PostgreSQL Config File and Query Design for Calculations - postgresql

I'm new to PG and trying to run this simple query and it's crashing Postgres. The query works in a few seconds if I only try to calculate r1, but says "Out of memory" if I try to calculate r2 through r6 in addition to r1 as below. Poor query design? I'm going to reference the calculated fields r1...r6 in other calculations, so was thinking about making this query a view. My key config file parameters are below. Windows 10, PG 9.6, 40GB RAM, 64-bit. Any ideas on what to do? Thanks!
Edited: I tried adding LIMIT 500 to the end and that worked, but if I run a query on this query, i.e. to use the calculated r1,r2,r3... in another query, will the new query see all the records or will it be limited to just the 500?
SELECT
public.psda.price_y1,
public.psda.price_y2,
public.psda.price_y3,
public.psda.price_y4,
public.psda.price_y5,
public.psda.price_y6,
public.psda.price_y7,
(price_y1 - price_y2) / nullif(price_y2, 0) AS r1,
(price_y2 - price_y3) / nullif(price_y3, 0) AS r2,
(price_y3 - price_y4) / nullif(price_y4, 0) AS r3,
(price_y4 - price_y5) / nullif(price_y5, 0) AS r4,
(price_y5 - price_y6) / nullif(price_y6, 0) AS r5,
(price_y6 - price_y7) / nullif(price_y7, 0) AS r6
FROM
public.psda
My config file parameters:
max_connections = 50
shared_buffers = 1GB
effective_cache_size = 20GB
work_mem = 400MB
maintenance_work_mem = 1GB
wal_buffers = 16MB
max_wal_size = 2GB
min_wal_size = 1GB
checkpoint_completion_target = 0.7
default_statistics_target = 100

use this service http://pgtune.leopard.in.ua/
when calc total RAM for PostgreSQL = total RAM - RAM for OS
PGTune calculate configuration for PostgreSQL based on the maximum performance for a given hardware configuration.
It isn't a silver bullet for the optimization settings of PostgreSQL.
Many settings depend not only on the hardware configuration, but also on the size of the database, the number of clients and the complexity of queries, so that optimally configure the database can only be given all these parameters.

Related

High autovacuum_vacuum_cost_limit - why?

Update 2022-11-10: I have opened a case with AWS for this one and will let you know here once they have responded.
Postgres 12.9 AWS Managed on db.r5.4xlarge which has 64GB RAM.
autovacuum_vacuum_cost_limit is at 1800:
select setting from pg_settings where name = 'autovacuum_vacuum_cost_limit'; -- 1800
Parameter group in AWS Console:
autovacuum_vacuum_cost_limit: GREATEST({log(DBInstanceClassMemory/21474836480)*600},200)
rds.adaptive_autovacuum: 1
Calculation for autovacuum_vacuum_cost_limit IMHO:
64 Gigabytes = 68,719,476,736 Bytes
GREATEST({log(68719476736/21474836480)*600},200)
GREATEST({log(3.2)*600},200)
GREATEST({0.50514997832*600},200)
GREATEST(303.089986992,200)
CloudWatch Metric MaximumUsedTransactionIDs hovers around 200mio. Many tables are close to 200mio.
So autovacuum_vacuum_cost_limit should be 303 IMHO? Why is autovacuum_vacuum_cost_limit at 1800? I would expect 303. I think I can rule out manual intervention. Thank you.
Here is the answer from aws. In short, the explanation is twofold:
by "log" they mean "log base 2" and not "log base 10"
they are rounding at one point
The autovacuum_vacuum_cost_limit formula is GREATEST({log(DBInstanceClassMemory/21474836480)*600},200).
Log in above formula is log of base 2 and the log value is rounded off before it is multiplied by 600.
As your instance's instance type is r5.2xlarge, the instance class memory is 64 gb.
DBInstanceClassMemory= 64 GiB =68719476736 bytes
Therefore, the following calculation is used to calculate the autovacuum_vacuum_cost_limit value:
GREATEST({log(68719476736/21474836480)600},200)
= GREATEST({log(3.2)600},200) --> log base 2 of (3.2) is 1.678 which is rounded off to 2
= GREATEST({2*600},200)
= GREATEST({1200},200)
= 1200

Does Spark Filter/Predicate Pushdown not working as intended in ORC file?

While "spark.sql.orc.filterPushdown" is equal to false (by default). Following statement took 3 minutes to execute.
val result = spark.read.schema(schema).orc("s3a://......./*")
result.select("a","b").where(col("a")===1318138224).explain(extended = true)
result.select("a","b").where(col("a")===1318138224).show()
At physical plan it says ; PushedFilters: [IsNotNull(a), EqualTo(a,1318138224)]
So even if "filterPushdown" disabled by default by looking at "PushedFilters" statement I thought spark somehow pushes down filters.
But after setting spark.sql.orc.filterPushdown to "true" same code snippet took about 30 seconds. Weird part is that physical plan was same.
So I looked at "Stages" section at SparkUI and amount of input sizes were different.
spark.conf.set("spark.sql.orc.filterPushdown", false)
spark.conf.set("spark.sql.orc.filterPushdown", true)
So I feel like for reading orc files even if PushedFilters at Physical filled with some parameters(not empty) it doesn't mean Spark actually do pushdown predicate/filters ?
Or is there a point that I'm missing ?
The query plan is not affected by filterPushdown config (for either parquet or orc). Spark always tries to push filters to the source. The config controls whether source is allowed to apply the data skipping. Sometimes even when the config is set to true, there could be some cases where data skipping is not happening (due to bugs, or columns stats were incorrect or types not supported).
You can also check "number of output rows" in the SQL tab in web ui.
It is the number of rows read after predicate pushdown & partition pruning.
Scan parquet
number of files read: 1
scan time total (min, med, max )
18.7 s (12 ms, 214 ms, 841 ms )
metadata time: 0 ms
size of files read: 783.9 MiB
number of output rows: 100,000,000

PromQL Query Variable Not Returning Data But Works When Hardcoded

I ran into a problem where CPU was showing up higher than 100% and the solution was to divide it by the number of CPU's available.
The following worked on a gauge panel (using [ORIGINAL QUERY] / sum(machine_cpu_cores)):
sum(sum by (container_name)( rate(container_cpu_usage_seconds_total[1m] ) ) / count(node_cpu_seconds_total{mode="system"}) * 100 / sum(machine_cpu_cores)
On a graph, the following will return no data when [ORIGINAL QUERY] / sum(machine_cpu_cores)):
sum(rate(container_cpu_usage_seconds_total{name=~".+"}[$interval])) by (name) * 100 / sum(machine_cpu_cores)
However the following will return the expected data when hardcoding the value ([ORIGINAL QUERY] / 12):
sum(rate(container_cpu_usage_seconds_total{name=~".+"}[$interval])) by (name) * 100 / 12
What am I missing here?
I expect you have issues with the label sets or more certainly with the cardinality
Display each parts of the expression independently in the Prometheus console and check the labels are the same (ie matching label sets) and you have equivalents number of elements on each side (same cardinality for a metric).
If not, you may have to add vector matching expressions to your query.
Especially one to many matching in the later promql where to would need a / group_left sum(machine_cpu_cores).

How to handle data skew in the spark data frame for outer join

I have two data frames and I am performing outer join on 5 columns .
Below is example of my data set .
uniqueFundamentalSet|^|PeriodId|^|SourceId|^|StatementTypeCode|^|StatementCurrencyId|^|FinancialStatementLineItem.lineItemId|^|FinancialAsReportedLineItemName|^|FinancialAsReportedLineItemName.languageId|^|FinancialStatementLineItemValue|^|AdjustedForCorporateActionValue|^|ReportedCurrencyId|^|IsAsReportedCurrencySetManually|^|Unit|^|IsTotal|^|StatementSectionCode|^|DimentionalLineItemId|^|IsDerived|^|EstimateMethodCode|^|EstimateMethodNote|^|EstimateMethodNote.languageId|^|FinancialLineItemSource|^|IsCombinedItem|^|IsExcludedFromStandardization|^|DocByteOffset|^|DocByteLength|^|BookMark|^|ItemDisplayedNegativeFlag|^|ItemScalingFactor|^|ItemDisplayedValue|^|ReportedValue|^|EditedDescription|^|EditedDescription.languageId|^|ReportedDescription|^|ReportedDescription.languageId|^|AsReportedInstanceSequence|^|PhysicalMeasureId|^|FinancialStatementLineItemSequence|^|SystemDerivedTypeCode|^|AsReportedExchangeRate|^|AsReportedExchangeRateSourceCurrencyId|^|ThirdPartySourceCode|^|FinancialStatementLineItemValueUpperRange|^|FinancialStatementLineItemLocalLanguageLabel|^|FinancialStatementLineItemLocalLanguageLabel.languageId|^|IsFinal|^|FinancialStatementLineItem.lineItemInstanceKey|^|StatementSectionIsCredit|^|CapitalChangeAdjustmentDate|^|ParentLineItemId|^|EstimateMethodId|^|StatementSectionId|^|SystemDerivedTypeCodeId|^|UnitEnumerationId|^|FiscalYear|^|IsAnnual|^|PeriodPermId|^|PeriodPermId.objectTypeId|^|PeriodPermId.objectType|^|AuditID|^|AsReportedItemId|^|ExpressionInstanceId|^|ExpressionText|^|FFAction|!|
192730239205|^|235|^|1|^|FTN|^|500186|^|221|^|Average Age of Employees|^|505074|^|30.00000|^||^||^|False|^|1.00000|^|False|^|EMP|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|0|^||^||^||^|505074|^||^|505074|^||^||^|122880|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1235002211206722736|^|True|^||^||^|3019656|^|3013652|^|3019679|^|1010066|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239205|^|235|^|1|^|FTN|^|500186|^|498|^|Shareholders' Equity Per Share|^|505074|^|91.37000|^|678.74654|^|500186|^|False|^|1.00000|^|False|^|TAN|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|0|^||^||^||^|505074|^||^|505074|^||^||^|474880|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1235004981302988315|^|True|^||^||^|3019656|^|3013751|^|3019679|^|1010066|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239205|^|235|^|1|^|FTN|^|500186|^|500|^|Number of Shares Outstanding at Period End-Common Shares|^|505074|^|90000000.00000|^|12115420.96161|^||^|False|^|1000.00000|^|False|^|TAN|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|3|^||^||^||^|505074|^||^|505074|^||^||^|499712|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1235005001178855709|^|True|^||^||^|3019656|^|3013751|^|3019679|^|1010067|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239205|^|235|^|1|^|FTN|^|500186|^|562|^|Number of Employees|^|505074|^|2924.00000|^||^||^|False|^|1.00000|^|False|^|EMP|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|0|^||^||^||^|505074|^||^|505074|^||^||^|464864|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1235005621461877526|^|True|^||^||^|3019656|^|3013652|^|3019679|^|1010066|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239205|^|235|^|1|^|FTN|^|500186|^|655|^|Total number of shareholders|^|505074|^|11792.00000|^||^||^|False|^|1.00000|^|False|^|OTH|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|0|^||^||^||^|505074|^||^|505074|^||^||^|466927|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1235006551335570418|^|True|^||^||^|3019656|^|3013716|^|3019679|^|1010066|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239205|^|235|^|1|^|FTN|^|500186|^|657|^|Total dividends paid (common stock)|^|505074|^|540000000.00000|^||^|500186|^|False|^|1000000.00000|^|False|^|OTH|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|6|^||^||^||^|505074|^||^|505074|^||^||^|233463|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|12350065712483219|^|True|^||^||^|3019656|^|3013716|^|3019679|^|1010068|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239205|^|235|^|1|^|FTN|^|500186|^|1452|^|Order received|^|505074|^|26936000000.00000|^||^|500186|^|False|^|1000000.00000|^|False|^|OTH|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|6|^||^||^||^|505074|^||^|505074|^||^||^|350195|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1235014521608462544|^|True|^||^||^|3019656|^|3013716|^|3019679|^|1010068|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239205|^|235|^|1|^|FTN|^|500186|^|1453|^|Order backlogs|^|505074|^|1447000000.00000|^||^|500186|^|False|^|1000000.00000|^|False|^|OTH|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|6|^||^||^||^|505074|^||^|505074|^||^||^|350195|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1235014531922884465|^|True|^||^||^|3019656|^|3013716|^|3019679|^|1010068|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239205|^|235|^|1|^|FTN|^|500186|^|1457|^|Export amount|^|505074|^|3924000000.00000|^||^|500186|^|False|^|1000000.00000|^|False|^|OTH|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|6|^||^||^||^|505074|^||^|505074|^||^||^|291829|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1235014571728332413|^|True|^||^||^|3019656|^|3013716|^|3019679|^|1010068|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239205|^|235|^|1|^|FTN|^|500186|^|1459|^|Capital expenditures (Note)|^|505074|^|659000000.00000|^||^|500186|^|False|^|1000000.00000|^|False|^|OTH|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|6|^||^||^||^|505074|^||^|505074|^||^||^|350195|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1235014591148256870|^|True|^||^||^|3019656|^|3013716|^|3019679|^|1010068|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239285|^|236|^|1|^|FTN|^|500186|^|255|^|Number of Employees|^|505074|^|10152.00000|^||^||^|False|^|1.00000|^|False|^|EMP|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|0|^||^||^||^|505074|^||^|505074|^||^||^|12288|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1236002551128894330|^|True|^||^||^|3019656|^|3013652|^|3019679|^|1010066|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239285|^|236|^|1|^|FTN|^|500186|^|256|^|Average Age of Employees|^|505074|^|34.00000|^||^||^|False|^|1.00000|^|False|^|EMP|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|0|^||^||^||^|505074|^||^|505074|^||^||^|122880|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1236002561111316467|^|True|^||^||^|3019656|^|3013652|^|3019679|^|1010066|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239285|^|236|^|1|^|FTN|^|500186|^|542|^|Shareholders' Equity Per Share|^|505074|^|160.20000|^|691.93184|^|500186|^|False|^|1.00000|^|False|^|TAN|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|0|^||^||^||^|505074|^||^|505074|^||^||^|471038|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1236005421170597389|^|True|^||^||^|3019656|^|3013751|^|3019679|^|1010066|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239285|^|236|^|1|^|FTN|^|500186|^|545|^|Number of Shares Outstanding at Period End-Common Shares|^|505074|^|679468000.00000|^|157314300.64243|^||^|False|^|1000.00000|^|False|^|TAN|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|3|^||^||^||^|505074|^||^|505074|^||^||^|472064|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1236005451445165969|^|True|^||^||^|3019656|^|3013751|^|3019679|^|1010067|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239285|^|236|^|1|^|FTN|^|500186|^|718|^|Total dividends paid (common stock)|^|505074|^|4750000000.00000|^||^|500186|^|False|^|1000000.00000|^|False|^|OTH|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|6|^||^||^||^|505074|^||^|505074|^||^||^|458752|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1236007181118043352|^|True|^||^||^|3019656|^|3013716|^|3019679|^|1010068|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239285|^|236|^|1|^|FTN|^|500186|^|1364|^|Export amount|^|505074|^|15379000000.00000|^||^|500186|^|False|^|1000000.00000|^|False|^|OTH|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|6|^||^||^||^|505074|^||^|505074|^||^||^|459752|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1236013641649895533|^|True|^||^||^|3019656|^|3013716|^|3019679|^|1010068|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
192730239285|^|236|^|1|^|FTN|^|500186|^|1407|^|Total number of shareholders|^|505074|^|57288.00000|^||^||^|False|^|1.00000|^|False|^|OTH|^||^|False|^|ARV|^||^|505074|^||^|False|^|False|^||^||^||^||^|0|^||^||^||^|505074|^||^|505074|^||^||^|460752|^|NA|^||^||^|TK |^||^||^|505126|^|True|^|1236014071623011361|^|True|^||^||^|3019656|^|3013716|^|3019679|^|1010066|^|1976|^|True|^||^|1000220295|^||^||^||^||^||^|I|!|
The structure of the second data set is also same
I am performing on first 5 columns .
As you can see the combination of all first 5 columns does not provide me enough partition and that leads to data skew .
The spark job stuck on some of the Executor .
The size of the first dataset is 270 GB and second is 5 GB but expected to increase .
Total no of partition 1128
This is how I perform my join
val dfMainOutput = (dataMain.join(latestForEachKey, Seq("uniqueFundamentalSet", "PeriodId", "SourceId", "StatementTypeCode", "StatementCurrencyId", "FinancialStatementLineItem_lineItemId"), "outer") select (exprsExtended: _*)).filter(!$"FFAction|!|".contains("D|!|"))
I tried implementing Broadcast Join but no impact .
So in this case can I use salting or hashing on join key so that the joining key will become random and skew will not occur I guess .
Here is my query and app details
Here is the cluster details when we are loading the data .
And here is cluster details when most of the container is idle.
Adding the details of the task where some are 10 and on some executor only 3 to 4 .
Please consider the following points:
1) Since you have 60 executors and 10 cores per executor your partitions should be at least 60 x 10 = 600 partitions
2) In your case you have 270GB / 1128 ~ 241MB this should approximately be the partition size which looks quite big to me (considering data exchange during shuffling). Try first to re-partition to something more realistic for instance 8K or even 16K.
3) Since I can not see clearly how many executors participate on job execution you need to check it again and figure out the exact number of participating executors and if data is equally distributed. If data deviation between executors is low then your data is well distributed otherwise you face skewing.
4) If after re-partition skewing insists try to redistribute the join keys as described here

Sphinx composite (distributed) big indexes

Experience problem with indexing lot's of content data. Searching for the suitable solution.
The logic if following:
Robot is uploading content every day to the database.
Sphinx index must reindex only new (daily) data. I.e. the previous content is never being changed.
Sphinx delta indexing is an exact solution for this, but with too much content the error is rising: too many string attributes (current index format allows up to 4 GB).
Distributed indexing seems to be usable, but how to dynamically (without dirty hacks) add & split indexing data?
I.e.: day 1 there are total 10000 rows, day 2 - 20000 rows and etc. The index throws >4GB error on about 60000 rows.
The expected index flow: 1-5 day there is 1 index (no matter distributed or not), 6-10 day - 1 distributed (composite) index (50000 + 50000 rows) and so on.
The question is how to fill distributed index dynamically?
Daily iteration sample:
main index
chunk1 - 50000 rows
chunk2 - 50000 rows
chunk3 - 35000 rows
delta index
10000 new rows
rotate "delta"
merge "delta" into "main"
Please, advice.
Thanks to #barryhunter
RT indexes is a solution here.
Good manual is here: https://www.sphinxconnector.net/Tutorial/IntroductionToRealTimeIndexes
I've tested match queries on 3 000 000 000 letters. The speed is close to be the same as for "plain" index type. Total index size on HDD is about 2 GB.
Populating sphinx rt index:
CPU usage: ~ 50% of 1 core / 8 cores,
RAM usage: ~ 0.5% / 32 GB, Speed: quick as usual select - insert (mostly depends on using batch insert or row-by-row)
NOTE:
"SELECT MAX(id) FROM sphinx_index_name" will produce error "fullscan requires extern docinfo". Setting docinfo = extern will not solve this. So keep counter simply in mysql table (like for sphinx delta index: http://sphinxsearch.com/docs/current.html#delta-updates).