Postgres: Out of memory - postgresql

I have 3GB ram assigned to VMBox. Doing similarity check on column of a table consisting of about 3885 entries, so in total 3885*3885 comparisons. But I get this error:
[23636.611505] Out of memory: Kill process 987 (postgres) score 848 or sacrifice child
[23636.612762] Killed process 987 (postgres) total-vm:5404836kB, anon-rss:2772756kB, file-rss:828kB
server closed the connection unexpectedly
This probably means the server terminated abnormally
before of while processing the request
The connection to the server was lost. Ateempting reset: Succeeded.
Still no idea why 15,093,225 comparisons cause "Out of memory" on 3gb ram. Any solutions?
Edit: I increased
Shared_buffers = from 128MB to 1000MB
Working_mem = from 4MB to 50MB
But still same error.
Edit2: I did explain and same error. Cartesian, because I have to compare sentences that are imported in cells.

Related

sql bulk insert never completes for 10 million records when using df.bulkCopyToSqlDB on databricks

I am reading 1 GB of CSV file ( record count : 10 million, columns : 13 ) and trying to dump it into the SQL server. Below are the infra details :
CSV file location : azure blob storage
Code : Spark + Scala
Cluster : Databricks
Size :
Code Used to read the file and dump it :
val df = spark.read.format(fileparser_config("fileFormat").as[String]).option("header", fileparser_config("IsFirstRowHeader").toString).load(fileparser_config("FileName").as[String]).withColumn("_ID", monotonically_increasing_id)
val bulkCopyConfig = Config(Map(
"url" -> connConfig("dataSource").as[String],
"databaseName" -> connConfig("dbName").as[String],
"user" -> connConfig("userName").as[String],
"password" -> connConfig("password").as[String],
"dbTable" -> tableName,
"bulkCopyBatchSize" -> "500000",
"bulkCopyTableLock" -> "true",
"bulkCopyTimeout" -> "600"))
println(s" ${LocalDateTime.now()} ************ sql bulk insert start ************")
df.bulkCopyToSqlDB(bulkCopyConfig)
println(s" ${LocalDateTime.now()} ************ sql bulk insert end ************")
Problem :
the cluster goes into a limbo and my job never completes. One time when it ran long enough it threw an error :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 13 in stage 38.0 failed 4 times, most recent failure: Lost task 13.3 in stage 38.0 (TID 1532, 10.0.6.6, executor 4): com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.\n\tat com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:227)\n\tat com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(SQLServerConnection.java:796)\n\tat com.microsoft.sqlserver.jdbc.SQLServ
Cluster Event logs :
Other observations :
While the job runs for a very long time, the cluster is not completely unresponsive. I tried this by submitting more jobs in that same window. The job ran but took comparatively more time than usual( around 10x time)
I tried increasing the worker nodes and the node type ( even chose 128 GB nodes ) but still the outcome was same.
While the job was running, I tried checking the SQL table row count with nolock query. I ran this after 3-4 minutes while the job was running, it gave me around 2 million records in the table. But when I ran it again after 10 minutes, the query kept running forever and never returned any records.
I have tried tweaking the bulkCopyBatchSize property but it hasnt helped much.
I have tried to remove the sqlinsertion code and used an aggregation operation on the dataframe that i create from 1 GB file and the entire thing takes only 40-50 seconds, so the problem is only with sql driver/sql server.
I was facing the same issue.
Azure SQL Server - Standard S7: 800 DTUs
HDInsight - 6 node (2 D13V2 Head and 4 D13V2 Worker)
Data Size - 100GB Parquet with 1.7 billion rows.
Initially I was using "bulkCopyTimeout" as 600 seconds and I observed the loading was restarting after the timeout passed. Then I changed the timeout to a very large value and it worked fine.
For performance improvement:
Create columnstore Index in the target table and use
"bulkCopyBatchSize" = 1048576 (Load whole batch into row-groups maximum capacity and compresses them directly into the column-store rather than loading into delta store and compressing later)
"bulkCopyTableLock" = "false" (in order to allow parallelism)

error in postgres (connection limit exceeded for non-superusers)

We got an error from postgres one day.
"connection limit exceeded for non-superusers"
In postgresql.conf setting, the max_connection is 100.
At that time I checked the access with command (select * from pg_stat_activity;)
the result only 17 access.
We used this application for almost 10 years and never done any changed.
This is the first time we received this kind of error.
So, I assume that "not closing the connections properly in program "
is not the cause of this error.
Any tips?

Apache solr 5.3.1 out of memory

i'm new to solr, though i'm struggling for a few days to run full indexing on a postgreSQL 9.4 DB on a entity with about 117.000.000 entries.
I'm using solr 5.3.1 on Windows 7 x64 with 16 GB of RAM. I'm not intending to use this machine as a server, it's just some kind of prototyping i'm at.
I kept getting this error on JDK x86 with just starting solr as solr start without any options. Then i tried:
solr start -m 2g which results in solr not coming up at all
solr start -m 1g makes solr start, but after indexing about 87.000.000 entries it dies with an out of memory error.
It is exactly the same point at which it dies without any options, though at the admin dashboard I see JVM heap is full.
So, since solr warns me anyway to use a x64 JDK i did and now use 8u65. I started solr with 4g Heap and started full import again. Again after 87.000.000 entries it threw the same exception. But the heap isn't even full (42%), neither is RAM or SWAP.
Does anyone have an idea what could be the reason for this behaviour?
Here is my data-config
<dataConfig>
<dataSource
type="JdbcDataSource"
driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/dbname"
user="user"
password="secret"
readOnly="true"
autoCommit="false"
transactionIsolation="TRANSACTION_READ_COMMITTED"
holdability="CLOSE_CURSORS_AT_COMMIT" />
<entity name="hotel"
query="select * from someview;"
deltaImportQuery = "select * someview where solr_id = '${dataimporter.delta.id}'"
deltaQuery="select * from someview where changed > '${dataimporter.last_index_time}';">
<field name="id" column="id"/>
... etc for all 84 columns
in solrconfig.xml i have defined a RequestProcessorChain to generate a unique key while indexing, which seems to work.
in schema.xml there again are 84 columns with type, indexed and other attributes.
Here is the exception i'm getting, they are in german but the first one is saying "error 48" and the other "out of memory"
getNext() failed for query 'select * from someview;':org.apache.solr.handler.dataimport.DataImportHandlerException: org.postgresql.util.PSQLException: FEHLER: Speicher aufgebraucht
Detail: Fehler bei Anfrage mit Größe 48.
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:62)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:416)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:296)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:331)
at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:132)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: org.postgresql.util.PSQLException: FEHLER: Speicher aufgebraucht
Detail: Fehler bei Anfrage mit Größe 48.
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2182)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1911)
at org.postgresql.core.v3.QueryExecutorImpl.fetch(QueryExecutorImpl.java:2113)
at org.postgresql.jdbc2.AbstractJdbc2ResultSet.next(AbstractJdbc2ResultSet.java:1964)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:408)
... 12 more
Thank you in advance
As pointed by MatsLindh it was a JDBC error. Meanwhile i worked with hibernate search and experienced the same error at exactly the same time (near 87.000.000 indexed entities). The trick was to commit more often.
So at this case i tried several things at one time and it worked (don't know which option exactly did the trick):
1. set maxDocs for autoCommit in solrconfig.xml to 100.000. I believe that the default setting for committing is something at 15 seconds if no new documents are added, what actually happens all the time, until heap space runs full.
2. Set batchSize for the postrgreSQL JDBC Driver at 100 (Default is 500).
3. Changed the evil 'select * from table' to 'select c1, c2, ..., c85 from table'
4. Updated the JDBC Driver from 9.4.1203 to 9.4.1207
5. Updated Java to 1.8u74
I think it worked due to 1. and/or 3., I will do some further testing and update my post.
While i was trying the indexing with hibernate search I could see that the allocated RAM for PostgreSQL Server was freed at commit, so the RAM never was an issue again. It didn't happen here and the DB Server was at 85 GB RAM in the end, but kept on working.

PostgreSQL performance tuning under heavy load

My Server has following resources :
[postgres#srv2813 ~]$ free -m
total used free shared buffers cached
Mem: 15929 15118 810 142 12 219
-/+ buffers/cache: 14885 1043
Swap: 8031 2007 6024
[postgres#srv2813 ~]$ cat /proc/cpuinfo | grep processor | wc -l
8
[root#srv2813 postgres]# sysctl kernel.shmall
kernel.shmall = 4194304
[root#srv2813 postgres]# sysctl kernel.shmmax
kernel.shmmax = 17179869184
and My PostgreSQL conf :
default_statistics_target = 100
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
effective_cache_size = 12GB
work_mem = 32MB
wal_buffers = 16MB
shared_buffers = 3840MB
max_connections = 500
fsync = off
temp_buffers=32MB
But its getting "too many connection" error. The nginx_status page of the webserver shows around 500 active connections when this happens. The server hosts an api severver, so every "http request" invariably initiate a database "read". Its not a "write" heavy thing, but very "read" heavy.
Its possible that i maxed out our sever, but still i expected a little more from a 16G/8 core box considering the "read only" nature of the application. Can i push the PostgreSQL in any other possible direction?
PostgreSQL is process based vs thread based so it does not generally work well with a lot of connections.
I would look at using something like PgBouncer. PgBouncer is a lightweight connection pooler for PostgreSQL.

"Lost connection to MySQL server during query" in Google Cloud SQL

I am having a weird, recurring but not constant, error where I get "2013, 'Lost connection to MySQL server during query'". These are the premises:
a Python app runs around 15-20minutes every hour and then stops (hourly scheduled by cron)
the app is on a GCE n1-highcpu-2 instance, the db is on a D1 with a per package pricing plan and the following mysql flags
max_allowed_packet 1073741824
slow_query_log on
log_output TABLE
log_queries_not_using_indexes on
the database is accessed only by this app and this app only so the usage is the same, around 20 consecutive minutes per hour and then nothing at all for the other 40 minutes
the first query it does is
SELECT users.user_id, users.access_token, users.access_token_secret, users.screen_name, metadata.last_id
FROM users
LEFT OUTER JOIN metadata ON users.user_id = metadata.user_id
WHERE users.enabled = 1
the above query joins two tables that are each around 700 lines longs and do not have indexes
after this query (which takes 0.2 seconds when it runs without problems) the app starts without any issues
Looking at the logs I see that each time this error presents itself the interval between the start of the query and the error is 15 minutes.
I've also enabled the slow query log and those query are registered like this:
start_time: 2014-10-27 13:19:04
query_time: 00:00:00
lock_time: 00:00:00
rows_sent: 760
rows_examined: 1514
db: foobar
last_insert_id: 0
insert_id: 0
server_id: 1234567
sql_text: ...
Any ideas?
If your connection is idle for the 15 minute gap the you are probably seeing GCE disconnect your idle TCP connection, as described at https://cloud.google.com/compute/docs/troubleshooting#communicatewithinternet. Try the workaround that page suggests:
sudo /sbin/sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5
(You may need to put this configuration into /etc/sysctl.conf to make it permanent)