postgresql streaming replication slow on macOS - postgresql

I am using PostgreSQL 10.1 on MAC on which I am trying to set up streaming replication. I configured both master and slave to be on the same machine. I find the streaming replication lag to be slower than expected on mac. The same test runs on a Linux Ubuntu 16.04 machine without much lag.
I have the following insert script.
for i in $(seq 1 1 1000)
do
bin/psql postgres -p 8999 -c "Insert into $1 select tz, $i * 127361::bigint, $i::real, random()*12696::bigint from generate_series('01-01-2018'::timestamptz, '02-01-2018'::timestamptz, '30 sec'::interval)tz;"
echo $i
done
The lag is measured using the following queries,
SELECT pg_last_wal_receive_lsn() - pg_last_wal_replay_lsn();
SELECT (extract(epoch FROM now()) - extract(epoch FROM pg_last_xact_replay_timestamp()))::int;
However, the observation is very unexpected. The lag is increasing from the moment the transactions are started on master.
Slave localhost_9001: 12680304 1
Slave localhost_9001: 12354168 1
Slave localhost_9001: 16086800 1
.
.
.
Slave localhost_9001: 3697460920 121
Slave localhost_9001: 3689335376 122
Slave localhost_9001: 3685571296 122
.
.
.
.
Slave localhost_9001: 312752632 190
Slave localhost_9001: 308177496 190
Slave localhost_9001: 303548984 190
.
.
Slave localhost_9001: 22810280 199
Slave localhost_9001: 8255144 199
Slave localhost_9001: 4214440 199
Slave localhost_9001: 0 0
It took around 4.5 minutes for a single client inserting on a single table to complete on master and another 4 minutes for the slave to catch up. Note that NO simultaneous selects are run other than the script to measure the lag.
I understand that replay in PostgreSQL is pretty simple like, "move a particular block to a location", but I am not sure about this behavior.
I have the following other configurations,
checkpoint_timeout = 5min
max_wal_size = 1GB
min_wal_size = 80MB
Now, I run the same tests with same configurations on a Linux Ubuntu 16.04 machine, I find the lag perfectly reasonable.
Am I missing anything?

Related

Kafka producer quota and timeout exceptions

I am trying to come up with a configuration that would enforce producer quota setup based on an average byte rate of producer.
I did a test with a 3 node cluster. The topic however was created with 1 partition and 1 replication factor so that the producer_byte_rate can be measured only for 1 broker (the leader broker).
I set the producer_byte_rate to 20480 on client id test_producer_quota.
I used kafka-producer-perf-test to test out the throughput and throttle.
kafka-producer-perf-test --producer-props bootstrap.servers=SSL://kafka-broker1:6667 \
client.id=test_producer_quota \
--topic quota_test \
--producer.config /myfolder/client.properties \
--record.size 2048 --num-records 4000 --throughput -1
I expected the producer client to learn about the throttle and eventually smooth out the requests sent to the broker. Instead I noticed there is alternate throghput of 98 rec/sec and 21 recs/sec for a period of more than 30 seconds. During this time average latency slowly kept increseing and finally when it hits 120000 ms, I start to see Timeout exception as below
org.apache.kafka.common.errors.TimeoutException : Expiring 7 records for quota_test-0: 120000 ms has passed since batch creation.
What is possibly causing this issue?
The producer is hitting timeout when latency reaches 120 seconds (default value of delivery.timeout.ms )
Why isnt the producer not learning about the throttle and quota and slowing down or backing off
What other producer configuration could help alleviate this timeout issue ?
(2048 * 4000) / 20480 = 400 (sec)
This means that, if your producer is trying to send the 4000 records full speed ( which is the case because you set throughput to -1), then it might batch them and put them in the queue.. in maybe one or two seconds (depending on your CPU).
Then, thanks to your quota settings (20480), you can be sure that the broker won't 'complete' the processing of those 4000 records before at least 399 or 398 seconds.
The broker does not return an error when a client exceeds its quota, but instead attempts to slow the client down. The broker computes the amount of delay needed to bring a client under its quota and delays the response for that amount of time.
Your request.timeout.ms being set to 120 seconds, you then have this timeoutException.

Neo4j performance Neo4j REST API Vs Neo4j Java Driver

Our current implementation has all calls to Neo4j through REST API. We are in process of replacing some of the code through neo4j-java-driver.
We had some issues with performance which we tried to resolve through Cypher optimization and moving load from Neo4j DB to application layer.
Using java driver, will it further reduce load on Neo4j DB or will just help in terms of reducing network latency?
The driver is a bit more optimal for some things. Version 1.5 will also allow
async operations. The next major version will also provide backpressure and reactive operations.
It doesn't have to generate JSON anymore but will stream a binary protocol. So that might reduce the load a bit. I'm not sure, it will have a lot of impact.
Best to measure yourself.
I did some testing and below are the results, which are not very encouraging
Windows with 16GB RAM, 100K nodes with local connecting to Neo4j.
String defaultNodes = "1000";
if(args.length > 0) {
defaultNodes = args[0];
}
String query = "MATCH (n) return n LIMIT "+defaultNodes;
long time = System.currentTimeMillis();
session.run(query);
System.out.println("With bolt for LIMIT "+defaultNodes+" -- "+(System.currentTimeMillis() - time));
time = System.currentTimeMillis();
Neo4jRESTHandler dbHandler = new Neo4jRESTHandler();
dbHandler.executeCypherQuery(query);
System.out.println("With REST for LIMIT "+defaultNodes+" -- "+(System.currentTimeMillis() - time));
C:\Migration>java -jar neo4jtestexamples-1.0.0-SNAPSHOT-jar-with-dependencies.jar
With bolt for LIMIT 1000 -- 131
With REST for LIMIT 1000 -- 162
C:\Migration>java -jar neo4jtestexamples-1.0.0-SNAPSHOT-jar-with-dependencies.ja
r
With bolt for LIMIT 1000 -- 143
With REST for LIMIT 1000 -- 156
C:\Migration>java -jar neo4jtestexamples-1.0.0-SNAPSHOT-jar-with-dependencies.ja
r 10000
With bolt for LIMIT 10000 -- 377
With REST for LIMIT 10000 -- 156
C:\Migration>java -jar neo4jtestexamples-1.0.0-SNAPSHOT-jar-with-dependencies.ja
r 10000
With bolt for LIMIT 10000 -- 335
With REST for LIMIT 10000 -- 157
C:\Migration>java -jar neo4jtestexamples-1.0.0-SNAPSHOT-jar-with-dependencies.ja
r
With bolt for LIMIT 1000 -- 104
With REST for LIMIT 1000 -- 161
C:\Migration>java -jar neo4jtestexamples-1.0.0-SNAPSHOT-jar-with-dependencies.ja
r 25000
With bolt for LIMIT 25000 -- 595
With REST for LIMIT 25000 -- 155
C:\Migration>java -jar neo4jtestexamples-1.0.0-SNAPSHOT-jar-with-dependencies.ja
r 25000
With bolt for LIMIT 25000 -- 544
With REST for LIMIT 25000 -- 151

spark 2.2 cache() cause the driver OutOfMemoryerror

I'm running Spark 2.2 with Scala on AWS EMR (Zeppling / spark-shell).
I'm trying to calculate very simple calculation: Loading, filtering, caching and counting on a large data set. My data contain 4,500 GB (4.8 TB) ORC format with 51,317,951,565 (51 billion) rows.
first I tried the process it with the following cluster:
1 master node - m4.xlarge - 4 cpu, 16 gb Mem
150 core nodes - r3.xlarge - 4 cpu, 29 gb Mem
150 tasks nodes - r3.xlarge - 4 cpu, 29 gb Mem
but it failed with OutOfMemoryError.
When I looked at Spark UI and Ganglia I saw that after the application load more than 80% of the data, the driver node getting too busy while the executors stop working (CPU usage is very low) until it crashed.
Ganglia CPU usage for master and worker nodes
then I tried to execute the same process just with increasing the driver node to:
1 master node - m4.2xlarge - 8 cpu, 31 gb Mem
and it succeeds.
I don't understand why the Driver node Memory usage is being fulfilled until it crashes. AFAIK only the executors are loading and processing the tasks and the data should not pass to the master. what could be the reason for it?
1) Ganglia Master Node usage for the second scenario
2) Spark UI stages
3) Spark UI DAG visualization
Below you can find the code:
import org.apache.spark.SparkConf
import org.apache.spark.sql.{Dataset, SaveMode, SparkSession, DataFrame}
import org.apache.spark.sql.functions.{concat_ws, expr, lit, udf}
import org.apache.spark.storage.StorageLevel
val df = spark.sql("select * from default.level_1 where date_ >= ('2017-11-08') and date_ <= ('2017-11-27')")
.drop("carrier", "city", "connection_type", "geo_country", "geo_country","geo_lat","geo_lon","geo_lon","geo_type", "ip","keywords","language","lat","lon","store_category","GEO3","GEO4")
.where("GEO4 is not null")
.withColumn("is_away", lit(0))
df.persist(StorageLevel.MEMORY_AND_DISK_SER)
df.count()
Below you can find the error message -
{"Event":"SparkListenerLogStart","Spark Version":"2.2.0"}
{"Event":"SparkListenerBlockManagerAdded","Block Manager ID":{"Executor ID":"driver","Host":"10.44.6.179","Port":44257},"Maximum Memory":6819151872,"Timestamp":1512024674827,"Maximum Onheap Memory":6819151872,"Maximum Offheap Memory":0}
{"Event":"SparkListenerEnvironmentUpdate","JVM Information":{"Java Home":"/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-1.b16.32.amzn1.x86_64/jre","Java Version":"1.8.0_141 (Oracle Corporation)","Scala Version":"version 2.11.8"},"Spark Properties":{"spark.sql.warehouse.dir":"hdfs:///user/spark/warehouse","spark.yarn.dist.files":"file:/etc/spark/conf/hive-site.xml","spark.executor.extraJavaOptions":"-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'","spark.driver.host":"10.44.6.179","spark.history.fs.logDirectory":"hdfs:///var/log/spark/apps","spark.eventLog.enabled":"true","spark.driver.port":"33707","spark.shuffle.service.enabled":"true","spark.driver.extraLibraryPath":"/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native","spark.repl.class.uri":"spark://10.44.6.179:33707/classes","spark.jars":"","spark.yarn.historyServer.address":"ip-10-44-6-179.ec2.internal:18080","spark.stage.attempt.ignoreOnDecommissionFetchFailure":"true","spark.repl.class.outputDir":"/mnt/tmp/spark-52cac1b4-614f-43a5-ab9b-5c60c6c1c5a5/repl-9389c888-603e-4988-9593-86e298d2514a","spark.app.name":"Spark shell","spark.scheduler.mode":"FIFO","spark.driver.memory":"11171M","spark.executor.instances":"200","spark.default.parallelism":"3200","spark.resourceManager.cleanupExpiredHost":"true","spark.executor.id":"driver","spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS":"$(hostname -f)","spark.driver.extraJavaOptions":"-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'","spark.submit.deployMode":"client","spark.master":"yarn","spark.ui.filters":"org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter","spark.blacklist.decommissioning.timeout":"1h","spark.executor.extraLibraryPath":"/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native","spark.sql.hive.metastore.sharedPrefixes":"com.amazonaws.services.dynamodbv2","spark.executor.memory":"20480M","spark.driver.extraClassPath":"/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar","spark.home":"/usr/lib/spark","spark.eventLog.dir":"hdfs:///var/log/spark/apps","spark.dynamicAllocation.enabled":"true","spark.executor.extraClassPath":"/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar","spark.sql.catalogImplementation":"hive","spark.executor.cores":"8","spark.history.ui.port":"18080","spark.driver.appUIAddress":"http://ip-10-44-6-179.ec2.internal:4040","spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS":"ip-10-44-6-
Notes -
1) I tried to change the StorageLevel to cache() and DISK_ONLY and it didn't affect the result.
2) I checked the volume of the "scratch space" and I saw that more than 90% of it still not in use.
Thanks!!
I have some assumption that this may be caused by the mechanism inside spark SQL.
In short the spark SQL will collect all the broadcast dataset at driver side so that when you have a big query, the driver must have enough memory to hold the broadcasted data.
Related link to the issue

Tuning kafka performance to get 1 Million messages/second

I'm using 3 VM servers, each one has 16 core/ 56 GB Ram /1 TB, to setup a kafka cluster. I work with Kafka 0.10.0 version. I installed a broker on two of them. I have created a topic with 2 partitions, 1 partition/broker and without replication.
My goal is to attend 1 000 000 messages / second.
I made a test with kafka-producer-perf-test.sh script and i get between 150 000 msg/s and 204 000 msg/s.
My configuration is:
-batch size: 8k (8192)
-message size: 300 byte (0.3 KB)
-thread num: 1
The producer configuration:
-request.required.acks=1
-queue.buffering.max.ms=0 #linger.ms=0
-compression.codec=none
-queue.buffering.max.messages=100000
-send.buffer.bytes=100000000
Any help will be appreciated to get 1 000 000 msg / s
Thank you
You're running an old version of Apache Kafka. The most recent release (0.11) had improvements including around performance.
You might find this useful too: https://www.confluent.io/blog/optimizing-apache-kafka-deployment/

Spark Application - High "Executor Computing Time"

I have a Spark application that is now running for 46 hours. While majority of its jobs complete within 25 seconds, specific jobs take hours. Some details are provided below:
Task Time Shuffle Read Shuffle Write
7.5 h 2.2 MB / 257402 2.9 MB / 128601
There are other similar task times off-course having values of 11.3 h, 10.6 h, 9.4 h etc. each of them spending bulk of the activity time on "rdd at DataFrameFunctions.scala:42.". Details for the stage reveals that the time spent by executor on "Executor Computing time". This executor runs at DataNode 1, where the CPU utilization is very normal about 13%. Other boxes (4 more worker nodes) have very nominal CPU utilization.
When the Shuffle Read is within 5000 records, this is extremely fast and completes with 25 seconds, as stated previously. Nothing is appended to the logs (spark/hadoop/hbase), neither anything is noticed at /tmp or /var/tmp location which will indicate some disk related activity is in progress.
I am clueless about what is going wrong. Have been struggling with this for quite some time now. The versions of software used are as follows:
Hadoop : 2.7.2
Zookeeper : 3.4.9
Kafka : 2.11-0.10.1.1
Spark : 2.1.0
HBase : 1.2.6
Phoenix : 4.10.0
Some configurations on the spark default file.
spark.eventLog.enabled true
spark.eventLog.dir hdfs://SDCHDPMAST1:8111/data1/spark-event
spark.history.fs.logDirectory hdfs://SDCHDPMAST1:8111/data1/spark-event
spark.yarn.jars hdfs://SDCHDPMAST1:8111/user/appuser/spark/share/lib/*.jar
spark.driver.maxResultSize 5G
spark.deploy.zookeeper.url SDCZKPSRV01
spark.executor.memory 12G
spark.driver.memory 10G
spark.executor.heartbeatInterval 60s
spark.network.timeout 300s
Is there any way I can reduce the time spent on "Executor Computing time"?
The job performing on the specific dataset is skewed. Because of the skewness, jobs are taking more than expected.