spark - java heap space issue - ExecutorLostFailure - container exited with status 143

spark - java heap space issue - ExecutorLostFailure - container exited with status 143 - scala

I am reading the string which is of length more than 100k bytes and splitting the columns based on width. I have close to 16K columns which I split from above string based on width.
but while writing into parquet i am using below code
rdd1=spark.sparkContext.textfile("file1")
{ var now=0
{ val collector= new array[String] (ColLenghth.length)
val recordlength=line.length
for (k<- 0 to colLength.length -1)
{ collector(k) = line.substring(now,now+colLength(k))
now =now+colLength(k)
}
collector.toSeq}
StringArray=rdd1.map(SubstrSting(_,ColLengthSeq))
#here ColLengthSeq is read from another schema file which is column lengths
StringArray.toDF("StringCol").select(0 until ColCount).map(j=>$"StringCol"(j) as column_seq(j):_*).write.mode("overwrite").parquet("c"\home\")
here ColCount = 16000 and column_seq is seq(string) with 16K column names.
I am running this on Yarn with 16GB executor memory and 20 executors.
File size is 4GB.
I am getting the error as
Lost task 113.0 in stage 0.0 (TID 461, gsta32512.foo.com): ExecutorLostFailure (executor 28 exited caused by one of the running tasks) Reason:
Container marked as failed:
container_e05_1472185459203_255575_01_000183 on host: gsta32512.foo.com. Exit status: 143. Diagnostics:
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
when i checked the status on UI its showing
#java.lang.outofmemoryerror java heap space
#java.lang.outofmemoryerror gc overhead limit exceeded
Please guide on performance tuning of above mentioned code and spark submit parameter optimization

Related

containerd - cannot update memory of a running container lower than its current memory

I am using 'crictl' tool to work with containerd runtime containers (under kubernetes) in a managed cluster.
I'm trying to set the memory limit (in bytes) to 16MB with the command:
crictl -r unix:///run/containerd/containerd.sock update --memory 16777216 c60df9ef3381e
And get the following error:
E1219 11:10:11.616194 1241 remote_runtime.go:640] "UpdateContainerResources from runtime service failed" err=<
rpc error: code = Unknown desc = failed to update resources: failed to update resources: /usr/bin/runc did not terminate successfully: exit status 1: unable to set memory limit to 16777216 (current usage: 97058816, peak usage: 126517248)
: unknown
> containerID="c60df9ef3381e"
FATA[0000] updating container resources for "c60df9ef3381e": rpc error: code = Unknown desc = failed to update resources: failed to update resources: /usr/bin/runc did not terminate successfully: exit status 1: unable to set memory limit to 16777216 (current usage: 97058816, peak usage: 126517248)
: unknown
At first I thought that maybe I cannot set a memory limit directly to a running container lower than the limit that appears in the kubernetes yaml.
Here Are the limits from K8s:
Requests:{"cpu":"100m","memory":"64Mi"} Limits:{"cpu":"200m","memory":"128Mi"}
But not, even setting a memory limit above the K8S request (e.g. 65MB) gives this same error!
This works on Docker runtime - I'm able to limit the memory of the container. Yes, it might crash, but the operation works..
Then, I tried to give a memory limit higher than the current usage, and it succeeded...
Can anyone help understanding this error and what might be causing it on containerd runtime?? Is this indeed a limitation that I cannot limit to a lower memory currently used by the container? Is there a way to overcome that?
Thanks a lot for your time!!!

Guru Meditation Error: Core 1 panic'ed (InstrFetchProhibited). Exception was unhandled

I am running micropython code from this repo tensorflow-micropython-examples. I print the mem info before invoking the interpreter interpreter.invoke()
GC: total: 4098240, used: 2251528, free: 1846712
No. of 1-blocks: 26462, 2-blocks: 99, max blk sz: 67725, max free sz: 115420
mem_info: None
However I get the following error which makes no sense as I have sufficient memory and the interpreter was successfully loaded in
Guru Meditation Error: Core 1 panic'ed (InstrFetchProhibited). Exception was unhandled.

Spark "Task serialization failed: java.lang.OutOfMemoryError"

I try write in parquet files 300 rows each contain column BinaryType with length ~ 14 000 000.
And as result i got this Exception:
org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.OutOfMemoryError java.lang.OutOfMemoryError at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
Bur this work when i try write 100 and 200 rows with same value.
spark submit configuration:
<param>--driver-memory 20G</param>
<param>--num-executors 100</param>
<param>--executor-memory 20G</param>
<param>--executor-cores 3</param>
<param>--driver-cores 3</param>
<param>--conf spark.driver.maxResultSize=10G</param>
<param>--conf spark.executor.memoryOverhead=10G</param>
<param>--conf spark.driver.memoryOverhead=10G</param>
<param>--conf spark.network.timeout=1200000</param>
<param>--conf spark.memory.offHeap.enabled=true</param>
<param>--conf spark.sql.files.maxRecordsPerFile=2</param>
I really can't understand how 14 mb per row can throw an OutOfMemoryError when trying to write. If i ran show on the final DataFrame, it works without errors.
used persist(StorageLevel.MEMORY_AND_DISK_SER) at final DataFrame also didnt help

How to fill data upto a size in multiple disk?

I am creating 4 mountpoint disk in Windows OS. I need to copy files up to a threshold value (say 50 GB).
I tried with vdbench. It works fine, but it throws an exception at last.
compratio=4
dedupratio=1
dedupunit=256k
* Host Definition section
hd=default,user=Administator,shell=vdbench,jvms=1
hd=localhost,system=localhost
********************************************************************************
* Storage Definition section
fsd=fsd1,anchor=C:\UnMapTest-Volume1\disk1\,depth=1,width=1,files=1,size=5g
fsd=fsd2,anchor=C:\UnMapTest-Volume2\disk2\,depth=1,width=1,files=1,size=5g
fwd=fwd1,fsd=fsd*,operation=write,xfersize=1m,fileio=sequential,fileselect=random,threads=10
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=1h,interval=1
Below is the exception from vdbench. Due to this my calling script would fail.
05:29:14.287 Message from slave localhost-0:
05:29:14.289 file=C:\UnMapTest-Volume1\disk1\\vdb.1_1.dir\vdb_f0001.file,busy=true
05:29:14.290 Thread: FwgThread write C:\UnMapTest-Volume1\disk1\ rd=rd1 For loops: None
05:29:14.291
05:29:14.292 last_ok_request: Thu Dec 28 05:28:57 PST 2017
05:29:14.292 Duration: 16.92 seconds
05:29:14.293 consecutive_blocks: 10001
05:29:14.294 last_block: FILE_BUSY File busy
05:29:14.294 operation: write
05:29:14.295
05:29:14.296 Do you maybe have more threads running than that you have
05:29:14.296 files and therefore some threads ultimately give up after 10000 tries?
05:29:14.300 *
05:29:14.301 ******************************************************
05:29:14.302 * Slave localhost-0 aborting: Too many thread blocks *
05:29:14.302 ******************************************************
05:29:14.303 *
05:29:21.235
05:29:21.235 Slave localhost-0 prematurely terminated.
05:29:21.235
05:29:21.235 Slave aborted. Abort message received:
05:29:21.235 Too many thread blocks
05:29:21.235
05:29:21.235 Look at file localhost-0.stdout.html for more information.
05:29:21.735
05:29:21.735 Slave localhost-0 prematurely terminated.
05:29:21.735
java.lang.RuntimeException: Slave localhost-0 prematurely terminated.
at Vdb.common.failure(common.java:335)
at Vdb.SlaveStarter.startSlave(SlaveStarter.java:198)
at Vdb.SlaveStarter.run(SlaveStarter.java:47)
I am using PowerShell in a Windows machine. Even if some other tools like Diskspd is having way to fill data up to some threshold then please provide me.

I found the answer by myself.
I have done this using Diskspd.exe as below
The following command fill 50 GB data in the mentioned disk folder
.\diskspd.exe -c50G -b4K -t2 C:\UnMapTest-Volume1\disk1\testfile1.dat
It is very simple than Vdbench for my requirement.
Caution : But it is not having real data so array side disk size is
not shown up with the size

Mappers fail for pig to insert data into MongoDB

I am trying to import a file from HDFS to MongoDB using MongoInsertStorage with PIG. The files are large, around 5GB. The script runs fine when I run it in local mode with
pig -x local example.pig
However if I run it in the mapreduce mode, Most of the mappers fail with the following error:
Error: com.mongodb.ConnectionString.getReadConcern()Lcom/mongodb/ReadConcern;
Container killed by the ApplicationMaster.
Container killed on request.
Exit code is 143 Container exited with a non-zero exit code 143
Can someone help me solve this issue?? I also increased the memory allocated to YARN containers but that hasnt helped.
Some mappers are also timing out after 300 seconds.
Pig Script is as follows
REGISTER mongo-java-driver-3.2.2.jar
REGISTER mongo-hadoop-core-1.4.0.jar
REGISTER mongo-hadoop-pig-1.4.0.jar
REGISTER mongodb-driver-3.2.2.jar
DEFINE MongoInsertStorage com.mongodb.hadoop.pig.MongoInsertStorage();
SET mapreduce.reduce.speculative true
BIG_DATA = LOAD 'hdfs://example.com:8020/user/someuser/sample.csv' using PigStorage(',') As (a:chararray,b:chararray,c:chararray);
STORE BIG_DATA INTO 'mongodb://insert.some.ip.here:27017/test.samplecollection' USING MongoInsertStorage('', '')

Found a solution.
For the error
Error: com.mongodb.ConnectionString.getReadConcern()Lcom/mongodb/ReadConcern;
Container killed by the ApplicationMaster.
Container killed on request.
Exit code is 143 Container exited with a non-zero exit code 143
I changed the JAR versions - hadoopcore and hadooppig from 1.4.0 to 2.0.2 and for Mongo Java driver from 3.2.2 to 3.4.2. This eliminated the ReadConcern Error on the mappers!
For the timeout, I added this after registering the jars:
SET mapreduce.task.timeout 1800000
I had been using SET mapred.task.timeout which didnt work
Hope this helps anyone who has a similar issue!