Total Noob here, I installed Cloudera Manager on single node on aws ec2. I followed the install wizard but when I try running
spark-shell or pyspark I get the following error message:
ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: Required executor memory (1024+384
MB) is above the max threshold (1024 MB) of this cluster! Please check
the values of 'yarn.scheduler.maximum-allocation-mb' and/or
'yarn.nodemanager.resource.memory-mb'.
Can somebody explain to me what is going on or where to begin reading? Total noob, here so any help or direction is greatly appreciated
The required executor memory is above the maximum threshold. You need to increase the YARN memory.
The values of yarn.scheduler.maximum-allocation-mb and yarn.nodemanager.resource.memory-mb both live in the config file yarn-site.xml which is managed by Cloudera Manager in your case.
yarn.nodemanager.resource.memory-mb is the amount of physical memory, in MB, that can be allocated for containers.
yarn.scheduler.maximum-allocation-mb is the maximum memory in mb that cab be allocated per yarn container. The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value.
You can read more on the definitions and default values here: https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
In the Cloudera Manager user interface, go to Yarn service > Configurations > Search and increase the values of them.
Restart YARN for the changes to take effect.
Related
We are trying to deploy an apache Flink job on a K8s Cluster, but we are noticing an odd behavior, when we start our job, the task manager memory starts with the amount assigned, in our case is 3 GB.
taskmanager.memory.process.size: 3g
eventually, the memory starts decreasing until it reaches about 160 MB, at that point, it recovers a little memory so it doesn't reach its end.
that very low memory often causes that the job is terminated due to task manager heartbeat exception even when trying to watch the logs on Flink dashboard or doing the job's process.
Why is it going so low on memory? we expected to have that behavior but in the range of GB because we assigned those 3Gb to the task manager even if we change our task manager memory size we have the same behavior.
Our Flink conf looks like this:
flink-conf.yaml: |+
taskmanager.numberOfTaskSlots: 1
blob.server.port: 6124
taskmanager.rpc.port: 6122
taskmanager.memory.process.size: 3g
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9999
metrics.system-resource: true
metrics.system-resource-probing-interval: 5000
jobmanager.rpc.address: flink-jobmanager
jobmanager.rpc.port: 6123
is there a recommended configuration on K8s for memory or something that we are missing on our flink-conf.yml?
Thanks.
Your configuration looks fine. It's most likely an issue with your code and some kind of memory leak. This is a very good answer describing what may be the problem.
You can try setting a limit on the JVM heap with taskmanager.memory.task.heap.size that you give the JVM some extra room to do GC, etc. But in the end, if you are allocating something that is not being referenced you will run into the situation.
Presumably, you are using your memory to store your state in which case you can also try RockDB as a state backend in case you are storing large objects.
What are your requests/limits in you deployment templates? If there are no specified request sizes you may be seeing your cluster resources get eaten.
I have 3 yarn node managers working in a yarn cluster, and an issue connected with vcores avalibity per yarn node.
For e.g., I have:
on first node : available 15 vcores,
on second node : non vcores avalible,
on third node : available 37 vcores.
And now, job try to start and fails withe the error:
"Queue's AM resource limit exceeded"
Is this connected with the non vcores available on second node, or maybe I can somehow increase the resources limit in queue?
I also want to mention, that I have the following setting:
yarn.scheduler.capacity.maximum-am-resource-percent=1.0
That means, that your drivers have exceeded max memory configured in Max Application Master Resources. You can either increase max memory for AM or decrease driver memory in your jobs.
Starting spark in standalone client mode on 10 nodes cluster using Spark-2.1.0-SNAPSHOT.
9 nodes are workers, 10th is master and driver. Each 256GB of memory.
I'm having difficuilty to utilize my cluster fully.
I'm setting up memory limit for executors and driver to 200GB using following parameters to spark-shell:
spark-shell --executor-memory 200g --driver-memory 200g --conf spark.driver.maxResultSize=200g
When my application starts I can see those values set as expected both in console and in spark web UI /environment/ tab.
But when I go to /executors/ tab then I see that my nodes got only 114.3GB storage memory assigned, see screen below.
Total memory shown here is then 1.1TB while I would expect to have 2TB. I double checked that other processes were not using the memory.
Any idea what is the source of that discrepancy? Did I miss some setting? Is it a bug in /executors/ tab or spark engine?
You're fully utilizing the memory, but here you are only looking at the storage portion of the memory. By default, the storage portion is 60% of the total memory.
From Spark Docs
Memory usage in Spark largely falls under one of two categories: execution and storage. Execution memory refers to that used for computation in shuffles, joins, sorts and aggregations, while storage memory refers to that used for caching and propagating internal data across the cluster.
As of Spark 1.6, the execution memory and the storage memory is shared, so it's unlikely that you would need to tune the memory.fraction parameter.
If you're using yarn, the main page of the resource manager's "Memory Used" and "Memory Total" will signify the total memory usage.
When I start start spark shell using :
./bin/spark-shell --master spark://IP:7077 --executor-memory 4G
Then no memory is allocated to Spark :
If however if I just use default :
./bin/spark-shell --master spark://IP:7077
then memory is allocated :
How can I use max available memory in spark shell ? In this case
845MB+845MB+2.8GB = 4.49GB
Update : It appears Spark will just allocate to each node the max available memory of the node with the least amount of memory. So if I use :
./bin/spark-shell --master spark://IP:7077 --executor-memory 845M
then 2 nodes are fully allocated but node with 2.8GB is not fully allocated :
So question now becomes can Spark be configured so that each node uses it's max free memory ?
If your cluster consists of machines of different types, the limit of memory allocated to all executors will be taken from the machine with least memory size.
Spark is not that smart of allocating more memory to one specific executor because it runs on a bigger machine. That's why it's recommended to use homogenous hardware to build spark/hadoop cluster.
Secondly, try to avoid allocating 100% node's memory to executors because it will make GC collection slower & also other daemons needs memory too. My suggestion would be to spare at least 5% memory size on every single machine.
For more insights I recommend this article http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
Hope it helps.
In spark-env.sh, it's possible to configure the following environment variables:
# - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
export SPARK_WORKER_MEMORY=22g
[...]
# - SPARK_MEM, to change the amount of memory used per node (this should
# be in the same format as the JVM's -Xmx option, e.g. 300m or 1g)
export SPARK_MEM=3g
If I start a standalone cluster with this:
$SPARK_HOME/bin/start-all.sh
I can see at the Spark Master UI webpage that all the workers start with only 3GB RAM:
-- Workers Memory Column --
22.0 GB (3.0 GB Used)
22.0 GB (3.0 GB Used)
22.0 GB (3.0 GB Used)
[...]
However, I specified 22g as SPARK_WORKER_MEMORY in spark-env.sh
I'm somewhat confused by this. Probably I don't understand the difference between "node" and "worker".
Can someone explain the difference between the two memory settings and what I might have done wrong?
I'm using spark-0.7.0. See also here for more configuration info.
A standalone cluster can host multiple Spark clusters (each "cluster" is tied to a particular SparkContext). i.e. you can have one cluster running kmeans, one cluster running Shark, and another one running some interactive data mining.
In this case, the 22GB is the total amount of memory you allocated to the Spark standalone cluster, and your particular instance of SparkContext is using 3GB per node. So you can create 6 more SparkContext's using up to 21GB.