Syntax error on topology.py when I try to run scala command in spark through Cloudera VM - scala

Everytime I try to run following Scala command
val dataRDD = sc.textFile("hdfs://quickstart.cloudera:8020/user/cloudera/data/data.txt")
dataRDD.collect().foreach(println)
//or
dataRDD.count()
I get following exception -
exitCodeException exitCode=1: File "/etc/hadoop/conf.cloudera.yarn/topology.py", line 43 print default_rack^
SyntaxError: Missing parentheses in call to 'print'
-I am running Spark 1.6.0 on Cloudera VM.
Anyone else faced such issue? What can be the reason? I understand that this is due to the 'topology.py' file which is trying to print without "(" which is required on python 3. But Why is this script being excuted when I am not running python/pyspark.
This is only happening through Cloudera VM, when I run outside the vm with some other sample data, the commands work!

I know it might be too late but I am posting the answer any way in case any other user face the same issue.
Above is the known issue and the workaround is following:
Workaround: Add a YARN gateway role to each host that does not already have at least one YARN role (of any type). YARN gateway needs to be added on the node/host where you are facing this issue.

Related

How to get performance of mongodb cluster from logs using mongodb keyhole?

I have installed Mongodb keyhole in my ubuntu server. Iam trying to analyze performance of a MongoDB cluster from the log file using the below command.
keyhole --loginfo log_file[.gz] [--collscan] [-v]
But the problem is iam getting the below error, eventhough the log file is same directory where iam running the command.Anyone please help me on this.
2022/10/12 11:20:45 open logfilename_mongodb.log.gz.[gz]: no such file or directory
I have fixed the issue with the below command format.
./keyhole -loginfo -v ~/Downloads/logfilepath.log
Glancing at the Logs Analytics readme for the project, it looks like you've got a simple syntax issue here. The [] characters are intending to indicate optional arguments/settings to use when running keyhole.
Have you tried a syntax similar to this?
keyhole --loginfo log_file --collscan -v

kedro context and catalog missing from ipython session

I launched ipython session and trying to load a dataset.
I am running
df = catalog.load("test_dataset")
Facing the below error
NameError: name 'catalog' is not defined
I also tried %reload_kedro but got the below error
UsageError: Line magic function `%reload_kedro` not found.
Even not able to load context either.
I am running the kedro environment from a Docker container.
I am not sure where I am going wrong.
new in 0.17.5 there is a fallback option, please run the following commands in your Jupyter/IPython session:
%load_ext kedro.extras.extensions.ipython
%reload_kedro <path_to_project_root>
This should help you get up and running.

Source of the ' unexpected keyword argument 'fetch' ' error in pandas to_sql?

I am trying to upload a dataframe to a Heroku postgreSQL server, which I have done successfully several time before.
here is my code, where for_db is the name of my Pandas dataframe:
from sqlalchemy import create_engine
engine = create_engine('postgresql://wgam{rest of url}',
echo=False)
# attach the data frame to the sql server
for_db.to_sql('phil_nlp',
con = engine,
if_exists='replace')
At first, it was not able to connect because the server URL Heroku gave me had only 'postgres' at the beginning, but I understand it has to be changed to 'postgresql' to work properly and have gotten past that initial error.
Now I am getting a new error.
/usr/local/lib/python3.7/dist-packages/sqlalchemy/dialects/postgresql/psycopg2.py in do_executemany(self, cursor, statement, parameters, context)
899 template=executemany_values,
900 fetch=bool(context.compiled.returning),
--> 901 **kwargs
902 )
903
TypeError: execute_values() got an unexpected keyword argument 'fetch'
I'm not understanding why this would come up. Obviously I never specified such a keyword argument. I've done a lot of searching without any good results. Anyone know why it would now throw this error in code that was working just last week?
I ran into the same issue running the DataFrame.to_sql method. Adding method='multi' does get it working and is a good workaround.
Investigating it a bit a further it turned out to be an issue with the versions of sqlalchemy and psycopg2 that I had installed. These github issues here and here led me to the following.
The fetch parameter was added on psycopg2 version 2.8. I had version 2.7 and sqlalchemy 1.4.15
Installing a newer version fixed the problem without the need to add the method='multi' parameter.
pip install psycopg2-binary==2.8.6
Hope this helps anyone else finding this issue
Was able to fix this by adding a 'multi' as the method parameter:
for_db.to_sql('phil_nlp',
con = engine,
if_exists='replace',
index=False,
method='multi')
Still not sure what caused the error, but I guess that's the problem fixed :)
worked for me: pip install psycopg2-binary==2.8.6

What causes error "Connection test failed: spawn npm; ENOENT" when creating new Strapi project with MongoDB?

I am trying to create a new Strapi app on Ubuntu 16.4 using MongoDB. After stepping through the tutorial, here: https://strapi.io/documentation/3.0.0-beta.x/guides/databases.html#mongodb-installation, I get the following error: Connection test failed: spawn npm; ENOENT
The error seems obvious, but I'm having issues getting to the cause of it. I've installed latest version of MongoDB and have ensured it is running using service mongod status. I can also connect directly using nc, like below.
$ nc -zvv localhost 27017
Connection to localhost 27017 port [tcp/*] succeeded!
Here is an image of the terminal output:
Any help troubleshooting this would be appreciated! Does Strapi perhaps log setup errors somewhere, or is there a way to get verbose logging? Is it possible the connection error would be logged by MongoDB somewhere?
I was able to find the answer. The problem was with using npx instead of Yarn. Strapi documentation states that either should work, however, it is clear from my experience that there is a bug when using npx.
I switched to Yarn and the process proceeded as expected without error. Steps were otherwise exactly the same.
Update: There is also a typo in Strapi documentation for yarn. They include the word "new" before the project name, which will create a project called new and ignore the project name.
Strapi docs (incorrect):
yarn create strapi-app new my-project
Correct usage, based on my experience:
yarn create strapi-app my-project
The ENOENT error is "an abbreviation of Error NO ENTry (or Error NO ENTity), and can actually be used for more than files/directories."
Why does ENOENT mean "No such file or directory"?
Everything I've read on this points toward issues with environment variables and the process.env.PATH.
"NOTE: This error is almost always caused because the command does not exist, because the working directory does not exist, or from a windows-only bug."
How do I debug "Error: spawn ENOENT" on node.js?
If you take the function that Jiaji Zhou provides in the link above and paste it into the top of your config/functions/bootstrap.js file (above module.exports), it might give you a better idea of where the error is occurring, specifically it should tell you the command it ran. Then run the command > which nameOfCommand to see what file path it returns.
"miss-installed programs are the most common cause for a not found command. Refer to each command documentation if needed and install it." - laconbass (from the same link, below Jiaji Zhou's answer)
This is how I interpret all of the above and form a solution. Put that function in bootstrap.js, then take the command returned from the function and run > which nameOfCommand. Then in bootstrap.js (you can comment out the function), put console.log(process.env.PATH) which will return a string of all the directories your current environment is checking for executables. If the path returned from your which command isn't in your process.env.PATH, you can move the command into a path, or try re-installing.

What is a spark kernel for apache toree?

I have a spark cluster which master is on 192.168.0.60:7077
I used to use jupyter notebook to make some pyspark scripts.
I am now willing to move on to scala.
I don't know scala's world.
I am trying to use Apache Toree.
I installed it, downloaded the scala kernels, and runned it to the point to open a scala notebook . Till there everything seems ok :-/
But I can't find the spark context, and there are errors in the jupyter's server logs :
[I 16:20:35.953 NotebookApp] Kernel started: afb8cb27-c0a2-425c-b8b1-3874329eb6a6
Starting Spark Kernel with SPARK_HOME=/Users/romain/spark
Error: Master must start with yarn, spark, mesos, or local
Run with --help for usage help or --verbose for debug output
[I 16:20:38.956 NotebookApp] KernelRestarter: restarting kernel (1/5)
As I don't know scala, I am not sure of the issue here ?
It could be :
I need a spark kernel (according to https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel )
I need to add an option on the server (the error message says 'Master must start with yarn, spark, mesos, or local' )
or something else :-/
I was just willing to migrate from python to scala, and I spend a few hours lost just on starting up the jupyter IDE :-/
It looks like you are using Spark in a standalone deploy mode. As Tzach suggested in his comment, following should work:
SPARK_OPTS='--master=spark://192.168.0.60:7077' jupyter notebook
SPARK_OPTS expects usual spark-submit parameter list.
If that does not help, you would need to check the SPARK_MASTER_PORT value in conf/spark-env.sh (7077 is the default).