Can we paste multiline code into Pyspark Shell - pyspark

I am trying to copy code from PyCharm into a pyspark-shell. Even a simple copy of two import statements leads to an error. Please see code snippet below
Can someone please point out what I am doing wrong here. It'll be so helpful if I can copy whole snippets of code into the shell (for e.g. copy pasting the contents of an entire python file). Is this meant to work?
>>> import subprocess
import pickle
File "<stdin>", line 1
import pickle
^
SyntaxError: multiple statements found while compiling a single statement

It sounds like you want to run this:
PYTHONSTARTUP=code.py pyspark
It will run your script in pyspark shell.
Usually when I paste into pyspark the issue is the whitespace.

Related

How to read a .sql file (not notebook) in a DataBricks repo to a String in Scala?

I can't seem to accomplish this is scala, seems easy in python, bash and R, but im trying to keep our code base in scala. I know the repo is set up correctly because when I execute %sh cat /absolute/path/to/file/in/repo I get the file contents printed to the console. I have tried most java/scala read text file methods and can't seem to get it to work. The error message I am repeatedly hitting is like:
Any help would be greatly appreciated.

'no module found' error in Python 3.9 for windows 10

I'm a total beginner in Python, I'm reading 'Learning Python' by Mark Lutz and following along in the book. I created a file called 'script1.py', saved as an 'all files' (as opposed to a .txt file) in notepad. The name of the file, code, etc. is verbatim to that which the book instructs.
I know script1.py is there and works, I know this because when I enter 'py script1.py' in my command prompt or git bash, it correctly outputs the product of the code.
The book instructs me to run 'import script1' in Python. When I do, I get the 'module not found' error. I've tried importing as stated above, a from-import command, moving 'script1' to and from different directories. I confirmed my PATH was set up to support python, nothing.
It seems (to my UNtrained eye anyway) that python isn't finding my files. Any and all help would be greatly appreciated. Thank you.
Remember that you can only import python files into a python program. In your File Explorer, make sure that file extensions are shown and verify that your script is script1.py and not script1.py.txt or something of that sort.

In what order will multiple pyspark program get executed on a spark cluster

If I submit multiple python (pyspark) files to a spark submit command, in which order will they get executed?
For Java, there is a main method which will get executed first and the rest of the classes will get executed in the order their objects/methdos are created/invoked.
But python (and also scala) allows the whole REPL style syntax whereby one is allowed to type commands in an 'open code' fashion, i.e outside method blocks.
So when a whole bunch of this REPL statements get submitted to the spark cluster, in what order will they execute?
According to http://spark.apache.org/docs/3.0.1/configuration.html
spark.submit.pyFiles (which is --py-files): Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps. Globs are allowed.
So the python files which are added by --py-files meant to be libraries, or modules, or packages, not for runnable scripts. You will need to create a main.py or something similar, then import the other 5 files and trigger in any order you want spark-submit --py-files five-files.zip main.py

Neo4j Import Tool for Dummies

I am new to neo4j and have "0" coding background (although trying to learn some). I understand the basic functionalities and am also able to import nodes and relationships using LOAD CSV. However, I absolutely can not make the neo4j-admin import tool work.
I created a new database, included the simplest CSV file in the import folder and tried the following (I will have to explain in the most simple terms - so don't laugh :))
Name of the file is test.csv
Content;
PropertyTest,:LABEL
proptest,TEST
I tried running the neo4j-import file by trying to open it. A black screen opens up and immediately disappears.
I tried ---> bin/neo4j-admin import --id-type=STRING \
--nodes:TEST=test.csv \
--nodes="test.csv" \
Could someone please explain to me with the simplest terms what the steps would be to import this?
Thank you.
The import folder under your Neo4j installation is fine to use but just bear in mind that the dbms.directories.import setting in neo4j.conf is just for the LOAD CSV command, not for neo4j-admin import.
Since your current directory in the command prompt is the bin folder, when you run the import command specifying import/movies.csv then that implies that the CSV file is in a folder called import under the current directory, under the bin folder.
If you run the command this way it should find the CSV files:
neo4j-admin import --nodes=../import/movies.csv --nodes=../import/actors.csv --relationships=../import/roles.csv
.. means the parent directory so running the command this way means to go up to the parent directory and then into the import directory under the parent dir.

Import excel spreadsheet into Oracle using sdcli command line tool

I'm attempting to into import oracle 11gR2 using the command line tool for SqlDeveloper 4.0. The ultimate reason is we are attempting to import a lot of freetext fields that need to preserve the exact formatting. CR LF, etc for legal reasons. End users need to edit these in Excel.
SQLLoader baulks at the CR LF's, You can achieve this in SqlDeveloper by switching the formatting to UTF-8 for import / export. We are now trying to build up some scripts after discovering how to do this in the command line runtime sdcli64... BUT there doesn't appear to be an option to import from a flat file or .xlsx in that utility??
Any pointers or are we missing an obvious parameter?
(we are using the latest version of SqlDeveloper we can find, 4.03)
Cheers,
Chris
New version of Oracle developer 4.1 was released as an Early adopter today. You can run the sdcli or sdcli64 command line version with the new parameters. This will import excel files as possible in the sqlDeveloper GUI and it will preserve the formatting using the new [-utility] switch.
You can then use the scripting tool/method of choice to build scripts to do all files in a directory, etc.
With new SQL Developer 4.1 you may import XLSX file (and other formats too) via command line:
Use sdcli utility import.
You will need a config XML file. To create one, start the import in the UI, configure the columns, etc, and at the last step click the button 'Save State'. It will create an XML file you may re-use in command line.