How to export spark data frames into excel sheets in pyspark - pyspark

I have an output data frame that I want to export into an excel sheet. So I have used xlsxwriter to export it but when I import it. It shows an error that says that No module named xlsxwriter.Is there any alternative library for converting data frame to excel sheets?
Not:- I am using data bricks community pyspark

There is some helpful information here:
https://xlsxwriter.readthedocs.io/getting_started.html
Try to load the package first, if that isn't available, make sure to install the package and restart your notebook.
pip install --user xlsxwriter
import XlsxWriter
workbook = xlsxwriter.Workbook('hello.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write('A1', 'Hello world')
workbook.close()

Related

Neo4j Import Tool for Dummies

I am new to neo4j and have "0" coding background (although trying to learn some). I understand the basic functionalities and am also able to import nodes and relationships using LOAD CSV. However, I absolutely can not make the neo4j-admin import tool work.
I created a new database, included the simplest CSV file in the import folder and tried the following (I will have to explain in the most simple terms - so don't laugh :))
Name of the file is test.csv
Content;
PropertyTest,:LABEL
proptest,TEST
I tried running the neo4j-import file by trying to open it. A black screen opens up and immediately disappears.
I tried ---> bin/neo4j-admin import --id-type=STRING \
--nodes:TEST=test.csv \
--nodes="test.csv" \
Could someone please explain to me with the simplest terms what the steps would be to import this?
Thank you.
The import folder under your Neo4j installation is fine to use but just bear in mind that the dbms.directories.import setting in neo4j.conf is just for the LOAD CSV command, not for neo4j-admin import.
Since your current directory in the command prompt is the bin folder, when you run the import command specifying import/movies.csv then that implies that the CSV file is in a folder called import under the current directory, under the bin folder.
If you run the command this way it should find the CSV files:
neo4j-admin import --nodes=../import/movies.csv --nodes=../import/actors.csv --relationships=../import/roles.csv
.. means the parent directory so running the command this way means to go up to the parent directory and then into the import directory under the parent dir.

Create a file download link to a dynamically generated file in Google Colab jupyter notebooks

How do you get FileLink(filename) to work in Google Colab in order to generate a download link? Is there a better way than FileLink?
Right now this code generates a download link pointing to localhost:
import pandas as pd
from IPython.display import FileLink, FileLinks
df = pd.DataFrame([[1,2,3],[4,5,6]])
df.to_csv('mydf.csv', index=False)
FileLink('mydf.csv')
The link generated as output points to: https://localhost:8080/myfile.csv
How do I get it to point to the correct file?
Try:
from google.colab import files
files.download('mydf.csv')
Or, more simply, use the file browser in the left hand pane.

How can I import custom modules from a Github repository in Google Colab?

I understand how to run a single notebook in Colab. However, I am not sure how to use all files from a repository, i.e to be able to import functions inside Colab notebook?
Thank you.
Let's say we want to run the ipynb file, named as "1-fully-connected-binarized-mnist" residing in the repo "qnn-inference-examples".
https://github.com/maltanar/qnn-inference-examples
The notebook of interest uses customly created QNN library and functions inside that repo. Yes we need to import that function. To do this, we should first upload the repo folder to Google Colab, then correct/modify library and file paths.
0) Open the ipynb file "1-fully-connected-binarized-mnist" on your Colab. You can rename it if you like.
Try to run it, but will probably get some errors (as I did). So let's fix these issues
1) Insert a new code cell at the top of the notebook. And clone the repo on your Colab:
!git clone https://github.com/maltanar/qnn-inference-examples.git
now the new folder "qnn-inference-examples" created under your "content" folder. you should see something like this on the left side. And remember the path "/content/qnn-inference-examples"
2) Now add the second new cell on top:
import sys
sys.path.insert(0,'/content/qnn-inference-examples')
This will fix the issue about not able to find the library location, when trying import the QNN libraries.
3) Manually fix the file links on the existing code, according to the new path. Because the library and files now exist under the folder "/content/qnn-inference-examples":
for example replace:
img = Image.open("7.png")
with
img = Image.open("/content/qnn-inference-examples/7.png")
These steps should do the work
Please note that: This is not my own solution, mix of 2 or 3 solutions. Credit goes to Hüseyin Elçi, KDnuggets and Alexandr Haymin
https://medium.com/analytics-vidhya/importing-your-own-python-module-or-python-file-into-colab-3e365f0a35ec
https://www.kdnuggets.com/2018/02/google-colab-free-gpu-tutorial-tensorflow-keras-pytorch.html/2
Please see the example below:
!git clone https://www.github.com/matterport/Mask_RCNN.git
from google.colab import files
files.os.chdir('Mask_RCNN')
# To find local version of the library
sys.path.append(os.path.join(ROOT_DIR, 'Mask_RCNN'))
# here is your import
from mrcnn.config import Config

Import csv file in postgresql using scala

i am new to scala and need to write a code which can import csv file into postgresql table using scala.Can anyone help regarding this ?

Import excel spreadsheet into Oracle using sdcli command line tool

I'm attempting to into import oracle 11gR2 using the command line tool for SqlDeveloper 4.0. The ultimate reason is we are attempting to import a lot of freetext fields that need to preserve the exact formatting. CR LF, etc for legal reasons. End users need to edit these in Excel.
SQLLoader baulks at the CR LF's, You can achieve this in SqlDeveloper by switching the formatting to UTF-8 for import / export. We are now trying to build up some scripts after discovering how to do this in the command line runtime sdcli64... BUT there doesn't appear to be an option to import from a flat file or .xlsx in that utility??
Any pointers or are we missing an obvious parameter?
(we are using the latest version of SqlDeveloper we can find, 4.03)
Cheers,
Chris
New version of Oracle developer 4.1 was released as an Early adopter today. You can run the sdcli or sdcli64 command line version with the new parameters. This will import excel files as possible in the sqlDeveloper GUI and it will preserve the formatting using the new [-utility] switch.
You can then use the scripting tool/method of choice to build scripts to do all files in a directory, etc.
With new SQL Developer 4.1 you may import XLSX file (and other formats too) via command line:
Use sdcli utility import.
You will need a config XML file. To create one, start the import in the UI, configure the columns, etc, and at the last step click the button 'Save State'. It will create an XML file you may re-use in command line.