Read excel file in a directory using pyspark - pyspark

`Hi ,
I am trying to read excel file in a directory using pyspark but i am getting fielnotfound error
`env_path='dbfs:/mnt'
raw='dev/raw/work1'
path=env_path+raw
file_path=path+'/'
objects = dbutils.fs.ls(file_path)
for file_name in objects:
`if file_name.isFile():
sample_df=spark.read.format("com.crealytics.spark.excel").option("header", "false").load(objects+file_name) `
I am trying this code to read my excel file but getting file not found error .can someone help me with this?`

I think in load you are not passing the proper path. So you are getting the error file not found error.
Try this you will get the output.
objects = dbutils.fs.ls(path)
for file_name in objects:
if file_name.isFile():
if file_name[0].endswith("xlsx"): #optional
sample_df=spark.read.format("com.crealytics.spark.excel").option("header", "true").load(file_name[0])

Related

Unable to convert XLS file to DRL file

We are trying to convert XLS file to DRL file using Drool.net in C#. It is throwing error java.lang.RuntimeException: 'Script template is null - check for missing script definition.'. I am using basic sample template, but this error is kept on coming.
Sample rules table
Sample code
Please help us to resolve this issue.
Sample tried code
We are expecting this code to return text in DRL format.

KDB generating ERROR:file/path/location/sym os reports: No such file or directory

I am trying to save table as partition using .Q.dpt[hdbroot;.z.d;`tablename].
But it's generating No such file or directory error, but the directory is present.
can you please help me on this.
I have created blank folder to store the data but it's checking for sym file while storing data.
I have created one blank folder and gave that folder path to hdbroot variable, but it's not working.
I could replicate your error by trying to save to a location that doesn't exist on the machine.
q).Q.dpt[`:/does/not/exist;.z.d;`t]
'/does/not/exist/sym. OS reports: No such file or directory
[0] .Q.dpt[`:/does/not/exist;.z.d;`t]
Like I mentioned in my comment, make sure that the hdbroot variable is exactly the location you're expecting. key can help you determine this, here is a quick helper function for you.
q)exists:{"Folder/file ",$[11=abs type key x;"exists";"does not exist"]}
q)exists`:/does/not/exist
"Folder/file does not exist"
q)exists`:/tmp
"Folder/file exists"

SpreadsheetGear - Save Specific Workbook Sheet to CSV

I am opening an existing Excel file using SpreadsheetGear, using the following code:
SpreadsheetGear.IWorkbook xlBook = SpreadsheetGear.Factory.GetWorkbook(fileName, System.Globalization.CultureInfo.CurrentCulture);
xlBook.SaveAs(fileNameCSV, SpreadsheetGear.FileFormat.CSV);
This works, but the saved CSV file contains the wrong sheet.
Can anyone help with a code snippet on how to open an Excel file in SpreadsheetGear, then save only a SPECIFIC sheet to a CSV file.
Please note I am working with SpreadsheetGear and want a solution for that library. Thanks!
The IWorksheet interface includes a SaveAs(...) method for just this purpose:
using SpreadsheetGear;
using System.Globalization;
...
IWorkbook xlBook = Factory.GetWorkbook(fileName, CultureInfo.CurrentCulture);
xlBook.Worksheets["My Sheet"].SaveAs(fileNameCSV, FileFormat.CSV);
I'll also mention that there is also an IRange.SaveAs(...) method if you want to save just a particular range to CSV / UnicodeText (tab-delimited).

An error in executing a .gap file in GAP software

I am trying to load two .gap files but I receive the following error message. What could be the reason?
Thanks a lot in advance.
Error in executing .gap files
I have already read F1.gap file. I don't understand why it says F1 must be readable to load the F2.gap file.
Please help me to solve this problem.

Error when loading .mat file with scipy.io (ValueError: Mat 4 mopt wrong format)

I'm currently trying to load a .mat file in python using scipy and the following bit of code:
from scipy import io as sio
data= "file.mat"
output= sio.loadmat(data)
However when running the command I get the error:
ValueError: Mat 4 mopt wrong format, byteswapping problem?
What does this error message mean? Is there an issue with the file I'm trying to load?
I'm quite the novice when it comes to programming so any suggestions would be greatly appreciated : ) If there is a better way to load .mat files in python I'm open to hearing those too. Thanks in advance!
I've never seen this error before, but it is produced by line 113 in scipy/scipy/io/matlab/mio4.py
def read_header(self):
''' Read and return header for variable '''
data = read_dtype(self.mat_stream, self.dtypes['header'])
name = self.mat_stream.read(int(data['namlen'])).strip(b'\x00')
if data['mopt'] < 0 or data['mopt'] > 5000:
raise ValueError('Mat 4 mopt wrong format, byteswapping problem?')
...
Normally loadmat is the right file loader, at least among the supported types:
v4 (Level 1.0), v6 and v7 to 7.2 matfiles are supported.
Do you know anything about how this file was saved in MATLAB? Any format specifications such as these?