pyspark.sql.utils.IllegalArgumentException - pyspark

pyspark.sql.utils.IllegalArgumentException: Pathname /F:/spark/sample_files/column_containing_JSON_data.csv from F:/spark/sample_files/column_containing_JSON_data.csv is not a valid DFS filename.
I am giving local input file path(as given below) but it is trying to access hdfs path(/F:/spark/sample_files/column_containing_JSON_data.csv). Throwing above error.
inputFile=spark.read.option("header",True).option("multiline",True).option("escape",""")
.csv('F:\spark\sample_files\column_containing_JSON_data.csv')

I have the same problem.
You have to put file:/// before the input path.
Like this:
inputFile=spark.read.option("header",True).option("multiline",True).option("escape",""").csv('file:///F:\spark\sample_files\column_containing_JSON_data.csv')

Related

Error: `path` does not exist: ‘MIS_655_RS_T3_Wholesale_Customers’

I imported my excel file into R Environment and saved the path by creating a new file in R scrip. However, when I tried to check my directory and load the dataset, I received the following message " Error: path does not exist: ‘MIS_655_RS_T3_Wholesale_Customers’
What am I doing wrong here?
Thanks
Have you missed the format of your dataset, eg. csv, xlsx.
I suggest you first set your file as working directory, then the following code might help you with it.
Dat_customers <- readxl::read_excel("MIS_655_RS_T3_Wholesale_Customers.xlsx")

citrus waitFor().file fails to read a file

I’m trying to use waitFor() in my Citrustest to wait for an output file on disk to be written by the process I’m testing. I’ve used this code
outputFile = new File “/esbfiles/blesbt/bl03orders.99160221.14289.xml");
waitFor().file(outputFile).seconds(65L).interval(1000L);
after a few seconds, the file appears in the folder as expected. The user I’m running the test code as has permissions to read the file. The waitFor(), however, ends in a timeout.
09:46:44 09:46:44,818 DEBUG dition.FileCondition| Checking file path '/esbfiles/blesbt/bl03orders.99160221.14289.xml'
09:46:44 09:46:44,818 WARN dition.FileCondition| Failed to access file resource 'class path resource [esbfiles/blesbt/bl03orders.99160221.14289.xml] cannot be resolved to URL because it does not exist'
What could be the problem? Can’t I check for files outside the classpath?
This is actually a bug in Citrus. Citrus is working with the file path instead of the file object and in combination with Spring's PathMatchingResourcePatternResolver this causes Citrus to search for a classpath resource instead of using the absolute file path as external file system resource.
You can fix this by providing the absolute file path instead of the file object like this:
waitFor().file(“file:/esbfiles/blesbt/bl03orders.99160221.14289.xml")
.seconds(65L)
.interval(1000L);
Issue regarding broken file object conversion has been opened: https://github.com/christophd/citrus/issues/303
Thanks for pointing to it!

Problems with non-"UTF-8" file collection using flume - Spooldir type

My flume spool directory contains non-"UTF-8" files.
So I get a Java.nio.charset.MalformedInputException error when I try to collect it.
Changing the encoding option of a .conf file will also cause an error.
And I have to use spooldir type.
How can I collecrt non-"UTF-8" files.
enter image description here
The encoding of our log files were Latin5 (which is Turkish)
Fixed it by adding the below line into the conf file:
AGENTNAME.sources.SOURCENAME.inputCharset = ISO-8859-9

Spark Shell unable to read file at valid path

I am trying to read a file in Spark Shell that comes with CentOS distribution of Cloudera on my local machine. Following are the commands I have entered in Spark Shell.
spark-shell
val fileData = sc.textFile("hdfs://user/home/cloudera/cm_api.py");
fileData.count
I also tried this statment for reading file:
val fileData = sc.textFile("user/home/cloudera/cm_api.py");
However I am getting
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera/cm_api.py
I haven't changed any settings or configurations. What am I doing wrong?
You are missing the leading slash in your url, so the path is relative. To make it absolute, use
val fileData = sc.textFile("hdfs:///user/home/cloudera/cm_api.py")
or
val fileData = sc.textFile("/user/home/cloudera/cm_api.py")
I think you need to put the file in hdfs first: hadoop fs -put, then check the file: hadoop fs -ls, then go spark-shell , val fileData = sc.textFile("cm_api.py")
In "hdfs://user/home/cloudera/cm_api.py", you are missing the hostname of the URI. You should have pass something like "hdfs://<host>:<port>/user/home/cloudera/cm_api.py", where <host> is Hadoop NameNode host and the <port> is, well, port number of Hadoop NameNode, which is 50070 by default.
The error message says hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera/cm_api.py does not exist. The path looks suspicious! The file you mean is probably at hdfs://quickstart.cloudera:8020/user/cloudera/cm_api.py.
If it is, you can access it by using that full path. Or, if the default file system is configured as hdfs://quickstart.cloudera:8020/user/cloudera/, you can use simply cm_api.py.
You may be confused between HDFS file paths and local file paths. By specifying
hdfs://quickstart.cloudera:8020/user/home/cloudera/cm_api.py
you are saying two things:
1) there is a computer by the name "quickstart.cloudera' reachable via the network (try ping to ensure that is the case), and it is running HDFS.
2) the HDFS file system contains a file at /user/home/cloudera/cm_api.py (try 'hdfs dfs -ls /user/home/cloudera/' to verify this
If you are trying to access a file on the local file system you have to use a different URI:
file:///user/home/cloudera/cm_api.py

unoconv fails to save in my specified directory

I am using unoconv to convert an ods spreadsheet to a csv file.
Here is the command:
unoconv -vvv --doctype=spreadsheet --format=csv --output= ~/Dropbox
/mariners_site/textFiles/expenses.csv ~/Dropbox/Aldeburgh/expenses
/expenses.ods
It saves the output file in the same directory as the source file, not in the specified directory. The error message is:
Output file: /home/richard/Dropbox/mariners_site/textFiles/expenses.csv
unoconv: UnoException during export phase:
Unable to store document to file:///home/richard/Dropbox/mariners_site
/textFiles/expenses.csv (ErrCode 19468)
I'm sure that this worked initially, but it has since stopped.
I have checked for permissions and they are identical for both directories.
I translated ErrCode 19468 for you and it boils down to meaning ERRCODE_SFX_DOCUMENTREADONLY.
You can find more information about the specific meaning of LibreOffice ErrCode numbers from the unoconv documentation at: https://github.com/dagwieers/unoconv/blob/master/doc/errcode.adoc
The clue here is that you have a whitespace-character between --output= and the filename (--output= ~/Dropbox
/mariners_site/textFiles/expenses.csv) and because of that unoconv gets an empty output value (which means the current directory) and is given 2 files. And that explains why you get this specific error IMO