Error: `path` does not exist: ‘MIS_655_RS_T3_Wholesale_Customers’ - import

I imported my excel file into R Environment and saved the path by creating a new file in R scrip. However, when I tried to check my directory and load the dataset, I received the following message " Error: path does not exist: ‘MIS_655_RS_T3_Wholesale_Customers’
What am I doing wrong here?
Thanks

Have you missed the format of your dataset, eg. csv, xlsx.
I suggest you first set your file as working directory, then the following code might help you with it.
Dat_customers <- readxl::read_excel("MIS_655_RS_T3_Wholesale_Customers.xlsx")

Related

Talend - how to configure tFileInputDelimited do not throw error when file not found

Good day,
I am using tFileInputDelimited in Talend Data Studio to read a txt file and get some value inside.
The input file name is something like follow, it contain day in the file name:
checksum_150123.txt
This file will create in last few steps before the job end and the file not found.
Thus, every day the job first run, there is no file exist, and then tFileInputDelimited will throw error on file not found.
C:\LandingZone\jx\checksum_180123.txt (The system cannot find the file specified)
[ERROR] 14:13:35 my_track.my_precheck_registration_0_1.DL_PRECHECK_REGISTRATION- CollectCheckSum_1_tFileInputDelimited_1 - C:\LandingZone\jx\checksum_180123.txt (The system cannot find the file specified)
I have a requirement to not showing this, may I know how can I configure this?
for that I recommend you to use the tFileExist component and then use the tFileExist variable Exist (((Boolean)globalMap.get("tFileExist_1_EXISTS")) for example) in a run if trigger
Hope this answers your question

pyspark.sql.utils.IllegalArgumentException

pyspark.sql.utils.IllegalArgumentException: Pathname /F:/spark/sample_files/column_containing_JSON_data.csv from F:/spark/sample_files/column_containing_JSON_data.csv is not a valid DFS filename.
I am giving local input file path(as given below) but it is trying to access hdfs path(/F:/spark/sample_files/column_containing_JSON_data.csv). Throwing above error.
inputFile=spark.read.option("header",True).option("multiline",True).option("escape",""")
.csv('F:\spark\sample_files\column_containing_JSON_data.csv')
I have the same problem.
You have to put file:/// before the input path.
Like this:
inputFile=spark.read.option("header",True).option("multiline",True).option("escape",""").csv('file:///F:\spark\sample_files\column_containing_JSON_data.csv')

citrus waitFor().file fails to read a file

I’m trying to use waitFor() in my Citrustest to wait for an output file on disk to be written by the process I’m testing. I’ve used this code
outputFile = new File “/esbfiles/blesbt/bl03orders.99160221.14289.xml");
waitFor().file(outputFile).seconds(65L).interval(1000L);
after a few seconds, the file appears in the folder as expected. The user I’m running the test code as has permissions to read the file. The waitFor(), however, ends in a timeout.
09:46:44 09:46:44,818 DEBUG dition.FileCondition| Checking file path '/esbfiles/blesbt/bl03orders.99160221.14289.xml'
09:46:44 09:46:44,818 WARN dition.FileCondition| Failed to access file resource 'class path resource [esbfiles/blesbt/bl03orders.99160221.14289.xml] cannot be resolved to URL because it does not exist'
What could be the problem? Can’t I check for files outside the classpath?
This is actually a bug in Citrus. Citrus is working with the file path instead of the file object and in combination with Spring's PathMatchingResourcePatternResolver this causes Citrus to search for a classpath resource instead of using the absolute file path as external file system resource.
You can fix this by providing the absolute file path instead of the file object like this:
waitFor().file(“file:/esbfiles/blesbt/bl03orders.99160221.14289.xml")
.seconds(65L)
.interval(1000L);
Issue regarding broken file object conversion has been opened: https://github.com/christophd/citrus/issues/303
Thanks for pointing to it!

Spark-SQL: access file in current worker node directory

I need to read a file using spark-sql, and the file is in the current directory.
I use this command to decompress a list of files I have stored on HDFS.
val decompressCommand = Seq(laszippath, "-i", inputFileName , "-o", "out.las").!!
The file is outputted in the current worker node directory, and I know this because executing "ls -a"!! through scala I can see that the file is there. I then try to access it with the following command:
val dataFrame = sqlContext.read.las("out.las")
I assumed that the sql context would try to find the file in the current directory, but it doesn't. Also, it doesn't throw an error but a warning stating that the file could not be found (so spark continues to run).
I attempted to add the file using: sparkContext.addFile("out.las") and then access the location using: val location = SparkFiles.get("out.las") but this didn't work either.
I even ran the command val locationPt = "pwd"!! and then did val fullLocation = locationPt + "/out.las" and attempted to use that value but it didn't work either.
The actual exception that gets thrown is the following:
User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'x' given input columns: [];
org.apache.spark.sql.AnalysisException: cannot resolve 'x' given input columns: []
And this happens when I try to access column "x" from a dataframe. I know that column 'X' exists because I've downloaded some of the files from HDFS, decompressed them locally and ran some tests.
I need to decompress files one by one because I have 1.6TB of data and so I cannot decompress it at one go and access them later.
Can anyone tell me what I can do to access files which are being outputted to the worker node directory? Or maybe should I be doing it some other way?
So I managed to do it now. What I'm doing is I'm saving the file to HDFS, and then retrieving the file using the sql context through hdfs. I overwrite "out.las" each time in HDFS so that I don't have take too much space.
I have used the hadoop API before to get to files, I dunno if it will help you here.
val filePath = "/user/me/dataForHDFS/"
val fs:FileSystem = FileSystem.get(new java.net.URI(filePath + "out.las"), sc.hadoopConfiguration)
And I've not tested the below, but I'm pretty sure I'm passing the java array to scala illegally. But just giving an idea of what to do afterward.
var readIn: Array[Byte] = Array.empty[Byte]
val fileIn: FSDataInputStream = fs.open(file)
val fileIn.readFully(0, readIn)

unoconv fails to save in my specified directory

I am using unoconv to convert an ods spreadsheet to a csv file.
Here is the command:
unoconv -vvv --doctype=spreadsheet --format=csv --output= ~/Dropbox
/mariners_site/textFiles/expenses.csv ~/Dropbox/Aldeburgh/expenses
/expenses.ods
It saves the output file in the same directory as the source file, not in the specified directory. The error message is:
Output file: /home/richard/Dropbox/mariners_site/textFiles/expenses.csv
unoconv: UnoException during export phase:
Unable to store document to file:///home/richard/Dropbox/mariners_site
/textFiles/expenses.csv (ErrCode 19468)
I'm sure that this worked initially, but it has since stopped.
I have checked for permissions and they are identical for both directories.
I translated ErrCode 19468 for you and it boils down to meaning ERRCODE_SFX_DOCUMENTREADONLY.
You can find more information about the specific meaning of LibreOffice ErrCode numbers from the unoconv documentation at: https://github.com/dagwieers/unoconv/blob/master/doc/errcode.adoc
The clue here is that you have a whitespace-character between --output= and the filename (--output= ~/Dropbox
/mariners_site/textFiles/expenses.csv) and because of that unoconv gets an empty output value (which means the current directory) and is given 2 files. And that explains why you get this specific error IMO