How with tflileInputExcel in talend - talend

I want to process multiple files one by one to merge them into one excel file(xlsx) using tfilelist so I proceeded like the screenshot but I'm getting these errors:
For input string: "N_Vol"
For input string: "N_Vol"
Exception in component tFileInputExcel_1
org.apache.poi.openxml4j.exceptions.InvalidOperationException:
Can't open the specified file: 'C:\Users\dell\Desktop\Data2 -
Copie\~$S_1.xlsx'
at org.apache.poi.openxml4j.opc.ZipPackage.<init>
(ZipPackage.java:112)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:224)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:186)
at org.apache.poi.POIXMLDocument.openPackage(POIXMLDocument.java:74)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:296)
at tunisair.lig_0_1.lig.tFileList_1Process(lig.java:913)
at tunisair.lig_0_1.lig.runJobInTOS(lig.java:2053)
at tunisair.lig_0_1.lig.main(lig.java:1910)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(Unknown Source)
at java.util.zip.ZipFile.<init>(Unknown Source)
at java.util.zip.ZipFile.<init>(Unknown Source)
at org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:174)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:110)
This is my job.

here is my tfileexcelinput parameters
[https://i.stack.imgur.com/6qOuD.jpg][1]

As said somewhere else ;) it seems there is a problem with 'C:\Users\dell\Desktop\Data2 - Copie\~$S_1.xlsx'.
Is this file in your input folder?
Are you able to open it with Excel?
Also, I saw you've selected the option to read all sheets in the Excel files. Are they all based on the same schema?

yes all files have the same schema and this file is not on my input folder and i dont know why he reads it and not reading the files in my input folder?

Related

Talend - how to configure tFileInputDelimited do not throw error when file not found

Good day,
I am using tFileInputDelimited in Talend Data Studio to read a txt file and get some value inside.
The input file name is something like follow, it contain day in the file name:
checksum_150123.txt
This file will create in last few steps before the job end and the file not found.
Thus, every day the job first run, there is no file exist, and then tFileInputDelimited will throw error on file not found.
C:\LandingZone\jx\checksum_180123.txt (The system cannot find the file specified)
[ERROR] 14:13:35 my_track.my_precheck_registration_0_1.DL_PRECHECK_REGISTRATION- CollectCheckSum_1_tFileInputDelimited_1 - C:\LandingZone\jx\checksum_180123.txt (The system cannot find the file specified)
I have a requirement to not showing this, may I know how can I configure this?
for that I recommend you to use the tFileExist component and then use the tFileExist variable Exist (((Boolean)globalMap.get("tFileExist_1_EXISTS")) for example) in a run if trigger
Hope this answers your question

IOException while overwriting parquet

I have parquet file let's say file name abc/A.parquet and few records are filtered out based on certain condition and create DF and I am trying overwrite file with resulted filtered DF with saveMode overwrite option, but throwing below exception:
command used to overwrite
filterDF.coalesce(1).write.mode("overwrite").parquet("file:/home/psub2/cls_parquet2/file:/home/psub7/abc/A.parquet")
failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:381)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File file:/home/psub7/abc/A.parquet does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
Pls help, Thanks in advance
Conceptually you can't read and write dataframe from same file. IOException thrown when you try to read df from file A and try write same df into the same file A. You can overwrite A parquet file only if you didn't read dataframe from file A.
For example you can read dataframe from file A and overwrite file B.

Problem when using writer_pdf_Export filter in UNO LibreOffice

I installed LibreOffice [LibreOffice 6.1.3.2 10(Build:2)] and sdk on Ubuntu 16.03.
I used the java sample DocumentConverter in the sdk package to convert an odt in different formats.
With "MS WORD 97" or "Text" there is no problem but with "writer_pdf_Export" it ends in Exception
OK
java -jar /home/js/libreoffice6.1_sdk/LINUXexample.out/class/JavaDocumentHandlingExamples/DocumentConverter.jar "./test" "MS WORD 97" "doc" "/home/js/libreoffice6.1_sdk/LINUXexample.out/misc/JavaDocumentHandlingExamples/converted_files"
Connected to a running office ...
The converted documents will stored in "/home/js/libreoffice6.1_sdk/LINUXexample.out/misc/JavaDocumentHandlingExamples/converted_files!
[test]
test1.odt
KO
java -jar /home/js/libreoffice6.1_sdk/LINUXexample.out/class/JavaDocumentHandlingExamples/DocumentConverter.jar "./test" "writer_pdf_Export" "pdf" "/home/js/libreoffice6.1_sdk/LINUXexample.out/misc/JavaDocumentHandlingExamples/converted_files"
Connected to a running office ...
The converted documents will stored in "/home/js/libreoffice6.1_sdk/LINUXexample.out/misc/JavaDocumentHandlingExamples/converted_files!
[test]
com.sun.star.task.ErrorCodeIOException: SfxBaseModel::impl_store <file:////home/js/libreoffice6.1_sdk/LINUXexample.out/misc/JavaDocumentHandlingExamples/converted_files/test1.pdf> failed: 0x81a(Error Area:Io Class:Parameter Code:26)
at com.sun.star.lib.uno.environments.remote.Job.remoteUnoRequestRaisedException(Job.java:158)
at com.sun.star.lib.uno.environments.remote.Job.execute(Job.java:122)
at com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:312)
at com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:281)
at com.sun.star.lib.uno.environments.remote.JavaThreadPool.enter(JavaThreadPool.java:81)
at com.sun.star.lib.uno.bridges.java_remote.java_remote_bridge.sendRequest(java_remote_bridge.java:618)
at com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.request(ProxyFactory.java:145)
at com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.invoke(ProxyFactory.java:129)
at com.sun.proxy.$Proxy5.storeAsURL(Unknown Source)
at DocumentConverter.traverse(DocumentConverter.java:137)
at DocumentConverter.main(DocumentConverter.java:216)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.star.lib.loader.Loader.main(Loader.java:132)
test1.odt
I need to use the UNO interface through java to convert to any format, and I really need pdf also...
Any help?
Are you doing "storeAsURL" or "storeToURL"?
"storeAsURL" is analogous to "Save As" -- for when you are saving a document in an editable format.
"storeToURL" is analogous to "Export" -- for generating an output in an uneditable format.
So I suspect that, if you switch from "storeAsURL" to "storeToURL", you'll get the PDF you wanted.

Scala Spark - Overwrite parquet File on HDFS

I was trying to append the data frame to existing parquet file found option to have the saveMode to append. But when I was trying to append it throws the error it was not the directory.
data.coalesce(1).write.mode(SaveMode.Append).parquet("/user/root/AppendTest");
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=EXECUTE, inode="/user/root/AppendTest":root:root:-rw-r--r-- (Ancestor /user/root/AppendTest is not a directory).
P.S: While was creating the new file it was generated to the folder and then I have renamed to the desired file.
I have checked How to overwrite the output directory in spark but that doesn't solve my problem here. I have tried the ways mentioned in the above questions(issue mentioned is also different).

Caused by: java.io.IOException: File already exists

I want to save my DataFrame in CSV format. This is a small data set, therefore I use coalesce(1):
df.coalesce(1).write.mode(SaveMode.Overwrite).csv(outputPath + "/test.csv")
I get this error:
Caused by: java.io.IOException: File already exists:s3://test/test.csv/part-00000-c9f8a000-2601-4b83-a6d6-a3f023937fdc-c000.csv
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:617)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:915)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:896)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:793)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:176)
at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:81)
at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92)
at org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CSVFileFormat.scala:135)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon$1.newInstance(CSVFileFormat.scala:77)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:305)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:314)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
However I can save this DataFrame as a parquet file without any error:
df.write.mode(SaveMode.Overwrite).parquet(outputPath + "/test")
How to solve this issue and save my DataFrame in CSV format?