Talend - how to configure tFileInputDelimited do not throw error when file not found - talend

Good day,
I am using tFileInputDelimited in Talend Data Studio to read a txt file and get some value inside.
The input file name is something like follow, it contain day in the file name:
checksum_150123.txt
This file will create in last few steps before the job end and the file not found.
Thus, every day the job first run, there is no file exist, and then tFileInputDelimited will throw error on file not found.
C:\LandingZone\jx\checksum_180123.txt (The system cannot find the file specified)
[ERROR] 14:13:35 my_track.my_precheck_registration_0_1.DL_PRECHECK_REGISTRATION- CollectCheckSum_1_tFileInputDelimited_1 - C:\LandingZone\jx\checksum_180123.txt (The system cannot find the file specified)
I have a requirement to not showing this, may I know how can I configure this?

for that I recommend you to use the tFileExist component and then use the tFileExist variable Exist (((Boolean)globalMap.get("tFileExist_1_EXISTS")) for example) in a run if trigger
Hope this answers your question

Related

Synapse Spark exception handling - Can't write to log file

I have written PySpark code to hit a REST API and extract the contents in an XML format and later wrote to Parquet in a data lake container.
I am trying to add logging functionality where I not only write out errors but updates of actions/process we execute.
I am comparatively new to Spark I have been relying on online articles and samples. All explain the error handling and logging through "1/0" examples and saving logs in the default folder structure (not in ADLS account/container/folder) which do not help at all. Most of the code written in Pure Python doesn't run as-is.
Could I get some assistance with setting up the following:
Push errors to a log file under a designated folder sitting under a data lake storage account/container/folder hierarchy".
Catching REST specific exceptions.
This is a sample of what I have written:
''''
LogFilepath = "abfss://raw#.dfs.core.windows.net/Data/logging/data.log"
#LogFilepath2 = "adl://.azuredatalakestore.net/raw/Data/logging/data.log"
print(LogFilepath)
try:
1/0
except Exception as e:
print('My Error...' + str(e))
with open(LogFilepath, "a") as f:
f.write("An error occured: {}\n".format(e))
''''
I have tried it both ABFSS and ADL file paths with no luck. The log file is already available in the storage account/container/folder.
I have reproduced the above using abfss path in with open() function but it gave me the below error.
FileNotFoundError: [Errno 2] No such file or directory: 'abfss://synapsedata#rakeshgen2.dfs.core.windows.net/datalogs.logs'
As per this Documentation
we can use open() on ADLS file with a path like /synfs/{jobId}/mountpoint/{filename}.
For that, first we need to mount the ADLS.
Here I have mounted it using ADLS linked service. you can mount either by Storage account access key or SAS as per your requirement.
mssparkutils.fs.mount(
"abfss://<container_name>#<storage_account_name>.dfs.core.windows.net",
"/mountpoint",
{"linkedService":"<ADLS linked service name>"}
)
Now use the below code to achieve your requirement.
from datetime import datetime
currentDateAndTime = datetime.now()
jobid=mssparkutils.env.getJobId()
LogFilepath='/synfs/'+jobid+'/synapsedata/datalogs.log'
print(LogFilepath)
try:
1/0
except Exception as e:
print('My Error...' + str(e))
with open(LogFilepath, "a") as f:
f.write("Time : {}- Error : {}\n".format(currentDateAndTime,e))
Here I am writing date time along with the error and there is no need to create the log file first. The above code will create and append the error.
If you want to generate the logs daily, you can generate date file names log files as per your requirement.
My Execution:
Here I have executed 2 times.

Error: `path` does not exist: ‘MIS_655_RS_T3_Wholesale_Customers’

I imported my excel file into R Environment and saved the path by creating a new file in R scrip. However, when I tried to check my directory and load the dataset, I received the following message " Error: path does not exist: ‘MIS_655_RS_T3_Wholesale_Customers’
What am I doing wrong here?
Thanks
Have you missed the format of your dataset, eg. csv, xlsx.
I suggest you first set your file as working directory, then the following code might help you with it.
Dat_customers <- readxl::read_excel("MIS_655_RS_T3_Wholesale_Customers.xlsx")

citrus waitFor().file fails to read a file

I’m trying to use waitFor() in my Citrustest to wait for an output file on disk to be written by the process I’m testing. I’ve used this code
outputFile = new File “/esbfiles/blesbt/bl03orders.99160221.14289.xml");
waitFor().file(outputFile).seconds(65L).interval(1000L);
after a few seconds, the file appears in the folder as expected. The user I’m running the test code as has permissions to read the file. The waitFor(), however, ends in a timeout.
09:46:44 09:46:44,818 DEBUG dition.FileCondition| Checking file path '/esbfiles/blesbt/bl03orders.99160221.14289.xml'
09:46:44 09:46:44,818 WARN dition.FileCondition| Failed to access file resource 'class path resource [esbfiles/blesbt/bl03orders.99160221.14289.xml] cannot be resolved to URL because it does not exist'
What could be the problem? Can’t I check for files outside the classpath?
This is actually a bug in Citrus. Citrus is working with the file path instead of the file object and in combination with Spring's PathMatchingResourcePatternResolver this causes Citrus to search for a classpath resource instead of using the absolute file path as external file system resource.
You can fix this by providing the absolute file path instead of the file object like this:
waitFor().file(“file:/esbfiles/blesbt/bl03orders.99160221.14289.xml")
.seconds(65L)
.interval(1000L);
Issue regarding broken file object conversion has been opened: https://github.com/christophd/citrus/issues/303
Thanks for pointing to it!

Spring batch, handle failed files again, skip successfully handled

How to ideologically correct organize file handling?
I have a folder for new files (NEW), folder for old files (OLD), a folder for failed files (FAIL). New file puts in NEW, then if the handling was correct, the file goes to OLD, if the handling was failed, the file goes to ERR. Then we take this file again and correcting it and put in NEW if all ok file goes to OLD if failed goes to ERR. And repeat again and again.
I have job with constant name "fileHandlingJob", in job i have some steps: "extract", "handling", "utilize", and i have job parameters: "filePath", "fileName".
Thanks!
If you state that uniqueness criteria of file - it's file's name, then you are on right way.
If job was in state FAILED (ERR folder) then you can retrigger it with same set of parameters. If job was COMPLETED - you can't run it again. Spring batch will complain.
You can ensure this behaviour by having unique file name as Job's parameter. So no other job could be triggered with same file name. Spring batch will simply prevent this.
Second parameter filePath can be additional non-unique parameter.
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder()
.addString("fileName", "myfile.xml", true)
.addDate("filePath", "C:\new\myfile.xml", false);
true/false here means whether parameter is unique or not.

unoconv fails to save in my specified directory

I am using unoconv to convert an ods spreadsheet to a csv file.
Here is the command:
unoconv -vvv --doctype=spreadsheet --format=csv --output= ~/Dropbox
/mariners_site/textFiles/expenses.csv ~/Dropbox/Aldeburgh/expenses
/expenses.ods
It saves the output file in the same directory as the source file, not in the specified directory. The error message is:
Output file: /home/richard/Dropbox/mariners_site/textFiles/expenses.csv
unoconv: UnoException during export phase:
Unable to store document to file:///home/richard/Dropbox/mariners_site
/textFiles/expenses.csv (ErrCode 19468)
I'm sure that this worked initially, but it has since stopped.
I have checked for permissions and they are identical for both directories.
I translated ErrCode 19468 for you and it boils down to meaning ERRCODE_SFX_DOCUMENTREADONLY.
You can find more information about the specific meaning of LibreOffice ErrCode numbers from the unoconv documentation at: https://github.com/dagwieers/unoconv/blob/master/doc/errcode.adoc
The clue here is that you have a whitespace-character between --output= and the filename (--output= ~/Dropbox
/mariners_site/textFiles/expenses.csv) and because of that unoconv gets an empty output value (which means the current directory) and is given 2 files. And that explains why you get this specific error IMO