Looping with tFileList - talend

I'm trying to loop into a directory to get images files. My Job is as following :
Talend TFileList Job
The problem is that in my oracle table, i find only one row (containing the last file in my folder).
The hierarchy of my files is as follow :
MainRepo
SubRepo1
- File1-1
- File1-2
- File1-3
- File1-4
SubRepo2
- File2-1
- File2-2
- File2-3
- File2-4
In my case, only File2-4 is persisted into Oracle DB.
Yours,
Amine

Related

Synapse Spark exception handling - Can't write to log file

I have written PySpark code to hit a REST API and extract the contents in an XML format and later wrote to Parquet in a data lake container.
I am trying to add logging functionality where I not only write out errors but updates of actions/process we execute.
I am comparatively new to Spark I have been relying on online articles and samples. All explain the error handling and logging through "1/0" examples and saving logs in the default folder structure (not in ADLS account/container/folder) which do not help at all. Most of the code written in Pure Python doesn't run as-is.
Could I get some assistance with setting up the following:
Push errors to a log file under a designated folder sitting under a data lake storage account/container/folder hierarchy".
Catching REST specific exceptions.
This is a sample of what I have written:
''''
LogFilepath = "abfss://raw#.dfs.core.windows.net/Data/logging/data.log"
#LogFilepath2 = "adl://.azuredatalakestore.net/raw/Data/logging/data.log"
print(LogFilepath)
try:
1/0
except Exception as e:
print('My Error...' + str(e))
with open(LogFilepath, "a") as f:
f.write("An error occured: {}\n".format(e))
''''
I have tried it both ABFSS and ADL file paths with no luck. The log file is already available in the storage account/container/folder.
I have reproduced the above using abfss path in with open() function but it gave me the below error.
FileNotFoundError: [Errno 2] No such file or directory: 'abfss://synapsedata#rakeshgen2.dfs.core.windows.net/datalogs.logs'
As per this Documentation
we can use open() on ADLS file with a path like /synfs/{jobId}/mountpoint/{filename}.
For that, first we need to mount the ADLS.
Here I have mounted it using ADLS linked service. you can mount either by Storage account access key or SAS as per your requirement.
mssparkutils.fs.mount(
"abfss://<container_name>#<storage_account_name>.dfs.core.windows.net",
"/mountpoint",
{"linkedService":"<ADLS linked service name>"}
)
Now use the below code to achieve your requirement.
from datetime import datetime
currentDateAndTime = datetime.now()
jobid=mssparkutils.env.getJobId()
LogFilepath='/synfs/'+jobid+'/synapsedata/datalogs.log'
print(LogFilepath)
try:
1/0
except Exception as e:
print('My Error...' + str(e))
with open(LogFilepath, "a") as f:
f.write("Time : {}- Error : {}\n".format(currentDateAndTime,e))
Here I am writing date time along with the error and there is no need to create the log file first. The above code will create and append the error.
If you want to generate the logs daily, you can generate date file names log files as per your requirement.
My Execution:
Here I have executed 2 times.

Talend - how to configure tFileInputDelimited do not throw error when file not found

Good day,
I am using tFileInputDelimited in Talend Data Studio to read a txt file and get some value inside.
The input file name is something like follow, it contain day in the file name:
checksum_150123.txt
This file will create in last few steps before the job end and the file not found.
Thus, every day the job first run, there is no file exist, and then tFileInputDelimited will throw error on file not found.
C:\LandingZone\jx\checksum_180123.txt (The system cannot find the file specified)
[ERROR] 14:13:35 my_track.my_precheck_registration_0_1.DL_PRECHECK_REGISTRATION- CollectCheckSum_1_tFileInputDelimited_1 - C:\LandingZone\jx\checksum_180123.txt (The system cannot find the file specified)
I have a requirement to not showing this, may I know how can I configure this?
for that I recommend you to use the tFileExist component and then use the tFileExist variable Exist (((Boolean)globalMap.get("tFileExist_1_EXISTS")) for example) in a run if trigger
Hope this answers your question

Azure Data Factory - Event based triggers on multiple files/blobs

I am invoking an ADF V2 pipeline via an event based trigger when new files/blobs are created in a folder within a blob container.
Blob Container structure:
BlobContainer ->
FolderName ->
-> File1.csv
-> File2.csv
-> File3.csv
I've created the trigger with below configuration:
Container Name: BlobContainer
Blob path begins with: FolderName/
Blob path ends with: .csv
Event Checked:Blob Created
Trigger Screenshot
Problem: Three csv files are created in the folder on ad hoc basis. The trigger that invokes the pipeline runs 3 times (probably because 3 blobs are created). The pipeline actually move the files in another blob container. So the 1st trigger run succeeds and remaining 2 fails because the files have been moved already. However how can I configure the trigger so that it only run once per folder even though 3 files are created within it?
Because the files are generated together, I am required to move them together into a new location using ADF.
Your blobEventTrigger triggered the pipeline for each file, For it, you can use a 'lookup activity' which gets the filenames and then use filter activity, which filtered the required filename and gives the filterdItemCounts attribute that could be checked in the IF Activity. When there is no file the filterdItemCounts returns '0'and your pipeline not triggered.
Summary-
Lookup Activity -> Filter Activity -> IF Activity -> Your Pipeline

Replace values in multiple appsettings files stored in different artifacts during VSTS Release pipeline

I am trying to replace appsettings.json and e2e-appsettings.json variables stored in two different artifacts in the release pipeline.
As per below code, ONLY appsettings.json is updating, but for the second line it gives an error,
error: NO JSON file matched with specific pattern: e2e/XXX.EndToEnd.Integration.Tests/e2e-appsettings.json
As per the information it should be the relative path to root. So in this case not sure what should be the root as there are two build artifacts.
Further in download artifact logs says,
2019-04-09T02:33:55.9132583Z Downloaded e2e/XXX.EndToEnd.Integration.Tests/e2e-appsettings.json to D:\a\r1\a\EstimationCore\e2e\XXX.EndToEnd.Integration.Tests\e2e-appsettings.json
The artifact with other appsettings.json file which is working fine is a zip file. logs, Downloading app/app.zip to D:\a\r1\a\EstimationCore\app\app.zip
These I have already tried which gave the same error
- NO JSON file matched with specific pattern: e2e/XXX.EndToEnd.Integration.Tests/e2e-appsettings.json.
- NO JSON file matched with specific pattern: **/*e2e-appsettings.json
- NO JSON file matched with specific pattern: d:\a\r1\a\EstimationCore\e2e\XXX.EndToEnd.Integration.Tests\e2e-appsettings.json.
- NO JSON file matched with specific pattern: d:\a\r1\a\EstimationCore\**\**\e2e-appsettings.json.

MongoDB - eclipse

I need to create a server-side app that saving information to a mongoDB ,
I'm working with java-eclipse-IDE and i have some problems with that .
First ,
I download the mongo-2.7.2.jar and add it to the path ( project ->properties-> java build path -> add jar -> (adding the mongo-2.7.2.jar file ) .
When I press "Run" without writing any other line except the empty - class & main function
The console writing me that :
CLI (1) [java application] path date
Usage : [--bucket bucketname] action
where action is one of:
list : lists all files in the store
put filename : puts the file filename into the store
get filename1 filename2 : gets filename1 from store and sends to filename2
md5 filename : does an md5 hash on a file in the db (for testing)
I tried to put a system.out.print("indications") in the main function but the console showing me the same output ...
onother interesting fact is when i'm wrting a code using a mongo DB the compiler accepted the code and not throwing errors ( seems he accepted the mongo-2.7.2.jar ).
second ,
I thoght maby i need to install a mongoDB plugin to eclipse , should i?
third ,
I saw that maven is integrated a mongoDB service , is it right that i should download maven to eclipse for handle mongoDB?
I need help soon as possible ....
Thanks.
Sounds like its trying to do something with GridFS but is missing a bucket name... are you using GridFS?