Talend: Using tfilelist to access files from a shared network path - talend

I have a Talend job that searches a directory and then uploads it to our database.
It's something like this: dbconnection>twaitforfile>tfilelist>fileschema>tmap>db
I have a subjobok that then commits the data into the table iterates through the directory and movies files to another folder.
Recently I was instructed to change the directory to a shared network path using the same components as before (I originally thought of changing components to tftpfilelist, etc.)
My question being how to direct it to the shared network path. I was able to get it to go through using double \ but it won't read any of the new files arriving.
Thanks!

I suppose if you use tWaitForFile on the local filesystem Talend/Java will hook somehow into the folder and get a message if a new file is being put into it.
Now, since you are on a network drive first of all this is out of reach of the component. Second, the OS behind the network drive could be different.
I understand your job is running all the time, listening. You could change the behaviour to putting a tLoop first which would check the file system for new files and then proceed. There must be some delta check in how the new files get recognized.

Related

customize the location of save files (apkovl)

Trying to be concise, I would like alpine to save backups made with lbu ci within a subdir of the bootable disk while the behavior is to put the saves in its root.
Insight
I have searched the internet and tried various things but they all failed.
Here it talks about the boot parameter of syslinux.conf:
A relative path, interpreted relative to the root of the alpine_dev.
This is my append inside syslinux.conf
This boot parameter should be used to specify where the backups are at startup, while where they should be saved with lbu ci should be written in /etc;/lbu/lbu.conf.
however, I don't understand how to use these variables here either,
although it should be clear.

Trigger a task on file Creates/Edits/Deletes in specific folder (AWS FSx)

I have a network path from AWS FSx (Already allowed Auditing from Folder's Advance Security Settings).
I need to log file Create/Delete/Edit on that network path from my server(Windows).
I tried to create a Custom view on Windows Event Viewer with event ID 4663.
But the problem is, that it shows logs from other folders as well.
I want to filter only the events from my network path and trigger a windows task, based on that Custom event view.
As it was a bit complicated to catch the detailed event from AWS, I changed the approach to accomplish my task.
I created a background worker to track all file changes in the given folder
using .Net Directory, File and FileSystemWatcher classes.

Watching for files on remote shared folder using tWaitForFile

I am trying tWaitForFile component in Talend to watch for new created files. It seems to be working for local directory (I am using Windows 7).
However, when I point it to a shared folder like //ps1.remotemachine.com/Continents/Africa it doesn't work. It doesn't give me file creation signals like it gives for local directory.
Am I missing something?
Update:
In my testing so far, below are the observations for monitoring files on network path:
Talend tWaitForFile - Inconsistent results. Only gives notification sometimes. Majority of time, doesn't.
Java Nio WatchService - Tried this out of Talend solution. It does give notification for created files on network path. However, when the number of folders to be monitored on network path are too many, it starts missing events of some of the folders. In my case, it was around 100 folders to be monitored.
Hence, aborted both of above approaches and sticking on scheduler based running of Talend jobs.
Use
"\\\\ps1.remotemachine.com/Continents/Africa"
If you use the value from a context then you don't need to double "\"
And in the tWaitForFile

SFTP file uploading and downloading at same time

A cronjob runs every 3 hours to download a file using SFTP. The scheduled program is written in Perl and the module used is Net::SFTP::Foreign.
Can the Net::SFTP::Foreign download files that are only partially uploaded using SFTP?
If so, do we need to check the SFTP file modified date to check copy process completion?
Suppose a new file is uploading by someone in SFTP and he file upload/copy is in progress. If a download is attempted at the same time, do I need to code for the possibility of fetching only part of a file?
It's not a question of the SFTP client you use, that's irrelevant. It's how the SFTP server handles the situation.
Some SFTP servers may lock the file being uploaded, preventing you from accessing it, while it is still being uploaded. But most SFTP servers, particularly the common OpenSSH SFTP server, won't lock the file.
There's no generic solution to this problem. Checking for timestamp or size changes may work for you, but it's hardly reliable.
There are some common workarounds to the problem:
Have the uploader upload "done" file once upload finishes. Make your program wait for the "done" file to appear.
You can have dedicated "upload" folder and have the uploader (atomically) move the uploaded file to "done" folder. Make your program look to the "done" folder only.
Have a file naming convention for files being uploaded (".filepart") and have the uploader (atomically) rename the file after upload to its final name. Make your program ignore the ".filepart" files.
See (my) article Locking files while uploading / Upload to temporary file name for example of implementing this approach.
Also, some FTP servers have this functionality built-in. For example ProFTPD with its HiddenStores directive.
A gross hack is to periodically check for file attributes (size and time) and consider the upload finished, if the attributes have not changed for some time interval.
You can also make use of the fact that some file formats have clear end-of-the-file marker (like XML or ZIP). So you know, when you download an incomplete file.
For details, see my answer to SFTP file lock mechanism.
The easiest way to do that when the upload process is also under your control, is to upload files using temporal names (for instance, foo-20170809.tgz.temp) and once the upload finishes, rename then (Net::SFTP::Foreign::put method supports the atomic option which does just that). Then on the download side, filter out the files with names corresponding to temporal files.
Anyway, Net::SFTP::Foreign get and rget methods can be instructed to resume a transfer passing the option resume => 1.
Also, if you have full SSH access to the SFTP server, you could check if some other process is still writing to the file to be downloaded using fuser or some similar tool (though, note that even then, the file may be incomplete if for instance there is some network issue and the uploader needs to reconnect before resuming the transfer).
You can check the size of the file.
Connect to SFTP.
Check file size.
Sleep for 5/10 seconds.
Check file size again.
If size did not change, download the file, if the size changed do step 3.

Powershell script run against share on server

I'm running a powershell script that's on my local PC on a file share that's on a server. I had code in the script to let the user select to delete something permanently (using Remove-Item) or to send something to the Recycle bin using this code:
[Microsoft.VisualBasic.FileIO.Filesystem]::DeleteFile($file.fullname,'OnlyErrorDialogs','SendToRecycleBin')
When run locally (either from my desktop, or from the server) against a folder that's local to that respective location, it works fine. A file that is identified gets deleted & immediately shows up in the recycle bin.
However, if run from my desktop to the file share, it deletes the file, but it doesn't show up in either the server's recycle bin or the local one either. I've tried UNC naming and mapped drive naming, and have come to believe this may be by design.
Is there a workaround for this?
Only files deleted from redirected folders end up in the recycle bin. If you want to be able to undelete files deleted across the network then you need to use a third-party utility.