I am trying tWaitForFile component in Talend to watch for new created files. It seems to be working for local directory (I am using Windows 7).
However, when I point it to a shared folder like //ps1.remotemachine.com/Continents/Africa it doesn't work. It doesn't give me file creation signals like it gives for local directory.
Am I missing something?
Update:
In my testing so far, below are the observations for monitoring files on network path:
Talend tWaitForFile - Inconsistent results. Only gives notification sometimes. Majority of time, doesn't.
Java Nio WatchService - Tried this out of Talend solution. It does give notification for created files on network path. However, when the number of folders to be monitored on network path are too many, it starts missing events of some of the folders. In my case, it was around 100 folders to be monitored.
Hence, aborted both of above approaches and sticking on scheduler based running of Talend jobs.
Use
"\\\\ps1.remotemachine.com/Continents/Africa"
If you use the value from a context then you don't need to double "\"
And in the tWaitForFile
Related
I have a cluster of machines hosting hadoop (MapR) and have install streamsets on one of the nodes (say node002) following the RPM documentation. However, I am accessing the web UI for the data collector from another node, node001.
My question is, when I specify files paths (eg. an origin directory), which file system is the web UI going to be referring to? Eg. if I put an origin directory as /home/myuser/mydata, will the pipeline created in the web UI be looking for that directory in node001 or node002? New to using streamsets, so a more detailed answer would be appreciated. Thanks.
** Ultimately I am asking this because I am currently getting "FileNotFound" and "permission denied" errors while trying to follow the documentation's tutorial and am trying to debug the situation.
From the streamsets community forums: It will be the path to the local file on the machine running that particular SDC instance.
The FileNotFound and permission errors have to do with the fact that the default user for the sdc service is a user called sdc. Still working on how to fix this part, but can produce a workable prototype by setting the read and write access for the directories in question to allow public access (still need to work on this part, but this answers the posted question).
I have a Talend job that searches a directory and then uploads it to our database.
It's something like this: dbconnection>twaitforfile>tfilelist>fileschema>tmap>db
I have a subjobok that then commits the data into the table iterates through the directory and movies files to another folder.
Recently I was instructed to change the directory to a shared network path using the same components as before (I originally thought of changing components to tftpfilelist, etc.)
My question being how to direct it to the shared network path. I was able to get it to go through using double \ but it won't read any of the new files arriving.
Thanks!
I suppose if you use tWaitForFile on the local filesystem Talend/Java will hook somehow into the folder and get a message if a new file is being put into it.
Now, since you are on a network drive first of all this is out of reach of the component. Second, the OS behind the network drive could be different.
I understand your job is running all the time, listening. You could change the behaviour to putting a tLoop first which would check the file system for new files and then proceed. There must be some delta check in how the new files get recognized.
I have two drives A and B. Using a python script I am creating some files in "A" drive and I am running a powerscript which copies all the files in the drive A to drive B in the interval of 1 sec.
I am getting this error in my powershell.
2015/03/10 23:55:35 ERROR 32 (0x00000020) Time-Stamping Destination
File \x.x.x.x\share1\source\ Dummy_100.txt The process cannot access
the file because it is being used by another process. Waiting 30
seconds...
How will I overcome this error?
This happened is because the file is locked by running process. To fix this, download Process Explorer. Then use Find>Find Handle or DLL, find out which process locked this file. Use 'taskkill' to kill that process in commandline. You will be fine.
if you want to skip this files you can use /r:n that n is times of tries
for example /w:3 /r:5 will try 5 time every 3 seconds
How will I overcome this error?
If backup is, what you got in mind, and you encounter in-use files frequently, you look into Volume Shadow Copies (VSS), which allow to copy files despite them being ‘in use’. It's not a product, but a windows technology used by various backup tool.
Sadly, it's not built into robocopy, but can be used in conjunction with it. See
➝ https://superuser.com/a/602833/75914
and especially:
➝ https://github.com/candera/shadowspawn
It could be many reasons.
In my case, I was running a CMD script to copy from one server to another, a heap of SQL Server backups and transaction logs. I too had the same problem because it was trying to write into a log file that was supposedly opened by another process. It was not.
I ran many IP checks and Process ID checkers that I ran out of knowing what was hogging the log file. Event viewer said nothing.
I found out it was not even the log file that was being locked. I was able to delete it by logging into the server as a normal user with no admin privileges!
It was the backup files themselves by the SQL Server Agent. Like #Oseack said, there may have been the need to use another tool whilst the backup files themselves were still being used or locked by the SQL Server Agent.
The way I got around it was to force ROBOCOPY to wait.
/W:5
did it.
I finally managed to automate our release process using Desired State Configuration with the Azure PowerShell SDK methods, in particular the Publish-AzureVMDscConfiguration -> Set-AzureVMDscExtension -> Update-AzureVM combo.
After thinking for a while in a way to send my build outputs somewhere accessible by the VM, I ended up with the strategy of appending my build drops in the configuration package that gets uploaded to Azure Storage.
My problem now is that as soon as the PowerShell DSC Extension in the VM starts downloading that package, it's memory consumption goes through the roof. When I open task manager, I can see the newly created PowerShell process going from 30 or so megabytes, to 300, and then to 1.3GB, completely ruining my VM.
Yesterday afternoon, I left work and let it processing, but when I logged into the VM today, the inner zip file, containing my build outputs, had 0 bytes in the DSCWork folder. My problem is that even if it worked in the end, it is taking a very long time and making my VM useless... I can't even change between windows in remote access, since the machine is completely stuck at 100% RAM usage.
Why is PowerShell taking so much memory and time to download my configuration package? It only has 60MB zipped, and roughly 200MB unzipped. Is there something I can do to prevent that from happening?
UPDATE:
I tested it just now and it finally finished correctly. Took more than a full hour, but the files are there... This is unacceptable though.
This issue should be resolved in the next iteration of the extension. In the meanwhile you may want to consider uploading your build content to a blob separate from your configuration ZIP package (you can use Set-AzureStorageBlobContent for this).
Then you can use either the remote file or script resources in your original configuration to download the blob. Be sure to add the appropriate dependencies in your configuration so that the blob gets downloaded before you use it.
configuration DownloadSample
{
Import-DscResource -Module xPSDesiredStateConfiguration
xRemoteFile Download
{
Uri = 'https://....blob.core.windows.net/windows-powershell-dsc/foo.zip?sv=...'
DestinationPath = 'd:\tmp\download.zip'
}
}
We have an IIS .Net application deployed across several machines. We use IIS log information to do reporting of performance of the web application and navigation by the user. Currently the reporting is only required infrequently (once a day, for the previous day), so we just roll the logs every 24 hours, and move the old logs to our reporting server.
We have a new requirement that means we need much faster turnaround on the IIS log information, say every minute for the sake of the discussion.
There exist Apache tools like Facebook's Scribe to scalably move Apache web server logs across a network of servers.
Are there any similar tools available for IIS?
Is this the right question to ask?
Should we be doing something different, if the timing requirements have changed so much?
I've looked at this question and the answers, and the only one that seems to come close is this one.
Pointers appreciated!
Snare is a little old but worth mentioning.
Snare Agent for IIS Servers
http://www.intersectalliance.com/projects/SnareIIS/index.html
I used this old version a long time ago and it worked well by forwarding/sending/replicating IIS logs over a network via syslog.
Today, they have a newer version called Snare Epilog
http://www.intersectalliance.com/projects/EpilogWindows/index.html
The code is also open source; perhaps you might find it useful.
You might also want to try ...
http://nxlog.org
http://www.syslogserver.com/syslogagent.html
I tend to write a .bat file in conjunction with LOG Parser 2.2. The .Bat file will determine the appropriate file dates and pull the corresponding logs from multiple IIS server log locations into a single local directory. Once the files are across I then run a Log Parser command to query the log content over all log files and then produce a single output file in .csv format. Finally, I run an SSIS job to import the new .csv file into a running log table which I can then query on an ongoing basis.