Is there a way for Apama to read files line by line?

Is there a way for Apama to read files line by line? - apama

I'm new to Apama. I see that a com.apama.file lib exists, but I am unsure how to actually use it to read a file. I want to send each line as an event to be parsed and then depending on the contents sent as a different event from there, but googling suggests that I'd need a transport (not sure what that is either) to do so, but my project lead is under the impression that this can all be done using Apama EPL. How true is this and if it has some validity, how can I go about achieving that?

Yes, this is certainly possible. To help you do it, though, please can you provide a little more information about your setup? For example, what is the file type and is the file local to where the correlator will be running? Will there only be one file to process at a time? How large is the file, and are there any specific performance requirements?
You may find this helpful:
https://github.com/SoftwareAG/apama-streaming-analytics-connectivity-FileTransport

You don't say quite what you are trying to achieve, but if you are new to Apama then I will say that that is not something that is done frequently, especially in simpler solutions when your are just starting.
Depending what you are trying to achieve, are you aware of the "engine_send" tool and the ability to use it to send in a text file of Apama events (normally a .evt file), and with batch tags if you want spread them over time?
http://www.apamacommunity.com/documents/10.5.3.0/apama_10.5.3.0_webhelp/apama-webhelp/apama-webhelp/re-DepAndManApaApp_sending_events_to_correlators.html
http://www.apamacommunity.com/documents/10.5.3.0/apama_10.5.3.0_webhelp/apama-webhelp/apama-webhelp/co-DepAndManApaApp_event_file_format.html

Related

Sockets to read/watch file?

This was prompted by this question: OS.File check last modified date before OS.read
I read that it might be wiser to use sockets to read files?
In my addon, every time the user clicks the PanelUI-popup button at top right, the new one in Australis, my add-on does a OS.File.read on the profiles.ini file to look for any changes. I do not even do OS.File.stat like in topic I linked above. And there is absolutely no performance issues from what I'm seeing. I have a computer from 2k4, Pentium 4, I notice any performance issues visually.
But I was waiting for a file watcher service, which is in the works right now at bugzilla. But I thought what is sockets? I searched SO but it didn't yield anything I understood they all seem to open to internet not to local file. (https://stackoverflow.com/search?q=[firefox-addon]+sockets)
Can sockets be used to watch a file for changes?

I read that it might be wiser to use sockets to read files?
No, I wrote that it might be wiser to use something like sockets for inter-process-communication (IPC) instead of files, to avoid disk I/O and polling in the first place. (I mentioned sockets for IPC in particular, because Firefox comes with a reasonably easy-to-use, cross-platform sockets API accessible from Javascript; still: nothing to do with files).
Since you're after the contents of a particular file (profiles.ini) and not after IPC, you'll have to actually read that file.

Perl Website with Dancer2 - how can I log user activity, history, etc?

We have a perl web interface that I am currently working on to slowly convert to using Dancer 2 and PSGI instead of our slow old plain vanilla CGI model.
In our old model, we stored everything in sessions -- the history of what the users did, the call stacks, the data inputs, ........ you get the idea.
We do not want to do it that way anymore so that we can keep the sessions small and efficient. BUT, we'd still like to log just what the users have been doing (that way when an error gets reported we can see what they did to get to the error, what input(s) they put in, etc).
I looked at Logging on Dancer2 documentation, but this doesn't seem to quite get to what we need - this would only record Dancer2 messages + what other messages I put in.
This one that I found Dancer2::Logger doesn't seem to quite cut it either.
What other libraries could I use to do what I need? I seriously doubt that perl does NOT have somethign that does this so...

Just off the top of my head, I can think of Log::log4perl and Log::Dispatch, though there are myriad others.
You can use them to establish your own log files, separate from dancer's log.
As for the best way, most logging interfaces have the same api for logging, but differ in run-time instantiation, and configuration syntax. So read the docs on a few of them and maybe try a couple out on for size.

Synchronize Directory of Files Between Server and iOS Application

I am building an internal iOS application (so - it won't ever be in the app store), and I need to keep a directory of content synchronized between a server and each of the instances of the iOS application. This would be easy enough if I just wanted to delete and re-download this content each time, but I would rather use something similar to rsync to only download the elements that have changed.
I haven't found any good way to utilize rsync. I considered looking at Objective-Git as a possibility here, but at a quick glance it looked like there is still a lot of the support for remote repositories that isn't supported yet.
As a final note, while this won't be in the app store, I will not be jailbreaking these devices and I would prefer to not rely on any private API's (although if there was an elegant solution that utilized private API's I might consider it).
Thoughts?
ADDITIONAL NOTE: This needs to be an isolated solution. I won't be relying on outside services (like Dropbox, Box.net, etc...). This needs to work solely between the device and the server (which is on a local network with the device).

Use HTTP to list the contents of each folder on the server.
Compare last modification time of each file with those on the device, and identify added/removed files.
Get added and modified files, remove deleted files.

It sounds like you're maybe asking for a library that already does this, but if you don't find one it's obviously moderately easy to write this from the ground up using stat(2) on the server and the same or a higher-level equivalent on the iOS devices. Have the iPhone send a tree of files with their modification date to the server and get back a list of insert/delete/update operations to do with the url (or whatever) for each one so you can do them incrementally on a background thread. Have the information from the server for new/updated files include the mod date that the server has so you can set it to be the same on the iOS device and send that when asking the server for the status of each file (kind of hack using the file system to store that, but it works).

Why not just set up a RESTful interface and do it across HTTP; that way you could query the modification times easily enough to determine whether client or server files need to be updated. You might also want to keep track of what files on the client have been synced, so you can easily know which files to add or delete. This can be done with a simple .sync file or using a plist / sqlite / etc.

If you'll consider FTP, there are some pretty advanced client libraries available.
For example, the iOS Chilkat bundle includes an FTP client library that supports synchronization in both directions. It's not free, but it's pretty cheap -- and you get a ton of other stuff that will likely prove useful someday. Here's an example of iOS pulling down all additions and changes (mode 2):
http://www.example-code.com/ios/ftp_syncLocalTree.asp
One caveat -- judging solely from the example, it doesn't appear to synchronize deletions. If this is a requirement, you could do it yourself without too much effort immediately following a sync.

acrosync (see https://acrosync.com/library.html) seems like a good fit given the initial question, however I haven't used it myself yet.

importing updated files into a database

I have files that are updated every 2 hours. I have to detect the files automatically and insert the extracted information from them into a database.
Our DBMS is Postgresql and programming language is Python. How would you suggest I do that?
I want to make use of DAL (Database Abstraction Layer) to make connection between the files and database and use postgresql LISTEN/NOTIFY techniques to detect the new files. If you agree with me please tell me how I can use LISTEN/NOTIFY functions to detect the files.
Thank you

What you need is to write a script that stays running as a dæmon, using a file system notify API to run a callback function when the files change. When the script is notified that the files change it should connect to PostgreSQL and do the required work, then go back to sleep waiting for the next change.
The only truly cross platform way to watch a directory for changes is to use a delay loop to poll os.listdir and os.stat to check for new files and updated modification times. This is a waste of power and disk I/O; it also gets slow for big sets of files. If your OS reliably changes the directory modification time whenever files within the directory change you can just os.stat the directory in a delay-loop, which helps.
It's much better to use an operating system specific notification API. Were you using Java I'd tell you to use the NIO2 watch service, which handles all the platform specifics for you. It looks like Watchdog may offer something similar for Python, but I haven't needed to do directory change notification in my Python coding so I haven't tested it. If it doesn't work out you can use platform-specific techniques like inotify/dnotify for Linux, and the various watcher APIs for Windows.
See also:
How do I watch a file for changes?
Python daemon to watch a folder and update a database

You can't use LISTEN/NOTIFY because that can only send messages from within the database and your files obviously aren't in there.
You'll want to have your python script scan the directory the files are in and check their modification time (mtime). If they are updated, you'll need to read in the files, parse the data and insert it to the db. Without knowing the format of the files, there's no way to be more specific.

I'm writing an application that monitors a directory for new input files by polling the directory every few seconds. New files may often be several megabytes, and so take some time to fully arrive in the input directory (eg: on copy from a remote share).
Is there a simple way to detect whether a file is currently in the process of being copied? Ideally any method would be platform and filesystem agnostic, but failing that specific strategies might be required for different platforms.
I've already considered taking two directory listings separaetd by a few seconds and comparing file sizes, but this introduces a time/reliability trade-off that my superiors aren't happy with unless there is no alternative.
For background, the application is being written as a set of Matlab M-files, so no JRE/CLR tricks I'm afraid...
Edit: files are arriving in the input directly by straight move/copy operation, either from a network drive or from another location on a local filesystem. This copy operation will probably be initiated by a human user rather than another application.
As a result, it's pretty difficult to place any responsibility on the file provider to add control files or use an intermediate staging area...
Conclusion: it seems like there's no easy way to do this, so I've settled for a belt-and-braces approach - a file is ready for processing if:
its size doesn't change in a certain period of time, and
it's possible to open the file in read-only mode (some copying processes place a lock on the file).
Thanks to everyone for their responses!

The safest method is to have the application(s) that put files in the directory first put them in a different, temporary directory, and then move them to the real one (which should be an atomic operation even when using FTP or file shares). You could also use naming conventions to achieve the same result within one directory.
Edit:
It really depends on the filesystem, on whether its copy functionality even has the concept of a "completed file". I don't know the SMB protocol well, but if it has that concept, you could write an app that exposes an SMB interface (or patch Samba) and an API to get notified for completed file copies. Probably a lot of work though.

This is a middleware problem as old as the hills, and the short answer is: no.
The two 'solutions' put the onus on the file-uploader: (1) upload the file in a staging directory and then move it into the destination directory (2) upload the file, and then create/upload a 'ready' file that indicates the state of the content file.
The 1st one is the better, but both are inelegant. The truth is that better communication media exist than the filesystem. Consider using some IPC that involves only a push or a pull (and not both, as does the filesystem) such as an HTTP POST, a JMS or MSMQ queue, etc. Furthermore, this can also be synchronous, allowing the process receiving the file to acknowledge the content, even check it for worthiness, and hand the client a receipt - this is the righteous road to non-repudiation. Follow this, and you will never suffer arguments over whether a file was or was not delivered to your server for processing.
M.

One simple possibility would be to poll at a fairly large interval (2 to 5 minutes) and only acknowledge the new file the second time you see it.
I don't know of a way in any OS to determine whether a file is still being copied, other than maybe checking if the file is locked.

How are the files getting there? Can you set an attribute on them as they are written and then change the attribute when write is complete? This would need to be done by the thing doing the writing ... which sounds like it isn't an option.
Otherwise, caching the listing and treating a file as new if it has the same file size for two consecutive listings is the best way I can think of.
Alternatively, you could use the modified time on the file - the file has to be new and have a modified time that is at least x in the past. But I think this will be about equivalent to caching the listing.
It you are polling the folder every few seconds, its not much of a time penalty is it? And its platform agnostic.
Also, linux only: http://www.linux.com/feature/144666
Like cron but for files. Not sure how it deals with your specific problem - but may be of use?

What is your OS. In unix you can use the "lsof" utility to determine if a user has the file open for write. Apparently somewhere in the MS Windows Process Explorer there is the same functionality.
Alternativly you could just try an exclusive open on the file and bail out of this fails. But this can be a little unreliable and its easy to tread on your own toes.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse