Sockets to read/watch file? - sockets

This was prompted by this question: OS.File check last modified date before OS.read
I read that it might be wiser to use sockets to read files?
In my addon, every time the user clicks the PanelUI-popup button at top right, the new one in Australis, my add-on does a OS.File.read on the profiles.ini file to look for any changes. I do not even do OS.File.stat like in topic I linked above. And there is absolutely no performance issues from what I'm seeing. I have a computer from 2k4, Pentium 4, I notice any performance issues visually.
But I was waiting for a file watcher service, which is in the works right now at bugzilla. But I thought what is sockets? I searched SO but it didn't yield anything I understood they all seem to open to internet not to local file. (https://stackoverflow.com/search?q=[firefox-addon]+sockets)
Can sockets be used to watch a file for changes?

I read that it might be wiser to use sockets to read files?
No, I wrote that it might be wiser to use something like sockets for inter-process-communication (IPC) instead of files, to avoid disk I/O and polling in the first place. (I mentioned sockets for IPC in particular, because Firefox comes with a reasonably easy-to-use, cross-platform sockets API accessible from Javascript; still: nothing to do with files).
Since you're after the contents of a particular file (profiles.ini) and not after IPC, you'll have to actually read that file.

Related

Is there a way for Apama to read files line by line?

I'm new to Apama. I see that a com.apama.file lib exists, but I am unsure how to actually use it to read a file. I want to send each line as an event to be parsed and then depending on the contents sent as a different event from there, but googling suggests that I'd need a transport (not sure what that is either) to do so, but my project lead is under the impression that this can all be done using Apama EPL. How true is this and if it has some validity, how can I go about achieving that?
Yes, this is certainly possible. To help you do it, though, please can you provide a little more information about your setup? For example, what is the file type and is the file local to where the correlator will be running? Will there only be one file to process at a time? How large is the file, and are there any specific performance requirements?
You may find this helpful:
https://github.com/SoftwareAG/apama-streaming-analytics-connectivity-FileTransport
You don't say quite what you are trying to achieve, but if you are new to Apama then I will say that that is not something that is done frequently, especially in simpler solutions when your are just starting.
Depending what you are trying to achieve, are you aware of the "engine_send" tool and the ability to use it to send in a text file of Apama events (normally a .evt file), and with batch tags if you want spread them over time?
http://www.apamacommunity.com/documents/10.5.3.0/apama_10.5.3.0_webhelp/apama-webhelp/apama-webhelp/re-DepAndManApaApp_sending_events_to_correlators.html
http://www.apamacommunity.com/documents/10.5.3.0/apama_10.5.3.0_webhelp/apama-webhelp/apama-webhelp/co-DepAndManApaApp_event_file_format.html

Site on two different servers

Im considering taking web server from China to reduce site loading times from China/China users. Problem is, how to sync/keep same data between two sites? When editing content in the site it should update these changes to site in China server.
Server is running Linux, Apache and MySQL. Website is using WordPress.
FYI I'm already using CDN and site loading speed is still too long from China.
Basically your solution would need to...
Copy the entire contents of your http'd directory from the main server to the Chinese server.
Copy the entire contents of your MySQL database from the main server to the Chinese server.
Perform these tasks at a regular interval without manual intervention.
I can guide you to references that will help with each task and sometimes can show you a quick example. However, if you want to get it to work and especially if you want to optimize the process, you're going to have to look through the references yourself.
If I didn't do it this way this answer would get even more horrendously long that it already is.
Before we start you should remember...
Thing 0 - Please Try Not to be Intimidated by the Length of this Answer
I know I've written a lot, perhaps more than I should have, but I guarantee you are capable of implementing this in no more than a day. I have tried to be thorough but that does not mean that what I'm describing is particularly complicated.
Thing 1 - Shutdown your Chinese Server During Transfer
This transfer of data is going to make your Chinese server unusable while it's in progress, as you might have guessed. You need to make sure that you're Chinese server is not operational during the transfer. Otherwise the server might have only partial data available which could cause problems for both client and server, particularly in relation to MySQL.
Thing 2 - Use Compression as much as You Can
As time consuming as compression and decompression can be for large amounts of data, believe me it is nothing compared to the time you will waste sending the uncompressed data to China. Network usage, not processor time, is really going to be the limiting factor in getting the transfer done quickly. Try to send compressed files whenever possible.
Thing 3 - Try to Use Checksums
Sending all your data, particularly in compressed format, will leave it vulnerable to corruption in transit. Whenever you send a file I encourage you to use some kind of checksum on the data to verify that it has not been corrupted. For brevity I will not be showing you how to do this but I'm sure you're smart enough to figure out how to pepper in some verification.
In case you're not familiar with checksums, the Wikipedia article about them is pretty straight forward. The most commonly used are the MD5 and the SHA-1, but both of those are somewhat collision prone. I would recommend the SHA-2 (also called SHA-256/512) or the very new SHA-3.
Copying your Http'd Directory to the Chinese Server
As far as I know (and I could be wrong) there is no built in way to transfer files from one Apache server to another...so you're going to have to write your own script for this.
You're also going to need to have two separate scripts: one for the main server and one for the Chinese server. Here's a breakdown of what each script needs to do.
On your main server...
Log in as you're Apache server's user. (Reference for switching users.)
zip/gzip/tar.gz your http'd directory's contents. (Reference for zip. Reference for gzip. Reference for tar.)
scp (secure copy) the compressed file to your Chinese server. Make sure to copy it to the username that Apache runs under. (Reference for scp.)
Delete the compressed file.
Initiate the Chinese server's script (this will be discussed later).
You will likely be using a shell script for all of this, so I hope you're familiar with the terminal. A simple example would look like this.
#!/bin/sh
## First I'll define some variables to explain this better.
APACHE_USER="whatever your Apache server's username is (usually it's www-data)";
WWW_DIR="your http'd directory relative to ~ (usually it's /var/www)";
CHINA_HOST="the host name/IP address of your Chinese server"
CHINA_USER="Apache's username on the Chinese server";
CHINA_PWD="Apache's user password on the Chinese server";
CHINA_HOME="the home directory of the Apache user on your Chinese server";
## Now to the real scripting. I will be using zip for compression.
su - "$APACHE_USER";
zip -r copy.zip "$WWW_DIR";
scp copy.zip "$CHINA_USER#$CHINA_HOST:$CHINA_HOME" < echo $CHINA_PWD;
rm copy.zip;
## Then you initiate the next step of the process.
## Like I said this will be covered later.
On your Chinese server...
Log in as the Apache user.
Delete the content of the http'd directory (probably /var/www relative to ~).
Decompress the scp'd file (this will change depending on how you compressed it).
Copy the decompressed directory to the http'd directory (this step is unnecessary if you choose to compress with zip).
Deleted the compressed, scp'd file.
Notify main server to continue next step (again, will be discussed later).
This is pretty straight forward and I don't think you need another example for this part.
Copying the MySQL Database Contents
You can find a good reference for how to do this in this article from the MySQL website. Basically copying database contents is a built in feature. Try to make use of the compression options!
Performing these Tasks at Regular Intervals without Manual Intervention
Ok this is where things get kind of complicated.
The first thing you need to know is how to schedule tasks at regular intervals on Linux. This is done with a command line tool called crontab. You can see good examples for setting up cron jobs in this article, and the full crontab documentation here.
However what will take more skill than just scheduling the job at regular intervals will be synchronizing the data transfer. If you simply set one server to send data at a certain time and the other to receive it at a certain time, you will get many bugs. Be sure of that.
My recommendation would be to create a socket in the Chinese server that listens for instructions from the main server.
This can be done in a variety of languages. Because you're using Linux I would recommend doing this in C, but it can be done in almost any language including Bash.
A full example would be too much but basically this will be the flow of what you have to do.
Socket in China listens for connections.
Cron job in main server connects to China socket.
Main server authenticates itself.
Chinese server stops Apache, stops accepting requests.
Chinese server acknowledges authentication approved.
Main server scp's website contents to Chinese server.
Main server tells Chinese server that scp is complete.
Chinese server replaces Apache's http'd directory's contents with the data that has been scp'd.
Chinese server announces success to main server.
Main server copies MySQL data.
Main server tells Chinese server process is complete.
Chinese server resumes Apache service.
Chinese server notify's main server that service is resumed.
Socket is closed.
Chinese server goes back to listening for connection from main server.
I hope this helps!

Synchronize Directory of Files Between Server and iOS Application

I am building an internal iOS application (so - it won't ever be in the app store), and I need to keep a directory of content synchronized between a server and each of the instances of the iOS application. This would be easy enough if I just wanted to delete and re-download this content each time, but I would rather use something similar to rsync to only download the elements that have changed.
I haven't found any good way to utilize rsync. I considered looking at Objective-Git as a possibility here, but at a quick glance it looked like there is still a lot of the support for remote repositories that isn't supported yet.
As a final note, while this won't be in the app store, I will not be jailbreaking these devices and I would prefer to not rely on any private API's (although if there was an elegant solution that utilized private API's I might consider it).
Thoughts?
ADDITIONAL NOTE: This needs to be an isolated solution. I won't be relying on outside services (like Dropbox, Box.net, etc...). This needs to work solely between the device and the server (which is on a local network with the device).
Use HTTP to list the contents of each folder on the server.
Compare last modification time of each file with those on the device, and identify added/removed files.
Get added and modified files, remove deleted files.
It sounds like you're maybe asking for a library that already does this, but if you don't find one it's obviously moderately easy to write this from the ground up using stat(2) on the server and the same or a higher-level equivalent on the iOS devices. Have the iPhone send a tree of files with their modification date to the server and get back a list of insert/delete/update operations to do with the url (or whatever) for each one so you can do them incrementally on a background thread. Have the information from the server for new/updated files include the mod date that the server has so you can set it to be the same on the iOS device and send that when asking the server for the status of each file (kind of hack using the file system to store that, but it works).
Why not just set up a RESTful interface and do it across HTTP; that way you could query the modification times easily enough to determine whether client or server files need to be updated. You might also want to keep track of what files on the client have been synced, so you can easily know which files to add or delete. This can be done with a simple .sync file or using a plist / sqlite / etc.
If you'll consider FTP, there are some pretty advanced client libraries available.
For example, the iOS Chilkat bundle includes an FTP client library that supports synchronization in both directions. It's not free, but it's pretty cheap -- and you get a ton of other stuff that will likely prove useful someday. Here's an example of iOS pulling down all additions and changes (mode 2):
http://www.example-code.com/ios/ftp_syncLocalTree.asp
One caveat -- judging solely from the example, it doesn't appear to synchronize deletions. If this is a requirement, you could do it yourself without too much effort immediately following a sync.
acrosync (see https://acrosync.com/library.html) seems like a good fit given the initial question, however I haven't used it myself yet.

How can I communicate across Perl CGI scripts?

I am searching for efficient ways of communication across two Perl
scripts. I have two scripts; Script 1 generates some data. I want my
Script 2 to be able to access that information.
The easiest/dumbest
way is to write the data generated by Script 1 as a file and read it
later using Script 2. Is there any other way than this? Can I store
the data in memory and make it available to Script 2 (of course with
support from my Linux )? Meaning malloc some data by Script 1 and make
Script 2 able to access it.
There is no guarantee that Script 2 will be run after Script 1. So
there should be some way to free that memory using a watchdog timer.
Let me reveal some more context. I am running these scripts on a web-server using CGI-Perl. So at the click of a button Script 1 is run and it generates a html web-page. Now the user can add some inputs to to this generated web-page and click a button on this new page.Now Script 2 should be able to read the data on new web-page.I can post the data back to web-server again but a more efficient way is to keep a copy of generated page in server also and make it available to script 2. Now, I would like to avoid writing down the generated page as a file. I was thinking of storing it in memory
This depends somewhat on your usage... one large set of data? Many small messages? Di you canre at all about data persistance? Is it TOTALLY asynchronous?
Some of the options are:
For any but the most high performace web sites, the best approach is to write our the HTML pages to files!. Unless the intrer-process communication is benchmarked to be the botttleneck in performance, don't both with any of the non-file solutions (shared memory, cache, intermediate server).
Specifically for two CGI scripts on the same server, if you run them under mod_perl or some other arrangement which shares Perl interpreter between 2 CGI processes, you can develop a package to serve as cache, which -with its package level variable - would be preserved in memory by mod_perl as long as mod_perl is running and can thus be used by a writer CGI process and a reader CGI process to communicate. Of course the usual synchronization/deadlock and persistance issues associated with reader/writer need to be considered.
As an alternative, use Apache::Session sessions to store inter-session data.
As you noted, shared memory. For example use IPC::ShareLite, IPC::Cache, or this solution from perlmonks.
Also, please check Chapter 16 Recipe 12 "Sharing Variables in Different Processes" from O'Reilly's "Perl Cookbook" (no link since non-pirated versions aren't online anywhere I know of)
Use a permanent medium. A file is one option. A database is another.
For async, use an intermediate messaging system (MQ, Tibco, something more lightweight). Probably a bit of an overkill in this scenario but a valid option to be aware of. This one is likely to be pretty stablem solid and optmized, but possibly not free and less flexible/tailored.
Or roll your own simple messaging system server - it's not THAT complicated for very simple one you seem to need.
Listen on one port for requests from first process to store data, listen on another port for requests from consumer process to send you that data, store the data in a storage area in memory and purge it when it expires using alarms or separate watcher child process).
You've tagged your question as "cgi". Are they both CGI programs? In that case, they can just talk to each other by making HTTP requests.
However, you'll have to tell a lot more about why you are trying to do this and what you need to accomplish for us to help you. It's certainly easy for Perl programs to communicate with each other in some fashion, but that doesn't mean it's the right answer for you.
When you have complex requirements for interaction among CGI programs, you probably want to move to a web framework that handles a lot of those details for you. Catalyst might be where'd you want to start. There's even a book for it.

Detect a file in transit?

I'm writing an application that monitors a directory for new input files by polling the directory every few seconds. New files may often be several megabytes, and so take some time to fully arrive in the input directory (eg: on copy from a remote share).
Is there a simple way to detect whether a file is currently in the process of being copied? Ideally any method would be platform and filesystem agnostic, but failing that specific strategies might be required for different platforms.
I've already considered taking two directory listings separaetd by a few seconds and comparing file sizes, but this introduces a time/reliability trade-off that my superiors aren't happy with unless there is no alternative.
For background, the application is being written as a set of Matlab M-files, so no JRE/CLR tricks I'm afraid...
Edit: files are arriving in the input directly by straight move/copy operation, either from a network drive or from another location on a local filesystem. This copy operation will probably be initiated by a human user rather than another application.
As a result, it's pretty difficult to place any responsibility on the file provider to add control files or use an intermediate staging area...
Conclusion: it seems like there's no easy way to do this, so I've settled for a belt-and-braces approach - a file is ready for processing if:
its size doesn't change in a certain period of time, and
it's possible to open the file in read-only mode (some copying processes place a lock on the file).
Thanks to everyone for their responses!
The safest method is to have the application(s) that put files in the directory first put them in a different, temporary directory, and then move them to the real one (which should be an atomic operation even when using FTP or file shares). You could also use naming conventions to achieve the same result within one directory.
Edit:
It really depends on the filesystem, on whether its copy functionality even has the concept of a "completed file". I don't know the SMB protocol well, but if it has that concept, you could write an app that exposes an SMB interface (or patch Samba) and an API to get notified for completed file copies. Probably a lot of work though.
This is a middleware problem as old as the hills, and the short answer is: no.
The two 'solutions' put the onus on the file-uploader: (1) upload the file in a staging directory and then move it into the destination directory (2) upload the file, and then create/upload a 'ready' file that indicates the state of the content file.
The 1st one is the better, but both are inelegant. The truth is that better communication media exist than the filesystem. Consider using some IPC that involves only a push or a pull (and not both, as does the filesystem) such as an HTTP POST, a JMS or MSMQ queue, etc. Furthermore, this can also be synchronous, allowing the process receiving the file to acknowledge the content, even check it for worthiness, and hand the client a receipt - this is the righteous road to non-repudiation. Follow this, and you will never suffer arguments over whether a file was or was not delivered to your server for processing.
M.
One simple possibility would be to poll at a fairly large interval (2 to 5 minutes) and only acknowledge the new file the second time you see it.
I don't know of a way in any OS to determine whether a file is still being copied, other than maybe checking if the file is locked.
How are the files getting there? Can you set an attribute on them as they are written and then change the attribute when write is complete? This would need to be done by the thing doing the writing ... which sounds like it isn't an option.
Otherwise, caching the listing and treating a file as new if it has the same file size for two consecutive listings is the best way I can think of.
Alternatively, you could use the modified time on the file - the file has to be new and have a modified time that is at least x in the past. But I think this will be about equivalent to caching the listing.
It you are polling the folder every few seconds, its not much of a time penalty is it? And its platform agnostic.
Also, linux only: http://www.linux.com/feature/144666
Like cron but for files. Not sure how it deals with your specific problem - but may be of use?
What is your OS. In unix you can use the "lsof" utility to determine if a user has the file open for write. Apparently somewhere in the MS Windows Process Explorer there is the same functionality.
Alternativly you could just try an exclusive open on the file and bail out of this fails. But this can be a little unreliable and its easy to tread on your own toes.