Perl share hashmap through file - perl

Currently I have a script that collects data of the specified server. The data is stored inside a hash which I store into a file for persistence.
If the script is being called with another server it should load the hash from the file and extend the hash with the data from the second server. Then save it back.
I use the storable module.
use Storable;
$recordedpkgs = retrieve($MONPKGS_DATA_FILE) if ( -e $MONPKGS_DATA_FILE);
store $recordedpkgs, $MONPKGS_DATA_FILE;
Obviously there is a access issue if one writes while the other has already read the file. Some data will be then lost.
What would be an ideal solution to that? Use basic file locking? Is there better ways to achieve that?

It depends - what you're talking about is inter process communication, and perl has a whole documentation segment on the subject perlipc
But to answer your question directly - yes, file locking is the way to go. It's exactly the tool for the job you describe.
Unfortunately, it's often OS dependent. Windows and Linux locking semantics are different. Take a look at flock - that's the basic start on Unix based systems. Take a look at: http://www.perlmonks.org/?node_id=7058
It's an advisory lock, where you can request a shared (read) or exclusive (write) lock. And either block (until released), or fail and return if you cannot acquire that lock.
Storable does implement some locking semantics: http://perldoc.perl.org/Storable.html#ADVISORY-LOCKING
But you might find you want to use a lock file if you're doing a read-modify-write cycle on the saved content.

I would just use a basic lock file that is checked before operations are performed upon the file, if the lock file is in place then simply make your other process either wait + check (either infinite or a set amount of times before exiting), or simply exit with an error.

Related

Powershell holding a lock on a log file

I use ADD-CONTENT and OUT-FILE to write information to log files, and in order to simplify the amount of logs I have I'd like multiple instances of my script to be able to share log files. Is there any way to make sure that powershell doesn't hold a lock on those files when writing to them?
For example, I have a SQLCMD call that restores a database, which can take 20 minutes or so. During this time, it writes the output to a log file and thus maintains a lock on that file (so I can't write to it with other scripts).
Ideally I would like both processes to be able to write at the same time. Should I write a test-file function to see if the file is locked prior to writing? And if it is, sleep for x seconds and check again?
Multiple processes writing to a file is pretty difficult to pull off safely. A better method is a transactional system. Many people use a transactional database for multiple processes to log to. Another good option is to write to a custom or system event log. This is also transactional and should avoid collisions.

Google Cloud Storage transactions?

It does not appear that GCS has any transaction mechanism. Is this correct?
I would like to be able to have a long lived transaction. For example, it would be great if I could start a transaction and specify an expiration time (if not committed within X time it automatically gets rolled back). Then I could use this handle to insert objects, compose, delete etc. and if all goes well, issue a isCommitPossible(), and if yes, then commit().
Is this a possibility?
Object writes are transactional (either the complete object and its metadata are successfully written and the object becomes visible; or it fails without becoming visible). But there's no transaction mechanism spanning multiple GCS operations.
Mike
The Cloud Storage client libraries offer a file-like object to work with, which has an Open() and Close() operation. If a single operation can be transactional then, in theory, it should be possible to open a single "lock file" for the duration of all other operations, only closing it when you're done will all the other files.
In other words, you would have to write your processes to use a "lock file" and, in that way, you could, at the least, know whether or not all your files were written/read or if there was some error. Whenever the next round of operations takes place, it would just look for the existence of the lock file that corresponds to the set of files written (you'd have to arrange your naming, directory layout, etc, to have it make sense for this). If it exists, we can assume that the file group was written successfully. If it doesn't exist, assume that something happened (or that the process hasn't yet completed).
I have not actually tested this out. But I offer it as an idea for others who might be desperate enough to try.

Verify .mat file exists and is not Corrupt - Matlab

I have 2 independent Matlab workers, with FIRST getting/saving data and SECOND reading it (and doing some calculations etc).
FIRST saves data as .mat file on the hard-disk while SECOND reads it from there. It takes ~20 seconds to SAVE this data as .mat and 8millisec to DELETE it. Before SAVING data, FIRST deletes the old file and then saves a newer version.
How can the SECOND verify that data exists and is not corrupt? I can use exists but that doesn't tell me if the data is corrupt or not. For eg, if SECOND tries to read data exactly when FIRST is saving it, exists passes but LOAD gives you an error saying - Data Corrupt etc.
Thanks.
You can't, without some synchronization mechanism - by the time SECOND completes its check and starts to read the file, FIRST might have started writing it again. You need some sort of lock or mutex.
Two options for base Matlab.
If this is on a local filesystem, you could use a separate lock file sitting next to the data file to manage concurrent access to the data file. Use Java's NIO FileChannel and FileLock objects from within Matlab to lock the first byte of the lock file and use that as a semaphore to control access to the data file, so the reader waits until the writer is finished and vice versa. (If this is on a network filesystem, don't try this - file locking may seem to work but usually is not officially supported and in my experience is unreliable.)
Or you could just put a try/catch around your load() call and have it pause a few seconds and retry if you get a corrupt file error. The .mat file format is such that you won't get a partial read if the writer is still writing it; you'll get that corrupt file error. So you could use this as a lazy sort of collision detection and backoff. This is what I usually do.
To reduce the window of contention, consider having FIRST write to a temporary file in the same directory, and then use a rename to move it to its final destination. That way the file is only unavailable during a quick filesystem move operation, not the 20 seconds of data writing. If you have multiple writers, stick the PID and hostname in the temp file name to avoid collisions.
Sounds like a classic resource sharing problem between 2 threads (R-W)
In short, you should find a method of inter-workers safe communication. Check this out.
Also, try to type
showdemo('paralleldemo_communic_prof')
in Matlab

Sharing a file among several processes [Perl]

I have an application that updates a CSV file (single one), the CSV is being updated randomly from several processes, and I guess if two processes try to update it (add a row...) on the same time, some data will be lost I guess, or overwritten(?).
what is the best way to avoid this?
thanks,
Use Perl's DBI with the DBD::CSV driver to access your data; that'll take care of the flocking for you. (Unless you're using Windows 95 or the old Mac OS.) If you decide to switch to an RDBMS later on, you'll be well prepared.
Simple flocking as suggested by #Fluff should also be fine, of course.
If you want to have a simple and manual way to take care of file locking.
1) As soon as a process opens the csv, it creates a lock.
(Lock can be in the form of creating a dummy file. The process has to delete
the file(lock) as soon as it is done reading/updating the csv)
2) Have each process check for file lock before trying to update the csv.
(If dummy file is present, some process is accessing the csv,
else it can update the csv)

Perl DBM vs. Storable

for my current project i need to store a little database on disk, that i read once my program runs and write it once.
I have looked into perls DBM functionality and from what I understand it provides merely a hash that is stored on disk with every read and write going directly to disk.
My question is: Could I not simply use Storable or any of the related modules to achieve the same (a persistent hash) with far less File I/O overhead? (The hashes will never be to large to fit into memory easily)
Regards
Nick
SQLite is fast becoming the standard for simple on-disk databases. And in Perl you can just use DBD::SQLite and you're good to go.
Since the previous answers didn't really answer your actual question, "yes, you can"... with the following caveats:
Storable isn't really suited to concurrent access.
You will need to roll your own "atomic" update (ie: you will need to write to a tmp file, then rename).
If performance isn't really an issue, you could also use Data::Dumper (with the resulting file being somewhat human readable).
You could splat the contents to CSV.
I often use Dumper when there is only going to be a single task accessing the file - and it gives me a way to read/modify the contents if I see fit.