Sharing a file among several processes [Perl] - perl

I have an application that updates a CSV file (single one), the CSV is being updated randomly from several processes, and I guess if two processes try to update it (add a row...) on the same time, some data will be lost I guess, or overwritten(?).
what is the best way to avoid this?
thanks,

Use Perl's DBI with the DBD::CSV driver to access your data; that'll take care of the flocking for you. (Unless you're using Windows 95 or the old Mac OS.) If you decide to switch to an RDBMS later on, you'll be well prepared.
Simple flocking as suggested by #Fluff should also be fine, of course.

If you want to have a simple and manual way to take care of file locking.
1) As soon as a process opens the csv, it creates a lock.
(Lock can be in the form of creating a dummy file. The process has to delete
the file(lock) as soon as it is done reading/updating the csv)
2) Have each process check for file lock before trying to update the csv.
(If dummy file is present, some process is accessing the csv,
else it can update the csv)

Related

Perl share hashmap through file

Currently I have a script that collects data of the specified server. The data is stored inside a hash which I store into a file for persistence.
If the script is being called with another server it should load the hash from the file and extend the hash with the data from the second server. Then save it back.
I use the storable module.
use Storable;
$recordedpkgs = retrieve($MONPKGS_DATA_FILE) if ( -e $MONPKGS_DATA_FILE);
store $recordedpkgs, $MONPKGS_DATA_FILE;
Obviously there is a access issue if one writes while the other has already read the file. Some data will be then lost.
What would be an ideal solution to that? Use basic file locking? Is there better ways to achieve that?
It depends - what you're talking about is inter process communication, and perl has a whole documentation segment on the subject perlipc
But to answer your question directly - yes, file locking is the way to go. It's exactly the tool for the job you describe.
Unfortunately, it's often OS dependent. Windows and Linux locking semantics are different. Take a look at flock - that's the basic start on Unix based systems. Take a look at: http://www.perlmonks.org/?node_id=7058
It's an advisory lock, where you can request a shared (read) or exclusive (write) lock. And either block (until released), or fail and return if you cannot acquire that lock.
Storable does implement some locking semantics: http://perldoc.perl.org/Storable.html#ADVISORY-LOCKING
But you might find you want to use a lock file if you're doing a read-modify-write cycle on the saved content.
I would just use a basic lock file that is checked before operations are performed upon the file, if the lock file is in place then simply make your other process either wait + check (either infinite or a set amount of times before exiting), or simply exit with an error.

Powershell holding a lock on a log file

I use ADD-CONTENT and OUT-FILE to write information to log files, and in order to simplify the amount of logs I have I'd like multiple instances of my script to be able to share log files. Is there any way to make sure that powershell doesn't hold a lock on those files when writing to them?
For example, I have a SQLCMD call that restores a database, which can take 20 minutes or so. During this time, it writes the output to a log file and thus maintains a lock on that file (so I can't write to it with other scripts).
Ideally I would like both processes to be able to write at the same time. Should I write a test-file function to see if the file is locked prior to writing? And if it is, sleep for x seconds and check again?
Multiple processes writing to a file is pretty difficult to pull off safely. A better method is a transactional system. Many people use a transactional database for multiple processes to log to. Another good option is to write to a custom or system event log. This is also transactional and should avoid collisions.

In OpenEdge, how do you transfer parts of the data in the database in an easy way?

I have a lot of data in 2 different databases and in many different tables I would like to move from one computer into a few others. The others has the same definition of the db:s. Note, not all the data should be transfered, only some that I define. Some tables fully, and some others just partly.
How would I move these data in the easiest way? To dump each table and load separately in many .d files - is not an easy way. Could you do something similar to the Incremental .df File that contains all that has to be changed?
Dumping (and loading) entire tables is easy. You can do it from the GUI or by command line. Look at for instance this KnowledgeBase entry about command line dump & load and this about creating scripts for dumping the entire database.
Parts of the data is another story. This is very individual and depends on your database and your application. It's hard for a generic tool to compare data and tell if a difference in data depends on changed data, added data or deleted data. Different databases has different kinds of layout, keys and indices.
There are however several built in commands that could help you:
For instance:
IMPORT and EXPORT for importing and exporting data to files, streams etc.
Basic import and export
OUTPUT TO c:\temp\foo.data.
FOR EACH foo NO-LOCK:
EXPORT foo.
END.
OUTPUT CLOSE.
INPUT FROM c:\temp\foo.data.
REPEAT:
CREATE foo.
IMPORT foo.
END.
INPUT CLOSE.
BUFFER-COPY and BUFFER-COMPARE for copying and comparing data between tables (and possibly even databases).
You could also use the built in commands for doing "dump" and then manually edit the created files.
Calling Progress Built in commands
You can call the back end that dumps data from Data Administration. That will require you to extract those .p-files from it's archives and calling them manually. This will also require you to change PROPATHS etc so it's not straightforward. You could also look into modifying the extracted files to your needs. Remember that this might break when upgrading Progress so store away your changes in separate files.
Look at this Progress KB entry:
Progress KB 15884
Best way for you depends on if this is a one time or reacurring task, size and layout of database etc.

Verify .mat file exists and is not Corrupt - Matlab

I have 2 independent Matlab workers, with FIRST getting/saving data and SECOND reading it (and doing some calculations etc).
FIRST saves data as .mat file on the hard-disk while SECOND reads it from there. It takes ~20 seconds to SAVE this data as .mat and 8millisec to DELETE it. Before SAVING data, FIRST deletes the old file and then saves a newer version.
How can the SECOND verify that data exists and is not corrupt? I can use exists but that doesn't tell me if the data is corrupt or not. For eg, if SECOND tries to read data exactly when FIRST is saving it, exists passes but LOAD gives you an error saying - Data Corrupt etc.
Thanks.
You can't, without some synchronization mechanism - by the time SECOND completes its check and starts to read the file, FIRST might have started writing it again. You need some sort of lock or mutex.
Two options for base Matlab.
If this is on a local filesystem, you could use a separate lock file sitting next to the data file to manage concurrent access to the data file. Use Java's NIO FileChannel and FileLock objects from within Matlab to lock the first byte of the lock file and use that as a semaphore to control access to the data file, so the reader waits until the writer is finished and vice versa. (If this is on a network filesystem, don't try this - file locking may seem to work but usually is not officially supported and in my experience is unreliable.)
Or you could just put a try/catch around your load() call and have it pause a few seconds and retry if you get a corrupt file error. The .mat file format is such that you won't get a partial read if the writer is still writing it; you'll get that corrupt file error. So you could use this as a lazy sort of collision detection and backoff. This is what I usually do.
To reduce the window of contention, consider having FIRST write to a temporary file in the same directory, and then use a rename to move it to its final destination. That way the file is only unavailable during a quick filesystem move operation, not the 20 seconds of data writing. If you have multiple writers, stick the PID and hostname in the temp file name to avoid collisions.
Sounds like a classic resource sharing problem between 2 threads (R-W)
In short, you should find a method of inter-workers safe communication. Check this out.
Also, try to type
showdemo('paralleldemo_communic_prof')
in Matlab

SQLite3: Batch Insert?

I've got some old code on a project I'm taking over.
One of my first tasks is to reduce the final size of the app binary.
Since the contents include a lot of text files (around 10.000 of them), my first thought was to create a database containing them all.
I'm not really used to SQLite and Core Data, so I've got basically two questions:
1 - Is my assumption correct? Should my SQLite file have a smaller size than all of the text files together?
2 - Is there any way of automating the task of getting them all into my newly created database (maybe using some kind of GUI or script), one file per record inside a single table?
I'm still experimenting with CoreData, but I've done a lot of searching already and could not find anything relevant to bringing everything together inside the database file. Doing that manually has proven no easy task already!
Thanks.
An alternative to using SQLite might be to use a zipfile instead. This is easy to create, and will surely safe space (and definitely reduce the number of files). There are several implementations of using zipfiles on the iphone, e.g. ziparchive or TWZipArchive.
1 - It probably won't be any smaller, but you can compress the files before storing them in the database. Or without the database for that matter.
2 - Sure. It's shouldn't be too hard to write a script to do that.
If you're looking for a SQLite bulk insert command to write your script for 2), there isn't one AFAIK. Prepared insert statments in a loop inside a transaction is the best you can do, I imagine it would take only a few seconds (if that) to insert 10,000 records.