Perl DBM vs. Storable - perl

for my current project i need to store a little database on disk, that i read once my program runs and write it once.
I have looked into perls DBM functionality and from what I understand it provides merely a hash that is stored on disk with every read and write going directly to disk.
My question is: Could I not simply use Storable or any of the related modules to achieve the same (a persistent hash) with far less File I/O overhead? (The hashes will never be to large to fit into memory easily)
Regards
Nick

SQLite is fast becoming the standard for simple on-disk databases. And in Perl you can just use DBD::SQLite and you're good to go.

Since the previous answers didn't really answer your actual question, "yes, you can"... with the following caveats:
Storable isn't really suited to concurrent access.
You will need to roll your own "atomic" update (ie: you will need to write to a tmp file, then rename).
If performance isn't really an issue, you could also use Data::Dumper (with the resulting file being somewhat human readable).
You could splat the contents to CSV.
I often use Dumper when there is only going to be a single task accessing the file - and it gives me a way to read/modify the contents if I see fit.

Related

How to keep a big hash on disk instead of in RAM?

I've got too little RAM to finish a calculation because of a large hash. Is there a drop-in Perl module that would let me use the hash without keeping it all in RAM? I expect it to top out around 4GB, and I've got a bit less than 2GB available for the script. I don't think processing time or disk I/O would be an issue.
You can use dbm_open to open a hash connected to a DBM file. These are not particularly sophisticated and can handle shallow hashes of simple keys and values.
For anything more sophisticated, I would recommend using SQLite.
You may try DB_File module (or similar modules).
Memory usage hints: https://www.perlmonks.org/?node_id=146377
Take a look at AnyDBM_File for other similar modules available with rudimentary comparison.
$hash{$key1,$key2} syntax can be used to turn multi level hash into flat (single level) hash.
see $SUBSCRIPT_SEPARATOR in man perlvar for details.

Serialize data to binary using multicore

I'm using the store function from the module Storable to get a binary representation of my hash. This hash is big enough for make the process last for 20min. Are there any similar function to store that works with multicore, so it gets the speed boosted?
I've searching for a while and I coulnd't find anything relevant, even using bson for the storage.
Finally I decided to split the data that I want to store in as many pieces as cores I have on the computer. So, I'm able to execute the store in threads making different output files, as ikegami suggested in the comments.

Perl share hashmap through file

Currently I have a script that collects data of the specified server. The data is stored inside a hash which I store into a file for persistence.
If the script is being called with another server it should load the hash from the file and extend the hash with the data from the second server. Then save it back.
I use the storable module.
use Storable;
$recordedpkgs = retrieve($MONPKGS_DATA_FILE) if ( -e $MONPKGS_DATA_FILE);
store $recordedpkgs, $MONPKGS_DATA_FILE;
Obviously there is a access issue if one writes while the other has already read the file. Some data will be then lost.
What would be an ideal solution to that? Use basic file locking? Is there better ways to achieve that?
It depends - what you're talking about is inter process communication, and perl has a whole documentation segment on the subject perlipc
But to answer your question directly - yes, file locking is the way to go. It's exactly the tool for the job you describe.
Unfortunately, it's often OS dependent. Windows and Linux locking semantics are different. Take a look at flock - that's the basic start on Unix based systems. Take a look at: http://www.perlmonks.org/?node_id=7058
It's an advisory lock, where you can request a shared (read) or exclusive (write) lock. And either block (until released), or fail and return if you cannot acquire that lock.
Storable does implement some locking semantics: http://perldoc.perl.org/Storable.html#ADVISORY-LOCKING
But you might find you want to use a lock file if you're doing a read-modify-write cycle on the saved content.
I would just use a basic lock file that is checked before operations are performed upon the file, if the lock file is in place then simply make your other process either wait + check (either infinite or a set amount of times before exiting), or simply exit with an error.

What is the Storable module used for?

I am having a hard time understanding what Storable does.
I know that it "stores" a variable into your disk, but why would I need to do that? What would I use this module for, and how would I do it?
Reasons that spring to mind:
Persist memory across script calls
Sharing variables across different processes (sometimes it isn't possible to pipe stuff)
Of course, that's not all that Storable does. It also:
Makes it possible to create deep clones of data structures
Serializes the data structure stored, which implies a smaller file footprint than output from Data::Dump
Is optimized for speed (so it's faster to retrieve than to require a file containing Data::Dump output
One example:
Your program spends a long time populating your data structure, a graph, or trie, and if the program crashes then you'd lose it all and have to start again from square one. To avoid losing this data and be able to continue where it stopped last time you can save a snapshot of the data to a file manually or just simply use Storable.

looking for light-weight data persistence solution in perl

In my app I need to store some simple data both in memroy and in disk. A real database will be overkill in my case, so I need lighter one to handle the simple data persistence requirement. I do some google search by myself, and found something interesting like DBM and DBI CVS, etc. but since there are too many options there so it is difficult for me to make the actuaaly choice, so I'd like ask you here for the "best-practice" like light-weight data perisistence solution in perl.
You have several options:
Storable is a core module and is very efficient. It has some problems with portability, for example someone using an older version of Storable may not be able to read your data. Also, the endianness of the system creating and retrieving that data is important. The network order stoarge options help reduce the portability issues. You can store an arbitrary nested data structure to a file or string and restore it. Storable is supported only by Perl.
YAML is a text based format that works like storable--you can store and restore arbitrary structures to/from YAML files. YAML is nice because there are YAML libraries for several languages. It's not quite as speedy or space efficient as Storable.
JSON is a popular data exchange format with support in many languages. It is very much like YAML in both its strengths and weaknesses.
DBD::SQLite is a database driver for the DBI that allows you to keep a whole relational database in a single file. It is powerful and allows you work with many of the persistence tools that are aimed at other databases like MySQL and Postgres.
DBM::Deep is a convenient and powerful perl only module that allows efficient retrieval and modification of small parts of a large persistent data structures. Almost as easy to use as Storable, but far more efficient when dealing with only small portions of a large data structure.
Update: I realized that I should mention that I have used all of these modules and depending on your particular needs, any of them could be "the right choice".
You might want to try Tie::Storable. Then it's as simple as addressing a hash.
If you're not looking to store a ton of data and you're OK loading everything all at once at program startup, it might be the way to go.
If you're looking for something more sophisticated but still light weight, a lot of people (including myself) swear by SQLite.
If I had to do this I would probably go with DBI and DBD::SQLite, since it does not involve reading all the data into memory, but I'd just like to mention a few other ways, because "there's more than one way to do it":
The old way to do this was with DB_file and its cousins. It still works with modern versions of Perl. The drawback is that it are only useful for storing a one-dimensional hash (a hash which doesn't have any references in it). The advantage is that you can find nice books about it which don't cost very much money, and also online articles, and also I believe it doesn't involve reading the whole file into memory.
Another method is to print the contents of Data::Dumper to a file to store, and eval the contents of the file to read the data.
Yet another thing which hasn't been mentioned is KiokuDB, which looks like the cutting-edge Moose-based module, if you want to be trendy.
Do you want your data to be transparently persisted, i.e. you won't have to worry about doing a commit()-type operation after every write? I just asked a very similar question: Simple, modern, robust, transparent persistence of data strutures for Perl, and listed all the solutions I found.
If you do want transparent persistence (autocommit), then DBM::Deep may be easier to use than Storable. Here is example code that works out of the box:
use DBM::Deep;
tie my %db, 'DBM::Deep', 'file.db';
if ( exists $db{foo}->{bar} ) {
print $db{foo}->{bar}, "\n"
} else {
$db{foo}->{bar} = 'baz';
}
Look into Tie::File and submodules like Tie::File::AsHash, or Tie::Handle::CSV. All available on CPAN, fast and easy to use.
Storable lets you serialize any Perl data structure and read it back in. For in-memory storage, just use IO::Scalar to store into a string, that way you only need to write the code once and for writing to disk you just pass in another I/O handle.