looking for light-weight data persistence solution in perl - perl

In my app I need to store some simple data both in memroy and in disk. A real database will be overkill in my case, so I need lighter one to handle the simple data persistence requirement. I do some google search by myself, and found something interesting like DBM and DBI CVS, etc. but since there are too many options there so it is difficult for me to make the actuaaly choice, so I'd like ask you here for the "best-practice" like light-weight data perisistence solution in perl.

You have several options:
Storable is a core module and is very efficient. It has some problems with portability, for example someone using an older version of Storable may not be able to read your data. Also, the endianness of the system creating and retrieving that data is important. The network order stoarge options help reduce the portability issues. You can store an arbitrary nested data structure to a file or string and restore it. Storable is supported only by Perl.
YAML is a text based format that works like storable--you can store and restore arbitrary structures to/from YAML files. YAML is nice because there are YAML libraries for several languages. It's not quite as speedy or space efficient as Storable.
JSON is a popular data exchange format with support in many languages. It is very much like YAML in both its strengths and weaknesses.
DBD::SQLite is a database driver for the DBI that allows you to keep a whole relational database in a single file. It is powerful and allows you work with many of the persistence tools that are aimed at other databases like MySQL and Postgres.
DBM::Deep is a convenient and powerful perl only module that allows efficient retrieval and modification of small parts of a large persistent data structures. Almost as easy to use as Storable, but far more efficient when dealing with only small portions of a large data structure.
Update: I realized that I should mention that I have used all of these modules and depending on your particular needs, any of them could be "the right choice".

You might want to try Tie::Storable. Then it's as simple as addressing a hash.
If you're not looking to store a ton of data and you're OK loading everything all at once at program startup, it might be the way to go.
If you're looking for something more sophisticated but still light weight, a lot of people (including myself) swear by SQLite.

If I had to do this I would probably go with DBI and DBD::SQLite, since it does not involve reading all the data into memory, but I'd just like to mention a few other ways, because "there's more than one way to do it":
The old way to do this was with DB_file and its cousins. It still works with modern versions of Perl. The drawback is that it are only useful for storing a one-dimensional hash (a hash which doesn't have any references in it). The advantage is that you can find nice books about it which don't cost very much money, and also online articles, and also I believe it doesn't involve reading the whole file into memory.
Another method is to print the contents of Data::Dumper to a file to store, and eval the contents of the file to read the data.
Yet another thing which hasn't been mentioned is KiokuDB, which looks like the cutting-edge Moose-based module, if you want to be trendy.

Do you want your data to be transparently persisted, i.e. you won't have to worry about doing a commit()-type operation after every write? I just asked a very similar question: Simple, modern, robust, transparent persistence of data strutures for Perl, and listed all the solutions I found.
If you do want transparent persistence (autocommit), then DBM::Deep may be easier to use than Storable. Here is example code that works out of the box:
use DBM::Deep;
tie my %db, 'DBM::Deep', 'file.db';
if ( exists $db{foo}->{bar} ) {
print $db{foo}->{bar}, "\n"
} else {
$db{foo}->{bar} = 'baz';
}

Look into Tie::File and submodules like Tie::File::AsHash, or Tie::Handle::CSV. All available on CPAN, fast and easy to use.

Storable lets you serialize any Perl data structure and read it back in. For in-memory storage, just use IO::Scalar to store into a string, that way you only need to write the code once and for writing to disk you just pass in another I/O handle.

Related

Light weight data store in Perl

My requirement is to maintain a simple data store with some rows (~1000) & columns (6)
Over period of time (2 years) I am expecting the data to grow to 1000-1500 lines/rows
I would like query, insert & update in the data store
I need this data store because this needs to be processed by another script.
I am using Perl for programming.
I have seen some threads in Stackoverflow (ex: looking for light-weight data persistence solution in perl) about this but I cannot make a decision
Anyone using light weight data store in Perl with query, insert & update capabilities ?
Go for Sqlite. It is powerful, tunable and lightweight.
Already accepted the answer, but in your case, I might just go with hashes and use Storable to write my structure to a disk. That is, if you don't have multiple people using the data at once.
The advantage is that it's all standard Perl, so it will work with almost any Perl installation. Can't get any lighter weight than this.
Probably the simplest lightweight solution would be to use DBI with DBD::SQLite.
If your data is relational and you are comfortable with SQL then I vote DBD::SQLite.
However if your data is more like documents (each entry data is self contained) or if you are not comfortable with SQL then I recommend DBM::Deep. Its interface is exactly as easy to use as regular Perl variables.
Finally, if you want to be really modern, MongoDB is very easy to install and the new Mango Perl module is very cool, just saying :-)

memcached like software with disk persistence

I have an application that runs on Ubuntu Linux 12.04 which needs to store and retrieve a large number of large serialized objects. Currently the store is implemented by simply saving the serialized streams as files, where the filenames equal the md5 hash of the serialized object. However I would like to speed things up replacing the file-store by one that does in-memory caching of objects that are recently read/written, and preferably does the hashing for me.
The design of my application should not get any more complicated. Hence preferably would be a storing back-end that manages a key-value database and caching in an abstracted and efficient way. I am a bit lost with all of the key/value stores that are out there, and much of the topics/information seems to be outdated. I was initially looking at something like memcached+membase, but maybe there are better solutions out there. I looked into redis, mongodb, couchdb, but it is not quite clear to me if they fit my needs.
My most important requirements:
Transparent saving to a persistent store in a way that the most recently written/read objects are quickly available by automatically caching them in memory.
Store should survive a reboot. Hence in memory objects should be saved on disk asap.
Currently I am calculating the md5 manually. It would actually be nicer if the back-end does this for me. Hence the ability to get the hash-key when an object is stored, and be able to retrieve the object later using the hashkey.
Big plus is that if there are packages available for Ubuntu 12.04, either in universe or through launchpad or whatever.
Other than this, the software should preferably be light not be more complicated than necessary (I don't need distributed map-reduce jobs, etc)
Thanks for any advice!
I would normally suggest Redis because it will be fast and in-memory with asynch persistant store. Plus you'll find you can use their different data types for other purposes so not as single-purpose as memcached. As far as auto-hashing, I don't think it does that as you define your own keys when you store objects (as in most of them).
One downside to Redis is if you're storing a TON of binary objects, you'll be limited to available memory in RAM (unless sharding) so could reach performance limitations. In that case you may store objects on file system, hash them, and store keys in Redis and match that to filename stored on file server and you'd be fine.
--
An alternate option would be to check out ElasticSearch which is like Mongo in that it stores objects native as JSON, but it includes the Lucene search engine on top with RESTful API interface. It "warms up" data in memory for fast response, but is also a persistent store and the nicest part is it auto-shards and auto-clusters using multicast to find other nodes.
--
Hope that helps and if so, share the love! ;-)
I'd look at MongoDB. It caches things efficiently using your OS to page data in and out, and is pretty simple to setup. Redis and Memcached won't be good solutions for you because they keep everything in RAM. Other, simpler solutions like LevelDB or BDB would also probably be suitable. I don't think any database going to compute hashes automatically for you. It sounds like you already have code for this though.

Perl DBM vs. Storable

for my current project i need to store a little database on disk, that i read once my program runs and write it once.
I have looked into perls DBM functionality and from what I understand it provides merely a hash that is stored on disk with every read and write going directly to disk.
My question is: Could I not simply use Storable or any of the related modules to achieve the same (a persistent hash) with far less File I/O overhead? (The hashes will never be to large to fit into memory easily)
Regards
Nick
SQLite is fast becoming the standard for simple on-disk databases. And in Perl you can just use DBD::SQLite and you're good to go.
Since the previous answers didn't really answer your actual question, "yes, you can"... with the following caveats:
Storable isn't really suited to concurrent access.
You will need to roll your own "atomic" update (ie: you will need to write to a tmp file, then rename).
If performance isn't really an issue, you could also use Data::Dumper (with the resulting file being somewhat human readable).
You could splat the contents to CSV.
I often use Dumper when there is only going to be a single task accessing the file - and it gives me a way to read/modify the contents if I see fit.

Proccessing 2 million records with perl

I have 2 million records on the database is it possible to bring them all and store them on perl hash reference without any problem of reaching out of memory ?
What is your reason to read them all into memory? Speed or ease of coding (i.e. treat the whole thing as a hashref).
If its the former, then sure, I think, you just need a ton of ram.
If its the latter, then there are interesting options. For example there are tied interfaces for databases that look like Perl native hashes but in reality query and return data as needed. A quick search of CPAN shows Tie::DBI, Tie::Hash::DBD and several tied interfaces for specific databases, flat-file DBs, and CSV files, including mine Tie::Array::CSV.
On the one hand, processing two million elements in a hash isn't unheard of. However, we don't know how big your records are. At any rate, it sounds like an XY problem. It may not be the best solution for the problem you're facing.
Why not use DBIx::Class so that your tables can be treated like Perl classes (which are themselves glorified data-structures)? There's a ton of documentation at DBIx::Class::Manual::DocMap. This is really what DBIx::Class is all about; letting you abstract away the SQL details of the database and treat it like a series of classes.
That completely depends on how much data your records have. Perl hashes and arrays take up more memory than you'd think although it's not crazy. But again, it totally depends on what your data looks like and how much RAM you have. Perl won't have any problems with it if you have the RAM.

What is the Storable module used for?

I am having a hard time understanding what Storable does.
I know that it "stores" a variable into your disk, but why would I need to do that? What would I use this module for, and how would I do it?
Reasons that spring to mind:
Persist memory across script calls
Sharing variables across different processes (sometimes it isn't possible to pipe stuff)
Of course, that's not all that Storable does. It also:
Makes it possible to create deep clones of data structures
Serializes the data structure stored, which implies a smaller file footprint than output from Data::Dump
Is optimized for speed (so it's faster to retrieve than to require a file containing Data::Dump output
One example:
Your program spends a long time populating your data structure, a graph, or trie, and if the program crashes then you'd lose it all and have to start again from square one. To avoid losing this data and be able to continue where it stopped last time you can save a snapshot of the data to a file manually or just simply use Storable.