How can I cache the mathobjects like clans and couplets in memory? - algebraixlib

Is it possible to cache the mathobjects using algebraixlib? I would like to store mathobjects like clans and couplets in memory and use it whenever I need it.

When you create a MathObject, it is stored in memory. As long as you have a reference to it, it will remain in memory during the runtime of your program. Depending on your needs, there are several ways to cache the MathObjects of interest. One way would be to store them in a dictionary.
If you want to store them between program runs, you need to store them on disk. algebraixlib provides the module algebraixlib.io.mojson for (de-)serializing MathObjects. You can use this module to serialize clans and couplets to disk, and at program startup read them from disk. However, you need to manage the objects and their associated files yourself; there is no 'automatic' cache.

Related

Are recently accessed Realm objects kept in memory?

Suppose I fetch a list of Realm objects and access all the data from the results (causing the data to be loaded into memory). Next, with the reference to the list of objects still around, I fetch one of those objects directly by its primary key. Am I correct to assume that, in this case, the object will be returned without having to hit the disk? What if I no longer had a reference to that original list of results? Might it still be in memory?
Just curious to understand how objects are cached internally by Realm, so I don't unnecessarily try to optimize things in my code (e.g. holding onto objects that I think I'll use again soon) if it's not needed.
Realm memory-maps the file, and only accesses it when you actually read the properties on objects. This means that recent-accessed data will still be in memory until the OS has to discard it to free up that RAM for something else, and there is no difference in terms of disk i/o from obtaining a new object from Realm and holding on to an existing object.

memcached like software with disk persistence

I have an application that runs on Ubuntu Linux 12.04 which needs to store and retrieve a large number of large serialized objects. Currently the store is implemented by simply saving the serialized streams as files, where the filenames equal the md5 hash of the serialized object. However I would like to speed things up replacing the file-store by one that does in-memory caching of objects that are recently read/written, and preferably does the hashing for me.
The design of my application should not get any more complicated. Hence preferably would be a storing back-end that manages a key-value database and caching in an abstracted and efficient way. I am a bit lost with all of the key/value stores that are out there, and much of the topics/information seems to be outdated. I was initially looking at something like memcached+membase, but maybe there are better solutions out there. I looked into redis, mongodb, couchdb, but it is not quite clear to me if they fit my needs.
My most important requirements:
Transparent saving to a persistent store in a way that the most recently written/read objects are quickly available by automatically caching them in memory.
Store should survive a reboot. Hence in memory objects should be saved on disk asap.
Currently I am calculating the md5 manually. It would actually be nicer if the back-end does this for me. Hence the ability to get the hash-key when an object is stored, and be able to retrieve the object later using the hashkey.
Big plus is that if there are packages available for Ubuntu 12.04, either in universe or through launchpad or whatever.
Other than this, the software should preferably be light not be more complicated than necessary (I don't need distributed map-reduce jobs, etc)
Thanks for any advice!
I would normally suggest Redis because it will be fast and in-memory with asynch persistant store. Plus you'll find you can use their different data types for other purposes so not as single-purpose as memcached. As far as auto-hashing, I don't think it does that as you define your own keys when you store objects (as in most of them).
One downside to Redis is if you're storing a TON of binary objects, you'll be limited to available memory in RAM (unless sharding) so could reach performance limitations. In that case you may store objects on file system, hash them, and store keys in Redis and match that to filename stored on file server and you'd be fine.
--
An alternate option would be to check out ElasticSearch which is like Mongo in that it stores objects native as JSON, but it includes the Lucene search engine on top with RESTful API interface. It "warms up" data in memory for fast response, but is also a persistent store and the nicest part is it auto-shards and auto-clusters using multicast to find other nodes.
--
Hope that helps and if so, share the love! ;-)
I'd look at MongoDB. It caches things efficiently using your OS to page data in and out, and is pretty simple to setup. Redis and Memcached won't be good solutions for you because they keep everything in RAM. Other, simpler solutions like LevelDB or BDB would also probably be suitable. I don't think any database going to compute hashes automatically for you. It sounds like you already have code for this though.

Is there any java memory structures that will automatically page data to disk?

Basically, I am caching a bunch of files in memory. The problem is, if I get too many files cached into memory I can run out of memory.
Is there any sort of java memory structure that will automatically page part of itself to disk?
For example, I would like to set a 2 mb limit the size of the files in memory. After that limit I would like some of the files to be written to disk.
Is there any library that does this sort of thing?
Grae
‎"files in memory". Conventionally, in memory data is stored in some data structure like a HashMap or whatever, and referred to as a 'file' when its written to disc. You could code a data storage class which did this programmatically. I don't know of any library which did this. It would be pretty useful. In effect you would be implementing virtual memory.
This link will might help you :
http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html
EhCache is general purpose caching library for Java. One of the option is to have disk-backed cache, which overflows to file system. Seems to be exactly what you need.

What is the Storable module used for?

I am having a hard time understanding what Storable does.
I know that it "stores" a variable into your disk, but why would I need to do that? What would I use this module for, and how would I do it?
Reasons that spring to mind:
Persist memory across script calls
Sharing variables across different processes (sometimes it isn't possible to pipe stuff)
Of course, that's not all that Storable does. It also:
Makes it possible to create deep clones of data structures
Serializes the data structure stored, which implies a smaller file footprint than output from Data::Dump
Is optimized for speed (so it's faster to retrieve than to require a file containing Data::Dump output
One example:
Your program spends a long time populating your data structure, a graph, or trie, and if the program crashes then you'd lose it all and have to start again from square one. To avoid losing this data and be able to continue where it stopped last time you can save a snapshot of the data to a file manually or just simply use Storable.

How does memcache store data?

I am a newbie to caching and have no idea how data is stored in caching. I have tried to read a few examples online, but everybody is providing code snippets of storing and getting data, rather than explaining how data is cached using memcache. I have read that it stores data in key, value pairs , but I am unable to understand where are those key-value pairs stored?
Also could someone explain why is data going into cache is hashed or encrypted? I am a little confused between serialising data and hashing data.
A couple of quotes from the Memcache page on Wikipedia:
Memcached's APIs provide a giant hash
table distributed across multiple
machines. When the table is full,
subsequent inserts cause older data to
be purged in least recently used (LRU)
order.
And
The servers keep the values in RAM; if
a server runs out of RAM, it discards
the oldest values. Therefore, clients
must treat Memcached as a transitory
cache; they cannot assume that data
stored in Memcached is still there
when they need it.
The rest of the page on Wikipedia is pretty informative, and it might help you get started.
They are stored in memory on the server, that way if you use the same key/value often and you know they won't change for a while you can store them in memory for faster access.
I'm not deeply familiar with memcached, so take what I have to say with a grain of salt :-)
Memcached is a separate process or set of processes that store a key-value store in-memory so they can be easily accessed later. In a sense, they provide another global scope that can be shared by different aspects of your program, enabling a value to be calculated once, and used in many distinct and separate areas of your program. In another sense, they provide a fast, forgetful database that can be used to store transient data. The data is not stored permanently, but in general it will be stored beyond the life of a particular request (it is possible for Memcached to never store your data, so every read will be a miss, but that's generally an indication that you do not have it set up correctly for your use case).
The data going into cache does not have to be hashed or encrypted (but both things can happen to the data, depending on the caching mechanism.)
Serializing data actually has nothing to do with either concept -- instead, it is the process of changing data from one format (generally one suited for in-memory storage) to another one (generally suitable for storage in a persistent medium.) Another term for this process is marshalling and unmarshalling.