Is there a way in memcached to just get all values? So instead of saying, with this key, retrieve this value, I want to just dump everything currently in memcached?
Take a look at the new beta version of memcached (1.6). I believe there is support for a new concept called tap streams. These allow you to stream all keys out of memcached as well as other kinds of streams.
Related
I am currently using Hazelcast Community Edition as a caching mechanism used for a web application so it needs to be fast.
Currently, we store a lot of data in there and this is growing even more. As it's an in-memory DB, RAM is expensive. So wanted to know what the best practice was. I was planning to store only a small amount of data in the cache and store the rest in MongoDB. I want Hazelcast to persist and get the data from MongoDB only if it can't find it.
I have created the mapstore, but I am not sure how to tell it to "look" in MongoDB for data it can't find in the cache. Is it simply the case of getMap("something") if this result is empty then load("") from MapStore?
Thanks
In EAGER mode it will load all the entries on the map init (so on hz.getMap())
In LAZY mode it will load a partition when it is first touched.
Additionally, if you do map.get() and there's no value in the IMap it will try loading that value from the MapLoader using MapLoader.load(key) method.
Also, if you do map.put() and there is no value in the IMap it will do MapLoader.load(key), since the put methods is supposed to return the previous value. If you want to avoid it use map.set().
It would be also good if you had a look at the manual section related to MapStore/MapLoader. It should describe all the subtle differences.
I am using triple store database for one of my project (semantic search engine for healthcare) and it works pretty fine. I am considering on giving it a performance boost by using a layer of key value store above triple store. Triple store querying is slower since we do deep semantic processing.
This is how I am planning to improve performance:
1) Running Hadoop job for all query terms every day by querying triple store.
2) Caching these results in a key value store in a cluster.
3) When user searches for a query term, instead of searching triple store, key value store will be searched first. Triple store will be searched only when query term not found in key value store.
Key value pair which I plan to save is a "String" to "List of POJO mapping". I can save it as a BLOB.
I am confused on using which key value store. I am looking mainly for failover and load balancing support. All I need is a simple key value store which provides above features. I do not need to sort/search within values or any other functionalities.
Please correct me if I am wrong. I am assuming memcached and Redis will be faster since it is in memory. But I do not know if any Java clients of Redis(Jredis) or memchaced(Spymemcached) supports failover. I am not sure whether to go with in memory or persistent storage. I am also considering Voldemort, Cassandra and HBase. Overall key values will be around 2GB to 4GB size. Any pointers on this will be really helpful.
I am very new to nosql and key value stores. Please let me know if you need any more details.
Have you gone over memcached tutorial article (they explain load balancing aspects there, since memcached instances balance load based on your key hash, also spymemcached is discussed how it handles connectivity failures):
Use Memcached for Java enterprise performance, Part 1: Architecture and setup http://www.javaworld.com/javaworld/jw-04-2012/120418-memcached-for-java-enterprise-performance.html
Use Memcached for Java enterprise performance, Part 2: Database-driven web apps http://www.javaworld.com/javaworld/jw-05-2012/120515-memcached-for-java-enterprise-performance-2.html
For enterprise grade fail-over/cross data center replication support in memcached you should use Couchbase that offers these features. The product has evolved from memcached base.
Before you build infrastructure to load your cache, you might just try adding memcached on top of your existing system. First, measure your current performance well. I suggest JMeter or similar tools. Here's the workflow in your application: Check memcached, if it's there, you're done. If not, run the query against the triple store and save the results in memcached. This will improve performance if you have queries that are repeated. Memcached will use the memory you give it efficiently, throwing away things that don't get used very often. Failover is handled by your application (if it's not in memcached, you use your existing infrastructure).
We use triple store and cache data in memcache provided by google app engine and it works fine. It reduced the overhead of sparql query over triple store.
Only cassandra will have mentioned features and CQL full support, which helps in maintaining, otherwise maybe you should look in another direction:
Write heavy, replicated, bigger-than-memory key-value store
Since you want just to cache data in front of your triple store, going with disk-based, or replicated/distributed key-value stores seems to be pointless. All you need is essentially to cache data in front of your queries right on the machines where those queries are done. No "key-value stores", just vanilla Java caching solutions.
In 2016 the best cache for Java is Caffeine.
I have an application that runs on Ubuntu Linux 12.04 which needs to store and retrieve a large number of large serialized objects. Currently the store is implemented by simply saving the serialized streams as files, where the filenames equal the md5 hash of the serialized object. However I would like to speed things up replacing the file-store by one that does in-memory caching of objects that are recently read/written, and preferably does the hashing for me.
The design of my application should not get any more complicated. Hence preferably would be a storing back-end that manages a key-value database and caching in an abstracted and efficient way. I am a bit lost with all of the key/value stores that are out there, and much of the topics/information seems to be outdated. I was initially looking at something like memcached+membase, but maybe there are better solutions out there. I looked into redis, mongodb, couchdb, but it is not quite clear to me if they fit my needs.
My most important requirements:
Transparent saving to a persistent store in a way that the most recently written/read objects are quickly available by automatically caching them in memory.
Store should survive a reboot. Hence in memory objects should be saved on disk asap.
Currently I am calculating the md5 manually. It would actually be nicer if the back-end does this for me. Hence the ability to get the hash-key when an object is stored, and be able to retrieve the object later using the hashkey.
Big plus is that if there are packages available for Ubuntu 12.04, either in universe or through launchpad or whatever.
Other than this, the software should preferably be light not be more complicated than necessary (I don't need distributed map-reduce jobs, etc)
Thanks for any advice!
I would normally suggest Redis because it will be fast and in-memory with asynch persistant store. Plus you'll find you can use their different data types for other purposes so not as single-purpose as memcached. As far as auto-hashing, I don't think it does that as you define your own keys when you store objects (as in most of them).
One downside to Redis is if you're storing a TON of binary objects, you'll be limited to available memory in RAM (unless sharding) so could reach performance limitations. In that case you may store objects on file system, hash them, and store keys in Redis and match that to filename stored on file server and you'd be fine.
--
An alternate option would be to check out ElasticSearch which is like Mongo in that it stores objects native as JSON, but it includes the Lucene search engine on top with RESTful API interface. It "warms up" data in memory for fast response, but is also a persistent store and the nicest part is it auto-shards and auto-clusters using multicast to find other nodes.
--
Hope that helps and if so, share the love! ;-)
I'd look at MongoDB. It caches things efficiently using your OS to page data in and out, and is pretty simple to setup. Redis and Memcached won't be good solutions for you because they keep everything in RAM. Other, simpler solutions like LevelDB or BDB would also probably be suitable. I don't think any database going to compute hashes automatically for you. It sounds like you already have code for this though.
I am trying to understand what would be the need to go with a solution like memcached. It may seem like a silly question - but what does it bring to the table if all I need is to cache objects? Won't a simple hashmap do ?
Quoting from the memcache web site, memcache is…
Free & open source, high-performance,
distributed memory object caching
system, generic in nature, but
intended for use in speeding up
dynamic web applications by
alleviating database load.
Memcached is an in-memory key-value
store for small chunks of arbitrary
data (strings, objects) from results
of database calls, API calls, or page
rendering. Memcached is simple yet
powerful. Its simple design promotes
quick deployment, ease of development,
and solves many problems facing large
data caches. Its API is available for
most popular languages.
At heart it is a simple Key/Value
store
A key word here is distributed. In general, quoting from the memcache site again,
Memcached servers are generally
unaware of each other. There is no
crosstalk, no syncronization, no
broadcasting. The lack of
interconnections means adding more
servers will usually add more capacity
as you expect. There might be
exceptions to this rule, but they are
exceptions and carefully regarded.
I would highly recommend reading the detailed description of memcache.
Where are you going to put this hashmap? That's what it's doing for you. Any structure you implement on PHP is only there until the request ends. If you throw stuff in a persistent cache, you can fetch it back out for other requests, instead of rebuilding the data.
I know that this question is rather old, but in addition to being able to share a cache across multiple servers, there is also another aspect that is not mentioned in other answers and is the values expiration.
If you store the values in a HashMap, and that HashMap is bound to the Application context, it will keep growing in size, unless you expire items in some ways. Memcached expires object lazily for maximum performance.
When an item is added to the memcache, it can have an expiration time, for instance 600 seconds. After the object is expired it will just remain there, but if another object asks for it, it will purge it and return null.
Similarly, when memcached memory is full, it will look for the first expired item of adequate size and expire it to make room for the new item. Lastly, it can also happen that the cache is full and there isn't any item to expire, in which case it will replace the least used items.
Using a fully flagded cache system usually allow you to replicate the cache on many servers, or just scale to many server just to scale a lot of parallel requestes, all this remaining acceptable fast in term of reply.
There is an (old) article that compares different caching systems used by php:
https://www.percona.com/blog/2006/08/09/cache-performance-comparison/
Basically, file caching is faster than memcached.
So to answer the question, I believe you would have better performances using a file based cache system.
Here are the results from the tests of the article:
Cache Type Cache Gets/sec
Array Cache 365000
APC Cache 98000
File Cache 27000
Memcached Cache (TCP/IP) 12200
MySQL Query Cache (TCP/IP) 9900
MySQL Query Cache (Unix Socket) 13500
Selecting from table (TCP/IP) 5100
Selecting from table (Unix Socket) 7400
I'm writing to memcached a lot of key/value -> PREFIX_KEY1, PREFIX_KEY2, PREFIX_KEY3
I need to get all the keys that starts with PREFIX_
Is it possible?
Sorry, but no. Memcached uses a hashing algorithm that distributes keys at apparently random places, and so those keys are scattered all over. You'd have to scan everything to find them.
Also you should be aware that, by design, memcached can drop any any key at any time for any reason. If you're putting stuff in, you should be aware that you can't depend on it coming back out. This is absolutely fine for its original use case, a cache to reduce hits on a database. But it can be a severe problem if you want to do something more complicated with it.
If these limitations are a problem, I would suggest that you use Redis instead. It behaves a lot like memcached, except that it will persist data and it lets you store complex data structures. So for your use case you can store a hash in Redis, then pull the whole hash out later.
A quick command to search if a specific key exists (the key name can be a "grep regex")
for i in {1..40}; do (echo "stats cachedump $i 0"; sleep 1; echo "quit";) | telnet localhost 11211 | grep 'APREFIX*\|ANOTHERPREFIX*'; done
i is the slab number
in the example above we search the slabs from 1 to 40
don't miss the grep part 'APREFIX*\|ANOTHERPREFIX*' ;)
based on the discussion at https://groups.google.com/forum/#!topic/memcached/YyzonP9HUi0
While #btilly is correct in saying that memcached does not do this natively, you can emulate it (quite efficiently) by maintaining an index of keys that share your prefix, allowing you to then fetch all entries that match a certain prefix.
Obviously this will only work for specific keys that you choose in advance and not arbitrary data, but it's quite workable if you can live with that limitation. There is a good article on this subject by one of the memcache developers.
You can use Namespace and perform what you need. Here is a PHP library which perform the same. You can use same Memcached for multiple Applications.
https://github.com/vijayabose/n_memcached