In memory key value store database of multi level - memcached

Is there any in memory key value datastore available of multiple level in which i can store the value again as a pair of key value? similar to a JSON
Example :
{
school1 : {classroom1 : ['student1','student2'], classroom2 : ['studentx','studenty']},
school2 : {classroom3 : ['student10','student20'], classroom4 : ['studentq','studentw']}
}
i wish to access classroom4 from school2 value without fetching the entire school2 value

Aerospike is an open source key-value in-memory (persistence optional) clustered database. We refer to the "value" as a "Record" and each record may have one or more "Bins". When retrieving the record you can optionally specify a specific set of bins to retrieve over the network.
Aerospike supports many popular languages such as C, Go, Java, and Python. Here is the Go API to get a set of bins from a record (without actually transferring the entire record).
Aerospike is ridiculously fast, recently achieving 1 Million transactions per second on a single C3.8xlarge Amazon EC2 instance.

Related

Managing transaction documents with MongoDB

Imagine you have millions of users those who perform transactions on your platform. Assuming each transaction is a document in your MongoDB collection there would be millions of documents generated everyday thus exploding your database in no time. I have received the following solutions from friends and family.
Having TTL index on the document - This won't work because we need those document stored somewhere so that it can be retrieved at a later point in time when the user demands for it.
Sharding the collection with timestamp as the key - This won't help us control the time frame we want the data to be backed up.
I would like to understand and implement a strategy somewhat similar to what banks follow. They keep your transactions upto a certain point (eg: 6 months) after which you have to request them via support or any other channel. I am assuming they follow a Hot/Cold storage pattern but I am not completely sure about it.
The entire point is to manage transaction documents and on a daily basis back up or move the older records to another place where it can be read from. Any idea how that is possible with MongoDB?
Update: Sample Document (Please note there are few other keys from the document that have been redacted)
{
"_id" : ObjectId("5d2c92d547d273c1329b49f0"),
"transactionType" : "type_3",
"transactionTimestamp" : ISODate("2019-07-15T14:51:54.444Z"),
"transactionValue" : 0.2,
"userId" : ObjectId("5d2c92f947d273c1329b49f1")
}
First Create a Table Where you want to save all records. (As you mentioned the sample, let's take this entry stored on a collection named A).
After that Create a backup at daily midnight and then after successful backup restored the collection with named timestamp.
After successful entry stored on table, you can truncate the original table.
By this approach you have a limited entry table on the collection and also have all records.

MongoDB : Arrays or not?

I am storing some data into a mongo database and I'm not sure about the structure I have to use... It's about IoT sensors that sends a value (temperature, pression, etc...) every specific time. I want to store into a collection (the collection name will be the sensor name) all the value from the sensor for a specific time (I thought about an array), the sensor type (like temperature).
Here is an example :
{
history : [ { date : "ISODate(2016-02-01T11:23:21.321Z)", value : 10.232216 }, { date : "ISODate(2016-02-01T11:26:41.314Z)", value : 10.164892 } ],
type : "temperature"
}
But my problem is that I want to query the database to get the history as a "list" of document. Each one with the date and the value.
On the other hand, I want to add a new value to the history each time there is a new one.
Thanks
Store every reading in a readings collection like:
{
date : "ISODate(2016-02-01T11:23:21.321Z)",
value : 10.232216,
type : "temperature",
sensor-name: "sensor-1"
}
This way you can access readings by type, date, value AND sensor. There is no reason why you would need to create a collection for each sensor.
Ting Suns answer is absolutely appropriate: Just store each measurement reading as a separate document in a collection. In doing so it's up to you if you want to arrange a separate collection for each sensor. Although putting them all into the same collection seems to be more obvious.
Especially you should not store items - in your case measurement readings - whose number is basically infinitely growing or could become "very large" into an embedded array of another MongoDB document. This is because:
The size of an individual document is limited to 16MB (MongoDB Version 3.2)
Often recurring modifications of the parent document could be inefficient for the memory management of the database engine.
Furthermore queries for individual embedded items/measurements are inefficient and more difficult to implement because you would actually have to query for the entire parent document.
How you divide readings into collections is completely up to you, whether one collection or multiple. And there are likely good arguments to be had on both sides.
However, regarding arrays: Just remember that sensor readings are unbounded. That is, they are possibly infinite in nature - just a flow of readings. MongoDB documents are limited in size (currently 16MB). With unbounded arrays, you will eventually hit this limit, which will result in failed updates, and requiring you to alter your storage architecture to accommodate your sensor readings.
So... you either need to devise a sharding solution to split array data across multiple documents (to avoid document-size-limit issues), or avoid arrays and store readings in separate documents.

Mongodb - expire subset of data in collection

I am wondering what is the best way to expire only a subset of a collection.
In one collection I store conversion data and click data.
The click data I would like to store for lets a week
And the conversion data for a year.
In my collection "customers" I store something like:
{ "_id" : ObjectId("53f5c0cfeXXXXXd"), "appid" : 2, "action" : "conversion", "uid" : "2_b2f5XXXXXX3ea3", "iid" : "2_2905040001", "t" : ISODate("2014-07-18T15:01:00.001Z") }
And
{ "_id" : ObjectId("53f5c0cfe4b0d9cd24847b7d"), "appid" : 2, "action" : "view", "uid" : "2_b2f58679e6f73ea3", "iid" : "2_2905040001", "t" : ISODate("2014-07-18T15:01:00.001Z") }
for the click data
So should I exucute a ensureIndex or something like a cronjob?
Thank you in advance
There are a couple of built in techniques you can use. The most obvious is a TTL collection which will automatically remove documents based on a date/time field. The caveat here is that for that convenience, you lose some control. You will be automatically doing deletes all the time that you have no control over, and deletes are not free - they require a write lock, they need to be flushed to disk etc. Basically you will want to test to see if your system can handle the level of deletes you will be doing and how it impacts your performance.
Another option is a capped collection - capped collections are pre-allocated on disk and don't grow (except for indexes), they don't have the same overheads as TTL deletes do (though again, not free). If you have a consistent insert rate and document size, then you can work out how much space corresponds to the time frame you wish to keep data. Perhaps 20GiB is 5 days, so to be safe you allocate 30GiB and make sure to monitor from time to time to make sure your data size has not changed.
After that you are into more manual options. For example, you could simply have a field that marks a document as expired or not, perhaps a boolean - that would mean that expiring a document would be an in-place update and about as efficient as you can get in terms of a MongoDB operation. You could then do a batch delete of your expired documents at a quiet time for your system when the deletes and their effect on performance are less of a concern.
Another alternative: you could start writing to a new database every X days in a predictable pattern so that your application knows what the name of the current database is and can determine the names of the previous 2. When you create your new database, you delete the one older than the previous two and essentially always just have 3 (sub in numbers as appropriate). This sounds like a lot of work, but the benefit is that the removal of the old data is just a drop database command, which just unlinks/deletes the data files at the OS level and is far more efficient from an IO perspective than randomly removing documents from within a series of large files. This model also allows for a very clean backup model - mongodump the old database, compress and archive, then drop etc.
As you can see, there are a lot of trade offs here - you can go for convenience, IO efficiency, database efficiency, or something in between - it all depends on what your requirements are and what fits best for your particular use case and system.

Store Records without keys in mongodb or document mapping with keys

mongodb always stores records in such a way that
{
'_id' : '1' ,
'name' : 'nidhi'
}
But i want to store in a way like
{ 123 , 'nidhi'}
I do not want to store keys again and again in database.
Is it possible with mongodb or with any other database.
Is there anything like sql is possible in nosql that I set the architecture first like mysql and then start inserting values in documents.
That is not possible with MongoDB. Documents are defined by Key/Value pairs. That has something to do that BSON (Binary JSON) – the internal storage format of MongoDB – was developed from JSON (JavaScript Object Notation). And without keys, you couldn't query the database at all, except by rather awkward positional parameters.
However, if disk space is so precious, you could revise your modeling to sth like:
{ _id:1,
values:['nidhi','foo','bar','baz']
}
However, since disk space is relatively cheap when compared to computing power (not a resource MongoDB uses a lot, though) and RAM (rule of thumb for MongoDB: the more, the better), you approach doesn't make sense. For a REST API to return a record to the client, all you have to do is (pseudo code):
var docToReturn = collection.findOne({_id:requestedId});
if(docToReturn){
response.send(200,docToReturn);
} else {
response.send(404,{'state':404,'error':'resource '+requestedId+' not available'});
}
Even if the data was possible to query with your approach, you would have to map the returned values to meaningful keys. And how would you deal with the fact that Arrays don't have a guaranteed order? Or with the fact that MongoDB has dynamic schemas, so that one doc in the collection may have a totally different structure than the other?

Using Large number of collections in MongoDB

I am considering MongoDB to hold data of our campaign logs,
{
"domain" : ""
"log_time" : ""
"email" : ""
"event_type" : "",
"data" : {
"campaign_id" : "",
"campaign_name" : "",
"message" : "",
"subscriber_id" : ""
}
}
The above one is our event structure, each event is associated with one domain,
one domain can contain any number of events and there is no relation between one domain to another domain
most of our queries are specific to one domain at a time
for quick query responses I'm planning to create one collection per one domain so that I can query on particular domain collection data instead of query on whole data which contains all domains data
we will have at least 100k+ domains in the future, so I need to create 100k+ collections.
We are expecting 1 million + documents per collection.
our main intention is index on only required collections, we don't want to index on whole data, that is why we are planning to have one collection per one domain
which approach is better for my case
1.Storing all domains events in one collection
(or)
2.Each domain events in separate collection
I have seen some questions on max number of collections that mongodb can support but I didn't get clarity on this topic , as far I know we can extend default limit size 24k, but if I create 100k+ collections what about performance will it get affect
Is this solution (using max number of collections) right approach for my case
Please suggest about my approach, thanks in advance
Without some hard numbers, this question would be probably just opinion based.
However, if you do some calculations with the numbers you provided, you will get to a solution.
So your total document count is:
100 K collections x 1M documents = 100 G (100.000.000.000) documents.
From your document structure, I'm going to do a rough estimate and say that the average size for each document will be 240 bytes (it may be even higher).
Multiplying those two numbers you get ~21.82 TB of data. You can't store this amount of data just one one server, so you will have to split your data across multiple servers.
With this amount of data, your problem isn't anymore one collection vs multiple collections, but rather, how do I store all of this data in MongoDB on multiple servers, so I can efficiently do my queries.
If you have 100K collections, you can probably do some manual work and store e.g. 10 K collections per MongoDB server. But there's a better way.
You can use sharding and let the MongoDB do the hard work of splitting your data across servers. With sharding, you will have one collection for all domains and then shard that collection across multiple servers.
I would strongly recommend you to read all documentation regarding sharding, before trying to deploy a system of this size.