Store Records without keys in mongodb or document mapping with keys - mongodb

mongodb always stores records in such a way that
{
'_id' : '1' ,
'name' : 'nidhi'
}
But i want to store in a way like
{ 123 , 'nidhi'}
I do not want to store keys again and again in database.
Is it possible with mongodb or with any other database.
Is there anything like sql is possible in nosql that I set the architecture first like mysql and then start inserting values in documents.

That is not possible with MongoDB. Documents are defined by Key/Value pairs. That has something to do that BSON (Binary JSON) – the internal storage format of MongoDB – was developed from JSON (JavaScript Object Notation). And without keys, you couldn't query the database at all, except by rather awkward positional parameters.
However, if disk space is so precious, you could revise your modeling to sth like:
{ _id:1,
values:['nidhi','foo','bar','baz']
}
However, since disk space is relatively cheap when compared to computing power (not a resource MongoDB uses a lot, though) and RAM (rule of thumb for MongoDB: the more, the better), you approach doesn't make sense. For a REST API to return a record to the client, all you have to do is (pseudo code):
var docToReturn = collection.findOne({_id:requestedId});
if(docToReturn){
response.send(200,docToReturn);
} else {
response.send(404,{'state':404,'error':'resource '+requestedId+' not available'});
}
Even if the data was possible to query with your approach, you would have to map the returned values to meaningful keys. And how would you deal with the fact that Arrays don't have a guaranteed order? Or with the fact that MongoDB has dynamic schemas, so that one doc in the collection may have a totally different structure than the other?

Related

How to efficiently do complex queries on MongoDB unindexed fields?

I am building filtering functionality for web application which should look like JIRA of TFS filtering query. So user should be able to filter on fields contents and use logical operators in filter query.
The data lives in MongoDB and the main challenge is that the fields on which we will filter should support not only strict equality but also full text search are difficult to index because they can vary for each user.
In a nutshell, there is a nested object, which has three other nested object that can have different amount of fields depending of user, field names are also set by user, so we don't know them.
For example document structure in collection can be:
{
_id: ObjectId()
storage: {
obj_1:{}
obj_2:{}
}
},
{
_id: ObjectId()
storage: {
obj_1:{
field_1 : val,
field_2 : val
}
obj_2:{}
}
}
I imagine queries will be something like:
find({$and:[{"storage.obj_1.field_1":{$regex: "va"}},{"storage.obj_1.var_2":"val"}]})
Unfortunately, I am not a database expert so the solutions that I see now are:
1) Use Elasticsearch as a search engine. But the question is: how do I set Elastic index if I don't know my documents structure?
2) Use sparse index in Mongo. But I will need to use regex for matching, is that solution better than Elastic?
So the question is: what is the best way to do filtering in such a DB structure?
p.s.
I have put this question in SO and not Software Engineering because SO has more active members, pls keep your downvotes for later
Elasticsearch and MongoDB (much like a relational database) behave differently for indexing: In MongoDB you need to explicitly index every field (for a non $text index). In Elasticsearch every field is automatically indexed. Don't go too crazy on the number of fields in Elasticsearch, since there is a little overhead for each field (in terms of disk space though that has improved in version 6).
As soon as you are having more than a test data set, regular expressions are often slow, since they can only use indices in specific cases and you need to define those indices explicitly. Maybe the $text index and search operator are more what you are looking for. That one can index every field in a collection as well. If you need more features and a system that is fully built for search, then Elasticsearch will be the better option though.

Can I store data that won't affect query performance in MongoDB?

We have an application which requires saving of data that should be in documents, for querying and sorting purposes. The data should be schema less, as some of the fields would be known only via usage. For this, MongoDB is a great solution and it works great for us.
Part of the data in each document, is for displaying purposes. Meaning the data can be objects (let's say json) that the client side uses in order to plot diagrams.
I tried to save this data using gridfs, but the use cases makes it not responsive enough. Also, the documents won't exceed the 16 MB limits even with the diagram data inside them. And in fact, while trying to save this data directly within the documents, we got better results.
This data is used only for client side responses, meaning we should never query it. So my question is, can I insert this data to MongoDB, and set it as a 'not for query' data? Meaning, can I insert this data without affecting Mongo's performance? The data is strict and once a document is inserted, there might be only updating of existing fields, not adding new ones.
I've noticed there is a Binary Data type in Mongo, and I am wondering if I should use this type for objects that are not binary. Can this give me what I'm looking for?
Also, I would love to know what is the advantage in using this type inside my documents. Can it save me disk space?
As at MongoDB 3.4, read and write operations are atomic on the level of a single document from the storage/memory point of view. If the MongoDB server needs to fetch a document from memory or disk (even when projecting a subset of fields to return) the full document generally has to be loaded into memory on a mongod. The only exception is if you can take advantage of covered queries where all of the fields returned are also included in the index used.
This data is used only for client side responses, meaning we should never query it.
Data fields which aren't queried directly do not need to be in any indexes. However, there is currently no concept like "not for query" fields in MongoDB. You can query or project any field (with or without an index).
Meaning, can I insert this data without affecting Mongo's performance?
Data with very different access or growth patterns (such as your infrequently requested client data) is a recommended candidate for storing separately from a parent document with frequently accessed data. This will improve the efficiency of memory usage for mongod by avoiding unnecessary retrieval of data when working with documents in the parent collection.
I've noticed there is a Binary Data type in Mongo, and I am wondering if I should use this type for objects that are not binary. Can this give me what I'm looking for? Also, I would love to know what is the advantage in using this type inside my documents. Can it save me disk space?
You should use a type that is most appropriate for the data that you are storing. Storing text data as binary will not gain you any obvious efficiencies in server storage. However, storing a complex object as a single value (for example, a JSON document serialized as a string) could save some serialization overhead if that object will only be interpreted via your client-side code. Binary data stored in MongoDB will be an opaque blob as far as indexing or querying, which sounds fine for your purposes.

Two mongodb collections in one query

I have a big collection of clients and a huge collection of the clients data, the collections are separated and I don't want to combine them to a single collection (because of the other already working servlets) but now I need to "Join" data from both collection in a single result.
Since The query should return a big number of results I don't want to query the server once and then use the result to query again. I'm also concerned about the traffic between the server and the DB and the memory that the result set will occupy in the server RAM.
The way it's working now is that I get the relevant client list from the 'clients' collection and send this list to the query of the 'client data' collection and only then I get the aggregated results.
I want to cut off the getting and sending the client list from and right back to the server, get the server to ask himself, let the query of client data collection to ask clients collection for the relevant client list.
How can I use a stored procedure(javascript functions) to do the query in the DB and return only the relevant clients out of the collection.
Alternatively, Is there a way to write a query that joins result from another collection ?
"Good news everyone", this aggregation query work just fine in the mongo shell as a join query
db.clientData.aggregate([{
$match: {
id: {
$in: db.clients.distinct("_id",
{
"tag": "qa"
})
}
}
},
$group: {
_id: "$computerId",
total_usage: {
$sum: "$workingTime"
}
}
}]);
The key idea with MongoDB data modelling is to be write-heavy, not read-heavy: store the data in the format that you need for reading, not in some format that minimizes/avoids redundancy (i.e. use a de-normalized data model).
I don't want to combine them to a single collection
That's not a good argument
I'm also concerned about the traffic between the server and the DB [...]
If you need the data, you need the data. How does the way it is queried make a difference here?
[...] and the memory that the result set will occupy in the server RAM.
Is the amount of data so large that you want to stream it from the server to the client, such that is transferred in chunks? How much data are we talking, and why does the client read it all?
How can I use a stored procedure to do the query in the DB and return only the relevant clients out of the collection
There are no stored procedures in MongoDB, but you can use server-side map/reduce to 'join' collections. Generally, code that is stored in and run by the database is a violation of the layer architecture separation of concerns. I consider it one of the most ugly hacks of all time - but that's debatable.
Also, less debatable, keep in mind that M/R has huge overhead in MongoDB and is not geared towards real-time queries made e.g. in a web server call. These calls will take hundreds of milliseconds.
Is there a way to write a query that joins result from another collection ?
No, operations are constrained to a single collection. You can perform a second query and use the $in operator there, however, which is similar to a subselect and reasonably fast, but of course requires two round-trips.
How can I use a stored procedure to do the query in the DB and return only the relevant clients out of the collection. Alternatively
There are no procedure in Mongodb
Alternatively, Is there a way to write a query that joins result from another collection ?
You normally don't need to do any Joins in MongoDB and there is no such thing. The flexibility of the document handled already typical need of joins. You should the think about your document model and asking how to design joins out of your schema should always be your first port of call. As alternative you may need to use aggregation or Map-Reduce in server side to handle this.
First of all, mnemosyn and Michael9 are right. But if I were in your shoes, also assuming that the client data collection is one document per client, I would store the document ID of the client data document in the client document to make the "join" (still no joins in Mongo) easier.
If you have more client data documents per client then an array of document IDs.
But all this does not save you from that you have to implement the "join" in your application code, if it's a Rails app then in your controller probably.

MongoDB objectId references

I have one collection, with the id of the documents set as MongoDb Object Ids (so appear in the db as:
Collection1
"someId": {
"$oid": "5003cb802e28076412000001"
},
In another collection, I am referencing these. However sometimes these references appear to be stored as proper oids:
Collection 2
"someForiegnId": {
"$oid": "5003cb802e28076412000001"
},
But other times they have made it into the db as a normal string.
Collection 2
"someForiegnId": "5003cb802e28076412000001",
My question is - Is is important to store these foreign references in the oid format, or can they just be strings?
I know that I am answering a 1 year old question, but still.
It is always desirable to be consistent in your database. No matter how you store your data (for example of IP address as a string "87.123.12.12" an array [87, 123, 12, 12] or a number 1467681804) it should be always the same way. The same with your data: you have to select one format and stick with it.
The format you will select have implications on how much storage will you use and how fast you will be able to query the data. And the best way is to store them as ObjectID for the following reasons:
it takes only 12 bytes to store objectID whereas it will take you twice to store the same in the string. For sure this is a small difference, but it goes absolutely for free. Moreover in the world where you try to fit all your data in memory (or at least as much as possible) this is a good consideration. So you save not only HDD but RAM as well
you can easily get timestamp from your ID. Sometimes it might be useful to get when the object was created. With strings you can not do this
if you will decide to create an index based on this field - the size of it will be much smaller and you will query it much faster (because this is a number) then querying by a string.
So even if I would have only string representations - I would change to ObjectID(), in your case this is definitely worth switching (I know that you most probably have already done so).
P.S. You can modify the field by modifying the query in the following answer.

MongoDB - lookup/taxonomies

In my MongoDB database I have a number of document types that require look-up/taxonomy information. Typically I'm either holding an Id of the look-up or an Id and denormalising the look up info into the main document.
e.g.
task = {
TaskDetail : "Some task",
TaskPriority : { Id : xxxxx, Code : 'U', Description:'Urgent' }
}
Moving from traditional relational databases (RDBMS) where I would just have a TaskPriority table, I was wondering what the best practice is when using documents within mongo?
My initial thought was to have a taxonomy/look-up collection. Typically, look-up and enums are short, so each could be a separate document in the collection? Or I could mirror what you'd typically do in a RDBMS database and create a collection to look-up?
Can anybody point in me in the right direction?
Thanks in advance,
Sam
Referencing has advantages such as :
Better management of master data
De-duplication, hence lesser updates.
disadvantages :
look-up is a separate API call, no joins in MongoDB
$DBRef of $lookup functions can still be used, but cumbersome. Manual reference is easier.
Atomicity in document level, hence look-up in collections can get out of sync for sometime if the reference look-up collection is updated.
Embedding -
On the other hand embedding does not make sense for reference data, since in case the look-up value gets updated, you need to update all your documents. That is a huge exercise to keep a track of all the documents and their keys which needs reference data updates.
Secondly embedding of reference data, also can not achieve SCD (slowly changing dimension) history keeping ability. If the referencing is used instead, you can achieve this with a version date in the reference collection.
Considering these points, I am inclined to referencing. But my only doubt is if I do not have the referenced value in my parent documents, how will search work? When a user search with descriptive referenced values, how mongo will refer to the reference data collection?