Relational or full object in MongoDB documents - mongodb

I have a general MongoDB question as I have recently found an issue with how I store things.
Currently, there is a collection called spaces like this:
{
_id: 5e1c4689429a8a0decf16f69,
challengers: [
5dfa24dce9cbc0180fb60226,
5dfa26f46719311869ac1756,
5dfa270c6719311869ac1757
],
tasks: [],
owner: 5dfa24dce9cbc0180fb60226,
name: 'testSpace',
description: 'testSpace'
}
As you can see, this has a challengers array, in which we store the ID of the User.
Would it be okey, if instead of storing the ID, I would store the entire User object, minus fields such as password etc?
Or should I continue with this reference path of referring to the ID of other documents?
The problem I have with this, is that when I want to go through all the spaces that a user has, I want to see what members are a part of that space (challengers array). However, I receive the IDS instead of name and email obviously. I am therefore struggling with sending the correct data to the frontend (I have tried doing some manual manipulation without luck).
So, if I have to continue the path of reference, then I will need to solve my problem somehow.
If it is okey to store the entire object in the array, It would be a lot easier.
HOWEVER, I want to do what is the best practice.
Thank you everyone!

Related

Using childByAutoId On Single Value?

I am pretty new to both Swift and Firebase, and I am attempting to make a simple app using Firebase as the backend. As far as I know, there is no memory-efficient way to use the numChildren() function without loading every single child into memory for counting, so I am implementing my own simple counter for the number of "Events" that have been created in my app.
The documentation for Firebase states that the childByAutoID() method should be used for updating lists in multi-user applications. I am assuming it adds a timestamp to the requested update and does them in order.
My question is whether it is necessary to use childByAutoID() when only updating a SINGLE field in a multi-user application. That is, will there be conflicts on my numEvents field if I do:
dbRef = FIRDatabase.database().reference()
dbRef.child("numEvents").setValue(num)
Or must I do:
dbRef = FIRDatabase.database().reference()
dbRef.child("numEvents").childByAutoId().setValue(num)
In order to avoid write conflicts? My only real confusion is that the documentation for childByAutoID stresses that it is useful when the children are a list of items, but mine is only a single item.
If you are only updating a single field you should not be using childByAutoId. To update a child value for an object, you need to obtain a reference to that object somehow, perhaps by a query of some sort (in many cases you will naturally already have a reference to the object if it needs to be changed) and you can change the value like this:
dbRef.child("events").child(objectToUpdateId).child(fieldToUpdateKey).setValue(newValue)
childByAutoId in this context would be used to create a new field like:
dbRef.child("events").childByAutoId().setValue(newObject)
I'm not exactly sure how this applies to your situation, but those are some descriptions of how to update a field, and use childByAutoId.
What childByAutoId does is create a unique key for a node, to avoid using the same key multiple times and then creating data conflicts like inconsistency (not write conflicts) to avoid write conflicts you use the transaction blocks.
The best way to learn is to try it out
If num == 1 , in the first example the result will be
dbRef:{
numEvents:1
}
While the second will be
dbRef:{
numEvents:{
//The auto-generated key
KLBHJBjhbjJBJHB:1
}
}
The childByAutoId would be useful if you want to save in a node multiple children of the same type, that way each children will have its own unique identifier
For example
pet:{
KJHBJJHB:{
name:fluffy,
owner:John Smith,
},
KhBHJBJjJ:{
name:fluffy,
owner:Jane Foster,
}
}
This way you have a unique identifier for cases where there is no clear way with the item data to guarantee it will be unique (in this case the pet's name)
Few things here:
childByAutoId is not a timestamp. But is used to create unique nodes in any given node.
Use case of childByAutoId :
You have messages node which stores messages from multiple user who are involved in a group chat. So each user can add messages in the group chat so you would do something like this each time user sends message:
dbRef = FIRDatabase.database().reference()
dbRef.child("messages").childByAutoId().setValue(messageText)
So this will create a unique message id for each message from different users. This will kind of act like primary key of message in normal databases.
The structure of database will be something like this:
messages: {
"randomIdGenerated-12asd12" : "hello",
"randomIdGenerated-12323D123" : "Hi, HOw are you",
}
So in your case your first approach is good enough! Since you dont need unique node for counting number of events added.

How to form an unordered key with many elements in mongodb

I'm attempting to use mongodb to implement a simple messaging system between two users in mongo. I want to be able to take two users, user0 and user1, and search for their entry in a collection. If the entry for those two users doesn't exist I want to create it and then add the message that was sent to its message field. If it does exist I just want to push the message to the message field.
I'm not really sure the best way to implement this.
db.privateChat.update(
{between:{$all:['user0', 'user1']}},
{$push:{message:'text'}}, {upsert:true}
)
And other similar entry schemes but they don't work. They produce the error:
"Cannot create base during insert of update. Caused by :ConflictingUpdateOperators Cannot update 'between' and 'between' at the same time"
I can think of other ways to do this producing a symmetric key (where the order of the users don't matter for the purposes of the search) from say adding the hashes together or a query that checks if either messenger0 or messenger1 is either user0 or user1 but these don't seem like great ways of doing it. Is this totally the wrong approach?
Thanks.
I think this could be solved by design.
let say that we have document in collection chats;
chat{
_id,
between[arrayOfIds],
startTime,
events[
{message:{
fromUserId,
timeStamp,
data}
}}
]}
}
then messages will be stored in message object inside chat .
App will be aware of chat _id so there will be no issues when you will have a group chat between more than 2 users.
This approach will allow you to prevent overflowing document size limitation as you could start new chat entry every week, day, etc...
Have a fun!

Meteor and MongoDB dropdown population and integrity

Hopefully I can describe this correctly but I come from the RDBMS world and I'm building an inventory type application with Meteor. Meteor and Mongodb may not be the best option for this application but hopefully it can be done and this seems like a circumstance that many converts will run into.
I'm trying to forget many of the things I know about relational databases and referential integrity so I can get my head wrapped around Mongodb but I'm hung up on this issue and how I would appropriately find the data with Meteor.
The inventory application will have a number of drop downs but I'll use an example to better explain. Let's say I wanted to track an item so I'll want the Name, Qty on Hand, Manufacturer, and Location. Much more than that but I'm keeping it simple.
The Name and Qty on Hand are easy since they are entered by the user but the Manufacturer and the Location should be chosen in a drop down from a data driven list (I'm assuming a Collection of sorts (or a new one added to the list if it is a new Manufacturer or Location). Odds are that I will use the Autocomplete package as well but the point is the same. I certainly wouldn't want the end user to misspell the Manufacturer name and thereby end up with documents that are supposed to have the same Manufacturer but that don't due to a typo. So I need some way to enforce the integrity of the data stored for Manufacturer and Location.
The reason is because when the user is viewing all inventory items later, they will have the option of filtering the data. They might want to filter the inventory items by Manufacturer. Or by Location. Or by both.
In my relational way of thinking this would just be three tables. INVENTORY, MANUFACTURER, and LOCATION. In the INVENTORY table I would store the ID of the related respective table row.
I'm trying to figure out how to store this data with Mongodb and, equally important, how to then find these Manufacturer and Location items to populate the drop down in the first place.
I found the following article which helps me understand some things but not quite what I need to connect the dots in my head.
Thanks!
referential data
[EDIT]
Still working at this, of course, but the best I've come up with is to do it normalized way much like is listed in the above article. Something like this:
inventory
{
name: "Pen",
manufacturer: id: "25643"},
location: {id: "95789"}
}
manufacturer
{
name: "BIC",
id: "25643"
}
location
{
name: "East Warehouse",
id: "95789"
}
Seems like this (in a more simple form) would have to be an extremely common need for many/most applications so want to make sure that I'm approaching it correctly. Even if this example code were correct, should I use an id field with generated numbers like that or should I just use the built-in _id field?
I've come from a similar background so I don't know if I'm doing it correctly in my application but I have gone for a similar option to you. My app is an e-learning app so an Organisation will have many Courses.
So my schema looks similar to yours except I obviously have an array of objects that look like {course_id: <id>}
I then registered a helper than takes the data from the organisation and adds in the additional data I need about the courses.
// Gets Organisation Courses - In your case could get the locations/manufacturers
UI.registerHelper('organisationCourses', function() {
user = Meteor.user();
if (user) {
organisation = Organisations.findOne({_id: user.profile.organisation._id});
courses = organisation.courses.courses;
return courses;
} else {
return false;
}
});
// This takes the coursedata and for each course_id value finds and adds all the course data to the object
UI.registerHelper('courseData', function() {
var courseContent = this;
var course = Courses.findOne({'_id': courseContent.course_id});
return _.extend(courseContent, _.omit(course, '_id'));
});
Then from my page all I have to call is:
{{#each organisationCourses}}
{{#with courseData}}
{{> admListCoursesItem}}
{{/with}}
{{/each}}
If I remember rightly I picked up this approach from an EventedMind How-to video.

How to change live date?

I wonder, How do I change a live data schema with MongoDB ?
For example If I have "Users" collection with the following document:
var user = {
_id:123312,
name:"name",
age:12,
address:{
country:"",
city:"",
location:""
}
};
now, in a new version of my application, if I add a new property to "User" entity, let us say weight, tall or adult ( based on users year ), How to change all the current live data which does not have adult property. I read MapReduce and group aggregation command but, they seem to be comfortable and suitable for analytic operation or other calculations, or I am wrong.
So what is the best way to change your current running data schema in MongoDB ?
It really depends upon your programming language. MongoDB is really good at having a dynamic schema. I think your pattern of thought at the moment is too SQL related whereby you believe that all rows, even if they do not yet have a value, must have the new field.
The reality is quite different. The rows which have nothing meaningful to put into them do not require the field and you can, in your application, just check to see if the returned document has a value, if not then you can assume, as in a fixed SQL schema, that the value is null.
So this is one aspect where MongoDB shines, is the fact that you don't have to apply that new field to the entire schema on demand, instead you can lazy fill it as data is entered by the user.
So just code the field into your application and let the user do the work for you.
The best way to add this field is to write a loop, in maybe the console close or on the primary of your replica (if you have one, otherwise just on the server), like so:
db.users.find().forEach(function(doc){
doc.weight = '44 stone';
db.users.save(doc);
});
That is currently the best way to do something like what your asking.

Searches (and general querying) with HBase and/or Cassandra (best practices?)

I have User model object with quite few fields (properties, if you wish) in it. Say "firstname", "lastname", "city" and "year-of-birth". Each user also gets "unique id".
I want to be able to search by them. How do I do that properly? How to do that at all?
My understanding (will work for pretty much any key-value storage -- first goes key, then value)
u:123456789 = serialized_json_object
("u" as a simple prefix for user's keys, 123456789 is "unique id").
Now, thinking that I want to be able to search by firstname and lastname, I can save in:
f:Steve = u:384734807,u:2398248764,u:23276263
f:Alex = u:12324355,u:121324334
so key is "f" - which is prefix for firstnames, and "Steve" is actual firstname.
For "u:Steve" we save as value all user id's who are "Steve's".
That makes every search very-very easy. Querying by few fields (properties) -- say by firstname (i.e. "Steve") and lastname (i.e. "l:Anything") is still easy - first get list of user ids from "f:Steve", then list from "l:Anything", find crossing user ids, an here you go.
Problems (and there are quite a few):
Saving, updating, deleting user is a pain. It has to be atomic and consistent operation. Also, if we have size of value limited to some value - then we are in (potential) trouble. And really not of an answer here. Only zipping the list of user ids? Not too cool, though.
What id we want to add new field to search by. Eventually. Say by "city". We certainly can do the same way "c:Los Angeles" = ..., "c:Chicago" = ..., but if we didn't foresee all those "search choices" from the very beginning, then we will have to be able to create some night job or something to go by all existing User records and update those "c:CITY" for them... Quite a big job!
Problems with locking. User "u:123" updates his name "Alex", and user "u:456" updates his name "Alex". They both have to update "f:Alex" with their id's. That means either we get into overwriting problem, or one update will wait for another (and imaging if there are many of them?!).
What's the best way of doing that? Keeping in mind that I want to search by many fields?
P.S. Please, the question is about HBase/Cassandra/NoSQL/Key-Value storages. Please please - no advices to use MySQL and "read about" SELECTs; and worry about scaling problems "later". There is a reason why I asked MY question exactly the way I did. :-)
Being able to query properties directly is one of the features you lose when moving away from SQL, so you need a way to maintain your own index to let you find records.
If your datastore does not have built in indexing or atomic list operations, you will need to deal with the locking issues you mention. However, indexing doesn't necessarily need to be synchronous - maintain a queue of updated records to be reindexed and you have a solution for 3 that can be reused to solve 2 also.
If the index list for a particular value becomes too large for the system to handle in a single list, you can replace the list of users with a list of lists. However, if you have that many records with the same value it probably isn't a particularly useful search criteria anyway.
Another option that is useful in some cases is to use a seperate system for the indexing - for example you could set up lucene to index the records in your main datastore.
I guess i would have implemented this as a MapReduce job, which would run on schedule.
Each search word, would be a row-key with lookup to UID.
Rowkey:uid1
profile:firstName: Joe
profile:lastName: Doe
profile:nick: DoeMaster
Rowkey: uid2
profile:firstName: Jane
profile:lastName: Doe
profile:nick: SuperBabe
MapReduse indexes all searchable properties and add them with search word as row key
Rowkey: Jane
lookup:uid: uid2
Rowkey: Doe
lookup:uid: uid2, uid1
Rowkey: DoeMaster
lookup:uid: uid1
..etc
Now, if you need to update the index list on the fly as a user change, you would write the change directly to the index base, by remove uid value from index and add to another row key. In case of this happens at the same time, temporary locking could be implemented.
For users being removed, an additional attribute telling the state of the user could be use to filter them out from search.
Adding additional search word isn't very hard, since its just about which name:value you want to index. you could filter search more also by adding type attribute to your row key/keyword. i.e boston - lookup:type: city.
The idea is to maintain your own row key based search index inside hbase.