Confusion regarding Mongo db Schema. How to make it better? - mongodb

I am using mongoose with node.js for this.
My current Schema is this:
var linkSchema = new Schema({
text: String,
tags: array,
body: String,
user: String
})
My use-case is this: There are a list of users and each user has a list of links associated with it. Users and links are different Schemas of course. Thus, how does one get that sort of one to one relationship done using mongo-db.
Should I make a User Schema and embed linkSchema in it? Or the other way around?
Another doubt regarding that. Tags would always be an array of strings which I can use to browse through links later. Should it be an array data type or is there a better way to represent it?

If it's 1:1 then nest one document inside the other. Which way around depends on the queries, but you could easily do both if you need to.
For tags, you can index an array field and use that for searching/filtering documents and from the information you've given that sounds reasonable IMHO.
If you had a fixed set of tags it would make sense to represent those as a nested object with named fields perhaps, depending on queries. Don't forget you not only can create nested documents in Mongo but you can also search on sub-fields and even use entire nested documents as searchable/indexable fields. For instance, you could have a username like this;
email: "joe#somewhere.com"
as a string, and you could also do;
email: {
user: "joe",
domain: "somewhere.com"
}
you could index email in both cases and use either for matching. In the latter case though you could also search on domain or user only without resorting to RegEx style queries. You could also store both variants, so there's lots of flexibile options in Mongo.
Going back to tags, I think your array of strings is a fine model given what you've described, but if you were doing more complex bulk aggregation, it wouldn't be crazy to store a document for every tag with the same document contents, since that's essentially what you'd have to do for every query during aggregation.

Related

MongoDB ObjectId vs string in find()

I'm starting to play with mongodb, and I learned that when inserting a document, you can either provide an ID, or let mongodb generate it for you.
I thought this is nice, because I want to let my users optionally choose an id, and if not generate it for them.
But the problem is, the generated one is of type ObjectId while the user provided one is a string, and the find method only returns the correct answer if you pass it with the correct type. So when a user requests GET /widget/123, I have no idea if the original ID was stored as an ObjectId or a string do I?
So how am I supposed to use this feature?
First off, I'd recommend against letting users provide _ids: if 2 users want to use the same _id, the second user will be unable to, which will be frustrating. If users want that functionality, I'd recommend storing the user created id on a separate field & querying by user (or company or whatever) and the user-created id.
That said, mongo ObjectIds are 24 hex characters, so you can safely identify when an id is not a MongoId by checking whether it doesn't match /^[a-f0-9]{24}$/ (or by seeing whether a call to ObjectId("maybeAnObjectId") throws). In the case where it's unclear (where a user might have provided 24 hex characters as their id), you'll need to use $in (or $or) to query for both cases:
const query = /^[a-f0-9]{24}$/.test(id) ? { _id: {$in: [ObjectId(id), id]}} : {_id: id}
(an annoying user could re-use an autogenerated ObjectId as their string id, and then queries to that route would return two values and there'd be no way of differentiating them).

mgo - bson.ObjectId vs string id

Using mgo, it seems that best practice is to set object ids to be bson.ObjectId.
This is not very convenient, as the result is that instead of a plain string id the id is stored as binary in the DB. Googling this seems to yield tons of questions like "how do I get a string out of the bson id?", and indeed in golang there is the Hex() method of the ObjectId to allow you to get the string.
The bson becomes even more annoying to work with when exporting data from mongo to another DB platform (this is the case when dealing with big data that is collected and you want to merge it with some properties from the back office mongo DB), this means a lot of pain (you need to transform the binary ObjectId to a string in order to join with the id in different platforms that do not use bson representation).
My question is: what are the benefits of using bson.ObjectId vs string id? Will I lose anything significant if I store my mongo entities with a plain string id?
As was already mentioned in the comments, storing the ObjectId as a hex string would double the space needed for it and in case you want to extract one of its values, you'd first need to construct an ObjectId from that string.
But you have a misconception. There is absolutely no need to use an ObjectId for the mandatory _id field. Quite often, I advice against that. Here is why.
Take the simple example of a book, relations and some other considerations set aside for simplicty:
{
_id: ObjectId("56b0d36c23da2af0363abe37"),
isbn: "978-3453056657",
title: "Neuromancer",
author: "William Gibson",
language: "German"
}
Now, what use would have the ObjectId here? Actually none. It would be an index with hardly any use, since you would never search your book databases by an artificial key like that. It holds no semantic value. It would be a unique ID for an object which already has a globally unique ID – the ISBN.
So we simplify our book document like this:
{
_id: "978-3453056657",
title: "Neuromancer",
author: "William Gibson",
language: "German"
}
We have reduced the size of the document, make use of a preexisting globally unique ID and do not have a basically unused index.
Back to your basic question wether you loose something by not using ObjectIds: Quite often, not using the ObjectId is the better choice. But if you use it, use the binary form.

how to join a collection and sort it, while limiting results in MongoDB

lets say I have 2 collections wherein each document may look like this:
Collection 1:
target:
_id,
comments:
[
{ _id,
message,
full_name
},
...
]
Collection 2:
user:
_id,
full_name,
username
I am paging through comments via $slice, let's say I take the first 25 entries.
From these entries I need the according usernames, which I receive from the second collection. What I want is to get the comments sorted by their reference username. The problem is I can't add the username to the comments because they may change often and if so, I would need to update all target documents, where the old username was in.
I can only imagine one way to solve this. Read out the entire full_names and query them in the user collection. The result would be sortable but it is not paged and so it takes a lot of resources to do that with large documents.
Is there anything I am missing with this problem?
Thanks in advance
If comments are an embedded array, you will have to do work on the client side to sort the comments array unless you store it in sorted order. Your application requirements for username force you to either read out all of the usernames of the users who commented to do the sort, or to store the username in the comments and have (much) more difficult and expensive updates.
Sorting and pagination don't work unless you can return the documents in sorted order. You should consider a different schema where comments form a separate collection so that you can return them in sorted order and paginate them. Store the username in each comment to facilitate the sort on the MongoDB side. Depending on your application's usage pattern this might work better for you.
It also seems strange to sort on usernames and expect/allow usernames to change frequently. If you could drop these requirements it'd make your life easier :D

Mongoid: retrieving documents whose _id exists in another collection

I am trying to fetch the documents from a collection based on the existence of a reference to these documents in another collection.
Let's say I have two collections Users and Courses and the models look like this:
User: {_id, name}
Course: {_id, name, user_id}
Note: this just a hypothetical example and not actual use case. So let's assume that duplicates are fine in the name field of Course. Let's thin Course as CourseRegistrations.
Here, I am maintaining a reference to User in the Course with the user_id holding the _Id of User. And note that its stored as a string.
Now I want to retrieve all users who are registered to a particular set of courses.
I know that it can be done with two queries. That is first run a query and get the users_id field from the Course collection for the set of courses. Then query the User collection by using $in and the user ids retrieved in the previous query. But this may not be good if the number of documents are in tens of thousands or more.
Is there a better way to do this in just one query?
What you are saying is a typical sql join. But thats not possible in mongodb. As you suggested already you can do that in 2 different queries.
There is one more way to handle it. Its not exactly a solution, but the valid workaround in NonSql databases. That is to store most frequently accessed fields inside the same collection.
You can store the some of the user collection fields, inside the course collection as embedded field.
Course : {
_id : 'xx',
name: 'yy'
user:{
fname : 'r',
lname :'v',
pic: 's'
}
}
This is a good approach if the subset of fields you intend to retrieve from user collection is less. You might be wondering the redundant user data stored in course collection, but that's exactly what makes mongodb powerful. Its a one time insert but your queries will be lot faster.

MongoDB - forcing stored value to uppercase and searching

in SQL world I could do something to the effect of:
SELECT name FROM table WHERE UPPER(name) = UPPER('Smith');
and this would match a search for "Smith", "SMITH", "SmiTH", etc... because it forces the query and the value to be the same case.
However, MongoDB doesn't seem to have this capability without using a RegEx, which won't use indexes and would be slow for a large amount of data.
Is there a way to convert a stored value to a particular case before doing a search against it in MongoDB?
I've come across the $toUpper aggregate, but I can't figure out how that would be used in this particular case.
If there's not way to convert stored values before searching, is it possible to have MongoDB convert a value when it's created in Mongo? So when I add a document to the collection it would force the "name" attribute to a particular case? Something like a callback in the Rails world.
It looks like there's the ability to create stored JS for MongoDB as well, similar to a Stored Procedure. Would that be a feasible solution as well?
Mostly looking for a push in the right direction; I can figure out the particular code once I know what I'm looking for, but so far I'm not even sure if my desired functionality is doable.
You have to normalize your data before storing them. There is no support for performing normalization as part of a query at runtime.
The simplest thing to do is probably to save both a case-normalized (i.e. all-uppercase) and display version of the field you want to search by. Suppose you are storing users and want to do a case-insensitive search on last name. You might store:
{
_id: ObjectId(...),
first_name: "Dan",
last_name: "Crosta",
last_name_upper: "CROSTA"
}
You can then create an index on last_name_upper, and query like:
> db.users.find({last_name_upper: "CROSTA"})