Fast Searching Based on "custom fields" along with other fields - mongodb

We have a legacy implementation of User Groups, which is way more than it implies. Users can be assigned to a group and you can create a hierarchy of groups. Groups can also have system wide permissions assigned to them, or a group can be used on some other module for permissions. You can even do a permission where it's something complicated like
((In Group1 or Group2) and (In Group3 and Group4)) or (In Group5 and (not IN Group1 or Group2))
When a permission like this is created, it will actually select all the users that match this, create a "derived group" and then assign those users to the new group.
In our new application, we have a completely different permissions system that handles these sorts of use cases pretty well, with it also being attribute based, rather than group/role based.
That being said, groups are still used for other things, other than permissions. We might build a report based upon a group, or send out emails to a group, etc. We still need this functionality.
It also looks like we're moving our current user information, into Mongo DB due to the fact that each client can customize the fields that are available for a user to populate, so a user might have a "job title" for one client, but another they would have a "designation", or "position". We call these "custom fields". The client can create as many of these fields as they want.
So that's the back story. My issue is that I don't really want to create a new "groups" implementation, since all we really need is a way to create and save filters for users, so when we need to send out an email to a specific subset of users, it will either use a filter that has already been saved, or create a new one.
So this is the original format for the user document in MongoDB:
{
"id": 123456,
"username": "john.smith#domain.com",
"first_name": "John",
"last_name": "Smith",
"email": "john.smith#domain.com",
"employee_type": "permanent",
"account": {
"enabled": true,
"locked": false,
"redeem_only": false
},
"custom_fields": {
"job_title": "Cashier",
"branch_code": "000123",
"social_team_name": "The Terrible Trolls"
}
},
{
"id": 123457,
"username": "jane.smith#domain.com",
"first_name": "Jane",
"last_name": "Smith",
"email": "john.smith#domain.com",
"employee_type": "permanent",
"account": {
"enabled": true,
"locked": false,
"redeem_only": false
},
"custom_fields": {
"job_title": "Mortgage Consultant",
"branch_code": "000123",
"social_team_name": "Team Savage"
}
},
{
"id": 123458,
"username": "morgan.jones#domain.com",
"first_name": "Morgan",
"last_name": "Jones",
"email": "morgan.jones#domain.com",
"employee_type": "permanent",
"account": {
"enabled": true,
"locked": false,
"redeem_only": false
},
"custom_fields": {
"job_title": "Regional Manager",
"branch_code": "000124",
"social_team_name": "The Terrible Trolls"
}
}
So we might want to create a filter where account.enabled = true AND employee_type='permanent' AND custom_fields.branch_code=000124. The filter could be any combination of fields in anyway.
Ultimately I'm wondering if this sort of structure is the best way to do this, I know I can use wildcard indexes to index the custom fields, but I'm still limited with regards to the amount of indexes I can create, so if a field is used in a query that isn't indexed, or we've hit our limit for creating indexes, then it's going to start slowing things down.
Another structure I saw is as follows:
{
"id": 123456,
"username": "john.smith#domain.com",
"first_name": "John",
"last_name": "Smith",
"email": "john.smith#domain.com",
"employee_type": "permanent",
"account": {
"enabled": true,
"locked": false,
"redeem_only": false
},
"custom_fields": [
{
"k": "Job Title",
"v": "Cashier"
},
{
"k": "Branch Code",
"v": "000123"
},
{
"k": "Social Team Name",
"v": "The Terrible Trolls"
}
]
},
{
"id": 123457,
"username": "jane.smith#domain.com",
"first_name": "Jane",
"last_name": "Smith",
"email": "john.smith#domain.com",
"employee_type": "permanent",
"account": {
"enabled": true,
"locked": false,
"redeem_only": false
},
"custom_fields": [
{
"k": "Job Title",
"v": "Mortgage Consultant"
},
{
"k": "Branch Code",
"v": "000123"
},
{
"k": "Social Team Name",
"v": "Team Savage"
}
]
},
{
"id": 123458,
"username": "morgan.jones#domain.com",
"first_name": "Morgan",
"last_name": "Jones",
"email": "morgan.jones#domain.com",
"employee_type": "permanent",
"account": {
"enabled": true,
"locked": false,
"redeem_only": false
},
"custom_fields": [
{
"k": "Job Title",
"v": "Regional Manager"
},
{
"k": "Branch Code",
"v": "000124"
},
{
"k": "Social Team Name",
"v": "The Terrible Trolls"
}
]
}
However, I'm not really sure if this would be better or not, as the problem still remains that we are limited by the amount of indexes we can create.
Is there a viable solution for this (links to articles/resources would be great), or am I going to end up saving a "filter", selecting all the users that apply to the filter and then assigning them to "filter" for easy lookup, but then have to rebuild every time a user updates their information, gets promoted, or anything else that changes the field values?

Related

Updating Mongo DB collection field from object to array of objects

I had to change one of the fields of my collection in mongoDB from an object to array of objects containing a lot of data. New documents get inserted without any problem, but when attempted to get old data, it never maps to the original DTO correctly and runs into errors.
subject is the field that was changed in Students collection.
I was wondering is there any way to update all the records so they all have the same data type, without losing any data.
The old version of Student:
{
"_id": "5fb2ae251373a76ae58945df",
"isActive": true,
"details": {
"picture": "http://placehold.it/32x32",
"age": 17,
"eyeColor": "green",
"name": "Vasquez Sparks",
"gender": "male",
"email": "vasquezsparks#orbalix.com",
"phone": "+1 (962) 512-3196",
"address": "619 Emerald Street, Nutrioso, Georgia, 6576"
},
"subject":
{
"id": 0,
"name": "math",
"module": {
"name": "Advanced",
"semester": "second"
}
}
}
This needs to be updated to the new version like this:
{
"_id": "5fb2ae251373a76ae58945df",
"isActive": true,
"details": {
"picture": "http://placehold.it/32x32",
"age": 17,
"eyeColor": "green",
"name": "Vasquez Sparks",
"gender": "male",
"email": "vasquezsparks#orbalix.com",
"phone": "+1 (962) 512-3196",
"address": "619 Emerald Street, Nutrioso, Georgia, 6576"
},
"subject": [
{
"id": 0,
"name": "math",
"module": {
"name": "Advanced",
"semester": "second"
}
},
{
"id": 1,
"name": "history",
"module": {
"name": "Basic",
"semester": "first"
}
},
{
"id": 2,
"name": "English",
"module": {
"name": "Basic",
"semester": "second"
}
}
]
}
I understand there might be a way to rename old collection, create new and insert data based on old one in to new one. I was wondering for some direct way.
The goal is to turn subject into an array of 1 if it is not already an array, otherwise leave it alone. This will do the trick:
update args are (predicate, actions, options).
db.foo.update(
// Match only those docs where subject is an object (i.e. not turned into array):
{$expr: {$eq:[{$type:"$subject"},"object"]}},
// Actions: set subject to be an array containing $subject. You MUST use the pipeline version
// of the update actions to correctly substitute $subject in the expression!
[ {$set: {subject: ["$subject"] }} ],
// Do this for ALL matches, not just first:
{multi:true});
You can run this converter over and over because it will ignore converted docs.
If the goal is to convert and add some new subjects, preserving the first one, then we can set up the additional subjects and concatenate them into one array as follows:
var mmm = [ {id:8, name:"CORN"}, {id:9, name:"DOG"} ];
rc = db.foo.update({$expr: {$eq:[{$type:"$subject"},"object"]}},
[ {$set: {subject: {$concatArrays: [["$subject"], mmm]} }} ],
{multi:true});

Delete sub-document from array in array of sub documents

Let's imagine a mongo collection of - let's say magazines. For some reason, we've ended up storing each issue of the magazine as a separate document. Each article is a subdocument inside an Articles-array, and the authors of each article is represented as a subdocument inside the Writers-array on the Article-subdocument. Only the name and email of the author is stored inside the article, but there is an Writers-array on the magazine level containing more information about each author.
{
"Title": "The Magazine",
"Articles": [
{
"Title": "Mongo Queries 101",
"Summary": ".....",
"Writers": [
{
"Name": "tom",
"Email": "tom#example.com"
},
{
"Name": "anna",
"Email": "anna#example.com"
}
]
},
{
"Title": "Why not SQL instead?",
"Summary": ".....",
"Writers": [
{
"Name": "mike",
"Email": "mike#example.com"
},
{
"Name": "anna",
"Email": "anna#example.com"
}
]
}
],
"Writers": [
{
"Name": "tom",
"Email": "tom#example.com",
"Web": "tom.example.com"
},
{
"Name": "mike",
"Email": "mike#example.com",
"Web": "mike.example.com"
},
{
"Name": "anna",
"Email": "anna#example.com",
"Web": "anna.example.com"
}
]
}
How can one author be completely removed from a magazines?
Finding magazines where the unwanted author exist is quite easy. The problem is pulling the author out of all the sub documents.
MongoDB 3.6 introduces some new placeholder operators, $[] and $[<identity>], and I suspect these could be used with either $pull or $pullAll, but so far, I haven't had any success.
Is it possible to do this in one go? Or at least no more than two? One query for removing the author from all the articles, and one for removing the biography from the magazine?
You can try below query.
db.col.update(
{},
{"$pull":{
"Articles.$[].Writers":{"Name": "tom","Email": "tom#example.com"},
"Writers":{"Name": "tom","Email": "tom#example.com"}
}},
{"multi":true}
);

MongoDB database design for users and their data

New to MongoDB and databases in general. I'm trying to make a basic property app with Express and MongoDB for practice.
I'm looking for some help on the best way to scheme this out.
Basically, my app would have landlords and tenants. Each landlord would have a bunch of properties that information is stored about. Things like lease terms, tenant name, maintenance requests, images, etc.
The tenants would be able to sign up and be associated with the property they live in. They could submit maintenance forms, etc.
Is this a good approach? Should everything be kept in the same collection? Thanks.
{
"_id": "507f1f77bcf86cd799439011",
"user": "Corey",
"password": "hashed#PASSWORD",
"email": "corey#email.com",
"role": "landlord",
"properties": [
{
"addressId": "1",
"address": "101 Main Street",
"tenant": "John Smith",
"leaseDate": "04/21/2016",
"notes": "These are my notes about this property.",
"images": [ "http://www.imagelink.com/image1", "http://www.imagelink.com/image2", "http://www.imagelink.com/image3"]
},
{
"addressId": "2",
"address": "105 Maple Street",
"tenant": "John Jones",
"leaseDate": "01/01/2018",
"notes": "These are my notes about 105 Maple Ave property.",
"images": ["http://www.imagelink.com/image1", "http://www.imagelink.com/image2", "http://www.imagelink.com/image3"],
"forms": [
{
"formType": "lease",
"leaseTerm": "12 months",
"leaseName": "John Jones",
"leaseDate": "01/01/2018"
},
{
"formtype": "maintenance",
"maintenanceNotes": "Need furnace looked at. Doesn't heat properly.",
"maintenanceName": "John Jones",
"maintenanceDate": "01/04/2018",
"status": "resolved"
},
]
},
{
"addressId": "3",
"address": "110 Chestnut Street",
"tenant": "John Brown",
"leaseDate": "07/28/2014",
"notes": "These are some notes about 110 Chestnut Ave property.",
"images": [ "http://www.imagelink.com/image1", "http://www.imagelink.com/image2", "http://www.imagelink.com/image3"]
}
]
}
{
"_id": "507f1f77bcf86cd799439012",
"user": "John",
"password": "hashed#PASSWORD",
"email": "john#email.com",
"role": "tenant",
"address": "2",
"images": [ "http://www.imagelink.com/image1", "http://www.imagelink.com/image2" ]
}
For this relation I'd suggest three collections (Landlords, Properties, and Tenants), with each tenant having a "landLordId" and "propertyId".
This "landLordId" would simply be the ObjectId of the landLord, and same for the property Id.
This will make your life easier if you plan to do any kind of roll-up reports or if the you have more than one-to-one mappings for landlords to properties or landlords to tenants. (Example, more than one property manager for a given property)
This just makes everything easier/more intuitive as you could simply add things like maintenance requests, lease terms etc in arrays on the tenants with references to whatever need be.
This offers the most flexibility in terms of being able to aggregate easily for any kind of report/query.

Search people on facebook by name and get fields=name, id, education

I want to create a search engine, it want to search people by name and i needed fields: id, name, education.
Is there possible?
I use this way:
https://graph.facebook.com/search?q=tom&type=user&fields=id,name,education&access_token=XXX
but, in result missing the education info.
{ "data": [
{
"id": "123456789",
"name": "name0"
},
{
"id": "123456789",
"name": "name1"
},
{
"id": "123456789",
"name": "name2"
},
{
"id": "123456789",
"name": "name3"
}, ...
That´s not possible, you can only get the education of users who authorized your App with the user_education_history permission, this would be the API call:
https://graph.facebook.com/me?fields=education

How can I query an indexed object list in mongodb?

I have some documents in the "company" collection structured this way :
[
{
"company_name": "Company 1",
"contacts": {
"main": {
"email": "main#company1.com",
"name": "Mainuser"
},
"store1": {
"email": "store1#company1.com",
"name": "Store1 user"
},
"store2": {
"email": "store2#company1.com",
"name": "Store2 user"
}
}
},
{
"company_name": "Company 2",
"contacts": {
"main": {
"email": "main#company2.com",
"name": "Mainuser"
},
"store1": {
"email": "store1#company2.com",
"name": "Store1 user"
},
"store2": {
"email": "store2#company2.com",
"name": "Store2 user"
}
}
}
]
I'm trying to retrieve the doc that have store1#company2.com as a contact but cannot find how to query a specific value of a specific propertie of an "indexed" list of objects.
My feeling is that the contacts lists should not not be indexed resulting in the following structure :
{
"company_name": "Company 1",
"contacts": [
{
"email": "main#company1.com",
"name": "Mainuser",
"label": "main"
},
{
"email": "store1#company1.com",
"name": "Store1 user",
"label": "store1"
},
{
"email": "store2#company1.com",
"name": "Store2 user",
"label": "store2"
}
]
}
This way I can retrieve matching documents through the following request :
db.company.find({"contacts.email":"main#company1.com"})
But is there anyway to do a similar request on document using the previous structure ?
Thanks a lot for your answers!
P.S. : same question for documents structured this way :
{
"company_name": "Company 1",
"contacts": {
"0": {
"email": "main#company1.com",
"name": "Mainuser"
},
"4": {
"email": "store1#company1.com",
"name": "Store1 user"
},
"1": {
"email": "store2#company1.com",
"name": "Store2 user"
}
}
}
Short answer: yes, they can be queried but it's probably not what you want and it's not going to be really efficient.
The document structure in the first and third block is basically the same - you have an embedded document. The only difference between are the name of the keys in the contacts object.
To query document with that kind of structure you will have to do a query like this:
db.company.find({ $or : [
{"contacts.main.email":"main#company1.com"},
{"contacts.store1.email":"main#company1.com"},
{"contacts.store2.email":"main#company1.com"}
]});
This query will not be efficient, especially if you have a lot of keys in the contacts object. Also, creating a query will be unnecessarily difficult and error prone.
The second document structure, with an array of embedded objects, is optimal. You can create a multikey index on the contacts array which will make your query faster. The bonus is that you can use a short and simple query.
I think the easiest is really to shape your document using the structure describe in your 2nd example : (I have not fixed the JSON)
{
"company_name": "Company 1",
"contacts":{[
{"email":"main#company1.com","name":"Mainuser", "label": "main", ...}
{"email":"store1#company1.com","name":"Store1 user", "label": "store1",...}
{"email":"store2#company1.com","name":"Store2 user", "label": "store2",...}
]}
}
like that you can easily query on email independently of the "label".
So if you really want to use the other structure, (but you need to fix the JSON too) you will have to write more complex code/aggregation pipeline, since we do not know the name and number of attributes when querying the system. Theses structures are also probably hard to use by the developers independently of MongoDB queries.
Since it was not clear let me show what I have in mind
db.company.save(
{
"company_name": "Company 1",
"contacts":[
{"email":"main#company1.com","name":"Mainuser", "label": "main"},
{"email":"store1#company1.com","name":"Store1 user", "label": "store1"},
{"email":"store2#company1.com","name":"Store2 user", "label": "store2"}
]
}
);
db.company.save(
{
"company_name": "Company 2",
"contacts":[
{"email":"main#company2.com","name":"Mainuser", "label": "main"},
{"email":"store1#company2.com","name":"Store1 user", "label": "store1"},
{"email":"store2#company2.com","name":"Store2 user", "label": "store2"}
]
}
);
db.company.ensureIndex( { "contacts.email" : 1 } );
db.company.find( { "contacts.email" : "store1#company2.com" } );
This allows you to store many emails, and query with an index.