Elastic/Nearest search based on document properties in MongoDB - mongodb

We need to accomplish the nearest search based on document properties in MongoDB.
Let's take an example, there is a Car schema in MongoDB, information will be stored as something similar to:
{
Make: "Hyundai",
Model: "Creta",
Title: "Hyundai Creta E 1.6 Petrol",
Description: "Compact SUV",
Feature: {
ABS: true,
EBD: true,
Speakers: 4,
Display: false
},
Specification: {
Length: "4270 mm",
Width: "1780 mm",
Height: "1630 mm",
Wheelbase: "2590 mm",
Doors: 5,
Seating: 5,
Displacement: "1591 cc"
},
Safety: {
Airbags: 2,
SeatBeltWarning: false
},
Maintenance: {
LastService: "21/06/2016",
WashingDone: true
}
}
Search needs to be provided based on following criteria:
1. Make
2. Model
3. ABS
4. Seating
5. Displacement
6. Airbags
Now results should contain records where 3 or more of the properties match (exact match), and ordered based on the maximum number of properties that match.
What is the best way to implement this with MongoDB?

You could write something to generate documents for each triplet of fields, and then put them together with $or, producing something like
{$or: [
{Make: "Hyundai", Model: "Creta", Feature.ABS: true},
{Make: "Hyundai", Model: "Creta", Specification.Seating: 5},
...
]}
Sorting probably requires sorting by textScore.

Related

What's the best way to update a Map/Dictionary which is part of a MongoDb document?

I'm new to MongoDb so I'm not sure what's the best approach regarding the following:
I have a MongoDb document which contains multiple fields, including a map/dictionary.
e.g. -> priceHistogram:
rents {
_id:"1234",
city:"London",
currentPrice:"500",
priceHistogram: {"14-02-2021" : "500"}
}
I would like to update the currentPrice field with the latest price but also add to the price histogram taday's date and the price'; e.g. if today's price would be 600, I would like to obtain the following:
rents {
_id:"1234",
city:"London",
currentPrice:"600",
priceHistogram: {"14-02-2021" : "500", "20-02-2021" ": "600"}
}
What would be the most efficient MongoDb function/approach allowing me to achieve this (everything else remains the same - _id/city)?
Thank you
Not sure how your schema looks like, I will assume that the schema looks similar to:
const rentsSchema = mongoose.Schema(
{
city: { type: String, required: true },
currentPrice: {type: String},
priceHistogram: {type: Map, of:String}
}
)
const rents = mongoose.model('Rents', histSchema);
And the update:
rents.updateOne({city:"London"},{
currentPrice:"600",
"priceHistogram.24-02-2021": "600"
})
Since as I have understood Map is another way to add arbitrary properties.

How to structure a firestore collection for items with variants, when using Algolia Search?

So as the title implies I've got an items collection in firestore that looks like this:
ParentCollection
items
->docId_1
->name: 'Shirt'
->price: 5
->tags: ['tag1', 'tag2', 'tag3']
->attribute1: 'blah'
->attribute2: 'blahblah'
However, I need these items to have variants, such as different sizes/color/etc. In this structure the only way is to have a completely new item for each variant. Which isn't ideal,
Here's my current thinking on the requirements:
A parent item that houses all variants, has a unique docId
Each variant has a unique docId but is attached somehow to the parent Item
Common attributes are fields on the parent item (name, company, tags etc)
Unique attributes are attributed to each variant (available, size/type, price)
My current plan is this:
ParentCollection
items
->parentDocId_1
->name: 'Shirt'
->variants : [{
size: S,
available: true,
variantDocId: variantDocId_1
},{
size: L,
available: false,
variantDocId: variantDocId_2
}]
->tags: ['tag1', 'tag2', 'tag3']
->attribute1: 'blah'
->attribute2: 'blahblah'
SubCollection
variants
->variantDocId_1
->size: 'S'
->price: 5
->parentDocId: parentDocId_1
->variantDocId_2
->size: 'L'
->price: 10
->parentDocId: parentDocId_1
I can see a couple issues with this.
It will require extra db calls since Firestore cannot get the sub Collection when requesting the parent collection.
A big problem is that I need to attach an available attribute to the items, previously I had it in the item collection level but now that I have variants I'll need to put it on each variant. If I have it in the variants array at the parent collection level, I don't think I can filter for available in algolia search anymore as there will be multiple values for the one Item. If I put available down in the variants sub collection, that is not indexed by algolia as it's only indexing documents in the Items collection. Not sure what the solution is here.
I'd really prefer not to have a sub collection at all, I suppose I could do the above without a sub collection and just merge all the unique attributes into the variants array. But then I won't have a unique docId for each variant, and I'm pretty sure I need that (not totally sure yet). Also it doesn't fix my available attribute issue.
Any thoughts on how to properly do this? Is there a way to do it without a sub collection?
Turns out the solution to this issue is more to do with algolia search than firestore. In order to achieve what I want I can keep my firestore database very close to what it was originally (no sub collection necessary), I just need to add an array of objects that contains the information unique to each variant (variants: []) as well as an identifier to bind the individual documents together as variants of each other (distinct: 12345).
The key here is aloglia's distinct feature which allows de duplication of items that are bound together by a specific key. So in the example below I tie the three variants of the Shirt item together by the distinct: 12345 field. You have to go into the aloglia dashboard to turn on distinct and set the key name. Now only one of the variants will show when searching (which depends on custom rankings or filters), but I will have access to the info of all them via the variants field. This allows me to have unique id's for each variant as well has have all the attributes for each be filterable within algolia search. It also allows me to build a selection dropdown to choose which variant the user would like to interact with. One caveat is that there is redundancy added, and when updating an item variant it will require a cloud function to propagate the change across all variants. But it works and Problem solved!
ParentCollection
items
->docId_1
->name: 'Shirt,
->tags: ['tag1', 'tag2', 'tag3']
->attribute1: 'blah'
->attribute2: 'blahblah'
->distinct: 12345
->variants: [
{
size: 'S',
available: true,
price: 5,
docId: docId_1
},
{
size: 'M',
available: false,
price: 10,
docId: docId_2
},
{
size: 'L',
available: true,
price: 15,
docId: docId_3
}]
->docId_2
->name: 'Shirt'
->tags: ['tag1', 'tag2', 'tag3']
->attribute1: 'blah'
->attribute2: 'blahblah'
->distinct: 12345
->variants: [
{
size: 'S',
available: true,
price: 5,
docId: docId_1
},
{
size: 'M',
available: false,
price: 10,
docId: docId_2
},
{
size: 'L',
available: true,
price: 15,
docId: docId_3
}]
->docId_3
->name: 'Shirt'
->tags: ['tag1', 'tag2', 'tag3']
->attribute1: 'blah'
->attribute2: 'blahblah'
->distinct: 12345
->variants: [
{
size: 'S',
available: true,
price: 5,
docId: docId_1
},
{
size: 'M',
available: false,
price: 10,
docId: docId_2
},
{
size: 'L',
available: true,
price: 15,
docId: docId_3
}]

Historical data structure on MongoDB

Based on a certain time interval I need to implement pre-aggregated statistical data based on the following model:
I have a Product entity and ProductGroup entity that plays a role of Products container. I can have 0..N Products and 0..N ProductGroups with the MANY_2_MANY relationship between Products and ProductGroups.
Based on some own business logic I can calculate the order of every Product in the every ProductGroup.
I will do this calculation continuously per some period of time... let's say via Cron job.
I also would like to store the history for every calculation(versions) in order to be able to analyze the Product positions shifts.
I have created a simple picture with this structure:
Right now I use MongoDB database and really interested to implement this structure on MongoDB without introducing new technologies.
My functional requirements - I need to have the ability to quickly get the position(and position offset) for certain Product in the certain ProductGroup. Let's say P2 position and offset for ProductGroup1. The output should be:
position: 1
offset : +2
Also, I'd like to visualize the graphics and show the historical changes of positions for a certain Product with a certain ProductGroup. For example for Product P2 in ProductGroup1 the output should be:
1(+2), 3(-3), 0
Is it possible to implement with MongoDB and if so, could you please describe the MongoDB collection(s) structure in order to support this?
Since the only limitation is to "quickly query the data as I described at my question", the simplest way is to have a collection of snapshots with an array of products:
db.snapshots.insert({
group: "group 1",
products:[
{id:"P2", position:0, offset:0},
{id:"P4", position:1, offset:0},
{id:"P5", position:2, offset:0},
{id:"P6", position:3, offset:0}
], ver:0
});
db.snapshots.insert({
group: "group 1",
products:[
{id:"P3", position:0, offset:0},
{id:"P5", position:1, offset:1},
{id:"P1", position:2, offset:0},
{id:"P2", position:3, offset:-3},
{id:"P4", position:4, offset:0}
], ver:1
});
The index would be
db.snapshots.createIndex(
{ group: 1, ver: -1, "products.id": 1 },
{ unique: true, partialFilterExpression: { "products.id": { $exists: true } } }
);
And the query to fetch current position of a product in the group ("P4" in "group 1" in the example):
db.snapshots.find(
{ group: "group 1" },
{ _id: 0, products: { $elemMatch: { id: "P4" } } }
).sort( { ver:-1 } ).limit(1)
A query to fetch historical data is almost the same:
db.snapshots.find(
{ group: "group 1" },
{ _id: 0, products: { $elemMatch: {id: "P4" } }, ver: 1 }
).sort({ver:-1})

How to handle large data sets in MongoDB

I need help in deciding which schema type is more appropriate for my mongodb collection.
Let's say I want to store a list of things a person have. There will be relatively small number of people, but one person can have very many things. Let's assume people will be count in hundreds, but things a person own in hundreds of thousands.
I can think of two options:
Option 1:
[{
id: 1,
name: "Tom",
things: [
{
name: 'red tie',
weight: 0.3,
value: 5
},
{
name: 'carpet',
weight: 15,
value: 700
} //... and 300'000 other things
]
},
{
id: 2,
name: "Rob",
things: [
{
name: 'can of olives',
weight: 0.4,
value: 2
},
{
name: 'Porsche',
weight: 1500,
value: 40000
}// and 170'000 other things
]
}//and 214 oher people]
]
Option 2:
[
{
name: 'red tie',
weight: 0.3,
value: 5,
owner: {
name: 'Tom',
id: 1
}
},
{
name: 'carpet',
weight: 15,
value: 700,
owner: {
name: 'Tom',
id: 1
}
},
{
name: 'can of olives',
weight: 0.4,
value: 2,
owner: {
name: 'Rob',
id: 2
}
},
{
name: 'Porsche',
weight: 1500,
value: 40000,
owner: {
name: 'Rob',
id: 2
}
}// and 20'000'000 other things
];
I will only ask for things from one owner in a single request and never ask for things from multiple owners.
I will need a pagination for the returned list of things so...
... things will need to be sorted by one of the parameters
From what I understand the first point suggest it would be much more efficient to use Option 1 (querying only few hundreds documents instead of millions), but points 2 and 3 are handled much more easily when using Option 2 (limit, skip and sort methods instead of $slice projection and Aggregation Framework).
Can anybody tell me which way would be more suitable? Or maybe I've got something wrong and there's even better solution?
I will only ask for things from one owner in a single request and never ask for things from multiple owners.
I will need a pagination for the returned list of things so...
things will need to be sorted by one of the parameters
Your requirements 2 and 3 would be fulfilled much better by creating a collection where each item is an individual document. With an array, you would have to use the aggregation framework to $unwind that array, which can become quite slow. Your first requirement can easily be optimized for by creating an index on the owner.name or owner.id field of said collection, depending on which you use for querying.
Also, MongoDB does not handle growing documents very well. To discourage users from creating indefinitely growing documents, MongoDB has a 16MB per document limit. When each of your items is a few hundred byte, hundreds of thousands of array entries would exceed that limit.

Is there a MongoDB maximum bson size work around?

The document I am working on is extremely large. It collects user input from an extremely long survey (like survey monkey) and stores the answers in a mongodb database.
I am unsurprisingly getting the following error
Error: Document exceeds maximal allowed bson size of 16777216 bytes
If I cannot change the fields in my document is there anything I can do? Is there some way to compress down the document, by removing white space or something like that?
Edit
Here is the structure of the document
Schema({
id : { type: Number, required: true },
created: { type: Date, default: Date.now },
last_modified: { type: Date, default: Date.now },
data : { type: Schema.Types.Mixed, required: true }
});
An example of the data field:
{
id: 65,
question: {
test: "some questions",
answers: [2,5,6]
}
// there could be thousands of these question objects
}
One thing you can do is to build your own mongoDB :-). Mongodb is an open source and the limitation about the size of a document is rather arbitrary to enforce a better schema design. You can just modify this line and build it for yourself. Be careful with this.
The most straight forward idea is to have each small question in a different document with a field which reference to its parent.
Another idea is to limit number of documents in the parent. Lets say you limit is N elements then the parent looks like this:
{
_id : ObjectId(),
id : { type: Number, required: true },
created: { type: Date, default: Date.now }, // you can store it only for the first element
last_modified: { type: Date, default: Date.now }, // the same here
data : [{
id: 65,
question: {
test: "some questions",
answers: [2,5,6]
}
}, ... up to N of such things {}
]
}
This way modifying number N you can make sure that you will be in 16 MB of BSON. And in order to read the whole survey you can select
db.coll.find({id: the Id you need}) and then combine the whole survey on the application level. Also do not forget to ensureIndex on id.
Try different things, do a benchmark on your data and see what works for you.
You should be using gridfs. It allows you to store documents in chunks. Here's the link: http://docs.mongodb.org/manual/reference/gridfs/