Mongodb large document design

Mongodb large document design - mongodb

I plan to create a database for price history.
The history database should store prices defined 90 days in advance each day in a year.
That means: 90 days x 365 days/year = 32850 database item
Is there any way to design schema to improve query performance ?
my first suggestion was hierarchical store values like:
{
"Address": "xxxxx",
"City": "xxxxx",
"Country": "Deutschland",
"Currency": "EUR",
"Item_Name": "xxxxxx",
"Location": [
log, lat
],
"Postal_code": "xxxx",
"Price_History": [
2014 : [
"January" : {
"CW_1" : { 1: [ price1 .. price90 ], 2: [ price1 .. price90 ], },
"CW_2" : {},
"CW_3" : {},
} ,
"February" : {},
"March" : {},
]
]
}
Thank you in advance!

It all depends on which queries you are planning to run against this data. It seems to me that if you are interested in keeping a history of actions, then your queries will almost always contain a date parameter.
The Price_History array might be better formatted as sub document. Each of these documents would have a varied (but limited) range of values - the year and the month. It might be a good idea to add an index on that attribute. This way, whenever you query by a certain date range, your indexes will assist mongo to find the relevant dataset relatively quickly.
Another option would be to have each price in-itself as a document. The item connected to the price could be a sub-document perhaps not containing all of the item data, but enough to be able to make the calculations and fetch the other relevant data once your dataset is small enough. For this usage, I would recommend creating a single attribute of the date ranges to be indexed and also an index on the item._id attribute. You can still have the individual date components if you still need to query them individually. Something like this:
{
"ind_attr": "2014_January_CW1",
"date": {
"year": 2014,
"month": January",
},
"CW": 1,
"price": [ price1... price90 ],
"item": {
"name": ...,
"_id": ...,
// minimal data about the actual item
}
}
With this document structure, you could easily add an index on the ind_attr attribute. The document.item._id attribute can be used to retrieve more detailed data on the actual item if needed.

Related

Is updating Embedded Documents in MongoDB a Manual process?

I am not overly familiar with Mongodb yet , but I have a question about embedded documents.
I have seen a number of posts which show you how to update embedded documents through some update query.
My question is this: If I have a collection with embedded documents - which is denormalised for performance ; and one of the embedded documents changes, then do I need to manually update all the embedded documents or is there some way of specifying the link in MongoDB to Auto-Update?
For Example:
An Order record might look like the structure below. Note there is a Product item in one of the rows.
Lets say the ItemName field changed to "Product1a" in the product from a different collection and I want to update the product in every single order where this exists. Is that a manual process - or is there a way od setting it up in Mongodb to auto-update embedded documents?
{
"id": "ccc1beb1-e022-11e9-97f0-e7e789106ab2",
"type": "order",
"orderNumber": "ORD-100209857x",
"orderDate": "2019-09-26T17:42:31.000+12:00",
"orderItems": [
{
"discount": 0,
"price": 24.4944,
"product": {
"id": "ccc1beb1-e022-11e9-97f0-e7e789106ab2",
"itemNumber": "prd1",
"itemName": "Product1"
},
"qty": 4,
"rowTotal": 97.96,
"taxAmount": 9.8
},
{
"discount": 0,
"price": 3.21,
"itemName": "Shipping",
"qty": 1,
"rowTotal": 3.21,
"taxAmount": 0
}
]
}

Not sure what you mean by manual process, but here is some sample code to update all the documents
db.collection.updateMany({}, {$set:{"orderItems.product.itemName": "updatedProductName"}})
Let me know if this is not what you are looking for.

MongoDB document setup and aggregation

I'm pretty new to MongoDB and while preparing data to be consumed I got into Aggregation... what a powerful little thing this database has! I got really excited and started to test some things :)
I'm saving time entries for a companyId and employeeId ... that can have many entries... those are normally sorted by date, but one date can have several entries (multiple registrations in the same day)
I'm trying to come up with a good schema so I could easily get my data exactly how I need and as a newbie, I would rather ask for guidance and check if I'm in the right path
my output should be as
[{
"company": "474A5D39-C87F-440C-BE99-D441371BF88C",
"employee": "BA75621E-5D46-4487-8C9F-C0CE0B2A7DE2",
"name": "Bruno Alexandre":
"registrations": [{
"id": 1448364,
"spanned": false,
"spannedDay": 0,
"date": "2019-01-17",
"timeStart": "09:00:00",
"timeEnd": "12:00:00",
"amount": {
"days": 0.4,
"hours": 2,
"km": null,
"unit": "days and hours",
"normHours": 5
},
"dateDetails": {
"week": 3,
"weekDay": 4,
"weekDayEnglish": "Thursday",
"holiday": false
},
"jobCode": {
"id": null,
"isPayroll": true,
"isFlex": false
},
"payroll": {
"guid": null
},
"type": "Sick",
"subType": "Sick",
"status": "APP",
"reason": "IS",
"group": "LeaveAndAbsence",
"note": null,
"createdTimeStamp": "2019-01-17T15:53:55.423Z"
}, /* more date entries */ ]
}, /* other employees */ ]
what is the best way to add the data into a collection?
Is it more efficient if I create a document per company/employee and add all registration entries inside that document (it could get really big as time passes)... or is it better to have one document per company/employee/date and add all daily events in that document instead?
regarding aggregation, I'm still new to all this, but I'm imagining I could simply call
RegistrationsModel.aggregate([
{
$match: {
date: { $gte: new Date('2019-01-01'), $lte: new Date('2019-01-31') },
company: '474A5D39-C87F-440C-BE99-D441371BF88C'
}
},
{
$group: {
_id: '$employee',
name: { '$first': '$name' }
}
},
{
// ... get all registrations as an Array ...
},
{
$sort: {
'registrations.date': -1
}
}
]);
P.S. I'm taken the Aggregation course to start familiarized with all of it

Is it more efficient if I create a document per company/employee and
add all registration entries inside that document (it could get really
big as time passes)... or is it better to have one document per
company/employee/date and add all daily events in that document
instead?
From what I understand of document oriented databases, I would say the aim is to have all the data you need, in a specific context, grouped inside one document.
So what you need to do is identify what data you're going to need (getting close to the features you want to implement) and build your data structure according to that. Be sure to identify future features, cause the more you prepare your data structure to it, the less it will be tricky to scale your database to your needs.
Your aggregation query looks ok !

Should I use selector or views in Cloudant?

I'm having confusion about whether to use selector or views, or both, when try to get a result from the following scenario:
I need to do a wildsearch for a book and return the result of the books plus the price and the details of the store branch name.
So I tried using selector to do wildsearch using regex
"selector": {
"_id": {
"$gt": null
},
"type":"product",
"product_name": {
"$regex":"(?i)"+search
}
},
"fields": [
"_id",
"_rev",
"product_name"
]
I am able to get the result. The idea after getting the result is to use all the _id's from the result set and query to views to get more details like price and store branch name on other documents, which I feel is kind of odd and I'm not certain is that the correct way to do it.
Below is just the idea once I get the result of _id's and insert it as a "productId" variable.
var input = {
method : 'GET',
returnedContentType : 'json',
path : 'test/_design/app/_view/find_price'+"?keys=[\""+productId+"\"]",
};
return WL.Server.invokeHttp(input);
so I'm asking for input from an expert regarding this.
Another question is how to get the store_branch_name? Can it be done in a single view where we can get the product detail, prices and store branch name? Or do I need to have several views to achieve this?
expected result
product_name (from book document) : Book 1
branch_name (from branch array in Store document) : store 1 branch one
price ( from relationship document) : 79.9
References:
Book
"_id": "book1",
"_rev": "1...b",
"product_name": "Book 1",
"type": "book"
"_id": "book2",
"_rev": "1...b",
"product_name": "Book 2 etc",
"type": "book"
relationship
"_id": "c...5",
"_rev": "3...",
"type": "relationship",
"product_id": "book1",
"store_branch_id": "Store1_branch1",
"price": "79.9"
Store
{
"_id": "store1",
"_rev": "1...2",
"store_name": "Store 1 Name",
"type": "stores",
"branch": [
{
"branch_id": "store1_branch1",
"branch_name": "store 1 branch one",
"address": {
"street": "some address",
"postalcode": "33490",
"type": "addresses"
},
"geolocation": {
"coordinates": [
42.34493,
-71.093232
],
"type": "point"
},
"type": "storebranch"
},
{
"branch_id": "store1_branch2",
"branch_name":
**details ommit...**
}
]
}

In Cloudant Query, you can specify two different kinds of indexes, and it's important to know the differences between the two.
For the first part of your question, if you're using Cloudant Query's $regex operator for wildcard searches like that, you might be better off creating a Cloudant Query index of type "text" instead of type "json". It's in the Cloudant docs, but see the intro blog post for details: https://cloudant.com/blog/cloudant-query-grows-up-to-handle-ad-hoc-queries/ There's a more advanced post on this that covers the tradeoffs between the two types of indexes https://cloudant.com/blog/mango-json-vs-text-indexes/
It's harder to address the second part of your question without understanding how your application interacts with your data, but there are a couple pieces of advice.
1) Consider denormalizing some of this information so you're not doing the JOINs to begin with.
2) Inject more logic into your document keys, and use the traditional MapReduce View indexing system to emit a compound key (an array), that you can use to emulate a JOIN by taking advantage of the CouchDB/Cloudant index sorting rules.
That second one's a mouthful, but check out this example on YouTube: https://youtu.be/0al1KnCKjlA?t=23m39s
Here's a preview (example map function) of what I'm talking about:
'map' : function(doc)
{
if (doc.type==="user") {
emit( [doc._id], null );
}
else if (doc.type==="edge:follower") {
emit( [doc.user, doc.follows], {"_id":doc.follows} );
}
}
The resulting secondary index here would take advantage of the rules outlined in http://wiki.apache.org/couchdb/View_collation -- that strings sort before arrays, and arrays sort before objects. You could then issue range queries to emulate the results you'd get with a JOIN.
I think that's as much detail that's appropriate for here. Hope it helps!

Storing clicks and impressions (for banners) in a Mongodb document?

I am new to Mongodb and I have a SQL background.
So my app records the number of clicks and impressions for banners and I have decided to store all this into a single document per banner which looks like this:
{
"_id":ObjectId('534b45b9b6d966a8010002323'),
"active": true,
"banner_end": ISODate("2015-06-05T23:59:59.0Z"),
"banner_name": "Cool banner",
"banner_position": "bottom",
"banner_url": "http:\/\/www.google.com",
"banner_image":"http:\/\/www.google.com/pic.jpg",
"click_details": [
{
"date": ISODate("2014-04-14T02:29:22.961Z"),
"ip": "::1"
}
],
"clicks": NumberInt(1),
"impression_details": [
{
"date": ISODate("2014-04-14T02:28:41.353Z"),
"ip": "::1"
},
{
"date": ISODate("2014-04-14T02:28:53.52Z"),
"ip": "::1"
}
],
"impressions": NumberInt(2)
}
Obviously, as time goes by, the array of click_details and impression_details will increase (especially the impressions). I was wondering if I am doing this correctly? Or should I store the click_details and impression_detail onto a separate collection?
I will need click_detail and impression_detail later to plot graphs.
Many thanks

There is nothing wrong with this approach and moreover the sub-document has a limit of 16 MB in Mongo which will store many records for you.
Can you also highlight the number of users which you get on your site and expected number of impressions / clicks for a banner.
P.S. You can save lot of space by aliasing your columns in JSON, e.g. banner_name could be written as bn_nm and so on.

Mongodb time series query

I'm new to MongoDB coming from a relational world. Any help would be greatly appreciated. I have a list of patients with the following attributes:
{"id": 101,
"demographics": {"sex": "male","dob": "1/1/1984"},
"hospital_visits": [
{ "date": "1/1/2012",
"diagnosis": ["diabetes","fractured hip"]
},
{ "date": "3/1/2012",
"treatment": ["hip replacement"],
"outcome": "normal discharge"
},
{ "date": "5/1/2012",
"diagnosis": ["hip infection"],
"outcome": "inpatient admission"
}
]
},
{"id": 102,
"hospital_visits": [
{ "date": "1/1/2013",
"diagnosis": ["fractured hip"]
},
{ "date": "3/1/2013",
"treatment": ["hip replacement"],
"outcome": "normal discharge"
}
]
}
Now, if I want to find out the number of patients that had a "fractured hip" as a diagnosis at some point (in one of the hospital visits), had a "hip replacement" treatment done subsequently at a later date (not in the same hospital visit), and within 3 months from the day of "hip replacement" treatment had an "inpatient admission" as an outcome, how do I frame my MongoDB query? Going by this logic, in the given example data, clearly patient 101 is a hit while patient 102 is not, so the count is 1. But just how do I formulate this query in MongoDB? Any ideas? Would it be easier if I change the document structure to address this question?
Many thanks!

One option can be:
If these queries are in considerable in number, then, whenever you update the document check for the condition that you want, and then store a suitable variable ( say some Boolean ) and you can index that variable.
The count will return very fast in that case.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse