Mongodb time series query - mongodb

I'm new to MongoDB coming from a relational world. Any help would be greatly appreciated. I have a list of patients with the following attributes:
{"id": 101,
"demographics": {"sex": "male","dob": "1/1/1984"},
"hospital_visits": [
{ "date": "1/1/2012",
"diagnosis": ["diabetes","fractured hip"]
},
{ "date": "3/1/2012",
"treatment": ["hip replacement"],
"outcome": "normal discharge"
},
{ "date": "5/1/2012",
"diagnosis": ["hip infection"],
"outcome": "inpatient admission"
}
]
},
{"id": 102,
"hospital_visits": [
{ "date": "1/1/2013",
"diagnosis": ["fractured hip"]
},
{ "date": "3/1/2013",
"treatment": ["hip replacement"],
"outcome": "normal discharge"
}
]
}
Now, if I want to find out the number of patients that had a "fractured hip" as a diagnosis at some point (in one of the hospital visits), had a "hip replacement" treatment done subsequently at a later date (not in the same hospital visit), and within 3 months from the day of "hip replacement" treatment had an "inpatient admission" as an outcome, how do I frame my MongoDB query? Going by this logic, in the given example data, clearly patient 101 is a hit while patient 102 is not, so the count is 1. But just how do I formulate this query in MongoDB? Any ideas? Would it be easier if I change the document structure to address this question?
Many thanks!

One option can be:
If these queries are in considerable in number, then, whenever you update the document check for the condition that you want, and then store a suitable variable ( say some Boolean ) and you can index that variable.
The count will return very fast in that case.

Related

which is the best way to implement: two separate array or array in array?

I need an advice!
I have an array of objects with many students data(more then 200 students). So, now, i want to implement lesson for this students, so, every day, i will push an array with data inside every array of students. Later i will work with all lesson data!
So my question is:
a) Is the best way to push array inside of every students array?
b) Or make another array with unique _id, and later filter lesson by students _id?
So, i'm looking for performance and speed...
As I understood from your architecture, a suitable option will be moving lessons to a separate collection and storing lesson._id in students[].lessons. You can reach it by using ref property in your mongoose schema.
Here's the example:
lessons collection data:
[
{
"_id": ObjectId("5a934e000102030405000001"),
"name": "First lesson"
},
{
"_id": ObjectId("5a934e000102030405000002"),
"name": "Second lesson"
}
]
groups collection data:
[
{
"_id": ObjectId("5a934e000102030405000003"),
"name": "Group 1",
"students": [
{
"_id": ObjectId("5a934e000102030405000004"),
"name": "John",
"lessons": [ObjectId("5a934e000102030405000001")]
},
{
"_id": ObjectId("5a934e000102030405000005"),
"name": "James",
"lessons": [ObjectId("5a934e000102030405000001"), ObjectId("5a934e000102030405000002")]
}
]
}
]
But I would also moved every student to separate students collection if it is possible (if you currently have students as array field).

Is updating Embedded Documents in MongoDB a Manual process?

I am not overly familiar with Mongodb yet , but I have a question about embedded documents.
I have seen a number of posts which show you how to update embedded documents through some update query.
My question is this: If I have a collection with embedded documents - which is denormalised for performance ; and one of the embedded documents changes, then do I need to manually update all the embedded documents or is there some way of specifying the link in MongoDB to Auto-Update?
For Example:
An Order record might look like the structure below. Note there is a Product item in one of the rows.
Lets say the ItemName field changed to "Product1a" in the product from a different collection and I want to update the product in every single order where this exists. Is that a manual process - or is there a way od setting it up in Mongodb to auto-update embedded documents?
{
"id": "ccc1beb1-e022-11e9-97f0-e7e789106ab2",
"type": "order",
"orderNumber": "ORD-100209857x",
"orderDate": "2019-09-26T17:42:31.000+12:00",
"orderItems": [
{
"discount": 0,
"price": 24.4944,
"product": {
"id": "ccc1beb1-e022-11e9-97f0-e7e789106ab2",
"itemNumber": "prd1",
"itemName": "Product1"
},
"qty": 4,
"rowTotal": 97.96,
"taxAmount": 9.8
},
{
"discount": 0,
"price": 3.21,
"itemName": "Shipping",
"qty": 1,
"rowTotal": 3.21,
"taxAmount": 0
}
]
}
Not sure what you mean by manual process, but here is some sample code to update all the documents
db.collection.updateMany({}, {$set:{"orderItems.product.itemName": "updatedProductName"}})
Let me know if this is not what you are looking for.

MongoDB document setup and aggregation

I'm pretty new to MongoDB and while preparing data to be consumed I got into Aggregation... what a powerful little thing this database has! I got really excited and started to test some things :)
I'm saving time entries for a companyId and employeeId ... that can have many entries... those are normally sorted by date, but one date can have several entries (multiple registrations in the same day)
I'm trying to come up with a good schema so I could easily get my data exactly how I need and as a newbie, I would rather ask for guidance and check if I'm in the right path
my output should be as
[{
"company": "474A5D39-C87F-440C-BE99-D441371BF88C",
"employee": "BA75621E-5D46-4487-8C9F-C0CE0B2A7DE2",
"name": "Bruno Alexandre":
"registrations": [{
"id": 1448364,
"spanned": false,
"spannedDay": 0,
"date": "2019-01-17",
"timeStart": "09:00:00",
"timeEnd": "12:00:00",
"amount": {
"days": 0.4,
"hours": 2,
"km": null,
"unit": "days and hours",
"normHours": 5
},
"dateDetails": {
"week": 3,
"weekDay": 4,
"weekDayEnglish": "Thursday",
"holiday": false
},
"jobCode": {
"id": null,
"isPayroll": true,
"isFlex": false
},
"payroll": {
"guid": null
},
"type": "Sick",
"subType": "Sick",
"status": "APP",
"reason": "IS",
"group": "LeaveAndAbsence",
"note": null,
"createdTimeStamp": "2019-01-17T15:53:55.423Z"
}, /* more date entries */ ]
}, /* other employees */ ]
what is the best way to add the data into a collection?
Is it more efficient if I create a document per company/employee and add all registration entries inside that document (it could get really big as time passes)... or is it better to have one document per company/employee/date and add all daily events in that document instead?
regarding aggregation, I'm still new to all this, but I'm imagining I could simply call
RegistrationsModel.aggregate([
{
$match: {
date: { $gte: new Date('2019-01-01'), $lte: new Date('2019-01-31') },
company: '474A5D39-C87F-440C-BE99-D441371BF88C'
}
},
{
$group: {
_id: '$employee',
name: { '$first': '$name' }
}
},
{
// ... get all registrations as an Array ...
},
{
$sort: {
'registrations.date': -1
}
}
]);
P.S. I'm taken the Aggregation course to start familiarized with all of it
Is it more efficient if I create a document per company/employee and
add all registration entries inside that document (it could get really
big as time passes)... or is it better to have one document per
company/employee/date and add all daily events in that document
instead?
From what I understand of document oriented databases, I would say the aim is to have all the data you need, in a specific context, grouped inside one document.
So what you need to do is identify what data you're going to need (getting close to the features you want to implement) and build your data structure according to that. Be sure to identify future features, cause the more you prepare your data structure to it, the less it will be tricky to scale your database to your needs.
Your aggregation query looks ok !

Should I use selector or views in Cloudant?

I'm having confusion about whether to use selector or views, or both, when try to get a result from the following scenario:
I need to do a wildsearch for a book and return the result of the books plus the price and the details of the store branch name.
So I tried using selector to do wildsearch using regex
"selector": {
"_id": {
"$gt": null
},
"type":"product",
"product_name": {
"$regex":"(?i)"+search
}
},
"fields": [
"_id",
"_rev",
"product_name"
]
I am able to get the result. The idea after getting the result is to use all the _id's from the result set and query to views to get more details like price and store branch name on other documents, which I feel is kind of odd and I'm not certain is that the correct way to do it.
Below is just the idea once I get the result of _id's and insert it as a "productId" variable.
var input = {
method : 'GET',
returnedContentType : 'json',
path : 'test/_design/app/_view/find_price'+"?keys=[\""+productId+"\"]",
};
return WL.Server.invokeHttp(input);
so I'm asking for input from an expert regarding this.
Another question is how to get the store_branch_name? Can it be done in a single view where we can get the product detail, prices and store branch name? Or do I need to have several views to achieve this?
expected result
product_name (from book document) : Book 1
branch_name (from branch array in Store document) : store 1 branch one
price ( from relationship document) : 79.9
References:
Book
"_id": "book1",
"_rev": "1...b",
"product_name": "Book 1",
"type": "book"
"_id": "book2",
"_rev": "1...b",
"product_name": "Book 2 etc",
"type": "book"
relationship
"_id": "c...5",
"_rev": "3...",
"type": "relationship",
"product_id": "book1",
"store_branch_id": "Store1_branch1",
"price": "79.9"
Store
{
"_id": "store1",
"_rev": "1...2",
"store_name": "Store 1 Name",
"type": "stores",
"branch": [
{
"branch_id": "store1_branch1",
"branch_name": "store 1 branch one",
"address": {
"street": "some address",
"postalcode": "33490",
"type": "addresses"
},
"geolocation": {
"coordinates": [
42.34493,
-71.093232
],
"type": "point"
},
"type": "storebranch"
},
{
"branch_id": "store1_branch2",
"branch_name":
**details ommit...**
}
]
}
In Cloudant Query, you can specify two different kinds of indexes, and it's important to know the differences between the two.
For the first part of your question, if you're using Cloudant Query's $regex operator for wildcard searches like that, you might be better off creating a Cloudant Query index of type "text" instead of type "json". It's in the Cloudant docs, but see the intro blog post for details: https://cloudant.com/blog/cloudant-query-grows-up-to-handle-ad-hoc-queries/ There's a more advanced post on this that covers the tradeoffs between the two types of indexes https://cloudant.com/blog/mango-json-vs-text-indexes/
It's harder to address the second part of your question without understanding how your application interacts with your data, but there are a couple pieces of advice.
1) Consider denormalizing some of this information so you're not doing the JOINs to begin with.
2) Inject more logic into your document keys, and use the traditional MapReduce View indexing system to emit a compound key (an array), that you can use to emulate a JOIN by taking advantage of the CouchDB/Cloudant index sorting rules.
That second one's a mouthful, but check out this example on YouTube: https://youtu.be/0al1KnCKjlA?t=23m39s
Here's a preview (example map function) of what I'm talking about:
'map' : function(doc)
{
if (doc.type==="user") {
emit( [doc._id], null );
}
else if (doc.type==="edge:follower") {
emit( [doc.user, doc.follows], {"_id":doc.follows} );
}
}
The resulting secondary index here would take advantage of the rules outlined in http://wiki.apache.org/couchdb/View_collation -- that strings sort before arrays, and arrays sort before objects. You could then issue range queries to emulate the results you'd get with a JOIN.
I think that's as much detail that's appropriate for here. Hope it helps!

Mongodb large document design

I plan to create a database for price history.
The history database should store prices defined 90 days in advance each day in a year.
That means: 90 days x 365 days/year = 32850 database item
Is there any way to design schema to improve query performance ?
my first suggestion was hierarchical store values like:
{
"Address": "xxxxx",
"City": "xxxxx",
"Country": "Deutschland",
"Currency": "EUR",
"Item_Name": "xxxxxx",
"Location": [
log, lat
],
"Postal_code": "xxxx",
"Price_History": [
2014 : [
"January" : {
"CW_1" : { 1: [ price1 .. price90 ], 2: [ price1 .. price90 ], },
"CW_2" : {},
"CW_3" : {},
} ,
"February" : {},
"March" : {},
]
]
}
Thank you in advance!
It all depends on which queries you are planning to run against this data. It seems to me that if you are interested in keeping a history of actions, then your queries will almost always contain a date parameter.
The Price_History array might be better formatted as sub document. Each of these documents would have a varied (but limited) range of values - the year and the month. It might be a good idea to add an index on that attribute. This way, whenever you query by a certain date range, your indexes will assist mongo to find the relevant dataset relatively quickly.
Another option would be to have each price in-itself as a document. The item connected to the price could be a sub-document perhaps not containing all of the item data, but enough to be able to make the calculations and fetch the other relevant data once your dataset is small enough. For this usage, I would recommend creating a single attribute of the date ranges to be indexed and also an index on the item._id attribute. You can still have the individual date components if you still need to query them individually. Something like this:
{
"ind_attr": "2014_January_CW1",
"date": {
"year": 2014,
"month": January",
},
"CW": 1,
"price": [ price1... price90 ],
"item": {
"name": ...,
"_id": ...,
// minimal data about the actual item
}
}
With this document structure, you could easily add an index on the ind_attr attribute. The document.item._id attribute can be used to retrieve more detailed data on the actual item if needed.