MongoDB-design for revisioned data - mongodb

There are many articles and SO questions about MongoDB data-model for storing old revisions of documents.
However, I found nothing satisfying one of my requirements; I need to be able to retroactively query the database to unambiguously find all documents that matched an arbitrary criteria for a given point in time.
To clarify, I need to be able to efficiently answer the question;
"Which documents (and preferably versions) matched criteria {X:Y...} at time T".
Pseudocode:
/* Would match a version that were active from 2010 - 2016-05-01 with zipcode 12345 */
db.my_objs.find({zipcode: "12345", ~time: ISODate("2016-01-01 22:14:31.003")~})
I haven't managed to find any solution, neither on google nor myself. I have tried;
Having a simple "from"-timestamp on data, and then select "the first item before my queried timepoint, that also matches other criteria", but I have not managed to express that in Mongo.
Having a from/to on each version, and whenever I write a new version, update "to" on the previous version to match from on the new version. However, I have not found a way to do this atomically or with eventual consistency, meaning multiple updates could wreak havoc and create ambiguous timelines. (Double entries for the same timepoint)
Any ideas?
edit
an undesirable example query for #1
db.my_objs.find({
data : {
$elemMatch : {
from : {
$lte : ISODate('2015-01-01')
}
}
}
}, {
"data.$" : 1
}).forEach(function (obj) {
    if(obj.data[0].state == 'active') {
printjson(registrar)
}
})–

aggregation framework and $unwind phase which transforms array into single document so we can create sophisticated $match condition
Example Document
{
"_id" : ObjectId("577275589ea91b3799341aba"),
"title" : "Test of design",
"firstCreated" : ISODate("2016-06-28T13:02:16.156Z"),
"lastUpdated" : ISODate("2016-06-28T13:02:16.156Z"),
"firstAuthor" : "profesor79",
"lastAuthor" : "Rawler",
"versions" : [{
"versionId" : 1.0,
"dateCreated" : ISODate("2015-10-10T00:00:00.000Z"),
"datePublished" : ISODate("2015-10-12T00:00:00.000Z"),
"isActive" : false,
"documnetPayload" : {
"a" : 1.0,
"b" : 2.0,
"c" : 3.0
}
}, {
"versionId" : 2.0,
"dateCreated" : ISODate("2015-12-10T00:00:00.000Z"),
"datePublished" : ISODate("2015-12-31T00:00:00.000Z"),
"isActive" : true,
"documnetPayload" : {
"a" : 1.0,
"b" : 3.0,
"c" : 30.0
}
}, {
"versionId" : 3.0,
"dateCreated" : ISODate("2016-01-31T00:00:00.000Z"),
"datePublished" : ISODate("2016-02-21T00:00:00.000Z"),
"isActive" : true,
"documnetPayload" : {
"a" : 11.0,
"b" : 3.0,
"c" : 31.0
}
}
]
}
Aggregation framework example
db.rawler.aggregate([{
$match : {
"_id" : ObjectId("577275589ea91b3799341aba")
}
}, {
$unwind : "$versions"
}, {
$match : {
$and : [{
"versions.dateCreated" : {
$gt : ISODate("2015-10-10T00:00:00.000Z")
}
}, {
"versions.dateCreated" : {
$lte : ISODate("2016-01-30T00:00:00.000Z")
}
}
],
"versions.datePublished" : {
$gt : new Date("2015-10-13T00:00:00.000")
},
// "versions.versionId" :{$in:[1,3,4,5]},
}
}, {
$sort : {
"versions.dateCreated" : -1
}
},
])

Related

Spring Data - MongoDB comparing two fields in the same document after aggregation

I am new to MongoDB and am attempting a query using Spring Boot Data Mongo Templates. Below is the sample data that I’m using for this application:
{
"book" : {
"isbn" : "ABCD1234",
"publisher" : "Penguin",
"dateCheckedOutLast" : "2019-12-22",
"library" : "Pickwah"
},
"isLost" : false,
},
{
"book" : {
"isbn" : "ABCD1234",
"publisher" : "Penguin",
"dateCheckedOutLast" : "2018-12-22",
"library" : "BlueRidge"
},
"isLost" : false,
},
{
"book" : {
"isbn" : "DECF1234",
"publisher" : "Marvel",
"dateCheckedOutLast" : "2019-07-22",
"library" : "Pickwah"
},
"isLost" : false
},
{
"book" : {
"isbn" : "DECF1234",
"publisher" : "Marvel",
"dateCheckedOutLast" : "2020-01-07",
"library" : "BlueRidge"
},
"isLost" : false
}
I would like the query to return all the books in BlueRidge library such that the dateCheckedOutLast at BlueRidge library is greater than the dateCheckedOutLast at Pickwah library. The association between the books in the collection is the isbn attribute which uniquely identifies the books.
I have attempted the following code (BookData is the name of the Mongo Collection), it appears that when I try to compare the two date fields (dateCheckedOutLast) after the lookup, it fails.
Aggregation agg = newAggregation(
match(Criteria.where("book.library").is("BlueRidge")),
lookup("BookData", "book.isbn”, " book.isbn" , "anotherLib"),
unwind("anotherLib"),
match(Criteria.where("anotherLib.book.library").is("Pickwah")),
match(Criteria.where("book.dateCheckedOutLast")
.gt("anotherLib.book.dateCheckedOutLast"));
The correct query output should return 1 document (given the sample data) with "isbn = DECF1234".
Any feedback is appreciated. Thank you!

MongoDB Conditional validation on arrays and embedded documents

I have a number of documents in my database where I am applying document validation. All of these documents may have embedded documents. I can apply simple validation along the lines of SQL non NULL checks (these are essentially enforcing the primary key constraints) but what I would like to do is apply some sort of conditional validation to the optional arrays and embedded documents. By example, lets say I have a document that looks like this:
{
"date": <<insertion date>>,
"name" : <<the portfolio name>>,
"assets" : << amount of money we have to trade with>>
}
Clearly I can put validation on this document to ensure that date name and assets all exist at insertion time. Lets say, however, that I'm managing a stock portfolio and the document can have future updates to show an array of stocks like this:
{
"date" : <<insertion date>>,
"name" : <<the portfolio name>>,
"assets" : << amount of money we have to trade with>>
"portfolio" : [
{ "stockName" : "IBM",
"pricePaid" : 155.39,
"sharesHeld" : 100
},
{ "stockName" : "Microsoft",
"pricePaid" : 57.22,
"sharesHeld" : 250
}
]
}
Is it possible to to apply a conditional validation to this array of sub documents? It's valid for the portfolio to not be there but if it is each document in the array must contain the three fields "stockName", "pricePaid" and "sharesHeld".
MongoShell
db.createCollection("collectionname",
{
validator: {
$or: [
{
"portfolio": {
$exists: false
}
},
{
$and: [
{
"portfolio": {
$exists: true
}
},
{
"portfolio.stockName": {
$type: "string",
$exists: true
}
},
{
"portfolio.pricePaid": {
$type: "double",
$exists: true
}
},
{
"portfolio.sharesHeld": {
$type: "double",
$exists: true
}
}
]
}
]
}
})
With this above validation in place you can insert documents with or without portfolio.
After executing the validator in shell, then you can insert data of following
db.collectionname.insert({
"_id" : ObjectId("58061aac8812662c9ae1b479"),
"date" : ISODate("2016-10-18T12:50:52.372Z"),
"name" : "B",
"assets" : 200
})
db.collectionname.insert({
"_id" : ObjectId("58061ab48812662c9ae1b47a"),
"date" : ISODate("2016-10-18T12:51:00.747Z"),
"name" : "A",
"assets" : 100,
"portfolio" : [
{
"stockName" : "Microsoft",
"pricePaid" : 57.22,
"sharesHeld" : 250
}
]
})
If we try to insert a document like this
db.collectionname.insert({
"date" : new Date(),
"name" : "A",
"assets" : 100,
"portfolio" : [
{ "stockName" : "IBM",
"sharesHeld" : 100
}
]
})
then we will get the below error message
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
Using Mongoose
Yes it can be done, Based on your scenario you may need to initialize the parent and the child schema.
Shown below would be a sample of child(portfolio) schema in mongoose.
var mongoose = require('mongoose');
var Schema = mongoose.Schema;
var portfolioSchema = new Schema({
"stockName" : { type : String, required : true },
"pricePaid" : { type : Number, required : true },
"sharesHeld" : { type : Number, required : true },
}
References:
http://mongoosejs.com/docs/guide.html
http://mongoosejs.com/docs/subdocs.html
Can I require an attribute to be set in a mongodb collection? (not null)
Hope it Helps!

Mongo query to return distinct count, large documents

I need to be able to get a count of distinct 'transactions' the problem I'm having is that using .distinct() comes back with an error because the documents too large.
I'm not familiar with aggregation either.
I need to be able to group it by 'agencyID' as you see below there are 2 different agencyID's
I need to be able to count transactions where the agencyID is 01721487 etc
db.myCollection.distinct("bookings.transactions").length
this doesn't work as I need to be able to group by agencyID and if there are too many results I get an error saying it's too large.
{
"_id" : ObjectId("5624a610a6e6b53b158b4744"),
"agencyID" : "01721487",
"paxID" : "-530189664",
"bookings" : [
{
"bookingID" : "24232",
"transactions" : [
{
"tranID" : "001",
"invoices" : [
{
"invNum" : "1312",
"type" : "r",
"inv_date" : "20150723",
"inv_time" : "0953",
"inv_val" : -300
}
],
"tranType" : "Fee",
"tranDate" : "20150723",
"tranTime" : "0952",
"opCode" : "admin",
"udf_1" : "j s"
}
],
"acctID" : "acct11",
"agt_id" : "xy"
}
],
"title" : "",
"firstname" : "",
"surname" : "f bar"
}
I've also tried this but it didn't work for me.
thank you for text data -
this is something you could play with:
db.kieron.aggregate([{
$unwind : "$bookings"
}, {
$match : {
"bookings.transactions" : {
$exists : true,
$not : {
$size : 0
}
}
}
}, {
$group : {
_id : "$agencyID",
count : {
$sum : {
$size : "$bookings.transactions"
}
}
}
}
])
as there is nested array we need to unwind it first, and then we can check size of inner array.
Happy reporting!

how to update property in nested mongo document

I want to update a particular property in a nested mongo document
{
"_id" : ObjectId("55af76e60b0e4b318ba822ec"),
"make" : "MERCEDES-BENZ",
"model" : "E-CLASS",
"variant" : "E 250 CDI CLASSIC",
"fuel" : "Diesel",
"cc" : 2143,
"seatingCapacity" : 5,
"variant_+_fuel" : "E 250 CDI CLASSIC (Diesel)",
"make_+_model_+_variant_+_fuel" : "MERCEDES-BENZ E-CLASS E 250 CDI CLASSIC (Diesel)",
"dropdown_display" : "E-CLASS E 250 CDI CLASSIC (Diesel)",
"vehicleSegment" : "HIGH END CARS",
"abc" : {
"variantId" : 1000815,
"makeId" : 1000016,
"modelId" : 1000556,
"fuelId" : 2,
"segmentId" : 1000002,
"price" : 4020000
},
"def" : {
"bodyType" : 1,
"makeId" : 87,
"modelId" : 21584,
"fuel" : "DIESEL",
"vehicleSegmentType" : "E2"
},
"isActive" : false
}
This is my document. If I want to add or update a value for key "nonPreferred" inside "abc", how do I go about it?
I tried it with this query:
db.FourWheelerMaster.update(
{ "abc.modelId": 1000556 },
{
$Set: {
"abc": {
"nonPreferred": ["Mumbai", "Pune"]
}
}
},
{multi:true}
)
but it updates the whole "abc" structure, removed all key:values inside it and kept only newly inserted key values like below
"abc" : {
"nonPreferred" : [
"Mumbai",
"Pune"
]
},
Can anyone tell me how to update only particular property inside it and not all the complete key?
Instead of using the $set operator, you need to push that array using the $push operator together with the $each modifier to append each element of the value separately as follows:
db.FourWheelerMaster.update(
{ "abc.modelId": 1000556 },
{
"$push": {
"abc.nonPreferred": {
"$each": ["Mumbai", "Pune"]
}
}
},
{ "multi": true }
)

Get specific object in array of array in MongoDB

I need get a specific object in array of array in MongoDB.
I need get only the task object = [_id = ObjectId("543429a2cb38b1d83c3ff2c2")].
My document (projects):
{
"_id" : ObjectId("543428c2cb38b1d83c3ff2bd"),
"name" : "new project",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"members" : [
ObjectId("5424ac37eb0ea85d4c921f8b")
],
"US" : [
{
"_id" : ObjectId("5434297fcb38b1d83c3ff2c0"),
"name" : "Test Story",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"tasks" : [
{
"_id" : ObjectId("54342987cb38b1d83c3ff2c1"),
"name" : "teste3",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
},
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
]
}
]
}
Result expected:
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
But i not getting it.
I still testing with no success:
db.projects.find({
"US.tasks._id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}, { "US.tasks.$" : 1 })
I tryed with $elemMatch too, but return nothing.
db.projects.find({
"US" : {
"tasks" : {
$elemMatch : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}
}
}
})
Can i get ONLY my result expected using find()? If not, what and how use?
Thanks!
You will need an aggregation for that:
db.projects.aggregate([{$unwind:"$US"},
{$unwind:"$US.tasks"},
{$match:{"US.tasks._id":ObjectId("543429a2cb38b1d83c3ff2c2")}},
{$project:{_id:0,"task":"$US.tasks"}}])
should return
{ task : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
Explanation:
$unwind creates a new (virtual) document for each array element
$match is the query part of your find
$project is similar as to project part in find i.e. it specifies the fields you want to get in the results
You might want to add a second $match before the $unwind if you know the document you are searching (look at performance metrics).
Edit: added a second $unwind since US is an array.
Don't know what you are doing (so realy can't tell and just sugesting) but you might want to examine if your schema (and mongodb) is ideal for your task because the document looks just like denormalized relational data probably a relational database would be better for you.