I am trying to find a mongoDB script which will look at a collection where there are multiple records of the same document and only provide me with the latest version of each document as a result set.
I cannot explain it in English any better than above but maybe this little SQL below might explain it further. I want each document by transaction_reference but only the latest dated version (object_creation_date).
select
t.transaction_reference,
t.transaction_date,
t.object_creation_date,
t.transaction_sale_value
from MyTable t
inner join (
select
transaction_reference,
max(object_creation_date) as MaxDate
from MyTable
group by transaction_reference
) tm
on t.transaction_reference = tm.transaction_reference
and t.object_creation_date = tm.MaxDat
The reason why there are multiple versions of the same document is because I want to store each iteration of a transaction. The first time I receive a document, it may be in transaction_status of UNPAID then I receive the same transaction again and this time the transaction_status is PAID.
Some analysis will be to SUM all unique transactions whereas some other analysis may be to measure the time distance between a document with status UNPAID and the next of PAID.
As per request, here are two documents:
{
"_id": {
"$oid": "579aa337f36d2808839a05e8"
},
"object_class": "Goods & Services Transaction",
"object_category": "Revenue",
"object_type": "Transaction",
"object_origin": "Sage One",
"object_origin_category": "Bookkeeping",
"object_creation_date": "2016-07-05T00:00:00.201Z",
"party_uuid": "dfa1e80a-5521-11e6-beb8-9e71128cae77",
"connection_uuid": "b945bd7c-7988-4d2a-92f5-8b50ab218e00",
"transaction_reference": "SI-1",
"transaction_status": "UNPAID",
"transaction_date": "2016-06-16T00:00:00.201Z",
"transaction_due_date": "2016-07-15T00:00:00.201Z",
"transaction_currency": "GBP",
"goods_and_services": [
{
"item_identifier": "PROD01",
"item_name": "Product One",
"item_quantity": 1,
"item_gross_unit_sale_value": 1800,
"item_revenue_category": "Sales Revenue",
"item_net_unit_cost_value": null,
"item_net_unit_sale_value": 1500,
"item_unit_tax_value": 300,
"item_net_total_sale_value": 1500,
"item_gross_total_sale_value": 1800,
"item_tax_value": 300
}
],
"transaction_gross_value": 1800,
"transaction_gross_curr_value": 1800,
"transaction_net_value": 1500,
"transaction_cost_value": null,
"transaction_payments_value": null,
"transaction_payment_extras_value": null,
"transaction_tax_value": 300,
"party": {
"customer": {
"customer_identifier": "11",
"customer_name": "KP"
}
}
}
and second version where it is paid now
{
"_id": {
"$oid": "579aa387f36d2808839a05ee"
},
"object_class": "Goods & Services Transaction",
"object_category": "Revenue",
"object_type": "Transaction",
"object_origin": "Sage One",
"object_origin_category": "Bookkeeping",
"object_creation_date": "2016-07-16T00:00:00.201Z",
"party_uuid": "dfa1e80a-5521-11e6-beb8-9e71128cae77",
"connection_uuid": "b945bd7c-7988-4d2a-92f5-8b50ab218e00",
"transaction_reference": "SI-1",
"transaction_status": "PAID",
"transaction_date": "2016-06-16T00:00:00.201Z",
"transaction_due_date": "2016-07-15T00:00:00.201Z",
"transaction_currency": "GBP",
"goods_and_services": [
{
"item_identifier": "PROD01",
"item_name": "Product One",
"item_quantity": 1,
"item_gross_unit_sale_value": 1800,
"item_revenue_category": "Sales Revenue",
"item_net_unit_cost_value": null,
"item_net_unit_sale_value": 1500,
"item_unit_tax_value": 300,
"item_net_total_sale_value": 1500,
"item_gross_total_sale_value": 1800,
"item_tax_value": 300
}
],
"transaction_gross_value": 1800,
"transaction_gross_curr_value": 1800,
"transaction_net_value": 1500,
"transaction_cost_value": null,
"transaction_payments_value": null,
"transaction_payment_extras_value": null,
"transaction_tax_value": 300,
"party": {
"customer": {
"customer_identifier": "11",
"customer_name": "KP"
}
}
}
Thanks for your support, Matt
If I understand the question correctly you could use something like this
db.getCollection('yourTransactionsCollection').aggregate([
{
$sort: {
"transaction_reference": 1,
"object_creation_date": -1
}
},
{
$group: {
_id: "$transaction_reference",
"transaction_date": { $first: "$transaction_date" },
"object_creation_date": { $first: "$transaction_date" },
"transaction_sale_value": { $first: "$transaction_sale_value" }
}
}
])
which outputs a result like the following
{
"_id" : "SI-1",
"transaction_date" : "2016-06-16T00:00:00.201Z",
"object_creation_date" : "2016-06-16T00:00:00.201Z",
"transaction_sale_value" : null
}
Note that you can change the $sort to just include the object_creation_date but I included both transaction_reference and object_creation_date as I think it would make sense to create a composite index on both of them instead of just the creation date. Adjust that according to your indexes so that the $sort will hit one.
In addition there was no document field transaction_sale_value hence the null for it in the result. Maybe you missed that or it is just not in your sample documents but I think you get the idea and can adjust it to your needs.
Related
Hello Good Developers,
I am facing a situation in MongoDB where I've JSON Data like this
[{
"id": "GLOBAL_EDUCATION",
"general_name": "GLOBAL_EDUCATION",
"display_name": "GLOBAL_EDUCATION",
"profile_section_id": 0,
"translated": [
{
"con_lang": "US-EN",
"country_code": "US",
"language_code": "EN",
"text": "What is the highest level of education you have completed?",
"hint": null
},
{
"con_lang": "US-ES",
"country_code": "US",
"language_code": "ES",
"text": "\u00bfCu\u00e1l es su nivel de educaci\u00f3n?",
"hint": null
}...
{
....
}
]
I am projecting result using the following query :
db.collection.find({
},
{
_id: 0,
id: 1,
general_name: 1,
translated: {
$elemMatch: {
con_lang: "US-EN"
}
}
})
here's a fiddle for the same: https://mongoplayground.net/p/I99ZXBfXIut
I want those records who don't match $elemMatch don't get returned at all.
In the fiddle output, you can see that the second item doesn't have translated attribute, In this case, I don't want the second Item at all to be returned.
I am using Laravel as Backend Tech, I can filter out those records using PHP, but there are lots of records returned, and I think filtering using PHP is not the best option.
You need to use $elemMatch in the first parameter
db.collection.find({
translated: {
$elemMatch: {
con_lang: "IT-EN"
}
}
})
MongoPlayground
I upgraded Wekan from 0.48 to 0.95. It looks like what happened in Mongo is that it took the checklist collection from one containing a nested list of items and split it out into a new checklistItems collection. It appears to have copied the data correctly- except that instead of copying each item's title, it copied the checklist title to each list.
I started with this in wekan.checklists:
{
"_id": "z329QEDfjsuQcxz7E",
"cardId": "TBgz6gMGCcn9XNPSW",
"title": "A list",
"sort": 0,
"createdAt": {
"$date": "2018-05-09T22:20:50.537Z"
},
"items": [
{
"_id": "z329QEDfjsuQcxz7E0",
"title": "Do some stuff",
"isFinished": false,
"sort": 0
},
{
"_id": "z329QEDfjsuQcxz7E1",
"title": "Do some other stuff",
"isFinished": false,
"sort": 1
}
],
"userId": "YndMrPQ5XhZTTKD2S"
}
and wound up with the following in wekan.checklistItems:
{
"_id": "RADPEu4nhr9PgwPHH",
"title": "A list",
"sort": 0,
"isFinished": false,
"checklistId": "z329QEDfjsuQcxz7E",
"cardId": "TBgz6gMGCcn9XNPSW"
}
{
"_id": "Guy3aaJL4WLJQjzRX",
"title": "A list",
"sort": 1,
"isFinished": false,
"checklistId": "z329QEDfjsuQcxz7E",
"cardId": "TBgz6gMGCcn9XNPSW"
}
and this in wekan.checklists:
{ "_id" : "z329QEDfjsuQcxz7E", "cardId" : "TBgz6gMGCcn9XNPSW", "title" : "MVP", "sort" : 0, "createdAt" : ISODate("2018-05-09T22:20:50.537Z"), "userId" : "YndMrPQ5XhZTTKD2S" }
Is there a quick query to go back through my original wekan.checklists and update the titles in wekan.checklistItems? I note that the checklistIDs stayed the same but the card id's are different- I can of course load the old wekan.checklists collection into my current (upgraded) db to query against.
Fix: load your old db.checklists into db.checklistsOld (I used mongoimport -d wekan -c checklistsOld ~/checklistsOld.bson, where checklistsOld.bson held my backup from before the upgrade. Use the following script in Robo3T:
db.checklistsOld.find({}, {"_id": 1, "items.title":1, "items.sort": 1 }).forEach( (list, i, lists) => {
var checklistId = list._id;
list.items.forEach( (item, j, items) => {
var sort = item.sort,
title = item.title;
db.checklistItems.update({"checklistId": checklistId, "sort":sort}, {$set: {"title": title}} );
});
});
Depending on how many items you have, you may need to adjust "shellTimeoutSec" in Robo3T (https://github.com/Studio3T/robomongo/wiki/Robomongo-Config-File-Guide)
As the title says, I need to retrieve the names of all the keys in my MongoDB collection, BUT I need them split up based on a key/value pair that each document has. Here's my clunky analogy: If you imagine the original collection is a zoo, I need a new collection that contains all the keys Zebras have, all the keys Lions have, and all the keys Giraffes have. The different animal types share many of the same keys, but those keys are meant to be specific to each type of animal (because the user needs to be able to (for example) search for Zebras taller than 3ft and giraffes shorter than 10ft).
Here's a bit of example code that I ran which worked well - it grabbed all the unique keys in my entire collection and threw them into their own collection:
db.runCommand({
"mapreduce" : "MyZoo",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "MyZoo" + "_keys"
})
I'd like a version of this command that would look through the MyZoo collection for animals with "type":"zebra", find all the unique keys, and place them in a new collection (MyZoo_keys) - then do the same thing for "type":"lion" & "type":"giraffe", giving each "type" its own array of keys.
Here's the collection I'm starting with:
{
"name": "Zebra1",
"height": "300",
"weight": "900",
"type": "zebra"
"zebraSpecific1": "somevalue"
},
{
"name": "Lion1",
"height": "325",
"weight": "1200",
"type": "lion",
},
{
"name": "Zebra2",
"height": "500",
"weight": "2100",
"type": "zebra",
"zebraSpecific2": "somevalue"
},
{
"name": "Giraffe",
"height": "4800",
"weight": "2400",
"type": "giraffe"
"giraffeSpecific1": "somevalue",
"giraffeSpecific2": "someothervalue"
}
And here's what I'd like the MyZoo_keys collection to look like:
{
"zebra": [
{
"name": null,
"height": null,
"weight": null,
"type": null,
"zebraSpecific1": null,
"zebraSpecific2": null
}
],
"lion": [
{
"name": null,
"height": null,
"weight": null,
"type": null
}
],
"giraffe": [
{
"name": null,
"height": null,
"weight": null,
"type": null,
"giraffeSpecific1": null,
"giraffeSpecific2": null
}
]
}
That's probably imperfect JSON, but you get the idea...
Thanks!
You can modify your code to dump the results in a more readable and organized format.
The map function:
Emit the type of animal as key, and an array of keys for
each animal(document). Leave out the _id field.
Code:
var map = function(){
var keys = [];
Object.keys(this).forEach(function(k){
if(k != "_id"){
keys.push(k);
}
})
emit(this.type,{"keys":keys});
}
The reduce function:
For each type of animal, consolidate and return the unique keys.
Use an Object(uniqueKeys) to check for duplicates, this increases the running
time even if it occupies some memory. The look up is O(1).
Code:
var reduce = function(key,values){
var uniqueKeys = {};
var result = [];
values.forEach(function(value){
value.keys.forEach(function(k){
if(!uniqueKeys[k]){
uniqueKeys[k] = 1;
result.push(k);
}
})
})
return {"keys":result};
}
Invoking Map-Reduce:
db.collection.mapReduce(map,reduce,{out:"t1"});
Aggregating the result:
db.t1.aggregate([
{$project:{"_id":0,"animal":"$_id","keys":"$value.keys"}}
])
Sample o/p:
{
"animal" : "lion",
"keys" : [
"name",
"height",
"weight",
"type"
]
}
{
"animal" : "zebra",
"keys" : [
"name",
"height",
"weight",
"type",
"zebraSpecific1",
"zebraSpecific2"
]
}
{
"animal" : "giraffe",
"keys" : [
"name",
"height",
"weight",
"type",
"giraffeSpecific1",
"giraffeSpecific2"
]
}
I am trying to retrieve information on how many attempts a user takes to solve a particular problem as a JSON from a mongodb database. If there are multiple attempts on the same problem, I would only like to pull out the last entry - for instance, right now, if I do a db.proficiencies.find() - I will pull out entries A, B, C, and D but I would like to only pull out entries B and D (latest entries for the problems maze and circle respectively).
Is there an easy way to do so?
Entry A
{
"problem": "maze",
"courseLesson": "elementary_one, 1",
"studentId": "51ed51d0fcb4cc3696000001",
"studentName": "Sarah",
"_id": "51ed51defcb4cc3696000011",
"__v": 0,
"date": "2013-07-22T15:38:06.259Z",
"numberOfAttemptsBeforeSolved": 1
}
Entry B
{
"problem": "maze",
"courseLesson": "elementary_one, 1",
"studentId": "51ed51d0fcb4cc3696000001",
"studentName": "Sarah",
"_id": "51ed51defcb4cc3696000011",
"__v": 0,
"date": "2013-07-27T15:38:06.259Z",
"numberOfAttemptsBeforeSolved": 1
}
Entry C
{
"problem": "circle",
"courseLesson": "elementary_one, 1",
"studentId": "51ed51d0fcb4cc3696000001",
"studentName": "Sarah",
"_id": "51ed51defcb4cc3696000011",
"__v": 0,
"date": "2013-07-22T15:38:06.259Z",
"numberOfAttemptsBeforeSolved": 2
}
Entry D
{
"problem": "circle",
"courseLesson": "elementary_one, 1",
"studentId": "51ed51d0fcb4cc3696000001",
"studentName": "Sarah",
"_id": "51ed51defcb4cc3696000011",
"__v": 0,
"date": "2013-07-27T15:38:06.259Z",
"numberOfAttemptsBeforeSolved": 4
}
var ProficiencySchema = new Schema({
problem: String
, numberOfAttemptsBeforeSolved: {type: Number, default: 0}
//refers to which lesson, e.g. elementary_one, 2 refers to lesson 2 of elementary_one
, courseLesson: String
, date: {type: Date, default: Date.now}
, studentId: Schema.Types.ObjectId
, studentName: String
})
The best way to do this would be to sort the results in descending date-time order (so the latest response is first) and then to limit the result set by one. This would look something like:
db.proficiencies.find(YOUR QUERY).sort({'date': -1}).limit(1)
The document is like below.
{
"title": "Book1",
"dailyactiviescores":[
{
"date": 2013-06-05,
"score": 10,
},
{
"date": 2013-06-06,
"score": 21,
},
]
}
The daily active score is intended to increase once the book is opened by a reader. The first solution comes to mind is use "$" to find whether target date has a score or not, and deal with it.
err = bookCollection.Update(
{"title":"Book1", "dailyactivescore.date": 2013-06-06},
{"$inc":{"dailyactivescore.$.score": 1}})
if err == ErrNotFound {
bookCollection.Update({"title":"Book1"}, {"$push":...})
}
But I cannot help to think is there any way to return the index of an item inside array? If so, I could use one query to do the job rather than two. Like this.
index = bookCollection.Find(
{"title":"Book1", "dailyactivescore.date": 2013-06-06}).Select({"$index"})
if index != -1 {
incTarget = FormatString("dailyactivescore.%d.score", index)
bookCollection.Update(..., {"$inc": {incTarget: 1}})
} else {
//push here
}
Incrementing a field that's not present isn't the issue as doing $inc:1 on it will just create it and set it to 1 post-increment. The issue is when you don't have an array item corresponding to the date you want to increment.
There are several possible solutions here (that don't involve multiple steps to increment).
One is to pre-create all the dates in the array elements with scores:0 like so:
{
"title": "Book1",
"dailyactiviescores":[
{
"date": 2013-06-01,
"score": 0,
},
{
"date": 2013-06-02,
"score": 0,
},
{
"date": 2013-06-03,
"score": 0,
},
{
"date": 2013-06-04,
"score": 0,
},
{
"date": 2013-06-05,
"score": 0,
},
{
"date": 2013-06-06,
"score": 0
}, { etc ... }
]
}
But how far into the future to go? So one option here is to "bucket" - for example, have an activities document "per month" and before the start of a month have a job that creates the new documents for next month. Slightly yucky. But it'll work.
Other options involve slight changes in schema.
You can use a collection with book, date, activity_scores. Then you can use a simple upsert to increment a score:
db.books.update({title:"Book1", date:"2013-06-02", {$inc:{score:1}}, {upsert:true})
This will increment the score or insert new record with score:1 for this book and date and your collection will look like this:
{
"title": "Book1",
"date": 2013-06-01,
"score": 10,
},
{
"title": "Book1",
"date": 2013-06-02,
"score": 1,
}, ...
Depending on how much you simplified your example from your real use case, this might work well.
Another option is to stick with the array but switch to using the date string as a key that you increment:
Schema:
{
"title": "Book1",
"dailyactiviescores":{
{ "2013-06-01":10},
{ "2013-06-02":8}
}
}
Note it's now a subdocument and not an array and you can do:
db.books.update({title:"Book1"}, {"dailyactivityscores.2013-06-03":{$inc:1}})
and it will add a new date into the subdocument and increment it resulting in:
{
"title": "Book1",
"dailyactiviescores":{
{ "2013-06-01":10},
{ "2013-06-02":8},
{ "2013-06-03":1}
}
}
Note it's now harder to "add-up" the scores for the book so you can atomically also update a "subtotal" in the same update statement whether it's for all time or just for the month.
But here it's once again problematic to keep adding days to this subdocument - what happens when you're still around in a few years and these book documents grow hugely?
I suspect that unless you will only be keeping activity scores for the last N days (which you can do with capped array feature in 2.4) it will be simpler to have a separate collection for book-activity-score tracking where each book-day is a separate document than to embed the scores for each day into the book in a collection of books.
According to the docs:
The $inc operator increments a value of a field by a specified amount.
If the field does not exist, $inc sets the field to the specified
amount.
So, if there won't be a score field in the array item, $inc will set it to 1 in your case, like this:
{
"title": "Book1",
"dailyactiviescores":[
{
"date": 2013-06-05,
"score": 10,
},
{
"date": 2013-06-06,
},
]
}
bookCollection.Update(
{"title":"Book1", "dailyactivescore.date": 2013-06-06},
{"$inc":{"dailyactivescore.$.score": 1}})
will result into:
{
"title": "Book1",
"dailyactiviescores":[
{
"date": 2013-06-05,
"score": 10,
},
{
"date": 2013-06-06,
"score": 1
},
]
}
Hope that helps.