sql server Row Number with partition over in MongoDB for returning a subset of rows - mongodb

How to write below query using MongoDB-Csharp driver
SELECT SubSet.*
FROM ( SELECT T.ProductName ,
T.Price ,
ROW_NUMBER() OVER ( PARTITION BY T.ProductName ORDER BY T.ProductName ) AS ProductRepeat
FROM myTable T
) SubSet
WHERE SubSet.ProductRepeat = 1
What I am trying to achieve is
Collection
ProductName|Price|SKU
Cap|10|AB123
Bag|5|ED567
Cap|20|CD345
Cap|5|EC123
Expected results is
ProductName|Price|SKU
Cap|10|AB123
Bag|5|ED567
Here is the one attempt (please don't go with the object and fields)
public List<ProductOL> Search(ProductOL obj, bool topOneOnly)
{
List<ProdutOL> products = new List<ProductOL>();
var database = MyMongoClient.Instance.OpenToRead(dbName: ConfigurationManager.AppSettings["MongoDBDefaultDB"]);
var collection = database.GetCollection<RawBsonDocument>("Products");
List<IMongoQuery> build = new List<IMongoQuery>();
if (!string.IsNullOrEmpty(obj.ProductName))
{
var ProductNameQuery = Query.Matches("ProductName", new BsonRegularExpression(obj.ProductName, "i"));
build.Add(ProductNameQuery);
}
if (!string.IsNullOrEmpty(obj.BrandName))
{
var brandNameQuery = Query.Matches("BrandName", new BsonRegularExpression(obj.BrandName, "i"));
build.Add(brandNameQuery);
}
var fullQuery = Query.And(build.ToArray());
products = collection.FindAs<ProductOL>(fullQuery).SetSortOrder(SortBy.Ascending("ProductName")).ToList();
if (topOneOnly)
{
var tmpProducts = new List<ProductOL>();
foreach (var item in products)
{
if (tmpProducts.Any(x => x.ProductName== item.ProductName)) { }
else
tmpProducts.Add(item);
}
products = tmpProducts;
}
return products;
}

my mongo query works and gives me the right results. But that is not effeciant when I am dealing with huge data, so I was wondering if mongodb has any concepts like SQL Server for Row_Number() and Partitioning
If your query returns the expected results but isn't efficient, you should look into index usage with explain(). Given your query generation code includes conditional clauses, it seems likely you will need multiple indexes to efficiently cover common variations.
I'm not sure how the C# code you've provided relates to the original SQL query, as they seem to be entirely different. I'm also not clear how grouping is expected to help your query performance, aside from limiting the results returned.
Equivalent of the SQL query
There is no direct equivalent of ROW_NUMBER() .. PARTITION BY grouping in MongoDB, but you should be able to work out the desired result using either the Aggregation Framework (fastest) or Map/Reduce (slower but more functionality). The MongoDB manual includes an Aggregation Commands Comparison as well as usage examples.
As an exercise in translation, I'll focus on your SQL query which is pulling out the first product match by ProductName:
SELECT SubSet.*
FROM ( SELECT T.ProductName ,
T.Price ,
ROW_NUMBER() OVER ( PARTITION BY T.ProductName ORDER BY T.ProductName ) AS ProductRepeat
FROM myTable T
) SubSet
WHERE SubSet.ProductRepeat = 1
Setting up the test data you provided:
db.myTable.insert([
{ ProductName: 'Cap', Price: 10, SKU: 'AB123' },
{ ProductName: 'Bag', Price: 5, SKU: 'ED567' },
{ ProductName: 'Cap', Price: 20, SKU: 'CD345' },
{ ProductName: 'Cap', Price: 5, SKU: 'EC123' },
])
Here's an aggregation query in the mongo shell which will find the first match per group (ordered by ProductName). It should be straightforward to translate that aggregation query to the C# driver using the MongoCollection.Aggregate() method.
I've included comments with the rough equivalent SQL fragment in your original query.
db.myTable.aggregate(
// Apply a sort order so the $first product is somewhat predictable
// ( "ORDER BY T.ProductName")
{ $sort: {
ProductName: 1
// Should really have additional sort by Price or SKU (otherwise order may change)
}},
// Group by Product Name
// (" PARTITION BY T.ProductName")
{ $group: {
_id: "$ProductName",
// Find first matching product details per group (can use $$CURRENT in MongoDB 2.6 or list specific fields)
// "SELECT SubSet.* ... WHERE SubSet.ProductRepeat = 1"
Price: { $first: "$Price" },
SKU: { $first: "$SKU" },
}},
// Rename _id to match expected results
{ $project: {
_id: 0,
ProductName: "$_id",
Price: 1,
SKU: 1,
}}
)
Results given the test data appear to be what you were looking for:
{ "Price" : 10, "SKU" : "AB123", "ProductName" : "Cap" }
{ "Price" : 5, "SKU" : "ED567", "ProductName" : "Bag" }
Notes:
This aggregation query uses the $first operator, so if you want to find the second or third product per grouping you'd need a different approach (eg. $group and then take the subset of results needed in your application code)
If you want predictable results for finding the first item in a $group there should be more specific sort criteria than ProductName (for example, sorting by ProductName & Price or ProductName & SKU). Otherwise the order of results may change in future as documents are added or updated.

Thanks to #Stennie with the help of his answer I could come up with C# aggregation code
var match = new BsonDocument
{
{
"$match",
new BsonDocument{
{"ProductName", new BsonRegularExpression("cap", "i")}
}
}
};
var group = new BsonDocument
{
{"$group",
new BsonDocument
{
{"_id", "$ProductName"},
{"SKU", new BsonDocument{
{
"$first", "$SKU"
}}
}
}}
};
var project = new BsonDocument{
{
"$project",
new BsonDocument
{
{"_id", 0 },
{"ProductName","$_id" },
{"SKU", 1}
}}};
var sort = new BsonDocument{
{
"$sort",
new BsonDocument
{
{
"ProductName",1 }
}
}};
var pipeline = new[] { match, group, project, sort };
var aggResult = collection.Aggregate(pipeline);
var products= aggResult.ResultDocuments.Select(BsonSerializer.Deserialize<ProductOL>).ToList();
Using AggregateArgs
AggregateArgs args = new AggregateArgs();
List<BsonDocument> piple = new List<BsonDocument>();
piple.Add(match);
piple.Add(group);
piple.Add(project);
piple.Add(sort);
args.Pipeline = piple;
// var pipeline = new[] { match, group, project, sort };
var aggResult = collection.Aggregate(args);
products = aggResult.Select(BsonSerializer.Deserialize<ProductOL>).ToList();

Related

How to guarantee unique primary key with one update query

In my Movie schema, I have a field "release_date" who can contain nested subdocuments.
These subdocuments contains three fields :
country_code
date
details
I need to guarantee the first two fields are unique (primary key).
I first tried to set a unique index. But I finally realized that MongoDB does not support unique indexes on subdocuments.
Index is created, but validation does not trigger, and I can still add duplicates.
Then, I tried to modify my update function to prevent duplicates, as explained in this article (see Workarounds) : http://joegornick.com/2012/10/25/mongodb-unique-indexes-on-single-embedded-documents/
$ne works well but in my case, I have a combination of two fields, and it's a way more complicated...
$addToSet is nice, but not exactly what I am searching for, because "details" field can be not unique.
I also tried plugin like mongoose-unique-validator, but it does not work with subdocuments ...
I finally ended up with two queries. One for searching existing subdocument, another to add a subdocument if the previous query returns no document.
insertReleaseDate: async(root, args) => {
const { movieId, fields } = args
// Searching for an existing primary key
const document = await Movie.find(
{
_id: movieId,
release_date: {
$elemMatch: {
country_code: fields.country_code,
date: fields.date
}
}
}
)
if (document.length > 0) {
throw new Error('Duplicate error')
}
// Updating the document
const response = await Movie.updateOne(
{ _id: movieId },
{ $push: { release_date: fields } }
)
return response
}
This code works fine, but I would have preferred to use only one query.
Any idea ? I don't understand why it's so complicated as it should be a common usage.
Thanks RichieK for your answer ! It's working great.
Just take care to put the field name before "$not" like this :
insertReleaseDate: async(root, args) => {
const { movieId, fields } = args
const response = await Movie.updateOne(
{
_id: movieId,
release_date: {
$not: {
$elemMatch: {
country_code: fields.country_code,
date: fields.date
}
}
}
},
{ $push: { release_date: fields } }
)
return formatResponse(response, movieId)
}
Thanks a lot !

Meteor/Mongo - add/update element in sub array dynamically

So I have found quite few related posts on SO on how to update a field in a sub array, such as this one here
What I want to achieve is basically the same thing, but updating a field in a subarray dynamically, instead of just calling the field name in the query.
Now I also found how to do that straight in the main object, but cant seem to do it in the sub array.
Code to insert dynamically in sub-object:
_.each(data.data, function(val, key) {
var obj = {};
obj['general.'+key] = val;
insert = 0 || (Documents.update(
{ _id: data._id },
{ $set: obj}
));
});
Here is the tree of what I am trying to do:
Documents: {
_id: '123123'
...
smallRoom:
[
_id: '456456'
name: 'name1'
description: 'description1'
],
[
...
]
}
Here is my code:
// insert a new object in smallRoom, with only the _id so far
var newID = new Mongo.ObjectID;
var createId = {_id: newID._str};
Documents.update({_id: data._id},{$push:{smallRooms: createId}})
And the part to insert the other fields:
_.each(data.data, function(val, key) {
var obj = {};
obj['simpleRoom.$'+key] = val;
console.log(Documents.update(
{
_id: data._id, <<== the document id that I want to update
smallRoom: {
$elemMatch:{
_id : newID._str, <<== the smallRoom id that I want to update
}
}
},
{
$set: obj
}
));
});
Ok, having said that, I understand I can insert the whole object straight away, not having to push every single field.
But I guess this question is more like, how does it work if smallRoom had 50 fields, and I want to update 3 random fields ? (So I would NEED to use the _each loop as I wouldnt know in advance which field to update, and would not want to replace the whole object)
I'm not sure I 100% understand your question, but I think the answer to what you are asking is to use the $ symbol.
Example:
Documents.update(
{
_id: data._id, smallRoom._id: newID._str
},
{
$set: { smallroom.$.name: 'new name' }
}
);
You are finding the document that matches the _id: data._id, then finding the object in the array smallRoom that has an _id equal to newId._str. Then you are using the $ sign to tell Mongo to update that object's name key.
Hope that helps

Meteor: Speeding up MongoDB Join's for Big Data?

I have two collections: Data and Users. In the Data collection, there is an array of user IDs consisting of approximately 300 to 800 users.
I need to join together the countries of all of the users for each row in the Data collection, which hangs my web browser due to there being too much data being queried at once.
I query for about 16 rows of the Data collection at once, and also there are 18833 users so far in the Users collection.
So far I have tried to make both a Meteor method and a transform() JOIN for the Meteor collection which is what's hanging my app.
Mongo Collection:
UserInfo = new Mongo.Collection("userInfo")
GlyphInfo = new Mongo.Collection("GlyphAllinOne", {
transform: function(doc) {
doc.peopleInfo = doc.peopleInfo.forEach(function(person) {
person.code3 = UserInfo.findOne({userId: person.name}).code3;
return person;
})
return doc;
}
});
'code3' designates user's country.
Publication:
Meteor.publish("glyphInfo", function (courseId) {
this.unblock();
var query = {};
if (courseId) query.courseId = courseId;
return [GlyphInfo.find(query), UserInfo.find({})];
})
Tested Server Method:
Meteor.methods({
'glyph.countryDistribution': function(courseId) {
var query = {};
if (courseId) query.courseId = courseId;
var glyphs = _.map(_.pluck(GlyphInfo.find(query).fetch(), 'peopleInfo'), function(glyph) {
_.map(glyph, function(user) {
var data = Users.findOne({userId: user.name});
if (data) {
user.country = data ? data.code3 : null;
console.log(user.country)
return user;
}
});
return glyph;
});
return glyphs;
}
});
Collection Data:
There is an option of preprocessing my collection so that countries would already be included, however I'm not allowed to modify these collections. I presume that having this JOIN be done on startup of the server and thereafter exposed through an array as a Meteor method may stall startup time of the server for far too long; though I'm not sure.
Does anyone have any ideas on how to speed up this query?
EDIT: Tried out MongoDB aggregation commands as well and it appears to be extremely slow on Meteor's minimongo. Took 4 minutes to query in comparison to 1 second on a native MongoDB client.
var codes = GlyphInfo.aggregate([
{$unwind: "$peopleInfo"},
{$lookup: {
from: "users",
localField: "peopleInfo.name",
foreignField: "userId",
as: "details"
}
},
{$unwind: "$details"},
{$project: {"peopleInfo.Count": 1, "details.code3": 1}}
])
I would approach this problem slightly differently, using reywood:publish-composite
On the server I would publish the glyphinfo using publish-composite and then include the related user and their country field in the publication
I would join up the country on the client wherever I needed to display the country name along with the glyphinfo object
Publication:
Meteor.publishComposite('glyphInfo', function(courseId) {
this.unblock();
return {
find: function() {
var query = {};
if (courseId) query.courseId = courseId;
return GlyphInfo.find(query);
},
children: [
{
find: function(glyph) {
var nameArray = [];
glyph.person.forEach(function(person){
nameArray.push(person.name);
};
return UserInfo.find({ userId: {$in: nameArray }});
}
]
}
});
Solved the problem by creating a huge MongoDB aggregation call, with the biggest factor in solving latency being indexing unique columns in your database.
After carefully implementing indices into my database with over 4.6 million entries, it took 0.3 seconds on Robomongo and 1.4 seconds w/ sending data to the client on Meteor.
Here is the aggregation code for those who'd like to see it:
Meteor.methods({
'course.countryDistribution': function (courseId, videoId) {
var query = {};
if (courseId) query.courseId = courseId;
var data = GlyphInfo.aggregate([
{$unwind: "$peopleInfo"},
{$lookup: {
from: "users",
localField: "peopleInfo.name",
foreignField: "userId",
as: "details"
}
},
{$unwind: "$details"},
{$project: {"peopleInfo.Count": 1, "details.code3": 1}},
{$group: {_id: "$details.code3", count: {$sum: "$peopleInfo.Count"}}}
])
return data;
}
});
If anyone else is tackling similar issues, feel free to contact me. Thanks everyone for your support!

Embedding fields in all mongodb documents

I have a collection with documents that follows this structure:
child:
{
id: int
name: string
age: int
dob: date
school: string
class: string
}
I would like to embed certain fields, into something like this:
child:
{
id : int
personalInfo {
name: string
age: int
dob: date
}
educationInfo {
school: string
class: string
}
}
How would one go across in doing this in code? I am new to Mongodb, so I apologize if my syntax is incorrect. All of the fields have one-to-one relationships with the child (i.e. one child has one id, one name, one age, one school etc.), so I'm also wondering if embedding is even necessary.
Please try to use $set to set the new field personalInfo and educationInfo, with #unset to remove old fields age, name etc. Before do it, it would be better to check all those fields exists through $exists, here are sample codes as below,
> var personfields = [ "name", "age", "dob" ];
> var educationFields = [ "school", "class" ];
> var query = {};
> personFields.forEach(function(k){ query[k] = {$exists: 1}});
> educationFields.forEach(function(k){ query[k] = {$exists: 1}});
> db.collection.find(query).forEach(function(doc){
var personalInfo = {};
var educationInfo = {};
for (var k in doc) {
if (personFields.indexOf(k) !== -1){
personalInfo[k] = doc[k];
} else if (educationFields.indexOf(k) !== -1) {
educationInfo[k] = doc[k];
}
}
db.collection.update({_id: doc._id},
{$set: {
personalInfo: personalInfo,
educationInfo: educationInfo},
$unset: {'name': '',
'age': '',
'dob': '',
'school': '',
'class': ''}});
})
It's OK to embed them, that's what document dB's are for. So if you need a migration, you'll basically use mongodb's functions like update ,with $set and $unset.
See more here: https://docs.mongodb.org/manual/reference/method/db.collection.update/

How to update nested object (i.e., doc field having object type) in single document in mongodb

I have doc collection which have object type field named price (i.e., see below), I just want to update/insert that field by adding new key value pairs to it.
suppose i have this as collection (in db):
[
{
_id: 1,
price: {
amazon: 102.1,
apple: 500
}
},
....
....
];
Now I want to write an query which either update price's or inserts if not exist in price.
let's suppose these as input data to update/insert with:
var key1 = 'ebay', value1 = 300; // will insert
var key2 = 'amazon', value2 = 100; // will update
assume doc having _id: 1 for now.
Something like $addToSet operator?, Though $addToSet only works for array & i want to work within object).
expected output:
[
{
_id: 1,
price: {
amazon: 100, // updated
apple: 500,
ebay: 300 // inserted
}
},
....
....
];
How can i do/achieve this?
Thanks.
You could construct the update document dynamically to use the dot notation and the $set operator to do the update correctly. Using your example above, you'd want to run the following update operation:
db.collection.update(
{ "_id": 1 },
{
"$set": { "price.ebay": 300, "price.amazon": 100 }
}
)
So, given the data input, you would want to construct an update document like { "price.ebay": 300, "price.amazon": 100 }
With the inputs as you have described
var key1 = 'ebay', value1 = 300; // will insert
var key2 = 'amazon', value2 = 100; // will update
Construct the update object:
var query = { "_id": 1 },
update = {};
update["price."+key1] = value1;
update["price."+key2] = value2;
db.collection.update(query, {"$set": update});