Meteor: Speeding up MongoDB Join's for Big Data? - mongodb

I have two collections: Data and Users. In the Data collection, there is an array of user IDs consisting of approximately 300 to 800 users.
I need to join together the countries of all of the users for each row in the Data collection, which hangs my web browser due to there being too much data being queried at once.
I query for about 16 rows of the Data collection at once, and also there are 18833 users so far in the Users collection.
So far I have tried to make both a Meteor method and a transform() JOIN for the Meteor collection which is what's hanging my app.
Mongo Collection:
UserInfo = new Mongo.Collection("userInfo")
GlyphInfo = new Mongo.Collection("GlyphAllinOne", {
transform: function(doc) {
doc.peopleInfo = doc.peopleInfo.forEach(function(person) {
person.code3 = UserInfo.findOne({userId: person.name}).code3;
return person;
})
return doc;
}
});
'code3' designates user's country.
Publication:
Meteor.publish("glyphInfo", function (courseId) {
this.unblock();
var query = {};
if (courseId) query.courseId = courseId;
return [GlyphInfo.find(query), UserInfo.find({})];
})
Tested Server Method:
Meteor.methods({
'glyph.countryDistribution': function(courseId) {
var query = {};
if (courseId) query.courseId = courseId;
var glyphs = _.map(_.pluck(GlyphInfo.find(query).fetch(), 'peopleInfo'), function(glyph) {
_.map(glyph, function(user) {
var data = Users.findOne({userId: user.name});
if (data) {
user.country = data ? data.code3 : null;
console.log(user.country)
return user;
}
});
return glyph;
});
return glyphs;
}
});
Collection Data:
There is an option of preprocessing my collection so that countries would already be included, however I'm not allowed to modify these collections. I presume that having this JOIN be done on startup of the server and thereafter exposed through an array as a Meteor method may stall startup time of the server for far too long; though I'm not sure.
Does anyone have any ideas on how to speed up this query?
EDIT: Tried out MongoDB aggregation commands as well and it appears to be extremely slow on Meteor's minimongo. Took 4 minutes to query in comparison to 1 second on a native MongoDB client.
var codes = GlyphInfo.aggregate([
{$unwind: "$peopleInfo"},
{$lookup: {
from: "users",
localField: "peopleInfo.name",
foreignField: "userId",
as: "details"
}
},
{$unwind: "$details"},
{$project: {"peopleInfo.Count": 1, "details.code3": 1}}
])

I would approach this problem slightly differently, using reywood:publish-composite
On the server I would publish the glyphinfo using publish-composite and then include the related user and their country field in the publication
I would join up the country on the client wherever I needed to display the country name along with the glyphinfo object
Publication:
Meteor.publishComposite('glyphInfo', function(courseId) {
this.unblock();
return {
find: function() {
var query = {};
if (courseId) query.courseId = courseId;
return GlyphInfo.find(query);
},
children: [
{
find: function(glyph) {
var nameArray = [];
glyph.person.forEach(function(person){
nameArray.push(person.name);
};
return UserInfo.find({ userId: {$in: nameArray }});
}
]
}
});

Solved the problem by creating a huge MongoDB aggregation call, with the biggest factor in solving latency being indexing unique columns in your database.
After carefully implementing indices into my database with over 4.6 million entries, it took 0.3 seconds on Robomongo and 1.4 seconds w/ sending data to the client on Meteor.
Here is the aggregation code for those who'd like to see it:
Meteor.methods({
'course.countryDistribution': function (courseId, videoId) {
var query = {};
if (courseId) query.courseId = courseId;
var data = GlyphInfo.aggregate([
{$unwind: "$peopleInfo"},
{$lookup: {
from: "users",
localField: "peopleInfo.name",
foreignField: "userId",
as: "details"
}
},
{$unwind: "$details"},
{$project: {"peopleInfo.Count": 1, "details.code3": 1}},
{$group: {_id: "$details.code3", count: {$sum: "$peopleInfo.Count"}}}
])
return data;
}
});
If anyone else is tackling similar issues, feel free to contact me. Thanks everyone for your support!

Related

How to guarantee unique primary key with one update query

In my Movie schema, I have a field "release_date" who can contain nested subdocuments.
These subdocuments contains three fields :
country_code
date
details
I need to guarantee the first two fields are unique (primary key).
I first tried to set a unique index. But I finally realized that MongoDB does not support unique indexes on subdocuments.
Index is created, but validation does not trigger, and I can still add duplicates.
Then, I tried to modify my update function to prevent duplicates, as explained in this article (see Workarounds) : http://joegornick.com/2012/10/25/mongodb-unique-indexes-on-single-embedded-documents/
$ne works well but in my case, I have a combination of two fields, and it's a way more complicated...
$addToSet is nice, but not exactly what I am searching for, because "details" field can be not unique.
I also tried plugin like mongoose-unique-validator, but it does not work with subdocuments ...
I finally ended up with two queries. One for searching existing subdocument, another to add a subdocument if the previous query returns no document.
insertReleaseDate: async(root, args) => {
const { movieId, fields } = args
// Searching for an existing primary key
const document = await Movie.find(
{
_id: movieId,
release_date: {
$elemMatch: {
country_code: fields.country_code,
date: fields.date
}
}
}
)
if (document.length > 0) {
throw new Error('Duplicate error')
}
// Updating the document
const response = await Movie.updateOne(
{ _id: movieId },
{ $push: { release_date: fields } }
)
return response
}
This code works fine, but I would have preferred to use only one query.
Any idea ? I don't understand why it's so complicated as it should be a common usage.
Thanks RichieK for your answer ! It's working great.
Just take care to put the field name before "$not" like this :
insertReleaseDate: async(root, args) => {
const { movieId, fields } = args
const response = await Movie.updateOne(
{
_id: movieId,
release_date: {
$not: {
$elemMatch: {
country_code: fields.country_code,
date: fields.date
}
}
}
},
{ $push: { release_date: fields } }
)
return formatResponse(response, movieId)
}
Thanks a lot !

How to return results in random order with Mongoose (working with skip, limit)

Let's say I have some bookschema, and I want to get books in a random order whenever the page loads. I cannot shuffle the find result like this:
const books = await BookModel.find({}, (err, res) => { // eslint-disable-line
// console.log(res);
if (err) {
console.log(err);
}
}).limit(args.limit || 3).skip(args.offset || 3).lean();
return _.shuffle(books);
Reason being that on each additional skip/limit, e. g. when the user scrolls down to load more results, the randomization would start again from scratch and the frontend results would shuffle around.
What I want is that it loads the first e. g. 9 results in randomized order, then loads another 9 in randomized order. Is there any way to do this with Mongoose out of the box?
I think we can solve this the following way:
=> we need to send the already read ids to the server for the next chunk
...
let ids = [];
if (retrievedIds.length) {
ids = retrievedIds.map((id) => mongoose.Types.ObjectId(id));
}
let filter = { _id: { $nin: ids} }
db.users.aggregate([
{ $match: filter }, //
{ $sample: { size: 10 } }
])

Meteor publication sorting

I have a Meteor-react app, what contains a collection, with a lots of data. I am displaying the data with pagination.
At the server side I am just publishing the data for the current page.
So, I am publishing some data at the server side:
Meteor.publish('animals', function(currPage,displayPerPage, options) {
const userId = this.userId;
if (userId) {
const currentUser = Meteor.users.findOne({ _id: userId });
let skip = (currPage - 1) * displayPerPage;
if (displayPerPage > 0) {
Counts.publish(this, 'count-animals', Animals.find(
{$and: [
// Counter Query
}
), {fastCount: true});
return Animals.find(
{$and: [
// Data query
]}, {sort: options.sortOption, skip: skip, limit: displayPerPage });
} else {
Counts.publish(this, 'count-animals', 0);
return [];
}
}
});
And on the client side I am using tracker:
export default AnimalsContainer = withTracker(({subscriptionName, subscriptionFun, options, counterName}) => {
let displayPerPage = Session.get("displayPerPage");
let currPage = Session.get("currPage");
let paginationSub = Meteor.subscribe(subscriptionName, currPage, displayPerPage, options );
let countAnimals = Counts.get(counterName);
let data = Animals.find({}).fetch({});
// console.log(data);
return {
// data: data,
data: data.length <= displayPerPage ? data : data.slice(0, displayPerPage),
countAnimals: countAnimals,
}
})(Animals);
The problem is:
When I try to modify the sort options on the client side, the server sort not from the first data(Skippind the first some). Sometimes from the 20 th sometimes from the 10 th.
The type checks are done at both side.
Two things I can think of.
Keep on eye on the {sort: options.sortOption, skip: skip, limit: displayPerPage } order. As far as I know, it runs in the order you place it. So it sorts first, then skips, then limits.
Do sorts on both client and server. When the sort happens on the server and it's streamed to the client, the client holds a mini mongo version which doesn't guarantee an order. Therefore you need to sort the client as well.

Update multiple documents in mongodb in one go

I'm still in the middle of getting familiar with mongoDB queries so my problem might be very easy for you guys.
The problem is -
I have a document as shown in 'document.jpg'
document.jpg
Both of these documents are having one entry '5ac649444f3df45be852df84' under their products array. I want to remove this entry. I have the name of these two documents in an array and also have this entry '5ac649444f3df45be852df84' -
arr = ['andwived', 'QA Infotech']
productId = 5ac649444f3df45be852df84
Now the query I am using is -
enter code here
productId = mongoose.Types.ObjectId(productId)
Org.update(
{
'name':{"$in":[arr]}
},
{
$pull: {'products': productId}
},
callback
)
This is not giving any error but not removing the mentioned id either. Please help.
Try this..
const productId = mongoose.Types.ObjectId('5ac649444f3df45be852df84');
const arr = ['andwived', 'QA Infotech'];
Org.updateMany(
{
name: {$in: arr}
},
{
$pull: {products: productId}
},
callback
);

sql server Row Number with partition over in MongoDB for returning a subset of rows

How to write below query using MongoDB-Csharp driver
SELECT SubSet.*
FROM ( SELECT T.ProductName ,
T.Price ,
ROW_NUMBER() OVER ( PARTITION BY T.ProductName ORDER BY T.ProductName ) AS ProductRepeat
FROM myTable T
) SubSet
WHERE SubSet.ProductRepeat = 1
What I am trying to achieve is
Collection
ProductName|Price|SKU
Cap|10|AB123
Bag|5|ED567
Cap|20|CD345
Cap|5|EC123
Expected results is
ProductName|Price|SKU
Cap|10|AB123
Bag|5|ED567
Here is the one attempt (please don't go with the object and fields)
public List<ProductOL> Search(ProductOL obj, bool topOneOnly)
{
List<ProdutOL> products = new List<ProductOL>();
var database = MyMongoClient.Instance.OpenToRead(dbName: ConfigurationManager.AppSettings["MongoDBDefaultDB"]);
var collection = database.GetCollection<RawBsonDocument>("Products");
List<IMongoQuery> build = new List<IMongoQuery>();
if (!string.IsNullOrEmpty(obj.ProductName))
{
var ProductNameQuery = Query.Matches("ProductName", new BsonRegularExpression(obj.ProductName, "i"));
build.Add(ProductNameQuery);
}
if (!string.IsNullOrEmpty(obj.BrandName))
{
var brandNameQuery = Query.Matches("BrandName", new BsonRegularExpression(obj.BrandName, "i"));
build.Add(brandNameQuery);
}
var fullQuery = Query.And(build.ToArray());
products = collection.FindAs<ProductOL>(fullQuery).SetSortOrder(SortBy.Ascending("ProductName")).ToList();
if (topOneOnly)
{
var tmpProducts = new List<ProductOL>();
foreach (var item in products)
{
if (tmpProducts.Any(x => x.ProductName== item.ProductName)) { }
else
tmpProducts.Add(item);
}
products = tmpProducts;
}
return products;
}
my mongo query works and gives me the right results. But that is not effeciant when I am dealing with huge data, so I was wondering if mongodb has any concepts like SQL Server for Row_Number() and Partitioning
If your query returns the expected results but isn't efficient, you should look into index usage with explain(). Given your query generation code includes conditional clauses, it seems likely you will need multiple indexes to efficiently cover common variations.
I'm not sure how the C# code you've provided relates to the original SQL query, as they seem to be entirely different. I'm also not clear how grouping is expected to help your query performance, aside from limiting the results returned.
Equivalent of the SQL query
There is no direct equivalent of ROW_NUMBER() .. PARTITION BY grouping in MongoDB, but you should be able to work out the desired result using either the Aggregation Framework (fastest) or Map/Reduce (slower but more functionality). The MongoDB manual includes an Aggregation Commands Comparison as well as usage examples.
As an exercise in translation, I'll focus on your SQL query which is pulling out the first product match by ProductName:
SELECT SubSet.*
FROM ( SELECT T.ProductName ,
T.Price ,
ROW_NUMBER() OVER ( PARTITION BY T.ProductName ORDER BY T.ProductName ) AS ProductRepeat
FROM myTable T
) SubSet
WHERE SubSet.ProductRepeat = 1
Setting up the test data you provided:
db.myTable.insert([
{ ProductName: 'Cap', Price: 10, SKU: 'AB123' },
{ ProductName: 'Bag', Price: 5, SKU: 'ED567' },
{ ProductName: 'Cap', Price: 20, SKU: 'CD345' },
{ ProductName: 'Cap', Price: 5, SKU: 'EC123' },
])
Here's an aggregation query in the mongo shell which will find the first match per group (ordered by ProductName). It should be straightforward to translate that aggregation query to the C# driver using the MongoCollection.Aggregate() method.
I've included comments with the rough equivalent SQL fragment in your original query.
db.myTable.aggregate(
// Apply a sort order so the $first product is somewhat predictable
// ( "ORDER BY T.ProductName")
{ $sort: {
ProductName: 1
// Should really have additional sort by Price or SKU (otherwise order may change)
}},
// Group by Product Name
// (" PARTITION BY T.ProductName")
{ $group: {
_id: "$ProductName",
// Find first matching product details per group (can use $$CURRENT in MongoDB 2.6 or list specific fields)
// "SELECT SubSet.* ... WHERE SubSet.ProductRepeat = 1"
Price: { $first: "$Price" },
SKU: { $first: "$SKU" },
}},
// Rename _id to match expected results
{ $project: {
_id: 0,
ProductName: "$_id",
Price: 1,
SKU: 1,
}}
)
Results given the test data appear to be what you were looking for:
{ "Price" : 10, "SKU" : "AB123", "ProductName" : "Cap" }
{ "Price" : 5, "SKU" : "ED567", "ProductName" : "Bag" }
Notes:
This aggregation query uses the $first operator, so if you want to find the second or third product per grouping you'd need a different approach (eg. $group and then take the subset of results needed in your application code)
If you want predictable results for finding the first item in a $group there should be more specific sort criteria than ProductName (for example, sorting by ProductName & Price or ProductName & SKU). Otherwise the order of results may change in future as documents are added or updated.
Thanks to #Stennie with the help of his answer I could come up with C# aggregation code
var match = new BsonDocument
{
{
"$match",
new BsonDocument{
{"ProductName", new BsonRegularExpression("cap", "i")}
}
}
};
var group = new BsonDocument
{
{"$group",
new BsonDocument
{
{"_id", "$ProductName"},
{"SKU", new BsonDocument{
{
"$first", "$SKU"
}}
}
}}
};
var project = new BsonDocument{
{
"$project",
new BsonDocument
{
{"_id", 0 },
{"ProductName","$_id" },
{"SKU", 1}
}}};
var sort = new BsonDocument{
{
"$sort",
new BsonDocument
{
{
"ProductName",1 }
}
}};
var pipeline = new[] { match, group, project, sort };
var aggResult = collection.Aggregate(pipeline);
var products= aggResult.ResultDocuments.Select(BsonSerializer.Deserialize<ProductOL>).ToList();
Using AggregateArgs
AggregateArgs args = new AggregateArgs();
List<BsonDocument> piple = new List<BsonDocument>();
piple.Add(match);
piple.Add(group);
piple.Add(project);
piple.Add(sort);
args.Pipeline = piple;
// var pipeline = new[] { match, group, project, sort };
var aggResult = collection.Aggregate(args);
products = aggResult.Select(BsonSerializer.Deserialize<ProductOL>).ToList();