Mongodb/Meteor: combine documents with ids in many arrays(returned by another query) - mongodb

I am using meteor to make a query( for a publication) to find all the Results generated by Workers which has finished their work:
Results has the following structure:
result example 1:
{
_id: "sldf234sdf"
result_a:0,
result_b:0
}
result example 2:
{
_id: "ghjwef23qql"
result_a:0,
result_b:0
}
Workers is defined as:
{
_id: "iweyr23s"
results:["sldf234sdf", "ghjwef23qql"], //here is a list of
tag:'running'
}
Here is what I am trying to do:
// 1), I want to find all the workers which is finished with their tag
const workers = Workers.find({tag:'done'});
// 2), I want to get the resultid arrays in all the workers, then combine it into a big array
const results_id_arrays = workers[0].results + workers[1].results + ...
const results = Results.find({_id:{$in: results_id_arrays }});
So, my question is, how to make a mongodb query to implement the second step?

Is this what you are looking for?
you can get results ids like this:
const results_id_arrays = Workers.aggregate({$project:{a:'$results'}},
{$unwind:'$a'},
{$unwind:'$a'},
{$group:{_id:'a',res:{$addToSet:'$a'}}}).map(function(e) {return e.res})
and then
const results = Results.find({_id:{$in: results_id_arrays }});
does this work?

You can use the mighty underscorejs for this. There is also a meteor package for it. Check documentation here.
//fetch results
const workers = Workers.find({tag:'done'}, {results: 1, _id: 0}).fetch();
//pluck only the ids, without field name
const plucked = _.pluck(workers, 'results');
//use plucked ids to find results
return Results.find({_id:{$in: plucked }});

Related

How can I apply conditional filtering to MongoDB documents?

I am having a really hard time figuring out how to apply multiple levels of filtering to my mongoDB documents (not sure if this phrasing is correct).
I am trying to create an app that will allow users to perform a search and retrieve only those documents that match the filters they have chosen to apply. A user might chose to apply only one filter or combine multiple filters.
For example, the user might be looking for a house. The available filters could be location, size and type. If the user applies the location filter with a value of ‘London’, they should get only those houses available in London. If they choose to combine the above location with the type filter with a value of ‘2-bedroom-apartment’, they should get all 2-bedroom apartments available in London.
How can I make sure that the results are conditionally filtered, depending on the filters that the user has applied?
I think I am supposed to use $match, but I don’t understand if I can use multiple queries with it.
What I have come up with so far is the following:
const getFilteredData = async(req, res) => {
try {
const { filter1, filter2, filter3 } = req.query;
const filteredData = await dataModel.aggregate([
{$match:{
$and:[{filter1:filter1},{filter2: filter2}, {filter3:filter3}] //1st option: all of the filters are applied by the user
}}
])
res.status(201).json({data: filteredData});
}
catch(err) {
res.status(404).json({message: err.message});
}
}
With the above code, the results are filtered only when all 3 filters are being applied. How can I cater to different combinations of filters being applied by the user (only one filter, filter1 & filter3 combined etc)?
Any help will be massively appreciated.
Assuming req.query can be {name: "Coat", city: "Athens"} You can do something like:
const getFilteredData = async(req, res) => {
try {
const filtersArr = [];
for (const filterKey of ['name', 'city', 'category']) {
if (req.query[filterKey]) {
const thisFilter = {};
thisFilter[filterKey] = req.query[filterKey];
filtersArr.push(thisFilter);
}
}
console.log(filtersArr)
const filteredData = await filteredDataModel.aggregate([
{$match:{
$and: filtersArr //1st option: all of the filters are applied by the user
}}
])
res.status(201).json({data: filteredData});
}
catch(err) {
res.status(404).json({message: err.message});
}
}
You can also use the original req.query like this:
const filteredData = await filteredDataModel.find(req.query)
But iterating using the code allows you to validate the keys that you want...

MongoDB bulkWrite multiple updateOne vs updateMany

I have cases where I build bulkWrite operations where some documents have the same update object, is there any performance benefit to merging the filters and send one updateMany with those filters instead of multiple updateOnes in the same bulkWrite?
It's obviously better to use updateMany over multiple updateOnes when using the normal methods, but with bulkWrite, since it's a single command, are there any significant gains of preferring one over the other?
Example:
I have 200k documents that I need to update, I have 10 total unique status field for all 200K documents, so my options are:
Solutions:
A) Send one single bulkWrite with 10 updateMany operations, and each one of those operations will affect 20K documents.
B) Send one single bulkWrite with 200K updateOne each operations holding its filter and status.
As #AlexBlex noted, I have to look out for accidentally updating more than one document with the same filter, in my case I use _id as my filter, so accidentally updating other documents is not a concern in my case, but is definitely something to look out for when considering the updateMany option.
Thanks #AlexBlex.
Short answer:
Using updateMany is at least twice faster, but might accidentally update more documents than you intended, keep reading to learn how to avoid this and gain the performance benefits.
Long answer:
We ran the following experiment to know the answer for that, the following are the steps:
Create a bankaccounts mongodb collection, each document contains only one field (balance).
Insert 1 million documents into the bankaccounts collection.
Randomize the order in memory of all 1 million documents to avoid any possible optimizations from the database using ids that are inserted in the same sequence, simulating a real-world scenario.
Build write operations for bulkWrite from the documents with a random number between 0 and 100.
Execute the bulkWrite.
Log the time the bulkWrite took.
Now, the experiment lies in the 4th step.
In one variation of the experiment we build an array consisting of 1 million updateOne operations, each updateOne has filter for a single document, and its respective `update object.
In the second variation, we build 100 updateMany operations, each including filter for 10K documents ids, and their respective update.
Results:
updateMany with multiple documents ids is 243% faster than multiple updateOnes, this can not be used everywhere though, please read "The risk" section to learn when it should be used.
Details:
We ran the script 5 times for each variation, the detailed results are as follows:
With updateOne: 51.28 seconds on average.
With updateMany: 21.04 seconds on average.
The risk:
As many people have already pointed out, updateMany is not a direct substitute to updateOne, since it can incorrectly update multiple documents when our intention was to really update only one document.
This approach is only valid when you're using a field that is unique such as _id or any other field that is unique, if the filter is depending on fields that are not unique, multiple documents will be updated and the results will not be equivalent.
65831219.js
// 65831219.js
'use strict';
const mongoose = require('mongoose');
const { Schema } = mongoose;
const DOCUMENTS_COUNT = 1_000_000;
const UPDATE_MANY_OPERATIONS_COUNT = 100;
const MINIMUM_BALANCE = 0;
const MAXIMUM_BALANCE = 100;
const SAMPLES_COUNT = 10;
const bankAccountSchema = new Schema({
balance: { type: Number }
});
const BankAccount = mongoose.model('BankAccount', bankAccountSchema);
mainRunner().catch(console.error);
async function mainRunner () {
for (let i = 0; i < SAMPLES_COUNT; i++) {
await runOneCycle(buildUpdateManyWriteOperations).catch(console.error);
await runOneCycle(buildUpdateOneWriteOperations).catch(console.error);
console.log('-'.repeat(80));
}
process.exit(0);
}
/**
*
* #param {buildUpdateManyWriteOperations|buildUpdateOneWriteOperations} buildBulkWrite
*/
async function runOneCycle (buildBulkWrite) {
await mongoose.connect('mongodb://localhost:27017/test', {
useNewUrlParser: true,
useUnifiedTopology: true
});
await mongoose.connection.dropDatabase();
const { accounts } = await createAccounts({ accountsCount: DOCUMENTS_COUNT });
const { writeOperations } = buildBulkWrite({ accounts });
const writeStartedAt = Date.now();
await BankAccount.bulkWrite(writeOperations);
const writeEndedAt = Date.now();
console.log(`Write operations took ${(writeEndedAt - writeStartedAt) / 1000} seconds with \`${buildBulkWrite.name}\`.`);
}
async function createAccounts ({ accountsCount }) {
const rawAccounts = Array.from({ length: accountsCount }, () => ({ balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }));
const accounts = await BankAccount.insertMany(rawAccounts);
return { accounts };
}
function buildUpdateOneWriteOperations ({ accounts }) {
const writeOperations = shuffleArray(accounts).map((account) => ({
updateOne: {
filter: { _id: account._id },
update: { balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }
}
}));
return { writeOperations };
}
function buildUpdateManyWriteOperations ({ accounts }) {
shuffleArray(accounts);
const accountsChunks = chunkArray(accounts, accounts.length / UPDATE_MANY_OPERATIONS_COUNT);
const writeOperations = accountsChunks.map((accountsChunk) => ({
updateMany: {
filter: { _id: { $in: accountsChunk.map(account => account._id) } },
update: { balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }
}
}));
return { writeOperations };
}
function getRandomInteger (min = 0, max = 1) {
min = Math.ceil(min);
max = Math.floor(max);
return min + Math.floor(Math.random() * (max - min + 1));
}
function shuffleArray (array) {
let currentIndex = array.length;
let temporaryValue;
let randomIndex;
// While there remain elements to shuffle...
while (0 !== currentIndex) {
// Pick a remaining element...
randomIndex = Math.floor(Math.random() * currentIndex);
currentIndex -= 1;
// And swap it with the current element.
temporaryValue = array[currentIndex];
array[currentIndex] = array[randomIndex];
array[randomIndex] = temporaryValue;
}
return array;
}
function chunkArray (array, sizeOfTheChunkedArray) {
const chunked = [];
for (const element of array) {
const last = chunked[chunked.length - 1];
if (!last || last.length === sizeOfTheChunkedArray) {
chunked.push([element]);
} else {
last.push(element);
}
}
return chunked;
}
Output
$ node 65831219.js
Write operations took 20.803 seconds with `buildUpdateManyWriteOperations`.
Write operations took 50.84 seconds with `buildUpdateOneWriteOperations`.
----------------------------------------------------------------------------------------------------
Tests were run using MongoDB version 4.0.4.
At high level, if you have same update object, then you can do updateMany rather than bulkWrite
Reason:
bulkWrite is designed to send multiple different commands to the server as mentioned here
If you have same update object, updateMany is best suited.
Performance:
If you have 10k update commands in bulkWrite, it will be executed batch manner internally. It may impact on the execution time
Exact lines from the reference about batching:
Each group of operations can have at most 1000 operations. If a group exceeds this limit, MongoDB will divide the group into smaller groups of 1000 or less. For example, if the bulk operations list consists of 2000 insert operations, MongoDB creates 2 groups, each with 1000 operations.
Thanks #Alex

Get a subdocument from document with criteria Mongodb Dart

Hello I have json data like that:
{
"_id":ObjectId('5dfe907f80580559fedcc9b1'),
"companyMail":"mail#gmail.com"
"workers":[
{
"name":name,
"surName":surname,
"mail":"mail2#gmail.com",
"password":"password",
"companyMail":"mail#gmail.com",
}
]
}
And I want to get an worker from workers:
{
"name":name,
"surName":surname,
"mail":"mail2#gmail.com",
"password":"password",
"companyMail":"mail#gmail.com",
}
I'm writing this query:
collection.findOne({
'companyMail':"mail#gmail.com",
'workers.mail':"mail2#gmail.com",
});
But it gives me whole of data. I only want to get worker which I search. How can I do that with Mongo Dart.
https://pub.dev/packages/mongo_dart
I found the solution. We should use aggregation but we should add some specific query to get one result. In dart mongo, we can use Filter object to add. Like that:
final pipeline = AggregationPipelineBuilder()
.addStage(Match(where.eq('companyMail', companyMail).map['\$query']))
.addStage(Match(where.eq('customers.mail', customerMail).map['\$query']))
.addStage(Project({
"_id": 0, //You can use as:'customer' instead of this keyword.
"customers": Filter(input: '\$customers',cond: {'\$eq':["\$\$this.mail",customerMail]}).build(),
}))
.build();
final result = await DbCollection(_db, 'Companies')
.aggregateToStream(pipeline)
.toList();
mongo-dart API driver is very bad and there is no good documentation whereas mongo-node.js API driver is very good and has very good documentation, so better to do server side with node, for example in node your problem will solve by one line code:
collection.find(
{
'companyMail':"mail#gmail.com",
'workers.mail':"mail2#gmail.com",
}).project({
'_id':0, 'workers':1
});
Pass options to project the workers field only
db.company.findOne(
{
'companyMail':"mail#gmail.com",
'workers.mail':"mail2#gmail.com",
},
{
"workers":1,
_id:0
}
);
In mongo-dart , looking at their api, you can use aggregation which is as follows
final pipeline = AggregationPipelineBuilder()
.addStage(Match(where.eq('companyMail','mail#gmail.com')))
.addStage(Project({
'_id': 0,
"workers":1,
})).build())
final result =
await DbCollection(db, 'company')
.aggregateToStream(pipeline).toList();
// result[0] should give you one worker

Mongoose $in [ObjectIds] returns 0 records

In our Mongoose model, we have a product referring to an article.
this is a piece of the schema:
const product = new Schema({
article_id: Schema.Types.ObjectId,
title: String,
description: String,
...
In the API we are looking for products that are referring to a list of specific articles, and I wanted to use the $in operator:
const articles = ["5dcd2a95d7e2999332441825",
"5dcd2a95d7e2999332441827",
"5dcd2a96d7e2999332441829"]
filter.article_id = {
$in: articles.map(
article => new mongoose.Types.ObjectId(article)
),
};
return Product.find({ ...filter })
This returns 0 records, whereas I know for sure it should have returned at least 3. Looking at the console log, all that has happened is that the double quotes have been removed from the array during the ObjectId conversion.
Then I tried a different approach by returning {$oid: "id goes here"} for each mapped array item:
const articles = ["5dcd2a95d7e2999332441825",
"5dcd2a95d7e2999332441827",
"5dcd2a96d7e2999332441829"]
filter.article_id = {
$in: articles.map(
article => ({$oid: article})
),
};
return Product.find({ ...filter })
This gives a different array:
console.log(filter);
// {article_id: {$in: [{$oid: "5dcd2a95d7e2999332441825"}, {$oid: "5dcd2a95d7e2999332441827"}, {$oid: "5dcd2a96d7e2999332441829"}]}}
But in this case I get following error:
CastError: Cast to ObjectId failed for value "\"{$oid: \"5dcd2a95d7e2999332441825\"}\"".
Though - if I take that particular console logged filter and pass it in Studio 3T as a filter, I do indeed get the desired results.
Any idea what I doing wrong in this case?
Frick me! I am just a big idiot.. Apparently there was a .skip(10) added after the find() method -.-'.... Now I understand why 0 records where returned... Been spending hours on this..
For future references, Mongoose casts strings to ObjectIds automatically if defined in Schema. Therefor following is working exactly as it should given you don't skip the first 10 records:
const articles = ["5dcd2a95d7e2999332441825",
"5dcd2a95d7e2999332441827",
"5dcd2a96d7e2999332441829"]
filter.article_id = {
$in: articles
};
return Product.find({ ...filter }) // Note that I DON'T put .skip() here..

algolia how to orderby string-value and geolocation

I am working on a prototype for a new product, need some help to figure out algolia feasibility. I did go through their documentation but couldn't find right answer. Any help appreciated.
Best example of our data is classified. In dataset we have geolocation(lat-lng) and category
[
{
category: 'retail_shops',
"_geoloc": {
"lat": 40.639751,
"lng": -73.778925
},
title: 'walget1'
},
{
category: 'retail_shops',
"_geoloc": {
"lat": 40.639751,
"lng": -73.778925
},
title: 'walget1'
},
]
Search Requirement
sortby category, selected category first, then all other in anyorder
sortby geolocation, this is secondary filter
We need to display all data in the system but selected category first and distance as secondary.
I am trying to do this is javascript, did find find search and searchForFacetValues methods but documentation is all-around. Any ideal how to achieve this? don't need code but general guidance will definitely help.
In order to do so, you can use a multiple-queries strategy, where you would have a query with a filter that matches the category, and then another query that matches all objects not in the category. In the query parameters you can then use the aroundLatLng field or the aroundLatLngViaIp field.
The following snippet should help you get a better understanding of how to achieve what you want:
const client = algoliasearch('ApplicationID', 'apiKey');
// these two values would be obtained from somewhere (like InstantSearch)
const query = 'walg';
const category = 'retail_shops';
const queries = [{
indexName: 'products',
query: `${query}`,
params: {
hitsPerPage: 3,
filters: `category:${category}`,
aroundLatLngViaIP:true
}
}, {
indexName: 'products',
query: `${query}`,
params: {
hitsPerPage: 10,
filters: `NOT category:${category}`,
aroundLatLngViaIP:true
}
}];
const searchCallback = (err, content) => {
if (err) {
console.error(err);
return;
}
const categorySpecificResults = content.results[0];
for (var i = 0; i < categorySpecificResults.hits.length; ++i) {
console.log(categorySpecificResults.hits[i]);
}
const geolocatedOnlyResults = content.results[1];
for (var i = 0; i < geolocatedOnlyResults.hits.length; ++i) {
console.log(geolocatedOnlyResults.hits[i]);
}
}
// perform the 2 queries in a single API call:
// - 1st query targets index `products` with category filter and geolocation
// - 2nd query targets index `products` with negative category filter and geolocation
client.search(queries, searchCallback);
You will find more information about how to use multiple queries on this page:
How to use multiple Queries
If you want a more fine grained control on the geolocation, like for instance defining a precision or a maximum range for the search, you can find more about these different features on these pages:
aroundLatLng: used when you want to define the geographical center of your query
aroundRadius: used to limit a search to a certain range around the center of the query
aroundPrecision: defines how precise the ranking has to be when it comes to distance.