How to MapReduce collection to get joined array of all values from the field containing array - mongodb

I have the following schema in the database:
{
id: 12345;
friends: [123,345,678,908]
},
{
id: 908;
friends: [123,345]
}
Is there a way to get an array of all unique friends IDs from the entire collection?

To get distinct friends values you do not need to write map/reduce job.
Just run:
> db.collection.distinct("friends")
[ 123, 345, 678, 908 ]

I'm not too familiar with MongoDB's MapReduce implementation but I imagine you could have your mappers write out the values passed to them as keys, and simply use null values.
This way you can ensure the reducers only will receive a given key (your friend IDs) once, and you can simply write that out just once without iterating over the values. As the values are null anyway, there is no point to iterating (not to mention that if you iterate you will write out the keys multiple times, you just want it written once to ensure it is distinct.)
However, bear in mind that your keys will be spread across the reducers output files, e.g. reducer 1 might output 123 and reducer 2 might output 345 so you may have to consolidate the output files' contents afterwards in order to construct your array.

Related

How to query a firestore array with multiple key value pairs in Swift

One of my collections in Cloud Firestore has an where each item in the array contains three separate values (see the groupMembership field:
I know I can't write a query to find documents that match one of the array values. Is there a way to write a query to find documents that match a specific value for all three items?
For instance, I want to find all users where the groupMembership array contains one object that is equal to groupId: X, groupName: Y, membershipStatus: active
You can pass the entire object in an array-contains clause.
So something like:
usersRef
.whereField("groupMembership", arrayContains: [
"groupId": "kmDT8OUOTCxSMIBf9yZC",
"groupName": "Jon and Charles Group",
"membershipStatus": "pending",
])
The array item must completely and exactly match the dictionary you pass in. If you want to filter only on some of the properties in each array element, you will have to create additional fields with just those properties. For example, it is quite common to have say a field groupIds with just the (unique) group Ids, so that you can also filter on those (of course within the limits of Firestore's queries).

Need help querying distinct combinations of nested fields

Desired result
I am trying to query my collection and obtain every unique combination of a batch and entry code. I don't care about anything other than these fields, the parent objects do not matter to me.
What I have tried
I tried running:
db.accountant_ledgers.aggregate( [ {"$group": { "_id": { entryCode: "$actions.entry.entryCode", batchCode: "$actions.entry.batchCode" } } } ]);
Problem
I get unexpected results when I run that query. I'm looking for a list of every unique combination of batch and entry codes, but instead I get a list of arrays? Perhaps these are the results I'm looking for, but I have no idea how to read them if they are.
Theory
I think perhaps this could have to do with the fact that these fields are nested. Each object has several actions, each action has several entries. I believe that the result from that query is just the aggregated entry and batch codes found in each object. I don't know how long the list of results is, but I'd guess it's the same number as the total number of objects in my collection (~90 million).
EDIT: I found out that there are only 182 results from my query, which is clearly significantly smaller than 90 million. My new theory is that it has found all unique objects, with the criteria for "uniqueness" being the list of the batch and entry codes that appear in their actions, which makes sense. There should be a lot of repetition in the collection.
Question
How can I achieve the result I'm looking for? I'm expecting something like:
FEE, MG
EXN, WT
ACH, 9C
...etc
Notes
I apologize if this is a bad question, I'm not sure how else to frame it. Let me know if I can improve my question at all.
Picture below shows the results of the query.
EDIT FOR ADDITIONAL INFORMATION
I can't share any sample documents, but the general structure of the data is shown (crudely) in the below image. Each Entity has several Actions, each Action has one Entry and each Entry has one Batch code and one Entry code.
List item
You are getting a list of documents (each is a map or a hash), not a list of arrays.
The GUI you are using is trying to show you the contents of each document on the top level which is maybe what is confusing.
If you run the query in mongo shell you should see a list of documents.
It looks like your inputs are documents where entry code and batch code are arrays, if so:
Edit your question to include sample documents you are querying as text
You could use $unwind to flatten those arrays before using $group.

How to do querying, sorting and filtering on values of an object in mongodb?

I am trying to find a way to do querying, sorting and filtering on values of an object(which is again an object) in a mongo document. The document structure is,
{
_id: '',
uid: '12345',
objects:{
dkey1: {
prop1: val1,
prop2: val2,
...
},
dkey2: {
prop1: val1,
prop2: val2,
...
},
dkey3: {
prop1: val1,
prop2: val2,
...
},
dkey4: {
prop1: val1,
prop2: val2,
...
}
...
}
}
objects property can contain 1000s of objects with dynamic keys. Thery are hash based unique keys. When I get these objects, I don't want to return all. I want to query, sort, limit as it can be done if they are from different documents. For example, if I say prop1 = val1 sort by prop2 limit 10, the query should return first 10 sub-objects in the objects, where their prop1 is val1 sorted by prop2.
I think it cannot be done with normal find. So, I am trying with aggregation framework. In first stage I will do match on uid. Next? I am confused there. Instead of objects with dynamic keys, if it is an array of objects, I can do $unwind and in further stages, I could've done filter on inner properties(prop1, prop2...), sorting and applying limit etc. But the problem is, it is not an array of objects. If there is a way to convert values of objects object into array of objects, it would be easier. I was looking for the way, but I could not find a solution.
I know the structure is not good and changing the schema would help me. But I am in a situation, I cannot change it now. Is there a way to convert objects's values into array of objects? Or is there different way to achieve the same result with some other aggregation pipeline stages?
Why can't this be done with normal find? According to the documents aggregation operations is to "group values from multiple documents together", and if I understood correctly that's not what you want to do here.
Try this:
db.objects.find({prop1: 'val1'}).sort({prop2: 1}).limit(10)
This was tested in mongo shell
objects would be your collection
the number 1 on sort means ascending order, -1 would be descending
and the number 10 on limit is the limit value of course
--- Edit ----
If you want to access to properties of documents inside another document, use the dot notation.
Example: db.objects.find({'nestedobj.prop1': 'val1'})
--- Edit 2 ---
Now I see I misunderstood your question. Sory. The problem here is that I don't think there is an operator that will let you access any embedded document (I really don't know, I can look into it but not right now).
But maybe I can help you telling you that if you are going to use 'aggregate', '$match' would be to filter the results, so uid wouldn't be on that pipeline stage. MongoDB 2.4 provides support for Javascript functions executed at database, but the purpose of the aggregation framework, as I told you before, is to map reduce the documents, so this isn't the best case scenario. I would be concerned about the performance and the ability of the database engine to accomplish what you want. But I think it should be tested before dismissing the idea.
Sorry again for misunderstanding your question and I hope you can solve your problem. Let me know if you do and how!

Find a set of documents in collection A, based on an array from collection B

The following data needs to be stored in MongoDb:
A collection of persons (approximately 100-2000) together with their relevant attributes.
Another collection of queues (approximately 5-50).
Information about the relationsship between persons and queues. Each person can stand in line in several queues, and each queue can hold several persons. The order of the persons waiting in a queue is important.
Currently this is what i have in mind:
Persons:
{
_id: ObjectId("507c35dd8fada716c89d0001"),
first_name: 'john',
email: 'john.doe#doe.com'
id_number: 8101011234,
...
},
Queues:
{
_id: ObjectId("507c35dd8fada716c89d0011"),
title: 'A title for this queue',
people_waiting: [
ObjectId("507c35dd8fada716c89d0001"),
ObjectId("507c35dd8fada716c89d0002"),
ObjectId("507c35dd8fada716c89d0003"),
...
]
},
In a web page, I want to list (in order) all persons standing in a certain queue. I'm thinking that I first need to query the 'people_waiting' array from the 'Queues' collection. And then loop trough this array and for each item query it from the 'Persons' collection.
But there seems to be a lot of queries to generate this list, and i wonder if there is a smarter way to write/combine queries than the way described above.
You can only query one collection at a time in MongoDB, so it does take two queries. But you can use $in instead of looping through array and querying each person individually.
In the shell:
queue = db.Queues.findOne({_id: idOfQueue});
peopleWaiting = db.Persions.find({_id: {$in: queue.people_waiting}}).toArray();
But peopleWaiting will not be sorted by the order of the ids in the queue and there's no support for doing that in a MongoDB query. So you'd have to reorder peopleWaiting in your code to match the order in queue.people_waiting.

MongoDB: Speed of field ("inside record") search in comporation with speed of search in "global scope"

My question may be not very good formulated because I haven't worked with MongoDB yet, so I'd want to know one thing.
I have an object (record/document/anything else) in my database - in global scope.
And have a really huge array of other objects in this object.
So, what about speed of search in global scope vs search "inside" object? Is it possible to index all "inner" records?
Thanks beforehand.
So, like this
users: {
..
user_maria:
{
age: "18",
best_comments :
{
goodnight:"23rr",
sleeptired:"dsf3"
..
}
}
user_ben:
{
age: "18",
best_comments :
{
one:"23rr",
two:"dsf3"
..
}
}
So, how can I make it fast to find user_maria->best_comments->goodnight (index context of collections "best_comment") ?
First of all, your example schema is very questionable. If you want to embed comments (which is a big if), you'd want to store them in an array for appropriate indexing. Also, post your schema in JSON format so we don't have to parse the whole name/value thing :
db.users {
name:"maria",
age: 18,
best_comments: [
{
title: "goodnight",
comment: "23rr"
},
{
title: "sleeptired",
comment: "dsf3"
}
]
}
With that schema in mind you can put an index on name and best_comments.title for example like so :
db.users.ensureIndex({name:1, 'best_comments.title:1})
Then, when you want the query you mentioned, simply do
db.users.find({name:"maria", 'best_comments.title':"first"})
And the database will hit the index and will return this document very fast.
Now, all that said. Your schema is very questionable. You mention you want to query specific comments but that requires either comments being in a seperate collection or you filtering the comments array app-side. Additionally having huge, ever growing embedded arrays in documents can become a problem. Documents have a 16mb limit and if document increase in size all the time mongo will have to continuously move them on disk.
My advice :
Put comments in a seperate collection
Either do document per comment or make comment bucket documents (say,
100 comments per document)
Read up on Mongo/NoSQL schema design. You always query for root documents so if you end up needing a small part of a large embedded structure you need to reexamine your schema or you'll be pumping huge documents over the connection and require app-side filtering.
I'm not sure I understand your question but it sounds like you have one record with many attributes.
record = {'attr1':1, 'attr2':2, etc.}
You can create an index on any single attribute or any combination of attributes. Also, you can create any number of indices on a single collection (MongoDB collection == MySQL table), whether or not each record in the collection has the attributes being indexed on.
edit: I don't know what you mean by 'global scope' within MongoDB. To insert any data, you must define a database and collection to insert that data into.
Database 'Example':
Collection 'table1':
records: {a:1,b:1,c:1}
{a:1,b:2,d:1}
{a:1,c:1,d:1}
indices:
ensureIndex({a:ascending, d:ascending}) <- this will index on a, then by d; the fact that record 1 doesn't have an attribute 'd' doesn't matter, and this will increase query performance
edit 2:
Well first of all, in your table here, you are assigning multiple values to the attribute "name" and "value". MongoDB will ignore/overwrite the original instantiations of them, so only the final ones will be included in the collection.
I think you need to reconsider your schema here. You're trying to use it as a series of key value pairs, and it is not specifically suited for this (if you really want key value pairs, check out Redis).
Check out: http://www.jonathanhui.com/mongodb-query