MongoDB aggregation find document pairs using list of pairs that cannot go together - mongodb

If I have the following data structure
{
"_id": <id_of_document>,
"name": <name_of_person>
}
I am trying to find a way of matching each person randomly with another person in the collection.
I have the following constraints a list of ids that can't go together, i.e.
[
[1, 2],
[10, 6],
]
the following shows that 1 and 2 cannot go together and 10 and 6 cannot go together but 1 and 10 could go together and 6 and 2 could go together.
How would I show this in a mongoDB aggregation?

Related

How to do multiple where query without effect data in TypeORM?

I want to do multiple where query without effect data. I want to get data that include at least 1 data per array. Pseudo code
data =[1,3]
array1 = [1,2]
array2 = [3,4]
if(data.IsIntersect(array1) and data.IsIntersect(array2))
IsIntersect checks are there a intersection beetween arrays
I did so far
queryBuilder.andWhere(
'properties.id IN (:...sizeIds) AND properties.id IN (:...colorIds)',
{ sizeIds: [1, 2], colorIds: [3, 4] },
);
It returns empty because firstly checks properties for 'sizeIds' then it checks for 'colorIds'. For example
properties includes 1,3
check for sizeIds, returns 1
check for colorIds, return empty
How can I do that with typeORM?
How can properties.id be 1 and 3? And if it is, how could 1 or 3 be in both? You're asking for the impossible.
I assume you mean to ask for when properties.id is 1 or 3, because if it is [1,3] then you should use the postgres array syntax {1,3} & the ANY keyword (some variation on this: Check if value exists in Postgres array).
tldr, I think all you need is brackets and OR instead of AND:
queryBuilder.andWhere(
'(properties.id IN (:...sizeIds) OR properties.id IN (:...colorIds))',
{ sizeIds: [1, 2], colorIds: [3, 4] },
);
If properties.id is in fact an array, then please add the entity definition to your question. If you want to merge the rows where properties.id is in the list you will need a GROUP BY (https://orkhan.gitbook.io/typeorm/docs/select-query-builder).

How can I perform a word count in spring-batch and sort the output?

Given an input of some objects each containing a set of strings, I want to count the number of occurrences of each string for the entire batch, and output the word counts (to a CSV in my case) alongside each string (preferably sorted by frequency). How can I achieve this in spring batch? I can't find any suitable examples. I've tried to implement this using item readers/processors but am getting the output duplicated, I'm assuming because it's chunked? Should I use Tasklets for this?
Spring Batch : Aggregating records and write count seems close to what I want to achieve, but it's not clear to me how this is working.
The input is along the lines of:
[{
"id": 1,
"tags": ["foo", "bar"]
}, {
"id": 2,
"tags": ["foo", "baz"]
}
...]
and the desired output from that would be
foo, 2
bar, 1
baz, 1

How to sum arrays from different documents in MongoDB Aggregation?

I have a collection of documents, each containing an array of revenues of different lengths.
I want to get a single array of revenue from the values that match the query.
Example data
...
{"cohort": "2112", "revenue": [1, 1, 0, 0, 5], ...},
{"cohort": "2113", "revenue": [0, 0, 2, 0], ...},
{"cohort": "2114", "revenue": [0, 1, 3], ...}
...
Expected result for cohorts 2113 and 2114
[0, 1, 5] or [0, 1, 5, 0]
The two results are equal for my purpose, since I know the length of the shortest array.
Is there any way to perform the operation with MongoDB Aggregate pipeline?
Or can you suggest a better solution?
And yes, I use PyMongo to access the database.
I just discovered this new 3.2 feature: includeArrayIndex.
So you can unwind the field revenue with this option and then sum using your includeArrayIndex field as id and then sort and push to get a new array.
Reference: https://docs.mongodb.org/manual/reference/operator/aggregation/unwind/

Mongo: How do I group documents based on one of two fields?

If I have documents that look like:
{
_id: 1,
val1: x,
val2: aa
},
{
_id: 2,
val1: y,
val2: bb
},
{
_id: 3,
val1: x,
val2: cc
},
{
_id: 4,
val1: z,
val2: bb
}
Is it possible to group them in MongoDB so that docs 1 and 3 are paired and docs 2 and 4 are paired?
In essence, I'm looking to group docs if their val1 OR val2 are the same. Also, there will NOT be the possibility of docs being able to be in two different groups. Meaning, I should be able to partition the set of docs. Is the possible in Mongo?
Ultimately, I want to partition my set of documents based on the aforementioned criteria and then count the size of each subset.
I've tried attacking this problem by grouping on the val1 field and using $addToSet to created an array of val2's. But then I'm stuck because I don't know of a way in Mongo to merge arrays that contain at least one common element. If I did, I could use that list of arrays and aggregate again using $in.
Please let me know if I can clarify my question in any way!

Is there a way to return part of an array in a document in MongoDB?

Pretend I have this document:
{
"name": "Bob",
"friends": [
"Alice",
"Joe",
"Phil"
],
"posts": [
12,
15,
55,
61,
525,
515
]
}
All is good with only a handful of posts. However, let's say posts grows substantially (and gets to the point of 10K+ posts). A friend mentioned that I might be able to keep the array in order (i.e. the first entry is the ID of the newest post so I don't have to sort) and append new posts to the beginning. This way, I could get the first, say, 10 elements of the array to get the 10 newest items.
Is there a way to only retrieve posts n at a time? I don't need 10K posts being returned, when most of them won't even be looked at, but I still need to keep around for records.
You can use $slice operator of mongoDB in projection to get n elements from array like following:
db.collection.find({
//add condition here
}, {
"posts": {
$slice: 3 //set number of element here
//negative number slices from end of array
}
})
You can do this :
create a list for posts you want to have (say you want first 3 posts) and return that list
for doc in db.collections.find({your query}):
temp = ()
for i in range (2):
temp.push(doc['posts'][i])
return temp