I have a collection named genre_collection of following structure :
user | genres
----------------
1 | comedy
1 | action
1 | thriller
1 | comedy
1 | action
2 | war
2 | adventure
2 | war
2 | thriller
I'm trying to find the count for each genre for each user i.e. my ideal final result would be something like this :
1 | comedy |2
1 | action |1
1 | thriller |1
2 | war |2
2 | adventure |1
2 | thriller |1
Any helps would be really useful.
you can do this with aggregation using $group
try this :
db.genre_collection.aggregate([
{
$group:{
_id:{
genre:"$genres",
user:"$user"
},
count:{
$sum:1
}
}
}
])
output:
{ "_id" : { "genre" : "adventure", "user" : 2 }, "count" : 1 }
{ "_id" : { "genre" : "action", "user" : 1 }, "count" : 2 }
{ "_id" : { "genre" : "thriller", "user" : 2 }, "count" : 1 }
{ "_id" : { "genre" : "war", "user" : 2 }, "count" : 2 }
{ "_id" : { "genre" : "comedy", "user" : 1 }, "count" : 2 }
{ "_id" : { "genre" : "thriller", "user" : 1 }, "count" : 1 }
Try this :
db.genre_collection.aggregate([
{"$group" : {_id:{genres:"$genres"}, count:{$sum:1}}} ])
])
Hope it helps !!!
Related
I need to delete certain entries from an Elasticsearch table. I cannot find any hints in the documentation. I'm also an Elasticsearch noob. The to be deleted rows will be identified by its type and an owner_id. Is it possible to call deleteByQuery with multiple parameters? Or any alternatives to reach the same?
I'm using this library: https://github.com/sksamuel/elastic4s
How the table looks like:
| id | type | owner_id | cost |
|------------------------------|
| 1 | house | 1 | 10 |
| 2 | hut | 1 | 3 |
| 3 | house | 2 | 16 |
| 4 | house | 1 | 11 |
In the code it looks like this currently:
deleteByQuery(someIndex, matchQuery("type", "house"))
and I would need something like this:
deleteByQuery(someIndex, matchQuery("type", "house"), matchQuery("owner_id", 1))
But this won't work since deleteByQuery only accepts a single Query.
In this example it should delete the entries with id 1 and 4.
Explaining it in JSON and rest API format, to make it more clear.
Index Sample documents
put myindex/_doc/1
{
"type" : "house",
"owner_id" :1
}
put myindex/_doc/2
{
"type" : "hut",
"owner_id" :1
}
put myindex/_doc/3
{
"type" : "house",
"owner_id" :2
}
put myindex/_doc/4
{
"type" : "house",
"owner_id" :1
}
Search using the boolean query
GET myindex/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"type": "house"
}
}
],
"filter": [
{
"term": {
"owner_id": 1
}
}
]
}
}
}
And query result
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.35667494,
"_source" : {
"type" : "house",
"owner_id" : 1
}
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.35667494,
"_source" : {
"type" : "house",
"owner_id" : 1
}
}
]
I have some input data :
Brand | Model | Number
Peugeot | 208 | 1
Peugeot | 4008 | 2
Renault | Clio | 3
Renault | Megane | 4
I would like to get both :
the sum for each brand
the global sum
Here is my expected output :
Brand | Number
Peugeot | 3
Renault | 7
Total | 10
I think I have to create two $group operations and set Total with $literal.
What is the right way to do so ?
As you said this can be done by 2 group bys, so let's start by putting some data in to mongo similar to your example input:
> db.cars.insertMany([
{ "Brand" : "Peugeot", "Model" : "208", "Number": 1 },
{ "Brand" : "Peugeot", "Model" : "4008", "Number": 2 },
{ "Brand" : "Renault", "Model" : "Clio", "Number": 3 },
{ "Brand" : "Renault", "Model" : "Megane", "Number": 4 }
]);
Now we've got all our cars inserted we can then aggregate these using the 2 group aggregation operators:
db.cars.aggregate([
{ $group : { "_id" : "$Brand", "Number" : { $sum : "$Number" }}},
{ $group : { "_id" : null, "Rows" : { $push : { "Brand" : "$$ROOT._id", "Number" : "$Number" } }, "Total" : {$sum : "$Number" } }}
])
This will give us the following output
{
"_id" : null,
"Rows" : [
{
"Brand" : "Renault",
"Number" : 7
},
{
"Brand" : "Peugeot",
"Number" : 3
}
],
"Total" : 10
}
We can then clean it up with a projection
db.cars.aggregate([
{ "$group" : { "_id" : "$Brand", "Number" : { $sum : "$Number" }}},
{ "$group" : { "_id" : null, "Rows" : { $push : { "Brand" : "$$ROOT._id", "Number" : "$Number" } }, "Total" : {$sum : "$Number" } } },
{ "$project" : { "_id" : 0, "Data" : { "$concatArrays" : [ "$Rows", [ { "Brand": { $literal : "Total" }, "Number" : "$Total" } ] ] } } }
])
Giving us the following result
{
"Data" : [
{
"Brand" : "Renault",
"Number" : 7
},
{
"Brand" : "Peugeot",
"Number" : 3
},
{
"Brand" : "Total",
"Number" : 10
}
]
}
Say in mongo I have a collection that looks like this:
+----+-----+-----+----------+
| id | x | y | quantity |
+----+-----+-----+----------+
| 1 | abc | jkl | 5 |
+----+-----+-----+----------+
| 2 | jkl | xyz | 10 |
+----+-----+-----+----------+
| 3 | xyz | abc | 20 |
+----+-----+-----+----------+
I want to do a $group where x equals y and sum up the quantity. So the output would look like:
+-----+-------+
| x | total |
+-----+-------+
| abc | 25 |
+-----+-------+
| jkl | 15 |
+-----+-------+
| xyz | 30 |
+-----+-------+
Is this even possible to do in mongo?
You won't be performing a $group to retrieve the results. You're performing a $lookup. This feature is new in MongoDB 3.2.
Using the sample data you provided, the aggregation would be the following:
db.join.aggregate( [
{
"$lookup" : {
"from" : "join",
"localField" : "x",
"foreignField" : "y",
"as" : "matching_field"
}
},
{
"$unwind" : "$matching_field"
},
{
"$project" : {
"_id" : 0,
"x" : 1,
"total" : { "$sum" : [ "$quantity", "$matching_field.quantity"]}
}
}
])
The sample data set is pretty simple, so you'll need to test behavior when there are more than a simple result returned for a value, etc.
Edit:
It gets more complicated if there can be more than a single match between X and Y.
// Add document to return more than a single match for abc
db.join.insert( { "x" : "123", "y" : "abc", "quantity" : 100 })
// Had to add $group stage to consolidate matched results
db.join.aggregate( [
{
"$lookup" : {
"from" : "join",
"localField" : "x",
"foreignField" : "y",
"as" : "matching_field"
}
},
{
"$unwind" : "$matching_field"
},
{ "$group" : {
"_id" : { "x" : "$x", "quantity" : "$quantity" },
"matched_quantities" : { "$sum" : "$matching_field.quantity" }
}},
{
"$project" : {
"x" : "$_id.x",
"total" : { "$sum" : [ "$_id.quantity", "$matched_quantities" ]}
}
}
])
I have a collection in mongoDB that everyday a document with sampling data is added to it. I want to observe fields changes.
I want to use mongoDB aggregation to group similar items next to each other to the first:
+--+-------------------------+
|id|field | date |
+--+-------------------------+
| 1|hello | date1|
+--+-------------------------+
| 2|foobar | date2| \_ Condense these into one row with date2
+--+-------------------------+ /
| 3|foobar | date3|
+--+-------------------------+
| 4|hello | date4|
+--+-------------------------+
| 5|world | date5| \__ Condense these into a row with date5
+--+-------------------------+ /
| 6|world | date6|
+--+-------------------------+
| 7|puppies | date7|
+--+-------------------------+
| 8|kittens | date8| \__ Condense these into a row with date8
+--+-------------------------+ /
| 9|kittens | date9|
+--+-------------------------+
Is it possible to create a mongoDB aggregation for this problem?
Here is answer to similar problem in MySQL:
Grouping similar rows next to each other in MySQL
Sample Data
Data are already sorted by date.
These documents:
{ "_id" : "566ee064d56d02e854df756e", "date" : "2015-12-14T15:29:40.432Z", "score" : 59 },
{ "_id" : "566a8c70520d55771f2e9871", "date" : "2015-12-11T08:42:23.880Z", "score" : 60 },
{ "_id" : "566932f5572bd1720db7a4ef", "date" : "2015-12-10T08:08:21.514Z", "score" : 60 },
{ "_id" : "5667e652c021206f34e2c9e4", "date" : "2015-12-09T08:29:06.696Z", "score" : 60 },
{ "_id" : "5666a468cc45e9d9a82b81c9", "date" : "2015-12-08T09:35:35.837Z", "score" : 61 },
{ "_id" : "56653fe099799049b66dab97", "date" : "2015-12-07T08:14:24.494Z", "score" : 60 },
{ "_id" : "5663f6b3b7d0b00b74d9fdf9", "date" : "2015-12-06T08:49:55.299Z", "score" : 60 },
{ "_id" : "56629fb56099dfe31b0c72be", "date" : "2015-12-05T08:26:29.510Z", "score" : 60 }
should group to:
{ "_id" : "566ee064d56d02e854df756e", "date" : "2015-12-14T15:29:40.432Z", "score" : 59 }
{ "_id" : "566a8c70520d55771f2e9871", "date" : "2015-12-11T08:42:23.880Z", "score" : 60 }
{ "_id" : "5666a468cc45e9d9a82b81c9", "date" : "2015-12-08T09:35:35.837Z", "score" : 61 }
{ "_id" : "56653fe099799049b66dab97", "date" : "2015-12-07T08:14:24.494Z", "score" : 60 }
If you don't insist on using the aggregation framework, this could be done by iterating over the cursor and comparing each document to the previous one:
var cursor = db.test.find().sort({date:-1}).toArray();
var result = [];
result.push(cursor[0]); //first document must be saved
for(var i = 1; i < cursor.length; i++) {
if (cursor[i].score != cursor[i-1].score) {
result.push(cursor[i]);
}
}
result:
[
{
"_id" : "566ee064d56d02e854df756e",
"date" : "2015-12-14T15:29:40.432Z",
"score" : 59
},
{
"_id" : "566a8c70520d55771f2e9871",
"date" : "2015-12-11T08:42:23.880Z",
"score" : 60
},
{
"_id" : "5666a468cc45e9d9a82b81c9",
"date" : "2015-12-08T09:35:35.837Z",
"score" : 61
},
{
"_id" : "56653fe099799049b66dab97",
"date" : "2015-12-07T08:14:24.494Z",
"score" : 60
}
]
I have two PostgreSQL tables with the following data:
houses:
-# select * from houses;
id | address
----+----------------
1 | 123 Main Ave.
2 | 456 Elm St.
3 | 789 County Rd.
(3 rows)
and people:
-# select * from people;
id | name | house_id
----+-------+----------
1 | Fred | 1
2 | Jane | 1
3 | Bob | 1
4 | Mary | 2
5 | John | 2
6 | Susan | 2
7 | Bill | 3
8 | Nancy | 3
9 | Adam | 3
(9 rows)
In Spoon I have two table inputs the first named House Input with the SQL:
SELECT
id
, address
FROM houses
ORDER BY id;
The second table input is named People Input with the SQL:
SELECT
"name"
, house_id
FROM people
ORDER BY house_id;
I have both table input's going into a Merge Join that uses House Input as the first step with a key of id and People Input as the second step with a key of house_id.
I then have this going into a MongoDb Output with the database demo, collection houses, and Mongo document fields address and name. (As I am expecting MongoDB to assign the _id).
When I run the transformation and type db.houses.find(); from a Mongo shell, I get:
{ "_id" : ObjectId("52083706b251cc4be9813153"), "address" : "123 Main Ave.", "name" : "Fred" }
{ "_id" : ObjectId("52083706b251cc4be9813154"), "address" : "123 Main Ave.", "name" : "Jane" }
{ "_id" : ObjectId("52083706b251cc4be9813155"), "address" : "123 Main Ave.", "name" : "Bob" }
{ "_id" : ObjectId("52083706b251cc4be9813156"), "address" : "456 Elm St.", "name" : "Mary" }
{ "_id" : ObjectId("52083706b251cc4be9813157"), "address" : "456 Elm St.", "name" : "John" }
{ "_id" : ObjectId("52083706b251cc4be9813158"), "address" : "456 Elm St.", "name" : "Susan" }
{ "_id" : ObjectId("52083706b251cc4be9813159"), "address" : "789 County Rd.", "name" : "Bill" }
{ "_id" : ObjectId("52083706b251cc4be981315a"), "address" : "789 County Rd.", "name" : "Nancy" }
{ "_id" : ObjectId("52083706b251cc4be981315b"), "address" : "789 County Rd.", "name" : "Adam" }
What I want to get is something like:
{ "_id" : ObjectId("52083706b251cc4be9813153"), "address" : "123 Main Ave.", "people" : [
{ "_id" : ObjectId("52083706b251cc4be9813154"), "name" : "Fred"} ,
{ "_id" : ObjectId("52083706b251cc4be9813155"), "name" : "Jane" } ,
{ "_id" : ObjectId("52083706b251cc4be9813155"), "name" : "Bob" }
]
},
{ "_id" : ObjectId("52083706b251cc4be9813156"), "address" : "345 Elm St.", "people" : [
{ "_id" : ObjectId("52083706b251cc4be9813157"), "name" : "Mary"} ,
{ "_id" : ObjectId("52083706b251cc4be9813158"), "name" : "John" } ,
{ "_id" : ObjectId("52083706b251cc4be9813159"), "name" : "Susan" }
]
},
{ "_id" : ObjectId("52083706b251cc4be981315a"), "address" : "789 County Rd.", "people" : [
{ "_id" : ObjectId("52083706b251cc4be981315b"), "name" : "Mary"} ,
{ "_id" : ObjectId("52083706b251cc4be981315c"), "name" : "John" } ,
{ "_id" : ObjectId("52083706b251cc4be981315d"), "name" : "Susan" }
]
}
}
I know why I am getting what I am getting, but can't seem to find anything online or in the examples to get me where I want to be.
I was hoping someone could nudge me in the right direction, point to an example that is closer to what I am trying to accomplish, or tell me that this is out of scope for what Kettle is supposed to do (Hopefully not the latter).
Turns out creating subtables is all in the MongoDB Output step.
First make sure that you have the Upsert and Modifier update checked on the Configure connection tab.
Then on the Mongo Documents field tab enter the following (The first line is column names):
Name | Mongo document Path | Use field name | Match field for upsert | Modifier operation | Modifier policy
--------+---------------------+----------------+------------------------|--------------------+----------------
address | | Y | N | N/A | Insert
address | | Y | Y | N/A | Insert
name | people[0] | Y | N | $set | Insert
name | people[1] | Y | N | $push | Update
Now when I run db.houses.find(); I get:
{ "_id" : ObjectId("520ccb8978d96b204daa029d"), "address" : "123 Main Ave.", "people" : [ { "name" : "Fred" }, { "name" : "Jane" }, { "name" : "Bob" } ] }
{ "_id" : ObjectId("520ccb8978d96b204daa029e"), "address" : "456 Elm St.", "people" : [ { "name" : "Mary" }, { "name" : "John" }, { "name" : "Susan" } ] }
{ "_id" : ObjectId("520ccb8a78d96b204daa029f"), "address" : "789 County Rd.", "people" : [ { "name" : "Bill" }, { "name" : "Nancy" }, { "name" : "Adam" } ] }
Two things I would like to note:
This assumes that my address are unique and that my name's are unique within a house. If this is not the case I would need to make my id's from my OLTP tables to id (not _id) fields in MongoDB and Match for field upsert on my house id.
As #G Gordon Worley III pointed out above, if these two tables are in the same database, I could do the join in the Table Output step, and this would be a two step transformation (and faster).