count multiple colmns and true results only mongodb query - mongodb

I'm new to mongodb and I've been trying to figure this out but I haven't so far. I think I'm close. I have records similar to this.
{
"_id" : ObjectId("5bfdf4385a37e507cff0ff62"),
"Search_word" : "job01",
"hadoop" : 0,
"hive" : 0,
"javascript" : 0,
"mongodb" : 0,
"sql" : 1,
"java" : 0,
"sas" : 0,
"powerbi" : 0,
"python" : 1,
"pig" : 0,
"scala" : 0
}
I'm trying to create groups by search word and count the 1s in all the other values. Ive never used mongodb before.
The end result would look like this
job01, sql : 100, hive: 205, etc...
job02, sql : 121, hive 10, etc...
In python its literally like this
skill_data = df.groupby(by='Search_word').sum()
I tried doing something like this just to get sql count where it is 1
db.data_final.aggregate(
{"$group" :
{_id : {Search_word:"$Search_word", sql: {"$eq":["$sql",1]} }, count : { $sum : 1} } } )
but it gives me 2 counts one for false where sql not equal 1 and a count that I want for sql that is 1. how can I get rid of the false ? This is what I get now:
{ "_id" : { "Search_word" : "job01", "sql" : true }, "count" : 124 } // this is what i want only and do it for all other values in one query
{ "_id" : { "Search_word" : "job01", "sql" : false}, "count" : 279 }
Any help is appreciated

You could try this since the value of each field is either 0 or 1.
db.collection.aggregate({$group: {_id: "$Search_word", sql: {$sum: "$sql"}}});

Related

What are the efficient query for mongodb if value exist on array then don't update and return the error that id already exist

I have an entry stored on my collection like this:
{
"_id" : ObjectId("5d416c595f19962ff0680dbc"),
"data" : {
"a" : 6,
"b" : [
"5c35f04c4e92b8337885d9a6"
]
},
"image" : "123.jpg",
"hyperlinks" : "google.com",
"expirydate" : ISODate("2019-08-27T06:10:35.074Z"),
"createdate" : ISODate("2019-07-31T10:24:25.311Z"),
"lastmodified" : ISODate("2019-07-31T10:24:25.311Z"),
"__v" : 0
},
{
"_id" : ObjectId("5d416c595f19962ff0680dbd"),
"data" : {
"a" : 90,
"b" : [
"5c35f04c4e92b8337885d9a7"
]
},
"image" : "456.jpg",
"hyperlinks" : "google.com",
"expirydate" : ISODate("2019-08-27T06:10:35.074Z"),
"createdate" : ISODate("2019-07-31T10:24:25.311Z"),
"lastmodified" : ISODate("2019-07-31T10:24:25.311Z"),
"__v" : 0
}
I have to write the query for push userid on b array which is under data object and increment the a counter which is also under data object.
For that, I wrote the Code i.e
db.collection.updateOne({_id: ObjectId("5d416c595f19962ff0680dbd")},
{$inc: {'data.a': 1}, $push: {'data.b': '124sdff54f5s4fg5'}}
)
I also want to check that if that id exist on array then return the response that following id exist, so for that I wrote extra query which will check and if id exist then return the error response that following id exist,
My question is that any single query will do this? Like I don't want to write Two Queries for single task.
Any help is really appreciated for that
You can add one more check in the update query on "data.b". Following would be the query:
db.collection.updateOne(
{
_id: ObjectId("5d416c595f19962ff0680dbd"),
"data.b":{
$ne: "124sdff54f5s4fg5"
}
},
{
$inc: {'data.a': 1},
$push: {'data.b': '124sdff54f5s4fg5'}
}
)
For duplicate entry, you would get the following response:
{ "acknowledged" : true, "matchedCount" : 0, "modifiedCount" : 0 }
If matched count is 0, you can show the error that the id already exists.
You can use the operator $addToSet to check if the element already exits in the array.
db.collection.updateOne({_id: ObjectId("5d416c595f19962ff0680dbd")},
{$inc: {'data.a': 1}, $addToSet: {'data.b': '124sdff54f5s4fg5'}}
)

Insert document into mongodb from existing table

I am trying to write a query in mongo that will create a new table, loop through my data set, and insert the TopExecutiveTitle into the new table. I also would like it to keep count of each position and only insert a position into the table when it is new.
This is what I have so far. This code loops through my table and inserts the TopExectuiveTitle into a new table. However, it does not group them together and keep count. How do I write my query so that it will?
db.car.find().forEach( function (x) {
db.TopExecutiveTable.insert({Topexecutivetitle: x.Topexecutivetitle})
});
Here is a sample of a document in my database.
{
"_id" : ObjectId("5a22c8e562c2e489c5df70fa"),
"2016rank" : 1,
"Dealershipgroupname" : "AutoNation Inc.?",
"Address" : "200 S.W. 1st Ave.",
"City/State/Zip" : "Fort Lauderdale, FL 33301",
"Phone" : "(954) 769-7000",
"Companywebsite" : "www.autonation.com",
"Topexecutive" : "Mike Jackson",
"Topexecutivetitle" : "chairman & CEO",
"Totalnewretailunits" : "337,622",
"Totalusedunits" : "225,713",
"Totalfleetunits" : 3,
"Totalwholesaleunits" : "82,342",
"Total_units" : "649,415",
"Total_number_of _dealerships" : 260,
"Grouprevenuealldepartments*" : "$21,609,000,000",
"2015rank" : 1
}
The result I would like is something like this
"Topexecutivetitle" : "chairman & CEO"
"Count" : 3
"Topexecutivetitle" : "president"
"Count" : 7
}
To do this you need to use the aggregate function of mongo, something like this:
db.car.aggregate([
{
$group:{
_id:"$Topexecutivetitle",
count:{$sum:1}
}
},
{
$project:{
Topexecutivetitle:"$_id",
count:1,
_id:0
}
},
{
$out:"result"
}])
This will give you your desired output and store it into a new collection "result":
{
"_id" : "president",
"count" : 1.0
},
{
"_id" : "chairman & CEO",
"count" : 3.0
}

MapReduce in MongoDB doesn't output

I was trying to use MongoDB 2.4.3 (also tried 2.4.4) with mapReduce on a cluster with 2 shards with each 3 replicas. I have a problem with results of the mapReduce job not being reduced into output collection. I tried an Incremental Map Reduce. I also tried "merging" instead of reducing, but that didn't work either.
The map reduce command run on mongos: (coll isn't sharded)
db.coll.mapReduce(map, reduce, {out: {reduce: "events", "sharded": true}})
Which yields the following output:
{
"result" : "events",
"counts" : {
"input" : NumberLong(2),
"emit" : NumberLong(2),
"reduce" : NumberLong(0),
"output" : NumberLong(28304112)
},
"timeMillis" : 418,
"timing" : {
"shardProcessing" : 11,
"postProcessing" : 407
},
"shardCounts" : {
"stats2/192.168.…:27017,192.168.…" : {
"input" : 2,
"emit" : 2,
"reduce" : 0,
"output" : 2
}
},
"postProcessCounts" : {
"stats1/192.168.…:27017,…" : {
"input" : NumberLong(0),
"reduce" : NumberLong(0),
"output" : NumberLong(14151042)
},
"stats2/192.168.…:27017,…" : {
"input" : NumberLong(0),
"reduce" : NumberLong(0),
"output" : NumberLong(14153070)
}
},
"ok" : 1,
}
So I see that the mapReduce is run over 2 records, which results in 2 records outputted. However in the postProcessCounts for both shards the input count stays 0. Also trying to find the record with a search on _id yields no result. In the log file of MongoDB I wasn't able to find error messages related to this.
After trying to reproduce this with a newly created output collection, that I also sharded on hashed _id and I also gave the same indexes, I wasn't able to reproduce this. When outputting the same input to a different collection
db.coll.mapReduce(map, reduce, {out: {reduce: "events_test2", "sharded": true}})
The result is stored in the output collection and I got the following output:
{
"result" : "events_test2",
"counts" : {
"input" : NumberLong(2),
"emit" : NumberLong(2),
"reduce" : NumberLong(0),
"output" : NumberLong(4)
},
"timeMillis" : 321,
"timing" : {
"shardProcessing" : 68,
"postProcessing" : 253
},
"shardCounts" : {
"stats2/192.168.…:27017,…" : {
"input" : 2,
"emit" : 2,
"reduce" : 0,
"output" : 2
}
},
"postProcessCounts" : {
"stats1/192.168.…:27017,…" : {
"input" : NumberLong(2),
"reduce" : NumberLong(0),
"output" : NumberLong(2)
},
"stats2/192.168.…:27017,…" : {
"input" : NumberLong(2),
"reduce" : NumberLong(0),
"output" : NumberLong(2)
}
},
"ok" : 1,
}
When running the script again with the same input ouputting again in the second collection, it shows that it is reducing in postProcessCounts. So the map and reduce functions do their job fine. Why doesn't it work on the larger first collection? Am I doing something wrong here? Are there any special limitations on collections that can be used as output for map-reduce?
mapReduce is run over 2 records, which results in 2 records outputted. However in the postProcessCounts for both shards the input count stays 0.
Map is run over 2 records. If those two records have a different key then the Map will output 2 keys and a value for each. Which is normal.
But something that I noticed in an older version of MongoDB (not sure if this applies in your case) is that if the "values array " for the reduce phase have a length, then reducing will be skipped.
Is the output collection empty in the first case?

Errors while creating a collection in MongoDB

I am new to MongoDB. I am not able to create a collection. It gives a sentence in the mongo shell - Display all 169 possibilities? (y or n). The code is -
db.Lead.insert(
{ LeadID: 1,
MasterAccountID: 100,
LeadName: 'Sarah',
LeadEmailID : 'sarah#hmail.com',
LeadPhoneNumber : '2132155445',
Details : [{ StateID: 1,
TaskID : 1,
Assigned By : 1001,
TimeStamp : '10:00:00',
StatusID : 1 }
]
}
)
Not sure what the issue is. Please help me out with the same.
Regards.
Apart from the fact there is a space in Assigned By everything looks good.
I am able to insert it properly.
> db.Lead.find().pretty()
{
"_id" : ObjectId("517ebe75278e0557fd167eb7"),
"LeadID" : 1,
"MasterAccountID" : 100,
"LeadName" : "Sarah",
"LeadEmailID" : "sarah#hmail.com",
"LeadPhoneNumber" : "2132155445",
"Details" : [
{
"StateID" : 1,
"TaskID" : 1,
"AssignedBy" : 1001,
"TimeStamp" : "10:00:00",
"StatusID" : 1
}
]
}

Querying a Sub Object in MongoDB is not using the Index

I am recording site usage events in a sub object of a (visitor). here is a basic example of the data structure:
{ "_id" : ObjectId("4d4c695794b332a0740009bd"), "evs" : [
{
"ev" : "Visit Home Page",
"d" : 1,
"s" : 1
},
{
"ev" : "Buy Product",
"d" : "110.10",
"upc" : 1234,
"s" : 1
},
{
"ev" : "Sign up to newsletter",
"d" : "1",
"s" : 1
}
]}
I have an index on 'evs.s', but when I search on evs.s, the index is not used:
db.visitors.find({'evs.s':0}).explain()
{
"cursor" : "BtreeCursor evs.s_1",
"nscanned" : 33361,
"nscannedObjects" : 33361,
"n" : 33361,
"millis" : 311,
"nYields" : 105,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"evs.s" : [
[
0,
0
]
]
}
}
That query takes 311 milliseconds and scans through every object.
Here is the index: db.visitors.getIndexes()
{
"ns" : "tracking.visitors",
"unique" : false,
"key" : {
"evs.s" : 1
},
"name" : "evs.s_1",
"v" : 0
}
Your query actually is using an index, as indicated by the cursor type in the explain output ("BtreeCursor evs.s_1"). If you were not using a an index, it would be "BasicCursor".
From your input data, it looks like evs.s might not be a very efficient key to index on. If all of the values of evs.s are either 1 or 0, your index will always hit a large number of matches.
My guess is that your query did not do a full table scan, but that there are actually that many records with a value of evs.s = 0 in your index.
You might compare the output of
db.visits.find({evs.s: 0}).count();
db.visits.find({evs.s: 1}).count();
db.visits.find().count();
to verify this.
There are several things you can do to speed this up:
1) You can use a different index that has more distinct values. This will reduce the search space on each query.
2) You can add a limit statement to your query. This will stop scanning the index once limit documents have been found.
"cursor" : "BtreeCursor evs.s_1"
means that the index is used.