How do i remove duplicates in mongodb? - mongodb

I have a database which consists of few collections , i have tried copying from one collection to another .
In this process connection was lost and had to recopy them
now i find around 40000 records duplicates.
Format of my data:
{
"_id" : ObjectId("555abaf625149715842e6788"),
"reviewer_name" : "Sudarshan A",
"emp_name" : "Wilson Erica",
"evaluation_id" : NumberInt(550056),
"teamleader_id" : NumberInt(17199),
"reviewer_id" : NumberInt(1659),
"team_manager" : "Las Vegas",
"teammanager_id" : NumberInt(12245),
"team_leader" : "Thomas Donald",
"emp_id" : NumberInt(7781)
}
here only evaluation id is unique.
Queries that i have tried:
ensureIndex({id:1}, {unique:true, dropDups:true})

dropDups was removed in mongodb ~2.7.
Here is other realization method
but I don't test it

Related

How to reduce execution time in this mongo db find query?

document sample data followed like this,
{
"_id" : ObjectId("62317ae9d007af22f984c0b5"),
"productCategoryName" : "Product category 1",
"productCategoryDescription" : "Description about product category 1",
"productCategoryIcon" : "abcd.svg",
"status" : true,
"productCategoryUnits" : [
{
"unitId" : ObjectId("61fa5c1273a4aae8d89e13c9"),
"unitName" : "kilogram",
"unitSymbol" : "kg",
"_id" : ObjectId("622715a33c8239255df084e4")
}
],
"productCategorySizes" : [
{
"unitId" : ObjectId("61fa5c1273a4aae8d89e13c9"),
"unitName" : "kilogram",
"unitSize" : 10,
"unitSymbol" : "kg",
"_id" : ObjectId("622715a33c8239255df084e3")
}
],
"attributes" : [
{
"attributeId" : ObjectId("62136ed38a35a8b4e195ccf4"),
"attributeName" : "Country of Origin",
"attributeOptions" : [],
"isRequired" : true,
"_id" : ObjectId("622715ba3c8239255df084f8")
}
]
}
This collection has been indexed in "_id". without sub-documents execution time is reduced but all document fields are required.
db.getCollection('product_categories').find({})
This collection contains 30000 records and this query takes more than 30 seconds to execute. so how to solve this issue. Anybody ask me a better solution. Thanks.
Indexing and compound indexing will make it use cache instead of scanning document every time you query it. 30.000 documents is nothing to MongoDB, it can handle millions in a second. If these fields are populated in the process that's another heavy operation for the query.
See if your schema is efficiently structured or you're throttling your connection to the server. Other thing to consider is to project only the fields that you require, using aggregation pipeline.
Although the question is not very clear you can follow this article for some best practices.

How to collect specific samples from MongoDB collections?

I have a MongoDB collection "Events" with 1 million documents similar to:
{
"_id" : 32423,
"testid" : 43212,
"description" : "fdskfhsdj kfsdjfhskdjf hksdjfhsd kjfs",
"status" : "error",
"datetime" : ISODate("2018-12-04T15:55:00.000Z"),
"failure" : 0,
}
Considering the documents were sorted based on datetime field (ascending), I want to check them in the chronical order one by one and pick only the records where the "failure" field was 0 in the previous document and it is 1 in the current document. I want to skip other records in between.
For example, if I also have the following records:
{
"_id" : 32424,
....
"datetime" : ISODate("2018-12-04T16:55:00.000Z"),
"failure" : 0,
}
,
{
"_id" : 32425,
....
"datetime" : ISODate("2018-12-04T17:55:00.000Z"),
"failure" : 1,
}
,
{
"_id" : 32426,
....
"datetime" : ISODate("2018-12-04T18:55:00.000Z"),
"failure" : 0,
}
I only want to collect the one with "_id:32425", and repeat the same policy for the following cases.
Of course, if I extract all the data at once, then I can process it using Python for instance. But, extracting all the records would be really time-consuming (1 million documents!).
Is there a way to do the above via MongoDB commands?

want to merge two collection in mongo db using map reduce

I have two collection as bellow products has reference of user. i search product by name & in return i want combine output of product and user using map reduce method
user collection
{
"_id" : ObjectId("52ac5dd1fb670c2007000000"),
"company" : {
"about" : "This is textile machinery dealer",
"contactAddress" : [{
"address" : "abcd",
"city" : "52ac4bc6fb670c1007000000",
"zipcode" : "39as46as80"
},{
"address" : "abcd",
"city" : "52ac4bc6fb670c1007000000",
"zipcode" : "39as46as80"
}],
"fax" : "58784868",
"mainProducts" : "ads,asd,asd",
"mobileNumber" : "9537236588",
"name" : "krishna steels",
}
"user" : ObjectId("52ac4eb7fb670c0c07000000")
}
product colletion
{
"_id" : ObjectId("52ac5722fb670cf806000002"),
"category" : "52a2a9cc48a508b80e00001d",
"deliveryTime" : "10 days after received the ",
"price" : {
"minPrice" : "2000",
"maxPrice" : "3000",
"perUnit" : "5288ac6f7c104203e0976851",
"currency" : "INR"
},
"productName" : "New Mobile Solar Charger with Carabiner",
"rejectReason" : "",
"status" : 1,
"user" : ObjectId("52ac4eb7fb670c0c07000000")
}
This cannot be done. Mongo support Map Reduce only on one collection. You could try to fetch and merge in a java collection. Couple of days back I solved a similar problem using java collection.
Click to see similar response about joins and multi collection not supported in mongo.
This can be done using two map reduces.
You run your first MR and then you reduce out the second MR onto the results of the first.
You shouldn't do this though. JOINs are not designed to be done through MR, in fact it sounds like you are trying to do this MR with inline output which in itself is a very bad idea.
MRs are not designed to run inline to the application.
You would be better off doing the JOIN else where.

mongo db get data from two tables using same id

i have two tables history and jobs
my history table contains
> db.history.find()
{ "id" : "21", "browser" : "FF","os" : "Windows" "datetime" : "2013-11-26 17:04:21", "_id" : ObjectId("5294873d6b441e2c16000002") }
db.jobs.find()
{ "_id" : ObjectId("5289c147db9ed2b022f95a36"), "id" : "21", "launch" : "ertret", "names" : "234", "script" : "art-pagination" }
From the above two tables i need to get browser, launch, script and os by using common id: 21
How it is possible.
You can do it by using following two queries. It is not possible to get it with single query.
> db.history.find({'id':21}, {'browser':1, 'os':1})
> db.jobs.find({'id':21}, {'launch':1,'script':1 })

MongoDB : query result size greater than collection size

I'm analyzing a MongoDB data source to check its quality.
I'm wondering if every document contains the attribute time: so I used this two command
> db.droppay.find().count();
291822
> db.droppay.find({time: {$exists : true}}).count()
293525
How can I have more elements with a given field than the elements contained in whole collection ? What's going wrong ? I'm unable to find the mistake.
If it's necessary I can post you the expected structure of the document.
Mongo Shell version is 1.8.3. Mongo Db version is 1.8.3.
Thanks in advance
This is the expected structure of the document entry:
{
"_id" : ObjectId("4e6729cc96babe974c710611"),
"action" : "send",
"event" : "sent",
"job_id" : "50a1b7ac-7482-4ad6-ba7d-853249d6a123",
"result_code" : "0",
"sender" : "",
"service" : "webcontents",
"service_name" : "webcontents",
"tariff" : "0",
"time" : "2011-09-07 10:22:35",
"timestamp" : "1315383755",
"trace_id" : "372",
"ts" : "2011-09-07 09:28:42"
}
My guess is that is an issue with the index. I bet that droppay has an index on :time, and some unsafe operation updated the underlying collection without updating the index.
Can you try repairing the db, and see if that makes it better.
Good luck.
There are probably time values that are of type array.
You may do db.droppay.find({time: {$type : 4}}) to find such documents.