MongoDB 2.4 new Text Index feature - mongodb

So I have a weirdly configured DB imported into MongoDB, looks like this:
"_id" : ObjectId("51191d45890311d9b2a0865d"),
"field1" : "randomtextstuff",
"field2" : "randomtextstuff",
"field3" : "randomtextstuff",
"field4" : "randomtextstuff",
"field5" : "randomtextstuff"
Some documents have 100 fields others have non.
So I wanted to test the new text search, so I attempted the following index:
db.profile_specialties.ensureIndex({"field1":"text",
"field2":"text",
"field3":"text",
"field4":"text",
"field5":"text",
"field6":"text",
... All the way to 100
"field96":"text",
"field97":"text",
"field98":"text",
"field99":"text",
"field100":"text"})
The returned error message was:
{
"err" : "ns name too long, max size is 128",
"code" : 10080,
"n" : 0,
"connectionId" : 1,
"ok" : 1
}
Has any one else experiences this problem?

With MongoDB 2.4 text search you can use the new wildcard specifier ($**) to index all fields with string content:
db.profile_specialties.ensureIndex("$**":"text"})
You should consider that a text index across all fields is going to be very large, though.

Related

How to collect specific samples from MongoDB collections?

I have a MongoDB collection "Events" with 1 million documents similar to:
{
"_id" : 32423,
"testid" : 43212,
"description" : "fdskfhsdj kfsdjfhskdjf hksdjfhsd kjfs",
"status" : "error",
"datetime" : ISODate("2018-12-04T15:55:00.000Z"),
"failure" : 0,
}
Considering the documents were sorted based on datetime field (ascending), I want to check them in the chronical order one by one and pick only the records where the "failure" field was 0 in the previous document and it is 1 in the current document. I want to skip other records in between.
For example, if I also have the following records:
{
"_id" : 32424,
....
"datetime" : ISODate("2018-12-04T16:55:00.000Z"),
"failure" : 0,
}
,
{
"_id" : 32425,
....
"datetime" : ISODate("2018-12-04T17:55:00.000Z"),
"failure" : 1,
}
,
{
"_id" : 32426,
....
"datetime" : ISODate("2018-12-04T18:55:00.000Z"),
"failure" : 0,
}
I only want to collect the one with "_id:32425", and repeat the same policy for the following cases.
Of course, if I extract all the data at once, then I can process it using Python for instance. But, extracting all the records would be really time-consuming (1 million documents!).
Is there a way to do the above via MongoDB commands?

How do i remove duplicates in mongodb?

I have a database which consists of few collections , i have tried copying from one collection to another .
In this process connection was lost and had to recopy them
now i find around 40000 records duplicates.
Format of my data:
{
"_id" : ObjectId("555abaf625149715842e6788"),
"reviewer_name" : "Sudarshan A",
"emp_name" : "Wilson Erica",
"evaluation_id" : NumberInt(550056),
"teamleader_id" : NumberInt(17199),
"reviewer_id" : NumberInt(1659),
"team_manager" : "Las Vegas",
"teammanager_id" : NumberInt(12245),
"team_leader" : "Thomas Donald",
"emp_id" : NumberInt(7781)
}
here only evaluation id is unique.
Queries that i have tried:
ensureIndex({id:1}, {unique:true, dropDups:true})
dropDups was removed in mongodb ~2.7.
Here is other realization method
but I don't test it

Mongo DB sorting exception - too much data for sort() with no index

Using MongoDB version 2.4.4, I have a profile collection containing profiles documents.
I have the following query:
Query: { "loc" : { "$near" : [ 32.08290052711715 , 34.80888522811172] , "$maxDistance" : 0.0089992800575954}}
Fields: { "friendsCount" : 1 , "tappsCount" : 1 , "imageUrl" : 1 , "likesCount" : 1 , "lastActiveTime" : 1 , "smallImageUrl" : 1 , "loc" : 1 , "pid" : 1 , "firstName" : 1}
Sort: { "lastActiveTime" : -1}
Limited to 100 documents.
loc - embedded document containing the keys ( lat,lon)
I am getting the exception:
org.springframework.data.mongodb.UncategorizedMongoDbException: too much data for sort() with no index. add an index or specify a smaller limit;
As stated in the exception when I down-size the limit to 50 it works.. but it ain't option for me.
I have the following 2 relevant indexes on the profile document:
{'loc':'2d'}
{'lastActiveTime':-1}
I have also tried compound index as below but without success.
{'loc':'2d', 'lastActiveTime':-1}
This is example document (with the relevant keys):
{
"_id" : "5d5085601208aa918bea3c1ede31374d",
"gender" : "female",
"isCreated" : true,
"lastActiveTime" : ISODate("2013-04-08T11:30:56.615Z"),
"loc" : {
"lat" : 32.082230499955806,
"lon" : 34.813542940344945,
"locTime" : NumberLong(0)
}
}
There are other fields in the profile documents .. basically average profile document size is 0.5 MB correct me if I am wrong but if I am specifying only the relevant response fields (as I do) it is not the cause for the problem.
Don't know if it helps but when I down-size the limit size to 50 and the query succeed
I have the following explain information (via MongoVUE client)
cursor : GeoSearchCursor
isMultyKey : False
n : 50
nscannedObjects : 50
nscanned : 50
nscannedObjectsAllPlans : 50
nscannedAllPlans : 50
scanAndOrder : True
indexOnly : False
nYields : 0
nChunkSkips : 0
millis : 10
indexBounds :
It is a blocker for me and I will appreciate your help, what am I doing wrong? How can I make the query roll with the needed limit size?
Try creating a compound index instead of two indexes.
db.collection.ensureIndex( { 'loc':'2d','lastActiveTime':-1 } )
You can also suggest the query which index to use:
db.collection.find(...).hint('myIndexName')

MongoDB : query result size greater than collection size

I'm analyzing a MongoDB data source to check its quality.
I'm wondering if every document contains the attribute time: so I used this two command
> db.droppay.find().count();
291822
> db.droppay.find({time: {$exists : true}}).count()
293525
How can I have more elements with a given field than the elements contained in whole collection ? What's going wrong ? I'm unable to find the mistake.
If it's necessary I can post you the expected structure of the document.
Mongo Shell version is 1.8.3. Mongo Db version is 1.8.3.
Thanks in advance
This is the expected structure of the document entry:
{
"_id" : ObjectId("4e6729cc96babe974c710611"),
"action" : "send",
"event" : "sent",
"job_id" : "50a1b7ac-7482-4ad6-ba7d-853249d6a123",
"result_code" : "0",
"sender" : "",
"service" : "webcontents",
"service_name" : "webcontents",
"tariff" : "0",
"time" : "2011-09-07 10:22:35",
"timestamp" : "1315383755",
"trace_id" : "372",
"ts" : "2011-09-07 09:28:42"
}
My guess is that is an issue with the index. I bet that droppay has an index on :time, and some unsafe operation updated the underlying collection without updating the index.
Can you try repairing the db, and see if that makes it better.
Good luck.
There are probably time values that are of type array.
You may do db.droppay.find({time: {$type : 4}}) to find such documents.

MongoDB - too much data for sort() with no index error

I am using MongoDB 1.6.3, to store a big collection (300k+ records). I added a composite index.
db['collection_name'].getIndexes()
[
{
"name" : "_id_",
"ns" : "db_name.event_logs",
"key" : {
"_id" : 1
}
},
{
"key" : {
"updated_at.t" : -1,
"community_id" : 1
},
"ns" : "db_name.event_logs",
"background" : true,
"name" : "updated_at.t_-1_community_id_1"
}
]
However, when I try to run this code:
db['collection_name']
.find({:community_id => 1})
.sort(['updated_at.t', -1])
.skip(#skip)
.limit(#limit)
I am getting:
Mongo::OperationFailure (too much data
for sort() with no index. add an
index or specify a smaller limit)
What am I doing wrong?
Try adding {community_id: 1, 'updated_at.t': -1} index. It needs to search by community_id first and then sort.
So it "feels" like you're using the index, but the index is actually a composite index. I'm not sure that the sort is "smart enough" to use only the partial index.
So two problems:
Based on your query, I would put community_id as the first part of the index, not the second. updated_at.t sounds like a field on which you'll do range queries. Indexes work better if the range query is the second bit.
How many entries are going to come back from community_id => 1? If the number is not big, you may be able to get away with just sorting without an index.
So you may have to switch the index around and you may have to change the sort to use both community_id and updated_at.t. I know it seems redundant, but start there and check the Google Groups if it's still not working.
Even with an index, I think you can still get that error if your result set exceeds 4MB.
You can see the size by going into the mongodb console and doing this:
show dbs
# pick yours (e.g., production)
use db-production
db.articles.stats()
I ended up with results like this:
{
"ns" : "mdalert-production.encounters",
"count" : 89077,
"size" : 62974416,
"avgObjSize" : 706.9660630690302,
"storageSize" : 85170176,
"numExtents" : 8,
"nindexes" : 6,
"lastExtentSize" : 25819648,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 18808832,
"indexSizes" : {
"_id_" : 3719168,
"patient_num_1" : 3440640,
"msg_timestamp_1" : 2981888,
"practice_id_1" : 2342912,
"patient_id_1" : 3342336,
"msg_timestamp_-1" : 2981888
},
"ok" : 1
}
Having a cursor batch size that is too large will cause this error. Setting the batch size does not limit the amount of data you can process, it just limits how much data is brought back from the database. When you iterate through and hit the batch limit, the process will make another trip to the database.