Mongodb error: data.includes is not a function - mongodb

I am running a js script for identifying and writing field values with certain date formats (YYYY-mm-dd in this case) in a txt file. All was going well with single fields and even with one subfield, such as 'Status.dateTime', to which I wrote: var data = ({'Status.dateTime':{$exists:true}) and worked just fine.
The problem started in the following case, in which I have a field, an array '0' and a subfield of this array. How can I declare the variables and proceed in this case?
db.ato.find({'Area.0.Date':{$exists:true})
.projection({})
.sort({_id:-1})
//.limit(1000)
.forEach(function(doc) {
const fs = require('fs')
var data = ({'Area.0.Date':{$exists:true})
if(data != null && !data.includes("/") && data != "" && data.indexOf("-") == 4) {
//print(doc._id + " " + data)
fs.appendFile('C:/Users/victo/Desktop/output_query_YYYY-mm-dd.txt', doc._id + " " + data + "\n", (err) => {
if (err) throw err;
})
}
})

Inside the forEach MongoDb uses JavaScript. So the doc inside the function is your mongodb document.
In order to get the date from the first element of the Area array inside your doc you would grab it using the JavaScript.
var data = doc.Area[0].Date
To make things more intuitive you can use
print(doc)
inside the forEach and see the JavaScript Object

Related

Firestore - getting documents fields if included in an array [duplicate]

In Firebase Cloud Firestore, I have "user_goals" in collections and goals may be a predefined goal (master_id: "XXXX") or a custom goal (no "master_id" key)
In JavaScript, I need to write two functions, one to get all predefined goals and other to get all custom goals.
I have got some workaround to get custom goals by setting "master_id" as "" empty string and able to get as below:
db.collection('user_goals')
.where('challenge_id', '==', '') // workaround works
.get()
Still this is not the correct way, I continued to use this for predefined goals where it has a "master_id" as below
db.collection('user_goals')
.where('challenge_id', '<', '') // this workaround
.where('challenge_id', '>', '') // is not working
.get()
Since Firestore has no "!=" operator, I need to use "<" and ">" operator but still no success.
Question: Ignoring these workarounds, what is the preferred way to get docs by checking whether a specific field exists or does not exists?
As #Emile Moureau solution. I prefer
.orderBy(`field`)
To query documents with the field exists. Since it will work with any type of data with any value even for null.
But as #Doug Stevenson said:
You can't query for something that doesn't exist in Firestore. A field needs to exist in order for a Firestore index to be aware of it.
You can't query for documents without the field. At least for now.
The preferred way to get docs where a specified field exists is to use the:
.orderBy(fieldPath)
As specified in the Firebase documentation:
Thus the answer provided by #hisoft is valid. I just decided to provide the official source, as the question was for the preferred way.
Firestore is an indexed database. For each field in a document, that document is inserted into that field's index as appropriate based on your configuration. If a document doesn't contain a particular field (like challenge_id) it will not appear in that field's index and will be omitted from queries on that field. Normally, because of the way Firestore is designed, queries should read an index in one continuous sweep. Prior to the introduction of the != and not-in operators, this meant you couldn't exclude particular values as this would require jumping over sections of an index. This limitation is still encountered when trying to use exclusive ranges (v<2 || v>4) in a single query.
Field values are sorted according to the Realtime Database sort order except that the results can be sorted by multiple fields when duplicates are encountered instead of just the document's ID.
Firestore Value Sort Order
Priority
Sorted Values
Priority
Sorted Values
1
null
6
strings
2
false
7
DocumentReference
3
true
8
GeoPoint
4
numbers
9
arrays
5
Timestamp
10
maps
Inequality !=/<>
This section documents how inequalities worked prior to the release of the != and not-in operators in Sep 2020. See the documentation on how to use these operators. The following section will be left for historical purposes.
To perform an inequality query on Firestore, you must rework your query so that it can be read by reading from Firestore's indexes. For an inequality, this is done by using two queries - one for values less than the equality and another for values greater than the equality.
As a trivial example, let's say I wanted the numbers that aren't equal to 3.
const someNumbersThatAreNotThree = someNumbers.filter(n => n !== 3)
can be written as
const someNumbersThatAreNotThree = [
...someNumbers.filter(n => n < 3),
...someNumbers.filter(n => n > 3)
];
Applying this to Firestore, you can convert this (formerly) incorrect query:
const docsWithChallengeID = await colRef
.where('challenge_id', '!=', '')
.get()
.then(querySnapshot => querySnapshot.docs);
into these two queries and merge their results:
const docsWithChallengeID = await Promise.all([
colRef
.orderBy('challenge_id')
.endBefore('')
.get()
.then(querySnapshot => querySnapshot.docs),
colRef
.orderBy('challenge_id')
.startAfter('')
.get()
.then(querySnapshot => querySnapshot.docs),
]).then(results => results.flat());
Important Note: The requesting user must be able to read all the documents that would match the queries to not get a permissions error.
Missing/Undefined Fields
Simply put, in Firestore, if a field doesn't appear in a document, that document won't appear in that field's index. This is in contrast to the Realtime Database where omitted fields had a value of null.
Because of the nature of NoSQL databases where the schema you are working with might change leaving your older documents with missing fields, you might need a solution to "patch your database". To do this, you would iterate over your collection and add the new field to the documents where it is missing.
To avoid permissions errors, it is best to make these adjustments using the Admin SDK with a service account, but you can do this using a regular SDK using a user with the appropriate read/write access to your database.
This function is recursive, and is intended to be executed once.
async function addDefaultValueForField(queryRef, fieldName, defaultFieldValue, pageSize = 100) {
let checkedCount = 0, pageCount = 1;
const initFieldPromises = [], newData = { [fieldName]: defaultFieldValue };
// get first page of results
console.log(`Fetching page ${pageCount}...`);
let querySnapshot = await queryRef
.limit(pageSize)
.get();
// while page has data, parse documents
while (!querySnapshot.empty) {
// for fetching the next page
let lastSnapshot = undefined;
// for each document in this page, add the field as needed
querySnapshot.forEach(doc => {
if (doc.get(fieldName) === undefined) {
const addFieldPromise = doc.ref.update(newData)
.then(
() => ({ success: true, ref: doc.ref }),
(error) => ({ success: false, ref: doc.ref, error }) // trap errors for later analysis
);
initFieldPromises.push(addFieldPromise);
}
lastSnapshot = doc;
});
checkedCount += querySnapshot.size;
pageCount++;
// fetch next page of results
console.log(`Fetching page ${pageCount}... (${checkedCount} documents checked so far, ${initFieldPromises.length} need initialization)`);
querySnapshot = await queryRef
.limit(pageSize)
.startAfter(lastSnapshot)
.get();
}
console.log(`Finished searching documents. Waiting for writes to complete...`);
// wait for all writes to resolve
const initFieldResults = await Promise.all(initFieldPromises);
console.log(`Finished`);
// count & sort results
let initializedCount = 0, errored = [];
initFieldResults.forEach((res) => {
if (res.success) {
initializedCount++;
} else {
errored.push(res);
}
});
const results = {
attemptedCount: initFieldResults.length,
checkedCount,
errored,
erroredCount: errored.length,
initializedCount
};
console.log([
`From ${results.checkedCount} documents, ${results.attemptedCount} needed the "${fieldName}" field added.`,
results.attemptedCount == 0
? ""
: ` ${results.initializedCount} were successfully updated and ${results.erroredCount} failed.`
].join(""));
const errorCountByCode = errored.reduce((counters, result) => {
const code = result.error.code || "unknown";
counters[code] = (counters[code] || 0) + 1;
return counters;
}, {});
console.log("Errors by reported code:", errorCountByCode);
return results;
}
You would then apply changes using:
const goalsQuery = firebase.firestore()
.collection("user_goals");
addDefaultValueForField(goalsQuery, "challenge_id", "")
.catch((err) => console.error("failed to patch collection with new default value", err));
The above function could also be tweaked to allow the default value to be calculated based on the document's other fields:
let getUpdateData;
if (typeof defaultFieldValue === "function") {
getUpdateData = (doc) => ({ [fieldName]: defaultFieldValue(doc) });
} else {
const updateData = { [fieldName]: defaultFieldValue };
getUpdateData = () => updateData;
}
/* ... later ... */
const addFieldPromise = doc.ref.update(getUpdateData(doc))
The solution I use is:
Use: .where('field', '>', ''),
Where "field" is the field we are looking for!
As you correctly state, it is not possible to filter based on !=. If possible, I would add an extra field to define the goal type. It is possible to use != in security rules, along with various string comparison methods, so you can enforce the correct goal type, based on your challenge_id format.
Specify the goal type
Create a type field and filter based on this field.
type: master or type: custom and search .where('type', '==', 'master') or search for custom.
Flag custom goals
Create a customGoal field which can be true or false.
customGoal: true and search .where('customGoal', '==', true) or false (as required).
Update
It is now possible to perform a != query in Cloud Firestore
Firestore does pick up on boolean, which is a thing! and can be orderBy'd.
So often, like now, for this, I add this into the array-pushing from onSnapshot or get, use .get().then( for dev...
if (this.props.auth !== undefined) {
if (community && community.place_name) {
const sc =
community.place_name && community.place_name.split(",")[1];
const splitComma = sc ? sc : false
if (community.splitComma !== splitComma) {
firebase
.firestore()
.collection("communities")
.doc(community.id)
.update({ splitComma });
}
const sc2 =
community.place_name && community.place_name.split(",")[2];
const splitComma2 =sc2 ? sc2 : false
console.log(splitComma2);
if (community.splitComma2 !== splitComma2) {
firebase
.firestore()
.collection("communities")
.doc(community.id)
.update({
splitComma2
});
}
}
This way, I can query with orderBy instead of where
browseCommunities = (paginate, cities) => {
const collection = firebase.firestore().collection("communities");
const query =
cities === 1 //countries
? collection.where("splitComma2", "==", false) //without a second comma
: cities //cities
? collection
.where("splitComma2", ">", "")
.orderBy("splitComma2", "desc") //has at least two
: collection.orderBy("members", "desc");
var shot = null;
if (!paginate) {
shot = query.limit(10);
} else if (paginate === "undo") {
shot = query.startAfter(this.state.undoCommunity).limit(10);
} else if (paginate === "last") {
shot = query.endBefore(this.state.lastCommunity).limitToLast(10);
}
shot &&
shot.onSnapshot(
(querySnapshot) => {
let p = 0;
let browsedCommunities = [];
if (querySnapshot.empty) {
this.setState({
[nuller]: null
});
}
querySnapshot.docs.forEach((doc) => {
p++;
if (doc.exists) {
var community = doc.data();
community.id = doc.id;
It is not an ideal solution, but here is my workaround when a field does not exist:
let user_goals = await db.collection('user_goals').get()
user_goals.forEach(goal => {
let data = goal.data()
if(!Object.keys(data).includes(challenge_id)){
//Perform your task here
}
})
Note that it would impact your read counts a lot so only use this if you have small collection or can afford the reads.

Mongo Shell print is not displaying?

So I am trying to compare a simple comma delimited list to the documents in my collection. This is my code:
var file = cat("Price Level V.csv");
var skus = file.split("\n");
for(var i = 0; i < skus.length; i++) {
var vasku = skus[i].split(',');
db.getCollection('skus').findOne({sku:vasku[0]}, function(err, mydoc) {
if(err)
print(err);
if(mydoc == null) {
print('NF');
} else if(mydoc.VA == vasku[1]) {
print('Correct');
} else {
print('Incorrect');
}
});
}
For some reason, I am not seeing anything pop up in the shell for all my print statements. It should at least print 'Incorrect', right?
If the loop is entered and the skus collection is not empty, then this can happen if you misspell the collection name of the model that You try to query (I see that from time to time when someone writes the collection name in camelCase).
It's a long shot but maybe the model name in the db is actually skuss (second 's' added for plural form)?

mongodb move documents from one collection to another collection

How can documents be moved from one collection to another collection in MongoDB?? For example: I have lot of documents in collection A and I want to move all 1 month older documents to collection B (these 1 month older documents should not be in collection A).
Using aggregation we can do copy. But what I am trying to do is moving of documents.
What method can be used to move documents?
The bulk operations #markus-w-mahlberg showed (and #mark-mullin refined) are efficient but unsafe as written. If the bulkInsert fails, the bulkRemove will still continue. To make sure you don't lose any records when moving, use this instead:
function insertBatch(collection, documents) {
var bulkInsert = collection.initializeUnorderedBulkOp();
var insertedIds = [];
var id;
documents.forEach(function(doc) {
id = doc._id;
// Insert without raising an error for duplicates
bulkInsert.find({_id: id}).upsert().replaceOne(doc);
insertedIds.push(id);
});
bulkInsert.execute();
return insertedIds;
}
function deleteBatch(collection, documents) {
var bulkRemove = collection.initializeUnorderedBulkOp();
documents.forEach(function(doc) {
bulkRemove.find({_id: doc._id}).removeOne();
});
bulkRemove.execute();
}
function moveDocuments(sourceCollection, targetCollection, filter, batchSize) {
print("Moving " + sourceCollection.find(filter).count() + " documents from " + sourceCollection + " to " + targetCollection);
var count;
while ((count = sourceCollection.find(filter).count()) > 0) {
print(count + " documents remaining");
sourceDocs = sourceCollection.find(filter).limit(batchSize);
idsOfCopiedDocs = insertBatch(targetCollection, sourceDocs);
targetDocs = targetCollection.find({_id: {$in: idsOfCopiedDocs}});
deleteBatch(sourceCollection, targetDocs);
}
print("Done!")
}
Update 2
Please do NOT upvote this answer any more. As written #jasongarber's answer is better in any aspect.
Update
This answer by #jasongarber is a safer approach and should be used instead of mine.
Provided I got you right and you want to move all documents older than 1 month, and you use mongoDB 2.6, there is no reason not to use bulk operations, which are the most efficient way of doing multiple operations I am aware of:
> var bulkInsert = db.target.initializeUnorderedBulkOp()
> var bulkRemove = db.source.initializeUnorderedBulkOp()
> var date = new Date()
> date.setMonth(date.getMonth() -1)
> db.source.find({"yourDateField":{$lt: date}}).forEach(
function(doc){
bulkInsert.insert(doc);
bulkRemove.find({_id:doc._id}).removeOne();
}
)
> bulkInsert.execute()
> bulkRemove.execute()
This should be pretty fast and it has the advantage that in case something goes wrong during the bulk insert, the original data still exists.
Edit
In order to prevent too much memory to be utilized, you can execute the bulk operation on every x docs processed:
> var bulkInsert = db.target.initializeUnorderedBulkOp()
> var bulkRemove = db.source.initializeUnorderedBulkOp()
> var x = 10000
> var counter = 0
> var date = new Date()
> date.setMonth(date.getMonth() -1)
> db.source.find({"yourDateField":{$lt: date}}).forEach(
function(doc){
bulkInsert.insert(doc);
bulkRemove.find({_id:doc._id}).removeOne();
counter ++
if( counter % x == 0){
bulkInsert.execute()
bulkRemove.execute()
bulkInsert = db.target.initializeUnorderedBulkOp()
bulkRemove = db.source.initializeUnorderedBulkOp()
}
}
)
> bulkInsert.execute()
> bulkRemove.execute()
Insert and remove:
var documentsToMove = db.collectionA.find({});
documentsToMove.forEach(function(doc) {
db.collectionB.insert(doc);
db.collectionA.remove(doc);
});
note: this method might be quite slow for large collections or collections holding large documents.
$out is use to create the new collection with data , so use $out
db.oldCollection.aggregate([{$out : "newCollection"}])
then use drop
db.oldCollection.drop()
you can use range query to get data from sourceCollection and keep the cursor data in variable and loop on it and insert to target collection:
var doc = db.sourceCollection.find({
"Timestamp":{
$gte:ISODate("2014-09-01T00:00:00Z"),
$lt:ISODate("2014-10-01T00:00:00Z")
}
});
doc.forEach(function(doc){
db.targetCollection.insert(doc);
})
Hope so it helps!!
First option (Using mongo dump)
1.Get a dump from collection
mongodump -d db -c source_collection
2.Restore from collection
mongorestore -d db -c target_collection dir=dump/db_name/source_collection.bson
Second Option
Running aggregate
db.getCollection('source_collection').aggregate([ { $match: {"emailAddress" : "apitester#mailinator.com"} }, { $out: "target_collection" } ])
Third Option (Slowest)
Running a through for loop
db.getCollection('source_collection').find().forEach(function(docs){ db.getCollection('target_collection').insert(docs); }) print("Rolleback Completed!");
May be from the performance point of view it's better to remove a lot of documents using one command(especially if you have indexes for query part) rather than deleting them one-by-one.
For example:
db.source.find({$gte: start, $lt: end}).forEach(function(doc){
db.target.insert(doc);
});
db.source.remove({$gte: start, $lt: end});
This is a restatement of #Markus W Mahlberg
Returning the favor - as a function
function moveDocuments(sourceCollection,targetCollection,filter) {
var bulkInsert = targetCollection.initializeUnorderedBulkOp();
var bulkRemove = sourceCollection.initializeUnorderedBulkOp();
sourceCollection.find(filter)
.forEach(function(doc) {
bulkInsert.insert(doc);
bulkRemove.find({_id:doc._id}).removeOne();
}
)
bulkInsert.execute();
bulkRemove.execute();
}
An example use
var x = {dsid:{$exists: true}};
moveDocuments(db.pictures,db.artifacts,x)
to move all documents that have top level element dsid from the pictures to the artifacts collection
Here's an update to #jasongarber's answer which uses the more recent mongo 'bulkWrite' operation (Read docs here), and also keeps the whole process asynchronous so you can run it as part of a wider script which depends on its' completion.
async function moveDocuments (sourceCollection, targetCollection, filter) {
const sourceDocs = await sourceCollection.find(filter)
console.log(`Moving ${await sourceDocs.count()} documents from ${sourceCollection.collectionName} to ${targetCollection.collectionName}`)
const idsOfCopiedDocs = await insertDocuments(targetCollection, sourceDocs)
const targetDocs = await targetCollection.find({_id: {$in: idsOfCopiedDocs}})
await deleteDocuments(sourceCollection, targetDocs)
console.log('Done!')
}
async function insertDocuments (collection, documents) {
const insertedIds = []
const bulkWrites = []
await documents.forEach(doc => {
const {_id} = doc
insertedIds.push(_id)
bulkWrites.push({
replaceOne: {
filter: {_id},
replacement: doc,
upsert: true,
},
})
})
if (bulkWrites.length) await collection.bulkWrite(bulkWrites, {ordered: false})
return insertedIds
}
async function deleteDocuments (collection, documents) {
const bulkWrites = []
await documents.forEach(({_id}) => {
bulkWrites.push({
deleteOne: {
filter: {_id},
},
})
})
if (bulkWrites.length) await collection.bulkWrite(bulkWrites, {ordered: false})
}
From MongoDB 3.0 up, you can use the copyTo command with the following syntax:
db.source_collection.copyTo("target_collection")
Then you can use the drop command to remove the old collection:
db.source_collection.drop()
I do like the response from #markus-w-mahlberg, however at times, I have seen the need to keep it a bit simpler for people. As such I have a couple of functions that are below. You could naturally wrap thing here with bulk operators as he did, but this code works with new and old Mongo systems equally.
function parseNS(ns){
//Expects we are forcing people to not violate the rules and not doing "foodb.foocollection.month.day.year" if they do they need to use an array.
if (ns instanceof Array){
database = ns[0];
collection = ns[1];
}
else{
tNS = ns.split(".");
if (tNS.length > 2){
print('ERROR: NS had more than 1 period in it, please pass as an [ "dbname","coll.name.with.dots"] !');
return false;
}
database = tNS[0];
collection = tNS[1];
}
return {database: database,collection: collection};
}
function insertFromCollection( sourceNS, destNS, query, batchSize, pauseMS){
//Parse and check namespaces
srcNS = parseNS(sourceNS);
destNS = parseNS(destNS);
if ( srcNS == false || destNS == false){return false;}
batchBucket = new Array();
totalToProcess = db.getDB(srcNS.database).getCollection(srcNS.collection).find(query,{_id:1}).count();
currentCount = 0;
print("Processed "+currentCount+"/"+totalToProcess+"...");
db.getDB(srcNS.database).getCollection(srcNS.collection).find(query).addOption(DBQuery.Option.noTimeout).forEach(function(doc){
batchBucket.push(doc);
if ( batchBucket.length > batchSize){
db.getDB(destNS.database).getCollection(destNS.collection)insert(batchBucket);
currentCount += batchBucket.length;
batchBucket = [];
sleep (pauseMS);
print("Processed "+currentCount+"/"+totalToProcess+"...");
}
}
print("Completed");
}
/** Example Usage:
insertFromCollection("foo.bar","foo2.bar",{"type":"archive"},1000,20);
You could obviously add a db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).remove(query,true)
If you wanted to also remove the records after they are copied to the new location. The code can easily be built like that to make it restartable.
I had 2297 collection for 15 million of documents but some collection was empty.
Using only copyTo the script failed, but with this script optimization:
db.getCollectionNames().forEach(function(collname) {
var c = db.getCollection(collname).count();
if(c!==0){
db.getCollection(collname).copyTo('master-collection');
print('Copied collection ' + collname);
}
});
all works fine for me.
NB: copyTo is deprecated because it block the read/write operation: so I think is fine if you know that the database is not usable during this operation.
In my case for each didn't work. So I had to make some changes.
var kittySchema = new mongoose.Schema({
name: String
});
var Kitten = mongoose.model('Kitten', kittySchema);
var catSchema = new mongoose.Schema({
name: String
});
var Cat = mongoose.model('Cat', catSchema);
This is Model for both the collection
`function Recursion(){
Kitten.findOne().lean().exec(function(error, results){
if(!error){
var objectResponse = results;
var RequiredId = objectResponse._id;
delete objectResponse._id;
var swap = new Cat(objectResponse);
swap.save(function (err) {
if (err) {
return err;
}
else {
console.log("SUCCESSFULL");
Kitten.deleteOne({ _id: RequiredId }, function(err) {
if (!err) {
console.log('notification!');
}
else {
return err;
}
});
Recursion();
}
});
}
if (err) {
console.log("No object found");
// return err;
}
})
}`
I planned to arhieve 1000 records at a time using bulkinsert and bulkdelete methods of pymongo.
For both source and target
create mongodb objects to connect to the database.
instantiate the bulk objects. Note: I created a backup of bulk objects too. This will help me to rollback the insertion or removal when an error occurs.
example:
For source
// replace this with mongodb object creation logic
source_db_obj = db_help.create_db_obj(source_db, source_col)
source_bulk = source_db_obj.initialize_ordered_bulk_op()
source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
For target
// replace this with mogodb object creation logic
target_db_obj = db_help.create_db_obj(target_db, target_col)
target_bulk = target_db_obj.initialize_ordered_bulk_op()
target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()
Obtain the source records that matches the filter criteria
source_find_results = source_db_obj.find(filter)
Loop through the source records
create target and source bulk operations
Append archived_at field with the current datetime to the target collection
//replace this with the logic to obtain the UTCtime.
doc['archived_at'] = db_help.getUTCTime()
target_bulk.insert(document)
source_bulk.remove(document)
for rollback in case of any errors or exceptions, create target_bulk_bak and source_bulk_bak operations.
target_bulk_bak.find({'_id':doc['_id']}).remove_one()
source_bulk_bak.insert(doc)
//remove the extra column
doc.pop('archieved_at', None)
When the record count to 1000, execute the target - bulk insertion and source - bulk removal. Note: this method takes target_bulk and source_bulk objects for execution.
execute_bulk_insert_remove(source_bulk, target_bulk)
When exception occurs, execute the target_bulk_bak removal and source_bulk_bak inesertions. This would rollback the changes. Since mongodb doesn't have rollback, I came up with this hack
execute_bulk_insert_remove(source_bulk_bak, target_bulk_bak)
Finally re-initialize the source and target bulk and bulk_bak objects. This is necessary because you can use them only once.
Complete code
def execute_bulk_insert_remove(source_bulk, target_bulk):
try:
target_bulk.execute()
source_bulk.execute()
except BulkWriteError as bwe:
raise Exception(
"could not archive document, reason: {}".format(bwe.details))
def archive_bulk_immediate(filter, source_db, source_col, target_db, target_col):
"""
filter: filter criteria for backup
source_db: source database name
source_col: source collection name
target_db: target database name
target_col: target collection name
"""
count = 0
bulk_count = 1000
source_db_obj = db_help.create_db_obj(source_db, source_col)
source_bulk = source_db_obj.initialize_ordered_bulk_op()
source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
target_db_obj = db_help.create_db_obj(target_db, target_col)
target_bulk = target_db_obj.initialize_ordered_bulk_op()
target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()
source_find_results = source_db_obj.find(filter)
start = datetime.now()
for doc in source_find_results:
doc['archived_at'] = db_help.getUTCTime()
target_bulk.insert(doc)
source_bulk.find({'_id': doc['_id']}).remove_one()
target_bulk_bak.find({'_id': doc['_id']}).remove_one()
doc.pop('archieved_at', None)
source_bulk_bak.insert(doc)
count += 1
if count % 1000 == 0:
logger.info("count: {}".format(count))
try:
execute_bulk_insert_remove(source_bulk, target_bulk)
except BulkWriteError as bwe:
execute_bulk_insert_remove(source_bulk_bak, target_bulk_bak)
logger.info("Bulk Write Error: {}".format(bwe.details))
raise
source_bulk = source_db_obj.initialize_ordered_bulk_op()
source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
target_bulk = target_db_obj.initialize_ordered_bulk_op()
target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()
end = datetime.now()
logger.info("archived {} documents to {} in ms.".format(
count, target_col, (end - start)))

Mongodb - How to find string in multiple fields?

Using Pymongo for this scenario.
I have User that has email, first_name, last_name.
I am using this Pymongo snippet:
user_found = users.find({'$or':[
{'email':{'$regex':searchString, '$options':'i'}},
{'first_name':{'$regex':searchString, '$options':'i'}},
{'last_name':{'$regex':searchString, '$options':'i'}}]})
this example works, if I want to find searchString in:
email, or
first_name, or
last_name
now I need to also find searchString in first_name + last_name combined.
how can I do that?
Is there a way in mongo, through the query, to combine the two into a "fullname" then search the fullname?
Easiest way is to add an array field and populate it with all of the variants that you want to search on. Index that array field.
That way you only need one index and your search across all fields is simple and doesn't change when you want to search on some new search variant. You can also normalize the text you put into the search array, for example, lower casing it, removing punctuation etc.
See https://stackoverflow.com/q/8206188/224370
Edit: MongoDB's documentation now covers keyword search and the new full-text search feature.
I had the same problem. I already used regex string search, so my solution was:
generate a helper collection. Here I combine all relevant string, like:
{
search_field: email + " " + first_name + " " + last_name,
ref_id: (id to real object)
}
I then use a regexp to creat what i allow to be looked for:
// logic found here: http://stackoverflow.com/questions/10870372/regex-match-if-string-contain-all-the-words-or-a-condition
var words = query.split(/[ ,]+/);
var regstr = "";
for (var i = 0; i < words.length; ++i) {
var word = words[i];
regstr += "(?=.*?\\b" + word + ")";
}
regstr += "^.*$";
regex = new RegExp(regstr, "i");
This then also gives some flexibility about the order.
Searching is not the fastest, since it still uses regex on all elements, but it is ok for me. (I also index the collection on search_field.
Getting results also becomes a nested call, since first you need to get the _ids you really want, and then you can query for them like so:
connection.find({ "search_field" : regex }, { _id: 0, ref_id: 1 }, { limit: limit, skip: start }).toArray(function (err, docs) {
if (err) throw err;
// map array of documents into simple array of ids
var ids = [];
for (var i = 0; i < docs.length; ++i)
{
var doc = docs[i];
ids.push(doc.ref_id);
}
if (ids.length > 0)
MongooseEmails.find({ "_id": { $in: ids } }, function (err, docres) {
if (err) throw err;
res.send(JSON.stringify(docsres));
});
else
res.send("");
});
This is edited code.. perhaps there is a syntax error, generally, it is working for me.

Looking for help with reading from MongoDB in Node.JS

I have a number of records stored in a MongoDB I'm trying to output them to the browser window by way of a Node.JS http server. I think I'm a good portion of the way along but I'm missing a few little things that are keeping it from actually working.
The code below uses node-mongo-native to connect to the database.
If there is anyone around who can help me make those last few connections with working in node I'd really appreciate it. To be fair, I'm sure this is just the start.
var sys = require("sys");
var test = require("assert");
var http = require('http');
var Db = require('../lib/mongodb').Db,
Connection = require('../lib/mongodb').Connection,
Server = require('../lib/mongodb').Server,
//BSON = require('../lib/mongodb').BSONPure;
BSON = require('../lib/mongodb').BSONNative;
var host = process.env['MONGO_NODE_DRIVER_HOST'] != null ? process.env['MONGO_NODE_DRIVER_HOST'] : 'localhost';
var port = process.env['MONGO_NODE_DRIVER_PORT'] != null ? process.env['MONGO_NODE_DRIVER_PORT'] : Connection.DEFAULT_PORT;
sys.puts("Connecting to " + host + ":" + port);
function PutItem(err, item){
var result = "";
if(item != null) {
for (key in item) {
result += key + '=' + item[key];
}
}
// sys.puts(sys.inspect(item)) // debug output
return result;
}
function ReadTest(){
var db = new Db('mydb', new Server(host, port, {}), {native_parser:true});
var result = "";
db.open(function (err, db) {
db.collection('test', function(err, collection) {
collection.find(function (err, cursor){
cursor.each( function (err, item) {
result += PutItem(err, item);
});
});
});
});
return result;
}
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end("foo"+ReadTest());
}).listen(8124);
console.log('Server running on 8124');
Sources:
- mongo connectivity code:
https://github.com/christkv/node-mongodb-native/blob/master/examples/simple.js
- node. http code: nodejs.org
EDIT CORRECTED CODE
Thanks to Mic below who got me rolling in the right direction. For anyone interested, the corrected solution is here:
function ReadTest(res){
var db = new Db('mydb', new Server(host, port, {}), {native_parser:true});
var result = "";
res.write("in readtest\n");
db.open(function (err, db) {
res.write("now open\n");
db.collection('test', function(err, collection) {
res.write("in collection\n");
collection.find(function (err, cursor){
res.write("found\n");
cursor.each( function (err, item) {
res.write("now open\n");
var x = PutItem(err, item);
sys.puts(x);
res.write(x);
if (item == null) {
res.end('foo');
}
});
});
});
});
}
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.write("start\n");
ReadTest(res);
}).listen(8124);
console.log('Server running on 8124');
My guess is that you are returning result, writing the response, and closing the connection before anything is fetched from the db.
One solution would be to pass the response object to where you actually need it, something like:
function readTest(res) {
db.open(function (err, db) {
db.collection('test', function(err, collection) {
collection.find(function (err, cursor) {
res.writeHead(200, {'Content-type' : 'text/plain'});
cursor.each( function (err, item) { res.write(item); });
res.end();
...
Of course, you should also handle errors and try to avoid nesting too many levels, but that's a different discussion.
Instead of writing all the low-level Mongodb access code, you might want to try a simple library like mongous so that you can focus on your data, not on MongoDB quirks.
You might want to try mongoskin too.
Reading documents
To apply specific value filters, we can pass specific values to the find() command. Here is a SQL query:
SELECT * FROM Table1 WHERE name = 'ABC'
which is equivalent to the following in MongoDB (notice Collection1 for Table1):
db.Collection1.find({name: 'ABC'})
We can chain count() to get the number of results, pretty() to get a readable result. The results can be further narrowed by adding additional parameters:
db.Collection1.find({name: 'ABC', rollNo: 5})
It's important to notice that these filters are ANDed together, by default. To apply an OR filter, we need to use $or. These filters will be specified depending upon the structure of the document. Ex: for object attribute name for an object school, we need to specify filter like "school.name" = 'AUHS'
We're using here the DOT notation, by trying to access a nested field name of a field school. Also notice that the filters are quoted, without which we'll get syntax errors.
Equality matches on arrays can be performed:
on the entire arrays
based on any element
based on a specific element
more complex matches using operators
In the below query:
db.Collection1.find({name: ['ABC','XYZ']})
MongoDB is going to identify documents by an exact match to an array of one or more values. Now for these types of queries, the order of elements matters, meaning that we will only match documents that have ABC followed by XYZ and those are the only 2 elements of the array name
{name:["ABC","GHI","XYZ"]},
{name:["DEF","ABC","XYZ"]}
In the above document, let's say that we need to get all the documnts where ABC is the first element. So, we'll use the below filter:
db.Schools.find({'name.0': 'ABC' })