mongodump by date / find() in dumped data - mongodb

How to dump all collections by date? If my records hasn't timestamp field?
Fields: _id, name, email, carnumber... etc.
And how to look/find() in archived/dumped database?
I need to create search mechanism, for searching in archive

You can pass a query to mongodump that will make it dump only a portion of your data. If you can't make a query that finds a required portion of data, then you're out of luck.
Result of mongodump is a collection of bson files. They are not directly queryable. But you can load them into another database and query that. Or you can use mongoexport utility that creates JSON documents. JSON is a little bit easier to work with.

Although what Sergio says is broadly true, let me expand a bit:
First, You mention using _id - if that is an ObjectID (the default), then it contains a timestamp - the first 4 bytes are a unix style timestamp:
http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification
Next, the problem with using mongoexport is that JSON does not preserve all BSON types (http://bsonspec.org/#/specification) - BSON has more types than JSON does and so storing as JSON can be problematic unless you have rules to re-import
If you keep the data in BSON format there is the bsondump to inspect things as-is in the files:
http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-bsondump
Or, if you had an "archive" MongoDB instance, you could just use mongodump/mongorestore, which works directly with the BSON files and does not have the JSON issues seen with mongoexport etc.:
http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-mongodumpandmongorestore

Related

How to use mongodb functions with mongoimport?

Let's say I want to insert an object that contains date objects using mongoimport from the commandline.
echo "{\"int_key\": 1, \"date_key\": new Date(\"2022-12-27\")}" | mongoimport --host "192.168.60.10" --db example_db --collection example_collection
will not work because the object I am trying to insert is not in the form of a valid JSON. The reason I want to use mongoimport is because there is an array of a large number of objects that I want to persist at one go. If I try to use the mongo command the argument length for --eval is too long. For example,
mongo --host "192.168.60.10" --eval "db=db.getSiblingDB(\"example_db\");db.getCollection(\"example_collection\").insert([{\"int_key\": 1, \"date_key\": new Date(\"2022-12-27\")}])"
but the array inside insert() has a very large number of objects. Can you suggest any workaround to this? I was thinking I could use mongoimport to read all the objects put into an array through stdin or a file. The options for using a json array would not allow the kind of array of objects I insert using the insert() in mongo --eval.
You have to use this
echo "{\"int_key\": 1, \"date_key\": {\"$date\": \"2022-12-27\"}}"
It may require:
echo "{\"int_key\": 1, \"date_key\": {\"\$date\": \"2022-12-27T00:00:00Z\"}}"
For other data types see MongoDB Extended JSON (v2)
I use mongoimport in the same way to insert around 6 billion documents per day, it is very fast and reliable.
Depending on how you use it, mongoimport does not import small amount of documents could be relevant for you.

MongoDB - querying GridFS by metadata does not return any results

I am trying to query MongoDB database for a file stored in GridFS using metadata in the following way:
db['fs'].files.find({'metadata': {'a_field': 'a_value'}})
And it does not return any results whereas I can see the file with such a field value exists when I run e.g.:
db['fs'].files.find()
What is wrong about my query?
It turns out the problem is solved by changing the nesting of JSON query document from:
{'metadata': {'a_field': 'a_value'}}
to:
{'metadata.a_field': 'a_value'}
It is still a mystery to me why the two queries are not equivalent, though.

Converting Parse objectId to Mongo ObjectId?

I'm trying to migrate data from Parse to a new project that uses Mongo as its database (without Parse/Parse Server). Since the schemas are different between the two projects, I'm manually writing a migration script to achieve this.
As I understand it, Parse appears to use 10-character-long IDs for their objects (combinations of digits, lower-case letters, and upper-case letters), while Mongo uses 24-character-long IDs (12 bytes represented as hex).
Right now, when migrating data for a document from the old project to the new one, I'm using a function that converts the Parse ID to a unique Mongo ObjectId (it converts each character to a 2-digit hex value, then pads the 20-character string with 4 zeroes).
Is this a good approach? I'm avoiding using Mongo's automatic ObjectId generation in case I ever need to re-migrate any of the old Parse documents and find the matching document in the new database. I know automatically generated ObjectIds in Mongo also embed some other information like creation dates, but I don't think this would be important and I can just use my custom ObjectId generator? However, I'm not sure about the implications for performance/if I'm just going about this migration the wrong way.
The approach i recommend is letting Mongo auto-generate the ids and then storing Parse's ids in a new field called parseID for future reference if needed.
For example:
PARSE DATA:
"_id": ObjectId(1234567890),
"title": "Mongo Migrate",
"description": "Migrating from Parse to Mongo"
MONGO DATA:
"_id": ObjectId(1ad83e4k2ab8e0daa8ebde7), //mongo generated
"parseId":ObjectId(1234567890),
"title": "Mongo Migrate",
"description": "Migrating from Parse to Mongo"
Then if you need to match a document between the two databases later, you can write a script that goes along the lines of Parse.find({"_id": Mongo.parseId}).....
MongoDB uses _id as primary key by default. _id has to be unique to avoid collision. The way you are generating unique ObjectId to _id is fine. As long as they are unique, you could even reduce the 20-character pad to save space.

Querying on Date in Mongo

I'm inserting a Mongo doc with the following time-stamp:
val format = new java.text.SimpleDateFormat("yyyyMMddHHmmss")
format.format(new Date()).toLong
Here's what the section looks like from Mongo's shell:
"{Timestamp" : NumberLong("20130919161948")}"
Based on a few tests, it appears to me that I can simply compare 2 documents by Timestamp by simply checking > or < for the yyyyMMddHHmmss format.
Please let me know if this time-stamp is OK for Mongo. Will I be able to query with it?
Mongo will not understand this as a timestamp, but as a number. As you set your date with a format going from year to seconds, you will be able to query mongo using > or < to know if it is before or after.
However if you want to mongo to treat the data as a date, you will need to use the appropriate bson date format. By having mongo treat it as a date, you will have all Mongo date operations available, like extracting year, day of week, etc.. read more
If you are using casbah, and Joda, you can enable serialization and deserialization by an explicit call:
import com.mongodb.casbah.conversions.scala._
RegisterJodaTimeConversionHelpers()
Read more here.
#Kevin, I think you are right. java.util.Date is supported in BSON object.
Using NumberLong to represent timestamp allows you to do range queries, but with BSON date type, date operation in aggregation framework becomes possible, which is more powerful.

Convert a ISODate string to mongoDB native ISODate data type

My application generates logs in JSON format. The logs looks something like this :
{"LogLevel":"error","Datetime":"2013-06-21T11:20:17Z","Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '#LOGIN_ID', which was not supplied."}
Currently, I'm pushing in the aforementioned log line as it is into mongoDB. But mongoDB stores the Datetime as a string (which is expected). Now that I want to run some data crunching job on these logs, I'd prefer to store the Datetime as mongoDB's native ISODate data type.
There are 3 ways I can think of for doing this :
i) parse every JSON log line and convert the string to ISODate type in the application code and then insert it. Cons : I'll have to parse each and every line before pushing it to mongoDB, which is going to be a little expensive
ii) After every insert run a query to convert the last inserted document's string date time to ISODate using
element.Datetime = ISODate(element.Datetime);
Cons : Again expensive, as I'm gonna be running one extra query per insert
iii) Modify my logs at generation point so that I don't have to do any parsing at application code level, or run an update query after every insert
Also, just curious, is there a way I can configure mongoDB to auto convert datetime strings to its native isodate format ?
TIA
EDIT:
I'm using pymongo for inserting the json logs
My file looks something like this :
{"LogLevel":"error","Datetime":"2013-06-21T11:20:17Z","Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '#LOGIN_ID', which was not supplied."}
There are hundreds of lines like the one mentioned above.
And this is how I'm inserting them into mongodb:
for line in logfile:
collection.insert(json.loads(line))
The following will fix my problem:
for line in logfile:
data = json.loads(line)
data["Datetime"] = datetime.strptime(data["Datetime"], "%Y-%M-%DTHH:mmZ")
collection.insert(data)
What I want to do is get rid of the extra manipulation of datetime I'm having to do above. Hope this clarifies the problem.
Looks like you already have the answer... I would stick with:
for line in logfile:
data = json.loads(line)
data["Datetime"] = datetime.strptime(data["Datetime"], "%Y-%M-%DTHH:mmZ")
collection.insert(data)
I had a similar problem, but I didn't known beforehand where I should replace it by a datetime object. So I changed my json information to something like:
{"LogLevel":"error","Datetime":{"__timestamp__": "2013-06-21T11:20:17Z"},"Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '#LOGIN_ID', which was not supplied."}
and parsed json with:
json.loads(data, object_hook=logHook)
with 'logHook' defined as:
def logHook(d):
if '__timestamp__' in d:
return datetime.strptime(d['__timestamp__'], "%Y-%M-%DTHH:mmZ")
return d
This logHook function could also be extended to replace many other 'variables' with elif, elif, ...
Hope this helps!
Also, just curious, is there a way I can configure mongoDB to auto convert datetime strings to its native isodate format ?
You probably want to create a Python datetime object for the timestamp, and insert that using PyMongo. This is stored under the hood as the native date object in MongoDB.
So, for example in Python:
from datetime import datetime
object_with_timestamp = { "timestamp": datetime.now() }
your_collection.insert(object_with_timestamp)
When this object gets queried from the Mongo shell, an ISODate object is present:
"timestamp" : ISODate("2013-06-24T09:29:58.615Z")
It depends on with what language/driver/utility you're pushing the log. I am assuming you're using mongoimport.
mongoimport doesn't support ISODate(). Refer to this issue https://jira.mongodb.org/browse/SERVER-5543 ISODate() is not a JSON format, hence not supported in mongoimport.
i) approach seems more efficient. ii) does two actions on mongo: insert & update. I had same issue while importing some log data into mongo. I ended up converting ISO 8601 format date to epoch format.
{"LogLevel":"error","Datetime":{"$date" : 1371813617000},"Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '#LOGIN_ID', which was not supplied."}
Above JSON should work. Note that it is 64-bit not 32-bit epoch.