How to get sysdate in talend using mongodb connector - mongodb

I have talend and mongodb on two different servers and two different zones ( these timezones may change in future). I am trying to set context variable in taled to mongodb sysdate. I have tried few ways using tMongodbInput and tMongodbRow with new Date(), new ISODate() and $currentDate methods. But it is either a JSON parse exception or Bad Command. I am looking for mongodb query for select sysdate from dual. Is there any way to get sysdate like this in taled using mongodb

i'm not sure if you still need this, but you can do your mongodb query like this:
{'comparison_date':{$lt:{$date: '" + context.date + "'}}}
$lt = less than comparison
$date = date object
having your date either stored in a context variable named 'date' like this: "2019-01-17T00:00:00.000Z" or within the code is up to you; also, i guess the timestamp could be optional
to set the current date you can use a tJava component and set it to the context variable

Related

Converting date string (YYYY-MM-DD) to datetime object pymongo

Before mass inserting using insertmany(), I need to change the "Date" field of each document from a string which is in the format of 'YYYY-MM-DD' (for example '2020-02-28) to a datetime object which can be used in mongo for later purposes...
Is there a possible way of doing this using pymongo
So my idea would look something like this
dict["Date"] = Mongo_Date(dict["Date"]) #converting the original string to a date object
outputList.append(dict)
#Later on in code
mycol.insert_many(outputList)
is there any easy way of doing this with pymongo??
A couple of possibilities come to mind:
use the python map function to modify all of the objects at once
insert the objects into MongoDB, and then use update with $dateFromString to modify them

querying documents in a date range using Mongo built-in 'timestamp'

I'm aware that the Mongo "ObjectId" has the method "getTimestamp()" , which works like
ObjectId("507f191e810c19729de860ea").getTimestamp()
And also I'm aware that it can be sorted based on built-in 'timestamp'
db.collection.find().sort({'timestamp': -1})
I know I can create a new field "created_time" in each document by converting ObjectId to created_time, then query based on this new field.
I've also read this post which converts the date range to ObjectId and then directly compare the ObjectId, but this method I'm worried about the other bytes which is not for time but for machine and process.
My question is, is there a way to directly query documents in a date range using Mongo built-in 'timestamp'? without extra field or extra effort.
something like below (but I tried below command and not working), which can directly query Mongo using its built-in timestamp.
db.collection.find({'timestamp':{$gt: new Date(ISODate("2015-08-14T14:00:00Z"))}})

get day of the week from mongodb datetime query

I am constructing a database where I might want to query a day of the week. Is it possible to use mongodb to query days in the week in a datetime (or utc timestamp) field?
Something like; get every object that has a datetime that was on a monday.
If it is not possible then the alternative seems to create dummy variables in the collection that show what day of the week it was. Preferably I would like to only query the datetime object for this as this would keep the database smaller.
There are three solutions that I can think of:
Your solution: create an extra "day_of_week" field, either an int or string, and then query against this field rather than the datetime field.
Query for everything in your collection, and then filter the results by day of the week on the client side.
Use $where, passing a javascript function which calls date.getDay(). For example, {$where: function () { return this.date.getDay() == 5; }} for getting every date on a Friday.
Solution #2 would call datetime.date.weekday() in pymongo on the client side. The downside of this method is that every document in the collection will end up being sent over the wire, which could add unnecessary network load. It's better than #1, however, in that it's more space efficient and you don't have duplicated information to keep in sync. Solution #3 has neither of these problems, but $where is slow because it requires the server to create a JavaScript execution context and cannot make use of indexes.
Pymongo can return Mongo BSON timestamp fields as python datetimes: http://api.mongodb.org/python/current/api/bson/timestamp.html
From there you can call datetime.date.weekday()
http://docs.python.org/2/library/datetime.html#datetime.date

Querying on Date in Mongo

I'm inserting a Mongo doc with the following time-stamp:
val format = new java.text.SimpleDateFormat("yyyyMMddHHmmss")
format.format(new Date()).toLong
Here's what the section looks like from Mongo's shell:
"{Timestamp" : NumberLong("20130919161948")}"
Based on a few tests, it appears to me that I can simply compare 2 documents by Timestamp by simply checking > or < for the yyyyMMddHHmmss format.
Please let me know if this time-stamp is OK for Mongo. Will I be able to query with it?
Mongo will not understand this as a timestamp, but as a number. As you set your date with a format going from year to seconds, you will be able to query mongo using > or < to know if it is before or after.
However if you want to mongo to treat the data as a date, you will need to use the appropriate bson date format. By having mongo treat it as a date, you will have all Mongo date operations available, like extracting year, day of week, etc.. read more
If you are using casbah, and Joda, you can enable serialization and deserialization by an explicit call:
import com.mongodb.casbah.conversions.scala._
RegisterJodaTimeConversionHelpers()
Read more here.
#Kevin, I think you are right. java.util.Date is supported in BSON object.
Using NumberLong to represent timestamp allows you to do range queries, but with BSON date type, date operation in aggregation framework becomes possible, which is more powerful.

Convert a ISODate string to mongoDB native ISODate data type

My application generates logs in JSON format. The logs looks something like this :
{"LogLevel":"error","Datetime":"2013-06-21T11:20:17Z","Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '#LOGIN_ID', which was not supplied."}
Currently, I'm pushing in the aforementioned log line as it is into mongoDB. But mongoDB stores the Datetime as a string (which is expected). Now that I want to run some data crunching job on these logs, I'd prefer to store the Datetime as mongoDB's native ISODate data type.
There are 3 ways I can think of for doing this :
i) parse every JSON log line and convert the string to ISODate type in the application code and then insert it. Cons : I'll have to parse each and every line before pushing it to mongoDB, which is going to be a little expensive
ii) After every insert run a query to convert the last inserted document's string date time to ISODate using
element.Datetime = ISODate(element.Datetime);
Cons : Again expensive, as I'm gonna be running one extra query per insert
iii) Modify my logs at generation point so that I don't have to do any parsing at application code level, or run an update query after every insert
Also, just curious, is there a way I can configure mongoDB to auto convert datetime strings to its native isodate format ?
TIA
EDIT:
I'm using pymongo for inserting the json logs
My file looks something like this :
{"LogLevel":"error","Datetime":"2013-06-21T11:20:17Z","Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '#LOGIN_ID', which was not supplied."}
There are hundreds of lines like the one mentioned above.
And this is how I'm inserting them into mongodb:
for line in logfile:
collection.insert(json.loads(line))
The following will fix my problem:
for line in logfile:
data = json.loads(line)
data["Datetime"] = datetime.strptime(data["Datetime"], "%Y-%M-%DTHH:mmZ")
collection.insert(data)
What I want to do is get rid of the extra manipulation of datetime I'm having to do above. Hope this clarifies the problem.
Looks like you already have the answer... I would stick with:
for line in logfile:
data = json.loads(line)
data["Datetime"] = datetime.strptime(data["Datetime"], "%Y-%M-%DTHH:mmZ")
collection.insert(data)
I had a similar problem, but I didn't known beforehand where I should replace it by a datetime object. So I changed my json information to something like:
{"LogLevel":"error","Datetime":{"__timestamp__": "2013-06-21T11:20:17Z"},"Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '#LOGIN_ID', which was not supplied."}
and parsed json with:
json.loads(data, object_hook=logHook)
with 'logHook' defined as:
def logHook(d):
if '__timestamp__' in d:
return datetime.strptime(d['__timestamp__'], "%Y-%M-%DTHH:mmZ")
return d
This logHook function could also be extended to replace many other 'variables' with elif, elif, ...
Hope this helps!
Also, just curious, is there a way I can configure mongoDB to auto convert datetime strings to its native isodate format ?
You probably want to create a Python datetime object for the timestamp, and insert that using PyMongo. This is stored under the hood as the native date object in MongoDB.
So, for example in Python:
from datetime import datetime
object_with_timestamp = { "timestamp": datetime.now() }
your_collection.insert(object_with_timestamp)
When this object gets queried from the Mongo shell, an ISODate object is present:
"timestamp" : ISODate("2013-06-24T09:29:58.615Z")
It depends on with what language/driver/utility you're pushing the log. I am assuming you're using mongoimport.
mongoimport doesn't support ISODate(). Refer to this issue https://jira.mongodb.org/browse/SERVER-5543 ISODate() is not a JSON format, hence not supported in mongoimport.
i) approach seems more efficient. ii) does two actions on mongo: insert & update. I had same issue while importing some log data into mongo. I ended up converting ISO 8601 format date to epoch format.
{"LogLevel":"error","Datetime":{"$date" : 1371813617000},"Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '#LOGIN_ID', which was not supplied."}
Above JSON should work. Note that it is 64-bit not 32-bit epoch.