mongodb searching for date strings in a range - mongodb

I have a DB with entries in Universal Time such as this:
{"problemDate": "Mon May 11 2020 13:09:14 GMT+0000 (Coordinated Universal Time)"},
{"problemDate": "Mon May 11 2020 13:09:14 GMT+0000 (Coordinated Universal Time)"},
{"problemDate": "Mon Dec 07 2020 14:08:26 GMT+0000 (Coordinated Universal Time)"},
{"problemDate": "Tue May 12 2020 00:18:21 GMT+0000 (Coordinated Universal Time)"}
Although, I am trying to query for a range of dates, my db has around 2k entries with dates ranging from May to December. If I query for dates greater than December 7th, my return still shows dates such as Mon May 11. I believe this is because my dates are just strings. I was wondering if there is a work around to this. Any help would be appreciated.
This was my query:db.problem.find({"problemDate":{ $gte:("Mon Dec 07 2020 14:08:24 GMT+0000 (Coordinated Universal Time)") }});

You could try an update on the collection to convert the dates from string to IsoDate.
The following script it's designed to run in a mongo shell, and it's assumed that you have selected the database (use databasename) and your collection name is 'datestest':
db.getCollection('datestest').find({}).forEach( function(item){
if (typeof(item.problemDate) == "string"){
print(item.problemDate);
item.problemDate = new Date(item.problemDate);
db.getCollection('datestest').save(item);
}
}
);
You can first save the "bad dates" to a different collection to check the update it's all right, just replace the save item with a diffrerent collection name:
db.getCollection('datestest_check').save(item);

Related

AWS Glue Dynamic Frame Pushdown Predicate List

When using pushdown predicate with AWS Glue Dynamic frame, how does it iterate through a list?
For example, the following list was created to be used as a pushdown predicate:
day=list(p_day.select('day').toPandas()['day'])
month=list(p_month.select('month').na.drop().toPandas()['month'])
year=list(p_year.select('year').toPandas()['year'])
predicate = "day in (%s) and month in (%s) and year in (%s)"%(",".join(map(lambda s: "'"+str(s)+"'",dat))
,",".join(map(lambda s: "'"+str(s)+"'",month))
,",".join(map(lambda s: "'"+str(s)+"'",year)))
Let's say it returns this:
"day in ('07','15') and month in ('11','09','08') and year in ('2021')"
How would the push down predicate read this combination/list?
Is it:
day
month
year
07
11
2021
15
11
2021
07
09
2021
15
09
2021
07
08
2021
15
08
2021
-OR-
day
month
year
07
11
2021
15
11
2021
15
08
2021
15
09
2021
I have a feeling that this list is read like the first table rather than the latter... But, it's the latter that I would like to pass through as a pushdown predicate. Does creating a list essentially cause a permutation? It's as if the true day, month, and year combination is lost in the list which should be 11/7/2021, 11/15/2021, 08/15/2021, and 09/15/2021.
This has nothing to do with Glue itself, since the Partition Predicate is just basic Spark SQL. You will receive the first list and not the second. You would have to restructure the boolean expression to receive the second list.

MongoDB/mongoose some dates as EDT and others as EST

When getting some records, I'm seeing that some dates are showing as GMT-0400 (EDT) and others are GMT-0500 (EST).
Dates are being added in mongoose simply by using Date.now in the schema.
Any ideas what could cause the offsets to be different?
Edit: Here's an example:
Stored as: ISODate("2015-10-30T15:36:47.287Z") Returned as: Fri Oct 30 2015 11:36:47 GMT-0400 (EDT) using find() with Mongoose
Stored as: ISODate("2015-11-07T14:44:47.956Z") Returned as: Sat Nov 07 2015 09:44:47 GMT-0500 (EST) using find() with Mongoose
It's just the default string representation of the underlying UTC timestamp, showing the timestamp in the local time zone that was in effect at the time.
In 2015, daylight saving time ended at 2:00 AM on Nov 1, so that's why the first one is shown in EDT and the second one is shown in EST.
Any queries you do are always performed using UTC time.

twitter4j - setSince and setUntil don't work

I'm having problem to filter tweets by specifics dates, using setUntil come no tweets, using setSince come recently tweets.. The code is following and after that the output result..
public void readTweetFromKeyword(String keywordString) throws TwitterException
{
twitter4j.Query query =new twitter4j.Query("#clt20");
QueryResult result;
query.setSince("2014-12-12");
int cont = 0;
result = twitter.search(query);
for (Status status : result.getTweets() )
{
System.out.print("original "+status.getId());
System.out.println("\t\tdata "+status.getCreatedAt());
if(!status.getText().substring(0, 2).equals("RT")){
System.out.println(status.getText());
cont++;
}
}
System.out.println(result.getTweets().size());
System.out.println("cont = "+cont);
return;
}
CONSOLE:
original 619433499116896256 data Fri Jul 10 06:10:29 GMT-03:00 2015
If the #BCCI is looking for an alternative to #Clt20, then how about a
league of teams consisting of only Indian players ?
original 619408117495939072 data Fri Jul 10 04:29:37 GMT-03:00 2015
#TesT, #ODI, #T20I, #IPL, #CLT20 Live record, score, history shedule ke lia, Follow #PTV_SpOrtsOne snt to 40404.
original 619330143258050560 data Thu Jul 09 23:19:47 GMT-03:00 2015
Need 66 From 6 Balls. Kinda Impossible #clt20
original 619301555532120065 data Thu Jul 09 21:26:11 GMT-03:00 2015
Kamran Akmals feet are stuck #soshit #CLT20
original 619095093962608640 data Thu Jul 09 07:45:47 GMT-03:00 2015
original 619095079983017984 data Thu Jul 09 07:45:43 GMT-03:00 2015
original 619095051524665344 data Thu Jul 09 07:45:37 GMT-03:00 2015
original 619095028304973825 data Thu Jul 09 07:45:31 GMT-03:00 2015
original 619094989943902209 data Thu Jul 09 07:45:22 GMT-03:00 2015
original 619094910516400129 data Thu Jul 09 07:45:03 GMT-03:00 2015
original 619094893441363969 data Thu Jul 09 07:44:59 GMT-03:00 2015
original 619035151578722304 data Thu Jul 09 03:47:35 GMT-03:00 2015
#abhisek_taneja Games r played in Himachal Pradesh every year if u go
through the schedule of #IPL & #CLT20 properly
original 618914815730290688 data Wed Jul 08 19:49:25 GMT-03:00 2015
original 618908444939186177 data Wed Jul 08 19:24:06 GMT-03:00 2015
original 618862474687705088 data Wed Jul 08 16:21:26 GMT-03:00 2015
We as #T20 follower , #clt20 should be oganized #CLT20
15 cont = 6
Thanks a lot!!
If you set an until date kepp in mind this from the documentation
Returns tweets generated before the given date. Date should be
formatted as YYYY-MM-DD. Keep in mind that the search index may not go
back as far as the date you specify here.
And this too
Before getting involved, it’s important to know that the Search API is
focused on relevance and not completeness. This means that some Tweets
and users may be missing from search results. If you want to match for
completeness you should consider using a Streaming API instead.
So, if you set an until date too old you could get zero tweets, in the other hand if you set a since date too old you could get only tweets from the few past days as you get in the console.

Converting date into PostgreSQL format

I have a date time in format: Mon Jun 11 12:16:14 EDT 2013
I want it inserted in postgres as Date attribute, but postgres always inserts the current date time. I think it is the issue with the format. The normal date format in postgres is something like: 2012-06-13 04:24:45
How could I change Mon Jun 11 12:16:14 EDT 2013 format compatible to postgres?
Thank you!!!
You want the to_date or to_timestamp function.
You give it your string date, and a patten for how to parse the date. For your example it would be:
select to_timestamp('Mon Jun 11 12:16:14 EDT 2013', 'Dy Mon DD HH24:MI:SS ??? YYYY');
to_timestamp
------------------------
2013-06-11 12:16:14+01
I don't think you can't work with the timezone with these functions unfortunately (hence the ???)
You should also just be able to cast the string like:
'Mon Jun 11 12:16:14 EDT 2013'::timestamptz;

Can't get pubDate to output in Yahoo! Pipes?

In one of my RSS feeds in Yahoo! Pipes, I'm formatting dates using the Date Formatter module and using the format %K so they are pubDate-compliant. In Pipe Output, my four dates appears as follows: Wed, 25 Jul 2012 03:30:00 +0000, Mon, 16 Jul 2012 06:30:00 +0000, Wed, 11 Jul 2012 07:00:00 +0000, and Wed, 27 Jun 2012 13:00:00 +0000.
However, in the RSS feed output, none of these dates appear. Are they formatted incorrectly? Why does Yahoo! Pipes not output these dates?
Okay, so I now realize that I need to output dates to y:published rather than pubDate. This doesn't seem to be widely documented. Even Googling y:published doesn't return many results.
Here are the more detailed steps:
You have an easy-to-read date such as 8 Jan 2013 in its own field, such as pubDate (name doesn't matter; it's just used in Step 2).
Connect your feed to a Loop module. Inside that module, put the Date Builder module, and specify the field where the date is found (such as pubDate).
Still in the Loop module, select "assign results to" and enter item.y:published.
That should output the date in the RSS output in the pubDate field, and it should therefore be readable in any RSS reader.