When using pushdown predicate with AWS Glue Dynamic frame, how does it iterate through a list?
For example, the following list was created to be used as a pushdown predicate:
day=list(p_day.select('day').toPandas()['day'])
month=list(p_month.select('month').na.drop().toPandas()['month'])
year=list(p_year.select('year').toPandas()['year'])
predicate = "day in (%s) and month in (%s) and year in (%s)"%(",".join(map(lambda s: "'"+str(s)+"'",dat))
,",".join(map(lambda s: "'"+str(s)+"'",month))
,",".join(map(lambda s: "'"+str(s)+"'",year)))
Let's say it returns this:
"day in ('07','15') and month in ('11','09','08') and year in ('2021')"
How would the push down predicate read this combination/list?
Is it:
day
month
year
07
11
2021
15
11
2021
07
09
2021
15
09
2021
07
08
2021
15
08
2021
-OR-
day
month
year
07
11
2021
15
11
2021
15
08
2021
15
09
2021
I have a feeling that this list is read like the first table rather than the latter... But, it's the latter that I would like to pass through as a pushdown predicate. Does creating a list essentially cause a permutation? It's as if the true day, month, and year combination is lost in the list which should be 11/7/2021, 11/15/2021, 08/15/2021, and 09/15/2021.
This has nothing to do with Glue itself, since the Partition Predicate is just basic Spark SQL. You will receive the first list and not the second. You would have to restructure the boolean expression to receive the second list.
Related
How to apply distinct in one column, while getting few other columns from table.
For example, something like this,
Select distinct date(date0), name from note
So the goal here is to get all the results, for different days(distinct date(date0))
Sample data:
Date
Name
01st Jan 2021
Bohemian
01st Jan 2021
Bohemian
01st Jan 2021
Bohemian
02nd Jan 2021
Jack
03rd Jan 2021
John
Expected Results:
Date
Name
01st Jan, 2021
Bohemian
02nd Jan 2021
Jack
03rd Jan 2021
John
You seem to be asking for this:
Select distinct date, name from note
Best regards,Bjarni
I have a column with a year and a ISO week. I would like to get the corresponding date, but at the moment my formula is wrong.
I have the following table:
Year Week
2020 52
2020 53
2021 1
2021 2
I used in power query editor the following formula:
Date.StartOfWeek(Date.AddWeeks(#date([Year], 1, 1), [Week]), Day.Monday)
and I obtained:
Year Week Date
2020 52 28.12.2020
2020 53 04.01.2021
2021 1 04.01.2021
2021 2 11.01.2021
What I would like to have instead:
Year Week Date
2020 52 21.12.2020
2020 53 28.12.2020
2020 1 04.01.2021
2021 2 11.01.2021
For example, in DAX, this works:
Date = DATE([Year],1,-2)-WEEKDAY(DATE([Year],1,3))+[Week]*7
But I would prefer to have it in power query because my data source needs to be updated regularly. Thank you for your attention!
In case you have the same problem, this works now:
Date.AddDays((Date.AddDays(#date([Year],1,1),-4)),(-Date.DayOfWeek(Date.AddDays(#date([Year],1,1),-4)) + [Week]*7))
I have a DB with entries in Universal Time such as this:
{"problemDate": "Mon May 11 2020 13:09:14 GMT+0000 (Coordinated Universal Time)"},
{"problemDate": "Mon May 11 2020 13:09:14 GMT+0000 (Coordinated Universal Time)"},
{"problemDate": "Mon Dec 07 2020 14:08:26 GMT+0000 (Coordinated Universal Time)"},
{"problemDate": "Tue May 12 2020 00:18:21 GMT+0000 (Coordinated Universal Time)"}
Although, I am trying to query for a range of dates, my db has around 2k entries with dates ranging from May to December. If I query for dates greater than December 7th, my return still shows dates such as Mon May 11. I believe this is because my dates are just strings. I was wondering if there is a work around to this. Any help would be appreciated.
This was my query:db.problem.find({"problemDate":{ $gte:("Mon Dec 07 2020 14:08:24 GMT+0000 (Coordinated Universal Time)") }});
You could try an update on the collection to convert the dates from string to IsoDate.
The following script it's designed to run in a mongo shell, and it's assumed that you have selected the database (use databasename) and your collection name is 'datestest':
db.getCollection('datestest').find({}).forEach( function(item){
if (typeof(item.problemDate) == "string"){
print(item.problemDate);
item.problemDate = new Date(item.problemDate);
db.getCollection('datestest').save(item);
}
}
);
You can first save the "bad dates" to a different collection to check the update it's all right, just replace the save item with a diffrerent collection name:
db.getCollection('datestest_check').save(item);
I have dates in data from
02 Aug 2018
03 Aug 2018
04 Aug 2018
.
.
.
.
30 Aug 2018..
Now i want start of the month date through Dax formula which is 01/08/2018. But in data date is 02/08/2018 which i dont want
i tried below formula
Start_Monthdate = STARTOFMONTH(EStart_Date[Date])
through above formula i get 02 Aug 2018 which i dont want
In DAX, what you can do is use the EOMONTH function.
https://dax.guide/eomonth/
Column Name = EOMONTH(table[date], -1) + 1
So the above DAX is finding the end of the previous month, then adding 1 day to it.
For the date 2/4/2020, EOMONTH gets the date 31/3/2020, then adds one day to get 1/4/2020
Time intelligence only works reliably if you use it on a calendar table that has all the dates in the year you're working with. Since your date column is missing the first day of the month, STARTOFMONTH returns the first one that you do have.
Without creating a proper calendar table, you either use EOMONTH as #Jonee mentioned or try this:
DATE ( YEAR ( EStart_Date[Date] ), MONTH ( EStart_Date[Date] ), 1 )
I'm having problem to filter tweets by specifics dates, using setUntil come no tweets, using setSince come recently tweets.. The code is following and after that the output result..
public void readTweetFromKeyword(String keywordString) throws TwitterException
{
twitter4j.Query query =new twitter4j.Query("#clt20");
QueryResult result;
query.setSince("2014-12-12");
int cont = 0;
result = twitter.search(query);
for (Status status : result.getTweets() )
{
System.out.print("original "+status.getId());
System.out.println("\t\tdata "+status.getCreatedAt());
if(!status.getText().substring(0, 2).equals("RT")){
System.out.println(status.getText());
cont++;
}
}
System.out.println(result.getTweets().size());
System.out.println("cont = "+cont);
return;
}
CONSOLE:
original 619433499116896256 data Fri Jul 10 06:10:29 GMT-03:00 2015
If the #BCCI is looking for an alternative to #Clt20, then how about a
league of teams consisting of only Indian players ?
original 619408117495939072 data Fri Jul 10 04:29:37 GMT-03:00 2015
#TesT, #ODI, #T20I, #IPL, #CLT20 Live record, score, history shedule ke lia, Follow #PTV_SpOrtsOne snt to 40404.
original 619330143258050560 data Thu Jul 09 23:19:47 GMT-03:00 2015
Need 66 From 6 Balls. Kinda Impossible #clt20
original 619301555532120065 data Thu Jul 09 21:26:11 GMT-03:00 2015
Kamran Akmals feet are stuck #soshit #CLT20
original 619095093962608640 data Thu Jul 09 07:45:47 GMT-03:00 2015
original 619095079983017984 data Thu Jul 09 07:45:43 GMT-03:00 2015
original 619095051524665344 data Thu Jul 09 07:45:37 GMT-03:00 2015
original 619095028304973825 data Thu Jul 09 07:45:31 GMT-03:00 2015
original 619094989943902209 data Thu Jul 09 07:45:22 GMT-03:00 2015
original 619094910516400129 data Thu Jul 09 07:45:03 GMT-03:00 2015
original 619094893441363969 data Thu Jul 09 07:44:59 GMT-03:00 2015
original 619035151578722304 data Thu Jul 09 03:47:35 GMT-03:00 2015
#abhisek_taneja Games r played in Himachal Pradesh every year if u go
through the schedule of #IPL & #CLT20 properly
original 618914815730290688 data Wed Jul 08 19:49:25 GMT-03:00 2015
original 618908444939186177 data Wed Jul 08 19:24:06 GMT-03:00 2015
original 618862474687705088 data Wed Jul 08 16:21:26 GMT-03:00 2015
We as #T20 follower , #clt20 should be oganized #CLT20
15 cont = 6
Thanks a lot!!
If you set an until date kepp in mind this from the documentation
Returns tweets generated before the given date. Date should be
formatted as YYYY-MM-DD. Keep in mind that the search index may not go
back as far as the date you specify here.
And this too
Before getting involved, it’s important to know that the Search API is
focused on relevance and not completeness. This means that some Tweets
and users may be missing from search results. If you want to match for
completeness you should consider using a Streaming API instead.
So, if you set an until date too old you could get zero tweets, in the other hand if you set a since date too old you could get only tweets from the few past days as you get in the console.