I am quite new to pyspark. I am trying to filter a pyspark dataframe on dates iteratively. I am trying to create some function that can filter through dates to query one week's data for each month in a given date range.
My input table looks something like this and this is partitioned on date
date
user_id
2018-01-01
abc
2018-01-02
abc
2018-01-02
xyz
2018-01-02
ghk
So I would ideally define:
start_date == '2018-01-01' & end_date == '2021-12-31'
and create a filtered dataframe.
I am having trouble creating a loop that I can use as a filter.
Can someone help please!
Related
Hi I have a date format in one table in Text 'YYYY/MM'. example 2018/01, 2020/08 etc.
I need to join it with another table where the date is in Number type( and DATETIME20 format attached it it) and convert it to month and compare.
Is there any way to do it in PROC SQL as the rest of my query is in PROC SQL?
eg. Table 1: Month= 2018/01; Table 2: Date =20.01.2018 10:48:17 . They should be joined in the PROC SQL query.
I would also like to calculate difference in Months between these two dates.
Thank you in advance.
Convert both to the same DATE value. To convert a datetime value to a date use the DATEPART() function. To move to the first day of the month use the INTNX() function with the month interval. To convert a string like '2018/01' to a date you could use INPUT() function with YYMMDD informat by appending '/01'.
proc sql ;
create table want as
select *
from table1,table2
where input(cats(table1.month,'/01'),yymmdd10.)=intnx('month',datepart(table2.date),0)
;
quit;
In HiveSql I have a yearmonth [yyyymm] column from which I need to subtract 3 months
For example: if yearmonth is 201912 , the record required is 201909
Can someone please help me with the syntax or script I need to get for this ?
I have tried addmonths, conv(), and reg_extract
But nothing works
add_months() function works with dates. Convert yyyyMM to yyyy-MM-01 date, apply add_months and format as yyyyMM again:
with your_table as (select '201912' as yearmonth)
select date_format(add_months(concat_ws('-',substr(yearmonth,1,4),substr(yearmonth,5,2),'01'),-3),'yyyyMM') as yearmonth
from your_table;
Result:
201909
I have a table Logs with fields
Amount,date
I need to get the sum of amount and months grouped by each day
I need to migrate my sqlite code to postgresql but i find the code migration kind of hard.The sqlite code is as follows
SELECT SUM(amount),transaction_date FROM log WHERE user_id = 1 AND strftime('%Y', transaction_date) = '2019' GROUP BY strftime('%d', transaction_date);
What i need to is date and total amount grouped by day for the year 2019
You need extract() function instead of strftime() to get the year and cast transaction_date to date only if it is timestamp, if it is of data type date remove the casting ::date:
SELECT
SUM(amount),
transaction_date::date
FROM log WHERE user_id = 1 AND extract(year from transaction_date) = 2019
GROUP BY transaction_date::date
I have a table that has a column date and value, what I need is to sum a value showing just one date column.
Ex:
I have this:
date value
2018-01-01 150
2018-01-23 140
what I need:
date sum(value)
2018-01 290
Simple solution to get sums per month:
SELECT to_char(date, 'YYYY-MM') AS mon, sum(value) AS sum_value
FROM tbl
GROUP BY 1;
For large tables it's cheaper to group on date_trunc('month', date) instead.
Related:
Concatenate multiple result rows of one column into one, group by another column
Group and count events per time intervals, plus running total
How to get the date and time from timestamp in PostgreSQL select query?
I have sales data stored in a database. The sales_date field contains the date on which the sale took place. I want to extract this data grouped by month, so that I will get the aggregate data for Jan, Feb, etc. Is there a way I can do this without having to extract the entire data and then doing it manually?
Something like the following should work. If the data is partitioned on disk, remember to include the partition in the where clause.
q)tbl:([]dt:20?(2013.01.01;2013.02.01;2013.01.03);sales:20?100000)
q)select sum sales by `month$dt from tbl
dt | sales
-------| ------
2013.01| 701075
2013.02| 298200
q)select avg sales by `month$dt from tbl
dt | sales
-------| --------
2013.01| 50076.79
2013.02| 49700
You could also use the simpler form:
q)t:([] date:10?.z.Z)
q)select date.month from t
month
-------
2013.10
2013.08
2004.09
2010.03
...