Google BigQuery: Select and group by "yyyy-mm" from date field ("yyyy-mm-dd") or timestamp - date

I would like to group by "yyyy-mm" from a date field ("yyyy-mm-dd") or timestamp field so that I can pull and group transactional data over multiple years without having to pull separate queries grouping by month for each year.

SELECT
CONCAT(STRING(YEAR(timestamp)),'-',RIGHT(STRING(100 + MONTH(timestamp)), 2)) AS yyyymm,
<any aggregations here>
FROM YourTable
GROUP BY 1
another option:
SELECT
STRFTIME_UTC_USEC(timestamp, "%Y-%m") AS yyyymm,
<any aggregations here>
FROM YourTable
GROUP BY 1
both versions should work with timestamp or date

You can use the following for Standard SQL:
SELECT concat(cast(format_date("%E4Y", cast(current_date() as date)) as string),'-',cast(format_date("%m", cast(current_date() as date)) as string)) as yyyymm, <other aggregations>
FROM <YourTable>
GROUP BY 1;
Just replace the current_date() with your column name containing the timestamp.

Related

How to include three or more aggregators in a sql query?

I have a table called retail which stores items and their price along with date of purchase. I want to find out total monthly count of unique items sold.
This is the sql query I tried
select date_trunc('month', date) as month, sum(count(distinct(items))) as net_result from retail group by month order by date;
But I get the following error
ERROR: aggregate function calls cannot be nested
Now I searched for similar stackoverflow posts one of which is postgres aggregate function calls may not be nested and but I am unable to replicate it to create the correct sql query.
What am I doing wrong?
From your description, it doesn't seem like you need to nest the aggregate functions, the count(distinct item) construction will give you a count of distinct items sold, like so:
select date_trunc('month', date) as month
, count(distinct items) as unique_items_sold
, count(items) as total_items_sold
from retail
group by "month"
order by "month" ;
If you had a column called item_count (say if there was row in the table for each item sold, but a sale might include, say, three widgets)
select date_trunc('month', date) as month
, count(distinct items) as unique_items_sold
, sum(item_count) as total_items_sold
from retail
group by "month"
order by "month" ;
Use subqueries:
Select month, sum(citems) as net_result
from
(select
date_trunc('month', date) as month,
count(distinct(items)) as citems
from
retail
group by month
order by date
)
I am suspect your group by statement will throw an Error because your month column are condition column and you cannot put in the same level in your query so put your full expression instead.
select
month,
sum(disct_item) as net_results
from
(select
date_trunc('month', date) as month,
count(distinct items) as disct_item
from
retail
group by
date_trunc('month', date)
order by
date) as tbl
group by
month;
You cannot make nested aggregate so you wrap first count to subquery and after that in outer you make sum to do the operation.

Big Query Order By Grouped Field

I have a query that groups by date which works fine.
SELECT EXTRACT(date FROM DATETIME(timestamp, 'US/Eastern')) date, SUM(users) total_users FROM `mydataset.mytable`
GROUP BY EXTRACT(date FROM DATETIME(timestamp, 'US/Eastern'))
but when I try to order by date:
SELECT EXTRACT(date FROM DATETIME(timestamp, 'US/Eastern')) date, SUM(users) total_users FROM `mydataset.mytable`
GROUP BY EXTRACT(date FROM DATETIME(timestamp, 'US/Eastern'))
ORDER BY EXTRACT(date FROM DATETIME(timestamp, 'US/Eastern'));
I get the following error:
SELECT list expression references column timestamp which is neither grouped nor aggregated at [1:35]
The timestamp column is clearly part of the group by and even stranger still is that it works without the ORDER BY clause... What's going on here?
#standardSQL
SELECT
EXTRACT(DATE FROM DATETIME(timestamp, 'US/Eastern')) date,
SUM(users) total_users
FROM `mydataset.mytable`
GROUP BY 1
ORDER BY 1
You can try subselect:
#standardSQL
SELECT
date,
total_users
FROM (
SELECT
EXTRACT(date FROM DATETIME(timestamp,'US/Eastern')) date,
SUM(users) total_users
FROM
`mydataset.mytable`
GROUP BY EXTRACT(date FROM DATETIME(timestamp, 'US/Eastern'))
)
ORDER BY
date

DB2: substring a number

In my dataset I have a variable (numeric) which is year+month, called year_month with values 201702, 201703 etc.
Normally my code looks like this:
select
year_month
,variable2
,variable3
from dataset
I wish to extract the month and the year from the year_month variable, but I'm not sure how to do this when year_month is numeric.
edit: not a duplicate, different problem, I do not care about dates.
To extract the date parts from an integer
SELECT year_month/100,MOD(year_month,100)
To fully convert the integer to a date :
SELECT TO_DATE(CHAR(year_month),'YYYYMM')
Possible with this methods too:
select left(cast(year_mont as varchar(6)), 4) as YYYY,
right(cast(year_mont as varchar(6)), 2) as MM from yourtable
You can have a timestamp like this:
select TIMESTAMP_FORMAT(cast(year_mont as varchar(6)), 'YYYYMM') as YouTimeStamp
from yourtable
Or a date too:
select Date(TIMESTAMP_FORMAT(cast(year_mont as varchar(6)), 'YYYYMM')) as YouTimeStamp
from yourtable

Group by YYYYMM from YYYYMMDD int column - Postgresql

I have date in yyyymmdd format in an int column and I would like to group by month i.e. yyyymm.I've tried the below two versions
select to_char(to_timestamp(create_dt),'YYYYMM'),count(*) from table_name
group by to_char(to_timestamp(create_dt),'YYYYMM')
order by to_char(to_timestamp(create_dt),'YYYYMM') desc
AND
select to_char(create_dt,'YYYYMM'),count(*) from table_name
group by to_char(create_dt,'YYYYMM')
order by to_char(create_dt,'YYYYMM') desc
select create_dt / 100, count(*)
from t
group by 1
order by 1 desc
limit 6
Figured it out,any alternate ways would be helpful.
select substring(create_dt::int8,1,6),count(*) from table
group by substring(create_dt::int8,1,6)
order by substring(create_dt::int8,1,6) desc
limit 6;

PostgreSQL SELECT date before max(DATE)

I need to select the rows for which the difference between max(date) and the date just before max(date) is smaller than 366 days. I know about SELECT MAX(date) FROM table to get the last date from now, but how could I get the date before?
I would need a query of this kind:
SELECT code, MAX(date) - before_date FROM troncon WHERE MAX(date) - before_date < 366 ;
NB : before_date does not refer to anything and is to be replaced by a functionnal stuff.
Edit : Example of the table I'm testing it on:
CREATE TABLE troncon (code INTEGER, ope_date DATE) ;
INSERT INTO troncon (code, ope_date) VALUES
('C086000-T10001', '2014-11-11'),
('C086000-T10001', '2014-11-11'),
('C086000-T10002', '2014-12-03'),
('C086000-T10002', '2014-01-03'),
('C086000-T10003', '2014-08-11'),
('C086000-T10003', '2014-03-03'),
('C086000-T10003', '2012-02-27'),
('C086000-T10004', '2014-08-11'),
('C086000-T10004', '2013-12-30'),
('C086000-T10004', '2013-06-01'),
('C086000-T10004', '2012-07-31'),
('C086000-T10005', '2013-10-01'),
('C086000-T10005', '2012-11-01'),
('C086000-T10006', '2014-04-01'),
('C086000-T10006', '2014-05-15'),
('C086000-T10001', '2014-07-05'),
('C086000-T10003', '2014-03-03');
Many thanks!
The sub query contains all rows joined with the unique max date, and you select only ones which there differente with the max date is smaller than 366 days:
select * from
(
SELECT id, date, max(date) over(partition by code) max_date FROM your_table
) A
where max_date - date < interval '366 day'
PS: As #a_horse_with_no_name said, you can partition by code to get maximum_date for each code.