Group by YYYYMM from YYYYMMDD int column - Postgresql - postgresql

I have date in yyyymmdd format in an int column and I would like to group by month i.e. yyyymm.I've tried the below two versions
select to_char(to_timestamp(create_dt),'YYYYMM'),count(*) from table_name
group by to_char(to_timestamp(create_dt),'YYYYMM')
order by to_char(to_timestamp(create_dt),'YYYYMM') desc
AND
select to_char(create_dt,'YYYYMM'),count(*) from table_name
group by to_char(create_dt,'YYYYMM')
order by to_char(create_dt,'YYYYMM') desc

select create_dt / 100, count(*)
from t
group by 1
order by 1 desc
limit 6

Figured it out,any alternate ways would be helpful.
select substring(create_dt::int8,1,6),count(*) from table
group by substring(create_dt::int8,1,6)
order by substring(create_dt::int8,1,6) desc
limit 6;

Related

PostgreSQL - SQL function to loop through all months of the year and pull 10 random records from each

I am attempting to pull 10 random records from each month of this year using this query here but I get an error "ERROR: relation "c1" does not exist
"
Not sure where I'm going wrong - I think it may be I'm using Mysql syntax instead, but how do I resolve this?
My desired output is like this
Month
Another header
2021-01
random email 1
2021-01
random email 2
total of ten random emails from January, then ten more for each month this year (til November of course as Dec yet to happen)..
With CTE AS
(
Select month,
email,
Row_Number() Over (Partition By month Order By FLOOR(RANDOM()*(1-1000000+1))) AS RN
From (
SELECT
DISTINCT(TO_CHAR(DATE_TRUNC('month', timestamp ), 'YYYY-MM')) AS month
,CASE
WHEN
JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'name') = 'email'
THEN
JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'value')
END AS email
FROM form_submits_y2 fs
WHERE fs.website_id IN (791)
AND month LIKE '2021%'
GROUP BY 1,2
ORDER BY 1 ASC
)
)
SELECT *
FROM CTE C1
LEFT JOIN
(SELECT RN
,month
,email
FROM CTE C2
WHERE C2.month = C1.month
ORDER BY RANDOM() LIMIT 10) C3
ON C1.RN = C3.RN
ORDER By month ASC```
You can't reference an outer table inside a derived table with a regular join. You need to use left join lateral to make that work
I did end up finding a more elegant solution to my query here via this source from github :
SELECT
month
,email
FROM
(
Select month,
email,
Row_Number() Over (Partition By month Order By FLOOR(RANDOM()*(1-1000000+1))) AS RN
From (
SELECT
TO_CHAR(DATE_TRUNC('month', timestamp ), 'YYYY-MM') AS month
,CASE
WHEN JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'name') = 'email'
THEN JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'value')
END AS email
FROM form_submits_y2 fs
WHERE fs.website_id IN (791)
AND month LIKE '2021%'
GROUP BY 1,2
ORDER BY 1 ASC
)
) q
WHERE
RN <=10
ORDER BY month ASC

Count distinct loop in sql

I am trying to pull unique active users before a date.
So specifically, I have a date range (let's say August - November) where I want to know the cumulative unique active users on or before a day within a month.
So, the pseudocode would look something like this:
SELECT COUNT(DISTINCT USERS) FROM USER_DB
WHERE
Month = [loop through months 8-11]
AND
DAY <= [day in loop of 1:31]
The output I desire is something Like this
step-by-step demo: db<>fiddle
SELECT
mydate,
SUM( -- 3
COUNT(DISTINCT username) -- 1, 2
) OVER (ORDER BY mydate) -- 3
FROM t
GROUP BY mydate -- 2
GROUP BY your date and count the users
Because you don't want to count ALL user accesses, but only one access per user and day, you need to add the DISTINCT
This is a window function. This one aggregates all counts which where previously done cumulatively.
If you want to get unique user over ALL days (count a user only on its first access) you can filter the users with a DISTINCT ON clause first:
demo: db<>fiddle
SELECT DISTINCT ON (username)
*
FROM t
ORDER BY username, mydate
This yields:
SELECT
mydate,
SUM(
COUNT(*)
) OVER (ORDER BY mydate)
FROM (
SELECT DISTINCT ON (username)
*
FROM t
ORDER BY username, mydate
) s
GROUP BY mydate

Postgres : Need distinct records count

I have a table with duplicate entries and the objective is to get the distinct entries based on the latest time stamp.
In my case 'serial_no' will have duplicate entries but I select unique entries based on the latest time stamp.
Below query is giving me the unique results with the latest time stamp.
But my concern is I need to get the total of unique entries.
For example assume my table has 40 entries overall. With the below query I am able to get 20 unique rows based on the serial number.
But the 'total' is returned as 40 instead of 20.
Any help on this pls?
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
ORDER BY
serial_no asc
product_info table intially has this data
serial_no name timestamp
11212 pulp12 2018-06-01 20:00:01
11213 mango 2018-06-01 17:00:01
11214 grapes 2018-06-02 04:00:01
11215 orange 2018-06-02 07:05:30
11212 pulp12 2018-06-03 14:00:01
11213 mango 2018-06-03 13:00:00
After the distict query I got all unique results based on the latest
timestamp:
serial_no name timestamp total
11212 pulp12 2018-06-03 14:00:01 6
11213 mango 2018-06-03 13:00:00 6
11214 grapes 2018-06-02 04:00:01 6
11215 orange 2018-06-02 07:05:30 6
But total is appearing as 6 . I wanted the total to be 4 since it has
only 4 unique entries.
I am not sure how to modify my existing query to get this desired
result.
Postgres supports COUNT(DISTINCT column_name), so if I have understood your request, using that instead of COUNT(*) will work, and you can drop the OVER.
What you could do is move the window function to a higher level select statement. This is because window function is evaluated before distinct on and limit clauses are applied. Also, you can not include DISTINCT keyword within window functions - it has not been implemented yet (as of Postgres 9.6).
SELECT
*,
COUNT(*) OVER() as total -- here
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC
LIMIT
10
) AS my_info
Additionally, offset is not required there and one more sorting is also superfluous. I've removed these.
Another way would be to include a computed column in the select clause but this would not be as fast as it would require one more scan of the table. This is obviously assuming that your total is strictly connected to your resultset and not what's beyond that being stored in the table, but gets filtered out.
select count(*), serial_no from product_info group by serial_no
will give you the number of duplicates for each serial number
The most mindless way of incorporating that information would be to join in a sub query
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
join (select count(*) as counts, serial_no from product_info group by serial_no) as X
on X.serial_no = my_info.serial_no
ORDER BY
serial_no asc

PostgreSQL SELECT date before max(DATE)

I need to select the rows for which the difference between max(date) and the date just before max(date) is smaller than 366 days. I know about SELECT MAX(date) FROM table to get the last date from now, but how could I get the date before?
I would need a query of this kind:
SELECT code, MAX(date) - before_date FROM troncon WHERE MAX(date) - before_date < 366 ;
NB : before_date does not refer to anything and is to be replaced by a functionnal stuff.
Edit : Example of the table I'm testing it on:
CREATE TABLE troncon (code INTEGER, ope_date DATE) ;
INSERT INTO troncon (code, ope_date) VALUES
('C086000-T10001', '2014-11-11'),
('C086000-T10001', '2014-11-11'),
('C086000-T10002', '2014-12-03'),
('C086000-T10002', '2014-01-03'),
('C086000-T10003', '2014-08-11'),
('C086000-T10003', '2014-03-03'),
('C086000-T10003', '2012-02-27'),
('C086000-T10004', '2014-08-11'),
('C086000-T10004', '2013-12-30'),
('C086000-T10004', '2013-06-01'),
('C086000-T10004', '2012-07-31'),
('C086000-T10005', '2013-10-01'),
('C086000-T10005', '2012-11-01'),
('C086000-T10006', '2014-04-01'),
('C086000-T10006', '2014-05-15'),
('C086000-T10001', '2014-07-05'),
('C086000-T10003', '2014-03-03');
Many thanks!
The sub query contains all rows joined with the unique max date, and you select only ones which there differente with the max date is smaller than 366 days:
select * from
(
SELECT id, date, max(date) over(partition by code) max_date FROM your_table
) A
where max_date - date < interval '366 day'
PS: As #a_horse_with_no_name said, you can partition by code to get maximum_date for each code.

Summing Multiple Records by maxdate

I have a table with the following data
Bldg Suit SQFT Date
1 1 1,000 9/24/2012
1 1 1,500 12/31/2011
1 2 800 8/31/2012
1 2 500 10/1/2005
I want to write a query that will sum the max date for each suit record, so the desired result would be 1,800, and must be in one cell/row. This will ultimately be part of subquery, I am just not getting what I expect with the queries I have writtren so far.
Thanks in advance.
You can use the following (See SQL Fiddle with Demo):
select sum(t1.sqft) Total
from yourtable t1
inner join
(
select max(dt) mxdt, suit, bldg
from yourtable
group by suit, bldg
) t2
on t1.dt = t2.mxdt
and t1.bldg = t2.bldg
and t1.suit = t2.suit
; With Data As
(
Select Bldg, Suit, SQFT, Row_Number() Over (Partition By Bldg, Suit Order By Date DESC) As RowID
From YourTableNameHere
)
Select Bldg, Sum(SQFT) As TotalSQFT
From Data
Where RowId = 1
Group By Bldg