Counting consecutive days in postgres - postgresql

I'm trying to count the number of consecutive days in two tables with the following structure:
| id | email | timestamp |
| -------- | -------------- | -------------- |
| 1 | hello#example.com | 2021-10-22 00:35:22 |
| 2 | hello2#example.com | 2021-10-21 21:17:41 |
| 1 | hello#example.com | 2021-10-19 00:35:22 |
| 1 | hello#example.com | 2021-10-18 00:35:22 |
| 1 | hello#example.com | 2021-10-17 00:35:22 |
I would like to count the number of consecutive days of activity. The data above would show:
| id | email | length |
| -------- | -------------- | -- |
| 1 | hello#example.com | 1 |
| 2 | hello2#example.com | 1 |
| 1 | hello#example.com | 3 |
This is made more difficult because I need to join the two tables using a UNION (or something similar and then run the grouping. I tried to build on this query (Finding the length of a series in postgres) but I'm unable to group by consecutive days.
select max(id) as max_id, email, count(*) as length
from (
select *, row_number() over wa - row_number() over wp as grp
from began_playing_video
window
wp as (partition by email order by id desc),
wa as (order by id desc)
) s
group by email, grp
order by 1 desc
Any ideas on how I could do this in Postgres?

First create an aggregate function in order to count the adjacent dates within an ascendant ordered list. The jsonb data type is used because it allows to mix various data types inside the same array :
CREATE OR REPLACE FUNCTION count_date(x jsonb, y jsonb, d date)
RETURNS jsonb LANGUAGE sql AS
$$
SELECT CASE
WHEN d IS NULL
THEN COALESCE(x,y)
ELSE
to_jsonb(d :: text)
|| CASE
WHEN COALESCE(x,y) = '[]' :: jsonb
THEN '[1]' :: jsonb
WHEN COALESCE(x->>0, y->>0) :: date + 1 = d :: date
THEN jsonb_set(COALESCE(x-0, y-0), '{-1}', to_jsonb(COALESCE(x->>-1, y->>-1) :: integer + 1))
ELSE COALESCE(x-0, y-0) || to_jsonb(1)
END
END ;
$$
DROP AGGREGATE IF EXISTS count_date(jsonb, date) ;
CREATE AGGREGATE count_date(jsonb, date)
(
sfunc = count_date
, stype = jsonb
) ;
Then iterate on the count_date on your table grouped by id :
WITH list AS (
SELECT id, email, count_date('[]', timestamp ORDER BY timestamp :: timestamp) as count_list
FROM your_table
GROUP BY id, email
)
SELECT id, email, jsonb_array_elements(count_list-0) AS length
FROM list

Related

2 Level pivot using Postgresql

I have a table whose schema along with data (table_name : raw_data) appears to be this :
name | category | clear_date |
A | GOOD | 2020-05-30 |
A | GOOD | 2020-05-30 |
A | GOOD | 2020-05-30 |
A | GOOD | 2020-05-30 |
A | BAD | 2020-05-30 |
A | BAD | 2020-05-30 |
Now if I perform a "groupby" operation using the following statement :
SELECT name, category, date(clear_date), count(clear_date)
FROM raw_data
GROUP BY name, category, date(clear_date)
ORDER BY name
I get the following answer :
name | caetgory | date | count |
A | GOOD |2020-05-30 | 4 |
A | BAD |2020-05-30 | 1 |
A | BAD |2020-05-31 | 1 |
IN order to produce the pivot in following format :
name | category | 2020-05-30 | 2020-05-31 |
A | GOOD | 4 | NULL |
A | BAD | 1 | 1 |
I am using the following query :
select * from crosstab (
'select name, category, date(clear_date), count(clear_date) from raw_data group by name, category, date(clear_date) order by 1,2,3',
'select distinct date(clear_date) from raw_data order by 1'
)
as newtable (
node_name varchar, alarm_name varchar, "2020-05-30" integer, "2020-05-31" integer
)
ORDER BY name
But I am getting results as follows :
name | category | 2020-05-30 | 2020-05-31 |
A | BAD | 4 | 1 |
Can anyone please try to suggest how can i achieve the result mentioned above. It appears crosstab removes the duplicate entry of A automatically.
Not sure if this is possible using crosstab because you have a missing records in some dates. Here is an example how to get expected result but not sure is what you need. Anyway hope this helps.
SELECT r1.*, r2.counter AS "2020-05-30", r3.counter AS "2020-05-31"
FROM (
SELECT DISTINCT name, category
FROM raw_data
) AS r1
LEFT JOIN (
SELECT name, category, count(*) AS counter
FROM raw_data
WHERE clear_date = '2020-05-30'
GROUP BY name, category
) AS r2 ON (r2.category = r1.category AND r2.name = r1.name)
LEFT JOIN (
SELECT name, category, count(*) AS counter
FROM raw_data
WHERE clear_date = '2020-05-31'
GROUP BY name, category
) AS r3 ON (r3.category = r1.category AND r3.name = r1.name)
ORDER BY r1.category DESC;

Transpose rows to columns where transposed column changes based on another column

I want to transpose the rows to columns using Pivot function in Oracle and/or SQL Server using Pivot function. My use case is very similar to this Efficiently convert rows to columns in sql server
However, I am organizing data by specific data type (below StringValue and NumericValue is shown).
This is my example:
----------------------------------------------------------------------
| Id | Person_ID | ColumnName | StringValue | NumericValue |
----------------------------------------------------------------------
| 1 | 1 | FirstName | John | (null) |
| 2 | 1 | Amount | (null) | 100 |
| 3 | 1 | PostalCode | (null) | 112334 |
| 4 | 1 | LastName | Smith | (null) |
| 5 | 1 | AccountNumber | (null) | 123456 |
----------------------------------------------------------------------
This is my result:
---------------------------------------------------------------------
| FirstName |Amount| PostalCode | LastName | AccountNumber |
---------------------------------------------------------------------
| John | 100 | 112334 | Smith | 123456 |
---------------------------------------------------------------------
How can I build the SQL Query?
I have already tried using MAX(DECODE()) and CASE statement in Oracle. However the performance is very poor. Looking to see if Pivot function in Oracle and/or SQL server can do this faster. Or should I go to single column value?
Below code will satisfy your requirement
Create table #test
(id int,
person_id int,
ColumnName varchar(50),
StringValue varchar(50),
numericValue varchar(50)
)
insert into #test values (1,1,'FirstName','John',null)
insert into #test values (2,1,'Amount',null,'100')
insert into #test values (3,1,'PostalCode',null,'112334')
insert into #test values (4,1,'LastName','Smith',null)
insert into #test values (5,1,'AccountNumber',null,'123456')
--select * from #test
Declare #Para varchar(max)='',
#Para1 varchar(max)='',
#main varchar(max)=''
select #Para += ','+QUOTENAME(ColumnName)
from (select distinct ColumnName from #test) as P
set #Para1= stuff(#para ,1,1,'')
print #Para1
set #main ='select * from (
select coalesce(StringValue,numericValue) as Val,ColumnName from #test) as Main
pivot
(
min(val) for ColumnName in ('+#Para1+')
) as pvt'
Exec(#main)

PostgreSQL Case statement based on other values in same Group

I have the following table and want to write a SELECT (in PostgreSQL) that will produce the table below it. It should GROUP by timestamp and where there's an 'X' and a 'Y' with the same timestamp, it should be classed as 'Standard', otherwise 'Premium' if there are only 'X's in the same timestamp.
| timestamp | data |
|--------------------------------|
| 2018-08-13 09:26:10.872 | X |
| 2018-08-13 09:26:10.872 | Y |
| 2018-08-13 09:26:11.125 | X |
| 2018-08-13 09:26:11.125 | X |
| timestamp | type |
|-------------------------|----------|
| 2018-08-13 09:26:10.872 | Standard |
| 2018-08-13 09:26:11.125 | Premium |
I have gotten as far as writing the following:
SELECT
timestamp,
CASE
WHEN -- ??
THEN 'Standard'
ELSE 'Premium'
END AS type
FROM my_table
GROUP BY timestamp, type;
You will want to group by timestamp only, and to count the distinct data entries. To get a boolean value out of this count (required by the case when), you would compare it to 1:
SELECT
timestamp,
CASE
WHEN count(distinct data) > 1
THEN 'Standard'
ELSE 'Premium'
END AS type
FROM my_table
GROUP BY timestamp;

Postgresql use more than one row as expression in sub query

As the title says, I need to create a query where I SELECT all items from one table and use those items as expressions in another query. Suppose I have the main table that looks like this:
main_table
-------------------------------------
id | name | location | //more columns
---|------|----------|---------------
1 | me | pluto | //
2 | them | mercury | //
3 | we | jupiter | //
And the sub query table looks like this:
some_table
---------------
id | item
---|-----------
1 | sub-col-1
2 | sub-col-2
3 | sub-col-3
where each item in some_table has a price which is in an amount_table like so:
amount_table
--------------
1 | 1000
2 | 2000
3 | 3000
So that the query returns results like this:
name | location | sub-col-1 | sub-col-2 | sub-col-3 |
----------------------------------------------------|
me | pluto | 1000 | | |
them | mercury | | 2000 | |
we | jupiter | | | 3000 |
My query currently looks like this
SELECT name, location, (SELECT item FROM some_table)
FROM main_table
INNER JOIN amount_table WHERE //match the id's
But I'm running into the error more than one row returned by a subquery used as an expression
How can I formulate this query to return the desired results?
you should decide on expected result.
to get one-tp-many relation:
SELECT name, location, some_table.item
FROM main_table
JOIN some_table on true -- or id if they match
INNER JOIN amount_table --WHERE match the id's
to get one-to-one with all rows:
SELECT name, location, (SELECT array_agg(item) FROM some_table)
FROM main_table
INNER JOIN amount_table --WHERE //match the id's

Updating multiple rows with a certain value from the same table

So, I have the next table:
time | name | ID |
12:00:00| access | 1 |
12:05:00| select | null |
12:10:00| update | null |
12:15:00| insert | null |
12:20:00| out | null |
12:30:00| access | 2 |
12:35:00| select | null |
The table is bigger (aprox 1-1,5 mil rows) and there will be ID equal to 2,3,4 etc and rows between.
The following should be the result:
time | name | ID |
12:00:00| access | 1 |
12:05:00| select | 1 |
12:10:00| update | 1 |
12:15:00| insert | 1 |
12:20:00| out | 1 |
12:30:00| access | 2 |
12:35:00| select | 2 |
What is the most simple method to update the rows without making the log full? Like, one ID at a time.
You can do it with a sub query:
UPDATE YourTable t
SET t.ID = (SELECT TOP 1 s.ID
FROM YourTable s
WHERE s.time < t.time AND s.name = 'access'
ORDER BY s.time DESC)
WHERE t.name <> 'access'
Index on (ID,time,name) will help.
You can do it using CTE as below:
;WITH myCTE
AS ( SELECT time
, name
, ROW_NUMBER() OVER ( PARTITION BY name ORDER BY time ) AS [rank]
, ID
FROM YourTable
)
UPDATE myCTE
SET myCTE.ID = myCTE.rank
SELECT *
FROM YourTable ORDER BY ID