PostgreSQL Case statement based on other values in same Group - postgresql

I have the following table and want to write a SELECT (in PostgreSQL) that will produce the table below it. It should GROUP by timestamp and where there's an 'X' and a 'Y' with the same timestamp, it should be classed as 'Standard', otherwise 'Premium' if there are only 'X's in the same timestamp.
| timestamp | data |
|--------------------------------|
| 2018-08-13 09:26:10.872 | X |
| 2018-08-13 09:26:10.872 | Y |
| 2018-08-13 09:26:11.125 | X |
| 2018-08-13 09:26:11.125 | X |
| timestamp | type |
|-------------------------|----------|
| 2018-08-13 09:26:10.872 | Standard |
| 2018-08-13 09:26:11.125 | Premium |
I have gotten as far as writing the following:
SELECT
timestamp,
CASE
WHEN -- ??
THEN 'Standard'
ELSE 'Premium'
END AS type
FROM my_table
GROUP BY timestamp, type;

You will want to group by timestamp only, and to count the distinct data entries. To get a boolean value out of this count (required by the case when), you would compare it to 1:
SELECT
timestamp,
CASE
WHEN count(distinct data) > 1
THEN 'Standard'
ELSE 'Premium'
END AS type
FROM my_table
GROUP BY timestamp;

Related

Counting consecutive days in postgres

I'm trying to count the number of consecutive days in two tables with the following structure:
| id | email | timestamp |
| -------- | -------------- | -------------- |
| 1 | hello#example.com | 2021-10-22 00:35:22 |
| 2 | hello2#example.com | 2021-10-21 21:17:41 |
| 1 | hello#example.com | 2021-10-19 00:35:22 |
| 1 | hello#example.com | 2021-10-18 00:35:22 |
| 1 | hello#example.com | 2021-10-17 00:35:22 |
I would like to count the number of consecutive days of activity. The data above would show:
| id | email | length |
| -------- | -------------- | -- |
| 1 | hello#example.com | 1 |
| 2 | hello2#example.com | 1 |
| 1 | hello#example.com | 3 |
This is made more difficult because I need to join the two tables using a UNION (or something similar and then run the grouping. I tried to build on this query (Finding the length of a series in postgres) but I'm unable to group by consecutive days.
select max(id) as max_id, email, count(*) as length
from (
select *, row_number() over wa - row_number() over wp as grp
from began_playing_video
window
wp as (partition by email order by id desc),
wa as (order by id desc)
) s
group by email, grp
order by 1 desc
Any ideas on how I could do this in Postgres?
First create an aggregate function in order to count the adjacent dates within an ascendant ordered list. The jsonb data type is used because it allows to mix various data types inside the same array :
CREATE OR REPLACE FUNCTION count_date(x jsonb, y jsonb, d date)
RETURNS jsonb LANGUAGE sql AS
$$
SELECT CASE
WHEN d IS NULL
THEN COALESCE(x,y)
ELSE
to_jsonb(d :: text)
|| CASE
WHEN COALESCE(x,y) = '[]' :: jsonb
THEN '[1]' :: jsonb
WHEN COALESCE(x->>0, y->>0) :: date + 1 = d :: date
THEN jsonb_set(COALESCE(x-0, y-0), '{-1}', to_jsonb(COALESCE(x->>-1, y->>-1) :: integer + 1))
ELSE COALESCE(x-0, y-0) || to_jsonb(1)
END
END ;
$$
DROP AGGREGATE IF EXISTS count_date(jsonb, date) ;
CREATE AGGREGATE count_date(jsonb, date)
(
sfunc = count_date
, stype = jsonb
) ;
Then iterate on the count_date on your table grouped by id :
WITH list AS (
SELECT id, email, count_date('[]', timestamp ORDER BY timestamp :: timestamp) as count_list
FROM your_table
GROUP BY id, email
)
SELECT id, email, jsonb_array_elements(count_list-0) AS length
FROM list

Update column with correct daterange using generate_series

I have a column with incorrect dateranges (a day is missing). The code
to generate these dateranges was written by a previous employee and
cannot be found.
The dateranges look like this, notice the missing day:
+-------+--------+-------------------------+
| id | client | date_range |
+-------+--------+-------------------------+
| 12885 | 30 | [2016-01-07,2016-01-13) |
| 12886 | 30 | [2016-01-14,2016-01-20) |
| 12887 | 30 | [2016-01-21,2016-01-27) |
| 12888 | 30 | [2016-01-28,2016-02-03) |
| 12889 | 30 | [2016-02-04,2016-02-10) |
| 12890 | 30 | [2016-02-11,2016-02-17) |
| 12891 | 30 | [2016-02-18,2016-02-24) |
+-------+--------+-------------------------+
And should look like this:
+-------------------------+
| range |
+-------------------------+
| [2016-01-07,2016-01-14) |
| [2016-01-14,2016-01-21) |
| [2016-01-21,2016-01-28) |
| [2016-01-28,2016-02-04) |
| [2016-02-04,2016-02-11) |
| [2016-02-11,2016-02-18) |
| [2016-02-18,2016-02-25) |
| [2016-02-25,2016-03-03) |
+-------------------------+
The code I've written to generate correct dateranges looks like this:
create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)
returns setof date as $$
select (startsOn + (frequency * count))::date
from (
select (row_number() over ()) - 1 as count
from generate_series(startsOn, endsOn, frequency)
) series
$$ language sql immutable;
select DATERANGE(
generate_date_series(
'2016-01-07'::date, '2024-11-07'::date, interval '7days'
)::date,
generate_date_series(
'2016-01-14'::date, '2024-11-13'::date, interval '7days'
)::date
) as range;
However, I'm having trouble trying to update the column with the
correct dateranges. I initially executed this UPDATE query on a test
database I created:
update factored_daterange set date_range = dt.range from (
select daterange(
generate_date_series(
'2016-01-07'::date, '2024-11-07'::date, interval '7days'
)::date,
generate_date_series(
'2016-01-14'::date, '2024-11-14'::date, interval '7days'
)::date ) as range ) dt where client_id=30;
But that is not correct, it simply assigns the first generated
daterange to each row. I want to essentially update the dateranges
row-by-row since there is no other join or condition I can match the
dates up to. Any assistance in this matter is greatly appreciated.
Your working too hard. Just update the upper range value.
update your_table_name
set date_range = daterange(lower(date_range),(upper(date_range) + interval '1 day')::date) ;

How to select rows in a postgres table based on a date and store it in a new table?

I have a postgres table which has some data.Each row has a date associated with it.I want to extract rows for the dates which has the month as April.Here is a csv version of my postgres table data
,date,location,device,provider,cpu,mem,load,drops,id,latency,gw_latency,upload,download,sap_drops,sap_latency,alert_id
0,2018-02-10 11:52:59.342269+00:00,BEM,10.11.100.1,COD,6.0,23.0,11.75,0.0,,,,,,,,
1,2018-02-10 11:53:04.006971+00:00,VER,10.11.100.1,KOD,6.0,23.0,4.58,0.0,,,,,,,,
2,2018-03-25 20:28:36.186015+00:00,RET,10.11.100.1,POL,7.0,26.0,9.83,0.0,,86.328,5.0,4.33,15.33,0.0,23.0,
3,2018-03-25 20:28:59.155453+00:00,ASR,10.12.100.1,VOL,5.0,14.0,2.67,0.0,,52.406,12.0,2.17,3.17,0.0,28.0,
4,2018-04-01 13:16:44.472119+00:00,RED,10.19.0.1,SEW,6.0,14.0,2.77,0.0,,52.766,2.0,3.25,2.29,0.0,1.0,0.0
5,2018-04-01 13:16:48.478708+00:00,RED,10.19.0.1,POL,6.0,14.0,4.065,0.0,,52.766,1.0,6.63,1.5,0.0,1.0,0.0
6,2018-04-06 21:00:44.769702+00:00,GOK,10.61.100.1,FDE,4.0,22.0,3.08,0.0,,54.406,8.0,3.33,2.83,0.0,19.0,0.0
7,2018-04-06 21:01:07.211395+00:00,WER,10.4.100.1,FDE,3.0,3.0,9.28,0.0,,0.346,2.0,10.54,8.02,0.0,33.0,0.0
8,2018-04-13 11:18:08.411550+00:00,DER,10.19.0.1,CVE,14.0,14.0,7.88,0.0,,50.545,2.0,6.17,9.59,0.0,1.0,0.0
9,2018-04-13 11:18:12.420974+00:00,RTR,10.19.0.1,BOL,14.0,14.0,1.345,0.0,,50.545,1.0,2.26,0.43,0.0,1.0,0.0
So I want only the rows which has a month of april data such that I will have a table which looks something like this
4,2018-04-01 13:16:44.472119+00:00,RED,10.19.0.1,SEW,6.0,14.0,2.77,0.0,,52.766,2.0,3.25,2.29,0.0,1.0,0.0
5,2018-04-01 13:16:48.478708+00:00,RED,10.19.0.1,POL,6.0,14.0,4.065,0.0,,52.766,1.0,6.63,1.5,0.0,1.0,0.0
6,2018-04-06 21:00:44.769702+00:00,GOK,10.61.100.1,FDE,4.0,22.0,3.08,0.0,,54.406,8.0,3.33,2.83,0.0,19.0,0.0
7,2018-04-06 21:01:07.211395+00:00,WER,10.4.100.1,FDE,3.0,3.0,9.28,0.0,,0.346,2.0,10.54,8.02,0.0,33.0,0.0
8,2018-04-13 11:18:08.411550+00:00,DER,10.19.0.1,CVE,14.0,14.0,7.88,0.0,,50.545,2.0,6.17,9.59,0.0,1.0,0.0
9,2018-04-13 11:18:12.420974+00:00,RTR,10.19.0.1,BOL,14.0,14.0,1.345,0.0,,50.545,1.0,2.26,0.43,0.0,1.0,0.0
Now If I try to extract a particular date with the below query
select * from metrics_data where date = 2018-04-13;
I get the error message
No operator matches the given name and argument type(s). You might need to add explicit type casts.
How do I get the rows for the month of April and store it in a new table say april_data?
Below is the structure of my existing table
Column | Type | Modifiers | Storage | Stats target | Description
-------------+--------------------------+-----------+----------+--------------+-------------
date | timestamp with time zone | | plain | |
location | character varying(255) | | extended | |
device | character varying(255) | | extended | |
provider | character varying(255) | | extended | |
cpu | double precision | | plain | |
mem | double precision | | plain | |
load | double precision | | plain | |
drops | double precision | | plain | |
id | integer | | plain | |
latency | double precision | | plain | |
gw_latency | double precision | | plain | |
upload | double precision | | plain | |
download | double precision | | plain | |
sap_drops | double precision | | plain | |
sap_latency | double precision | | plain | |
alert_id | double precision | | plain | |
The type of column date in your table is timestamp with time zone which format will be YYYY:MM:DD HH24:MI:SS.MS. In the query you make an operation timestamp with time zone = date, so it will throw an error.
So, if you want to fix it, you should fix one side to the type of other.
In your case I suggest as below:
Match exact 1 day.
select * from metrics_data where date(date) = '2018-04-13';
Match within 1 month.
select * from metrics_data where date BETWEEN '2018-04-01 00:00:00' AND '2018-04-30 23:59:59.999';
OR
select * from metrics_data where date(date) BETWEEN '2018-04-01' AND '2018-04-30';
OR
select * from metrics_data where to_char(date,'YYYY-MM') = '2018-04';
Match only April.
select * from metrics_data where to_char(date,'MM') = '04';
OR
select * from metrics_data where extract(month from date) = 4;
Hopefully this answer will help you.
You need single quotes around the string literal.
PostgreSQL will automatically cast it to the correct data type (timestamp with time zone).
You could use the extract function to select only the dates from April:
SELECT * FROM yourtable WHERE extract (month FROM yourtable.date) = 4;

How to query just the last record of every second within a period of time in postgres

I have a table with hundreds of millions of records in 'prices' table with only four columns: uid, price, unit, dt. dt is a datetime in standard format like '2017-05-01 00:00:00.585'.
I can quite easily to select a period using
SELECT uid, price, unit from prices
WHERE dt > '2017-05-01 00:00:00.000'
AND dt < '2017-05-01 02:59:59.999'
What I can't understand how to select price for every last record in each second. (I also need a very first one of each second too, but I guess it will be a similar separate query). There are some similar example (here), but they did not work for me when I try to adapt them to my needs generating errors.
Could some please help me to crack this nut?
Let say that there is a table which has been generated with a help of this command:
CREATE TABLE test AS
SELECT timestamp '2017-09-16 20:00:00' + x * interval '0.1' second As my_timestamp
from generate_series(0,100) x
This table contains an increasing series of timestamps, each timestamp differs by 100 milliseconds (0.1 second) from neighbors, so that there are 10 records within each second.
| my_timestamp |
|------------------------|
| 2017-09-16T20:00:00Z |
| 2017-09-16T20:00:00.1Z |
| 2017-09-16T20:00:00.2Z |
| 2017-09-16T20:00:00.3Z |
| 2017-09-16T20:00:00.4Z |
| 2017-09-16T20:00:00.5Z |
| 2017-09-16T20:00:00.6Z |
| 2017-09-16T20:00:00.7Z |
| 2017-09-16T20:00:00.8Z |
| 2017-09-16T20:00:00.9Z |
| 2017-09-16T20:00:01Z |
| 2017-09-16T20:00:01.1Z |
| 2017-09-16T20:00:01.2Z |
| 2017-09-16T20:00:01.3Z |
.......
The below query determines and prints the first and the last timestamp within each second:
SELECT my_timestamp,
CASE
WHEN rn1 = 1 THEN 'First'
WHEN rn2 = 1 THEN 'Last'
ELSE 'Somwhere in the middle'
END as Which_row_within_a_second
FROM (
select *,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp
) rn1,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp DESC
) rn2
from test
) xx
WHERE 1 IN (rn1, rn2 )
ORDER BY my_timestamp
;
| my_timestamp | which_row_within_a_second |
|------------------------|---------------------------|
| 2017-09-16T20:00:00Z | First |
| 2017-09-16T20:00:00.9Z | Last |
| 2017-09-16T20:00:01Z | First |
| 2017-09-16T20:00:01.9Z | Last |
| 2017-09-16T20:00:02Z | First |
| 2017-09-16T20:00:02.9Z | Last |
| 2017-09-16T20:00:03Z | First |
| 2017-09-16T20:00:03.9Z | Last |
| 2017-09-16T20:00:04Z | First |
| 2017-09-16T20:00:04.9Z | Last |
| 2017-09-16T20:00:05Z | First |
| 2017-09-16T20:00:05.9Z | Last |
A working demo you can find here

Join column with timestamps where value is maximum

I have a table that looks like
+-------+-----------+
| value | timestamp |
+-------+-----------+
and I'm trying to build a query that gives a result like
+-------+-----------+------------+------------------------+
| value | timestamp | MAX(value) | timestamp of max value |
+-------+-----------+------------+------------------------+
so that the result looks like
+---+----------+---+----------+
| 1 | 1.2.1001 | 3 | 1.1.1000 |
| 2 | 5.5.1021 | 3 | 1.1.1000 |
| 3 | 1.1.1000 | 3 | 1.1.1000 |
+---+----------+---+----------+
but I got stuck on joining the column with the corresponding timestamps.
Any hints or suggestions?
Thanks in advance!
For further information (if that helps):
In the real project the max-values are grouped by month and day (with group by clause, which works btw), but somehow I got stuck on joining the timestamps for max-values.
EDIT
Cross joins are a good idea, but I want to have them grouped by month e.g.:
+---+----------+---+----------+
| 1 | 1.1.1101 | 6 | 1.1.1300 |
| 2 | 2.6.1021 | 5 | 5.6.1000 |
| 3 | 1.1.1200 | 6 | 1.1.1300 |
| 4 | 1.1.1040 | 6 | 1.1.1300 |
| 5 | 5.6.1000 | 5 | 5.6.1000 |
| 6 | 1.1.1300 | 6 | 1.1.1300 |
+---+----------+---+----------+
EDIT 2
I've added a fiddle for some sample data and and example of the current query.
http://sqlfiddle.com/#!1/efa42/1
How to add the corresponding timestamp to the maximum?
Try a cross join with two sub queries, the first one selects all records, the second one gets one row that represents the time_stamp of the max value, <3;"1000-01-01"> for example.
SELECT col_value,col_timestamp,max_col_value, col_timestamp_of_max_value FROM table1
cross join
(
select max(col_value) max_col_value ,col_timestamp col_timestamp_of_max_value from table1
group by col_timestamp
order by max_col_value desc
limit 1
) A --One row that represents the time_stamp of the max value, ie: <3;"1000-01-01">
Use the window cause you use with pg
Select *, max( value ) over (), max( timestamp ) over() from table
That gives you the max values from all values in every row
http://www.postgresql.org/docs/9.1/static/tutorial-window.html