Get each 7th record - postgresql

I have a statistical table in PostgreSQL table:
article_id | date | read_count
1 | 2011-06-02 | 12
1 | 2011-06-03 | 54
1 | 2011-06-04 | 2
1 | 2011-06-05 | 432
And as i need this data in a chart, i need to get values in each week.
I know Postgres has a row_number() function, anyway i didn't get it to work properly.
SELECT "date", "read_count" FROM "articles_stats"
WHERE row_number() OVER (ORDER BY "date" ASC) % 7 = 0
ORDER BY "date" ASC
ERROR: window functions not allowed in WHERE clause

From #pst's link,
"If there is a need to filter or group rows after the window
calculations are performed, you can use a sub-select."
For example:
SELECT *
FROM (
SELECT "date", "read_count",
row_number() OVER (ORDER BY "date" ASC) as n
FROM "articles_stats"
) x
WHERE x.n % 7 = 0
ORDER BY x."date" ASC

Related

PostgreSQL how to generate a partition row_number() with certain numbers overridden

I have an unusual problem I'm trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).
I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don't need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there's some way I can utilise recursive.
I've not been able to solve this problem yet, but I've put together a SQL Fiddle which provides a simplified version:
http://sqlfiddle.com/#!17/236b5/5
The desired_dynamic_number is what I'm trying to generate and the generated_dynamic_number is my current work-in-progress attempt.
Any pointers around the best way to achieve the desired_dynamic_number values dynamically?
Update:
I'm almost there using lag:
http://sqlfiddle.com/#!17/236b5/24
step-by-step demo:db<>fiddle
SELECT
*,
COALESCE( -- 3
first_value(override_as_number) OVER w -- 2
, 1
)
+ row_number() OVER w - 1 -- 4, 5
FROM (
SELECT
*,
SUM( -- 1
CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
as grouped
FROM sample
) s
WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
first_value() gives the first value of such subpartition.
The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
row_number() - 1 creates a row count within a subpartition, starting with 0.
Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.
Below query gives exact result, but you need to verify with all combinations
select c.*,COALESCE(c.override_as_number,c.act) as final FROM
(
select b.*, dense_rank() over(partition by grouped_by order by grouped_by, actual) as act from
(
select a.*,COALESCE(override_as_number,row_num) as actual FROM
(
select grouped_by , secondary_order_by ,
dense_rank() over ( partition by grouped_by order by grouped_by, secondary_order_by ) as row_num
,override_as_number,desired_dynamic_number from fiddle
) a
) b
) c ;
column "final" is the result
grouped_by | secondary_order_by | row_num | override_as_number | desired_dynamic_number | actual | act | final
------------+--------------------+---------+--------------------+------------------------+--------+-----+-------
AAA | a | 1 | 1 | 1 | 1 | 1 | 1
AAA | b | 2 | | 2 | 2 | 2 | 2
AAA | c | 3 | 3 | 3 | 3 | 3 | 3
AAA | d | 4 | 3 | 3 | 3 | 3 | 3
AAA | e | 5 | | 4 | 5 | 4 | 4
AAA | f | 6 | | 5 | 6 | 5 | 5
AAA | g | 7 | 999 | 999 | 999 | 6 | 999
XYZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | b | 2 | | 2 | 2 | 2 | 2
(10 rows)
Hope this helps!
The real world problem I was trying to solve did not have a nicely ordered secondary_order_by column, instead it would be something a bit more randomised (a created timestamp).
For the benefit of people who stumble across this question with a similar problem to solve, a colleague solved this problem using a cartesian join, who's solution I'm posting below. The solution is Snowflake SQL which should be possible to adapt to Postgres. It does fall down on higher override_as_number values though unless the from table(generator(rowcount => 1000)) 1000 value is not increased to something suitably high.
The SQL:
with tally_table as (
select row_number() over (order by seq4()) as gen_list
from table(generator(rowcount => 1000))
),
base as (
select *,
IFF(override_as_number IS NULL, row_number() OVER(PARTITION BY grouped_by, override_as_number order by random),override_as_number) as rownum
from "SANDPIT"."TEST"."SAMPLEDATA" order by grouped_by,override_as_number,random
) --select * from base order by grouped_by,random;
,
cart_product as (
select *
from tally_table cross join (Select distinct grouped_by from base ) as distinct_grouped_by
) --select * from cart_product;
,
filter_product as (
select *,
row_number() OVER(partition by cart_product.grouped_by order by cart_product.grouped_by,gen_list) as seq_order
from cart_product
where CONCAT(grouped_by,'~',gen_list) NOT IN (select concat(grouped_by,'~',override_as_number) from base where override_as_number is not null)
) --select * from try2 order by 2,3 ;
select base.grouped_by,
base.random,
base.override_as_number,
base.answer, -- This is hard coded as test data
IFF(override_as_number is null, gen_list, seq_order) as computed_answer
from base inner join filter_product on base.rownum = filter_product.seq_order and base.grouped_by = filter_product.grouped_by
order by base.grouped_by,
random;
In the end I went for a simpler solution using a temporary table and cursor to inject override_as_number values and shuffle other numbers.

MySQL group by timestamp difference

I need to write mysql query which will group results by difference between timestamps.
Is it possible?
I have table with locations and every row has created_at (timestamp) and I want to group results by difference > 1min.
Example:
id | lat | lng | created_at
1. | ... | ... | 2020-05-03 06:11:35
2. | ... | ... | 2020-05-03 06:11:37
3. | ... | ... | 2020-05-03 06:11:46
4. | ... | ... | 2020-05-03 06:12:48
5. | ... | ... | 2020-05-03 06:12:52
Result of this data should be 2 groups (1,2,3) and (4,5)
It depends on what you actually want. If youw want to group together records that belong to the same minute, regardless of the difference with the previous record, then simple aggregation is enough:
select
date_format(created_at, '%Y-%m-%d %H:%i:00') date_minute,
min(id) min_id,
max(id) max_id,
min(created_at) min_created_at,
max(created_at) max_created_at,
count(*) no_records
from mytable
group by date_minute
On the other hand, if you want to build groups of consecutive records that have less than 1 minute gap in between, this is a gaps and islands problem. Here is on way to solve it using window functions (available in MySQL 8.0):
select
min(id) min_id,
max(id) max_id,
min(created_at) min_created_at,
max(created_at) max_created_at,
count(*) no_records
from (
select
t.*,
sum(case when created_at < lag_created_at + interval 1 minute then 0 else 1 end)
over(order by created_at) grp
from (
select
t.*,
lag(created_at) over(order by created_at) lag_created_at
from mytable t
) t
) t
group by grp

How to force query to return only first row from window?

I have data:
id | price | date
1 | 25 | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
Is it possible to write such query which will return only first row from window? something like LIMIT 1 but for the window OVER( date )?
I expect next result:
id | price | date
1 | 25 | 2019-01-01
1 | 27 | 2019-02-01
Or ignore whole window if first window row has NULL:
id | price | date
1 | NULL | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
result:
1 | 27 | 2019-02-01
Order the rows by date and id, and take only the first row per date.
Then remove those where the price is NULL.
SELECT *
FROM (SELECT DISTINCT ON (date)
id, price, date
FROM mytable
ORDER BY date, id
) AS q
WHERE price IS NOT NULL;
#Laurenz let me to provide a bit more explanation
select distinct on (<fldlist>) * from <table> order by <fldlist+>;
is equal to much more complex query:
select * from (
select row_number() over (partition by <fldlist> order by <fldlist+>) as rn,*
from <table>)
where rn = 1;
And here <fldlist> should be the beginning part (or equal) of <fldlist+>
As Myon on IRC said:
if you want to use a window function in WHERE, you need to put it into a subselect first
So the target query is:
select * from (
select
*
agg_function( my_field ) OVER( PARTITION BY other_field ) as agg_field
from sometable
) x
WHERE agg_field <condition>
In my case I have next query:
SELECT * FROM (
SELECT *,
FIRST_VALUE( p.price ) over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS first_price,
ROW_NUMBER() over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS row_number
FROM st
LEFT JOIN price p ON <COND>
LEFT JOIN currency_rate crate ON <COND>
) p
WHERE p.row_number = 1 AND p.first_price IS NOT null
Here I select only first rows from the group and where price IS NOT NULL

Finding the length of a series in postgres

A tricky query for postgres. Imagine, I have a set of rows with a boolean column called (for example) success. Like this:
id | success
9 | false
8 | false
7 | true
6 | true
5 | true
4 | false
3 | false
2 | true
1 | false
And I need to calculate a length of the latest (not) successful series. E. g. in this case it would be "3" for successful and "2" for not successful. Or using window functions, then something like:
id | success | length
9 | false | 2
8 | false | 2
7 | true | 3
6 | true | 3
5 | true | 3
4 | false | 1
3 | true | 2
2 | true | 2
1 | false | 1
(note that I generally need a length of only the latest series, not all of those)
The closest answer I've found so far was this article:
https://jaxenter.com/10-sql-tricks-that-you-didnt-think-were-possible-125934.html
(See #5)
However, postgres doesn't support "IGNORE NULLS" option so the query doesn't work. Without "IGNORE NULLS" it simply returns me nulls in length column.
Here is the closest I was able to get:
WITH
trx1(id, success, rn) AS (
SELECT id, success, row_number() OVER (ORDER BY id desc)
FROM results
),
trx2(id, success, rn, lo, hi) AS (
SELECT trx1.*,
CASE WHEN coalesce(lag(success) OVER (ORDER BY id DESC), FALSE) != success THEN rn END,
CASE WHEN coalesce(lead(success) OVER (ORDER BY id DESC), FALSE) != success THEN rn END
FROM trx1
)
SELECT trx2.*, 1
- last_value (lo) IGNORE nulls OVER (ORDER BY id DESC ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW)
+ first_value(hi) OVER (ORDER BY id DESC ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING)
AS length FROM trx2;
Do you have any ideas of such a query?
You can use the window function row_number() to designate series:
select max(id) as max_id, success, count(*) as length
from (
select *, row_number() over wa - row_number() over wp as grp
from my_table
window
wp as (partition by success order by id desc),
wa as (order by id desc)
) s
group by success, grp
order by 1 desc
max_id | success | length
--------+---------+--------
9 | f | 2
7 | t | 3
4 | f | 2
2 | t | 1
1 | f | 1
(5 rows)
DbFiddle.
Even though answer by Klin is totally correct, I'd like to post another solution my friend suggested:
with last_success as (
select max(id) id from my_table where success
)
select count(mt.id) last_fails_count
from my_table mt, last_success lt
where mt.id > lt.id;
--------------------
| last_fails_count |
--------------------
| 2 |
--------------------
DbFiddle
It is twice faster if I only need to get the last failing or successful series.

Iterate through rows, compare them against each other and store results in another table

I have a table that contains the following rows:
product_id | order_date
A | 12/04/12
A | 01/11/13
A | 01/21/13
A | 03/05/13
B | 02/14/13
B | 03/09/13
What I now need is an overview for each month, how many products have been bought for the first time (=have not been bought the month before), how many are existing products (=have been bought the month before) and how many have not been purchased within a given month. Taken the sample above as an input, the script should deliver the following result, regardless of what period of time is in the data:
month | new | existing | nopurchase
12/2012 | 1 | 0 | 0
01/2013 | 0 | 1 | 0
02/2013 | 1 | 0 | 1
03/2013 | 1 | 1 | 0
Would be great to get a first hint how this could be solved so I'm able to continue.
Thanks!
SQL Fiddle
with t as (
select product_id pid, date_trunc('month', order_date)::date od
from t
group by 1, 2
)
select od,
sum(is_new::integer) "new",
sum(is_existing::integer) existing,
sum(not_purchased::integer) nopurchase
from (
select od,
lag(t_pid) over(partition by s_pid order by od) is null and t_pid is not null is_new,
lag(t_pid) over(partition by s_pid order by od) is not null and t_pid is not null is_existing,
lag(t_pid) over(partition by s_pid order by od) is not null and t_pid is null not_purchased
from (
select t.pid t_pid, s.pid s_pid, s.od
from
t
right join
(
select pid, s.od
from
t
cross join
(
select date_trunc('month', d)::date od
from
generate_series(
(select min(od) from t),
(select max(od) from t),
'1 month'
) s(d)
) s
group by pid, s.od
) s on t.od = s.od and t.pid = s.pid
) s
) s
group by 1
order by 1