Filling Postgres table gaps with oldest available value - postgresql

I am trying to fill the missing colored rows with the earliest value which remains valid until a new value is set using SQL queries in Postgres database

You can use first_value() like this:
select
client,
activity_date,
val,
first_value(val) over (partition by client, grp_t) as new_val
from (
select client, activity_date, val,
sum(case when val is not null then 1 end) over (order by client, activity_date) as grp_t
from filling
) t
order by 1,2,3;
Here's dbfiddle example
upd. Fixed new_val due to issue when the first row per client is NULL

Related

Removing null values from lead function

I have used the following postgresql query to find the maximum difference between timestamp events for each user:
select
sq.user_id,
max(sq.diffs) inactivity
from (
select
user_id,
(lead("when", 1, now()) over (partition by user_id order by "when") - "when") as diffs
from tracking_viewed
) as sq
group by sq.user_id
order by inactivity desc;
This query works for a different table, but it returns all null values for the "when" column that includes nulls.
How can I remove or skip nulls from the lead and partition functions?

PostgreSQL row diff timestamp, and calculate stddev for group

I have a table with an ID column called mmsi and another column of timestamp, with multiple timestamps per mmsi.
For each mmsi I want to calculate the standard deviation of the difference between consecutive timestamps.
I'm not very experienced with SQL but have tried to construct a function as follows:
SELECT
mmsi, stddev(time_diff)
FROM
(SELECT mmsi,
EXTRACT(EPOCH FROM (timestamp - lag(timestamp) OVER (ORDER BY mmsi ASC, timestamp ASC)))
FROM ais_messages.ais_static
ORDER BY mmsi ASC, timestamp ASC) AS time_diff
WHERE time_diff IS NOT NULL
GROUP BY mmsi;
Your query looks on the right track, but it has several problems. You labelled your subquery, which looks almost right, with an alias which you then select. But this subquery returns multiple rows and columns so this doesn't make any sense. Here is a corrected version:
SELECT
t.mmsi,
STDDEV(t.time_diff) AS std
FROM
(
SELECT
mmsi,
EXTRACT(EPOCH FROM (timestamp - LAG(timestamp) OVER
(PARTITION BY mmsi ORDER BY timestamp))) AS time_diff
FROM ais_messages.ais_static
ORDER BY mmsi, timestamp
) t
WHERE t.time_diff IS NOT NULL
GROUP BY t.mmsi
This approach should be fine but there is one edge case where it might not behave as expected. If a given mmsi group have only one record, then it would not even appear in the result set of standard deviations. This is because the LAG calculation would return NULL for that single record and it would be filtered off.

PostgreSQL - get records with null values

I'm trying to get a query which would show distributors that haven't sell anything in 90 days, but the problem I get is with NULL values. It seems PostgreSQL ignores null values, even when I queried to show it (or maybe I did it in wrong way).
Let say there are 1000 distributors, but with this query I only get 1 distributor, but there should be more distributors that didn't sell anything, because if I write SQL query to show distributors that sold by any amount in the last 90 days, it shows about 500. So I wonder where are those other 499? If I understand correctly, those other 499, didn't have any sales, so all records are null and are not showed in query.
Does anyone know how to make it show null values of one table where in relation other table is not null? (like partners table (res_partner) is not null, but sale_order table (sales) or object is null? (I also tried to filter like so.id IS NULL, but in such way I get empty query)
Code of my query:
(
SELECT
min(f1.id) as id,
f1.partner as partner,
f1.sum1
FROM
(
SELECT
min(f2.id) as id,
f2.partner as partner,
sum(f2.null_sum) as sum1
FROM
(
SELECT
min(rp.id) as id,
rp.search_name as partner,
CASE
WHEN
sol.price_subtotal IS NULL
THEN
0
ELSE
sol.price_subtotal
END as null_sum
FROM
sale_order as so,
sale_order_line as sol,
res_partner as rp
WHERE
sol.order_id=so.id and
so.partner_id=rp.id
and
rp.distributor=TRUE
and
so.date_order <= now()::timestamp::date
and
so.date_order >= date_trunc('day', now() - '90 day'::interval)::timestamp::date
and
rp.contract_date <= date_trunc('day', now() - '90 day'::interval)::timestamp::date
GROUP BY
partner,
null_sum
)as f2
GROUP BY
partner
) as f1
WHERE
sum1=0
GROUP BY
partner,
sum1
)as fld
EDIT: 2012-09-18 11 AM.
I think I understand why Postgresql behaves like this. It is because of the time interval. It checks if there is any not null value in that inverval. So it only found one record, because that record had sale order with zero (it was not converted from null to zero) and part which checked for null values was just skipped. If I delete time interval, then I would see all distributors that didn't sell anything at all. But with time interval for some reason it stops checking null values and looks if there are only not null values.
So does anyone know how to make it check for null values too in given interval?.. (for the last 90 days to be exact)
Aggregates like sum() and and min() do ignore NULL values. This is required by the SQL standard and every DBMS I know behaves like that.
If you want to treat a NULL value as e.g. a zero, then use something like this:
sum(coalesce(f2.null_sum, 0)) as sum1
But as far as I understand you question and your invalid query you actually want an outer join between res_partner and the sales tables.
Something like this:
SELECT min(rp.id) as id,
rp.search_name as partner,
sum(coalesce(sol.price_subtotal,0)) as price_subtotal
FROM res_partner as rp
LEFT JOIN sale_order as so ON so.partner_id=rp.id and rp.distributor=TRUE
LEFT JOIN sale_order_line as sol ON sol.order_id=so.id
WHERE so.date_order <= CURRENT_DATE
and so.date_order >= date_trunc('day', now() - '90 day'::interval)::timestamp::date
and rp.contract_date <= date_trunc('day', now() - '90 day'::interval)::timestamp::date
GROUP BY rp.search_name
I'm not 100% sure I understood your problem correctly, but it might give you a headstart.
Try to name subqueries, and retrieve their columns with col.q1, col.q2 etc. to make sure which column from which query/subquery you're dealing with. Maybe it's somewhat simple, e.g. it unites some rows containing only NULLs into one row? Also, at least for debugging purposes, it's smart to add , count(*) at the end of each query/subquery to get implicit number of rows returned on result.. hard to guess what exactly happened..

how to develop t-sql subquery to select only one record each?

I am using SSMS 2008, trying to select just one row/client. I need to select the following columns: client_name, end_date, and program. Some clients have just one client row. But others have multiple.
For those clients with multiple rows, they normally have different end_date and program. For instance:
CLIENT PROGRAM END_DATE
a b c
a d e
a f g
h d e
h f NULL
This is a real simplified version of the actual data. As you will see, different clients can be in the same program ("d"). But the same client cannot be in the same program more than one time.
Also the tricky thing is that the end_date can be NULL, so when I tried selecting those clients with > 1 row, I added a HAVING statement > 1. But this eliminated all of my NULL End_date rows.
To sum up, I want one row per client. So those clients with only one row total + those clients listed above with the following criteria:
Select only the row where either the End_date is greatest or NULL. (In most cases the end_date is null for these clients).
How can I achieve this with as little logic as possible?
On SQL Server 2005 and up, you can use a Common Table Expression (CTE) combined with the ROW_NUMBER() and PARTITION BY function. This CTE will "partition" your data by one criteria - in your case by Client, creating a "partition" for each separate client. The ROW_NUMBER() will then number each partition ordered by another criteria - here I created a DATETIME - and assigns numbers from 1 on up, separately for each partition.
So in this case, ordering by DATETIME DESC, the newest row gets numbered as 1 - and that's the fact I use when selecting from the CTE. I used the ISNULL() function here to assign those rows that have a NULL end_date some arbitrary value to "get them in order". I wasn't quite sure if I understood your question properly: did you want to select the NULL rows over those with a given end_Date, or did you want to give precedence to an existing end_Date value over NULL?
This will select the most recent row for each client (for each "partition" of your data):
DECLARE #clients TABLE (Client CHAR(1), Program CHAR(1), END_DATE DATETIME)
INSERT INTO #clients
VALUES('a', 'b', '20090505'),
('a', 'd', '20100808'),
('a', 'f', '20110303'),
('h', 'd', '20090909'),
('h', 'f', NULL)
;WITH LatestData AS
(
SELECT Client, Program, End_Date,
ROW_NUMBER() OVER(PARTITION BY CLient ORDER BY ISNULL(End_Date, '99991231') DESC) AS 'RowNum'
FROM #clients
)
SELECT Client, Program, End_Date
FROM LatestData
WHERE RowNum = 1
Results in an output of:
Client Program End_Date
a f 2011-03-03
h f (NULL)

Postgresql vlookup

Let's say I have a table "uservalue" with the following columns:
integer user_id
integer group_id
integer value
I can get the maximum value for each group easily:
select max(value) from uservalue group by group_id;
What I would like is for it to return the user_id in each group that had the highest value. The max function in matlab will also return the index of the maximum, is there some way to make postgresql do the same thing?
The proper way todo this is with a subquery.
select
u.user_id,
u.value
from
uservalue u
join
(select groupid, max(value) as max_value from uservalue group by group_id) mv
on u.value = mv.max_value and mv.group_id = u.group_id
However I sometimes prefer a simpler hack.
select max(value*100000 + user_id) - 100000, max(value) from user_value group by group_id
Making sure that number (100000) is higher than any userids you are expecting to have. This makes sure only one user_id is selected on the same values whilst the other one selects them both.
Seems you should be able to do this with a windowing query, something like:
SELECT DISTINCT
group_id,
first_value(user_id) OVER w AS user,
first_value(value) OVER w AS val
FROM
uservalue
WINDOW w AS (PARTITION BY group_id ORDER BY value DESC)
This query will also work if you have multiple users with the same value (unless you add a second column to ORDER BY you will not know which one you will get back though - but you will only get one row back per group)
Here are several ways to do this.
It's pretty much a FAQ.