Select only the rows with the latest date in postgres - postgresql

I only want the latest date for each row (house) the number of entries per house varies sometimes there might be one sale sometimes multiple.
Date of sale | house number | street | price |uniqueref
-------------|--------------|--------|-------|----------
15-04-1990 |1 |castle |100000-| 1xzytt
15-04-1995 |1 |castle |200000-| 2jhgkj
15-04-2005 |1 |castle |800000-| 3sdfsdf
15-04-1995 |2 |castle |200000-| 2jhgkj
15-04-2005 |2 |castle |800000-| 3sdfsdf
What I have working is as follows
Creating VIEW as (v_orderedhouses) ORDER BY house number, street with date ordered on DESCso that latest date is first returned.
I then feed that into another VIEW (v_latesthouses) using DISTINCT ON (house number, street). Which gives me;
Date of sale | house number | street | price |uniqueref
-------------|--------------|--------|-------|----------
15-04-2005 |1 |castle |800000-| 3sdfsdf
15-04-2005 |2 |castle |800000-| 3sdfsdf
This works but seems like there should be a more elegant solution. Can I get to the filtered view in one step?

You do not need to create a bunch of views, just:
select distinct on(street, house_number)
*
from your_table
order by
street, house_number, -- those fields should be in the "order by" clause because it is in the "distinct on" expression
date_of_sale desc;
To make this query faster you could to create an index according to the order by:
create index index_name on your_table(street, house_number, date_of_sale desc);
Do not forget to analyse your tables regularly (depending on the grown speed):
analyse your_table;

You can use window function row_number for this
select * from (
select your_table.*, row_number() over(partition by house_number order by Date_of_sale desc) as rn from your_table
) tt
where rn = 1

This is what I use and it works fast(is a generic solution, as far as I tested every database software can do this):
SELECT t1.date_of_sale, t1.house_number
FROM table t1
LEFT JOIN table t2 ON (t2.house_number = t1.house_number AND t2.date_of_sale>t1.date_of_sale)
WHERE t2.pk IS NULL
GROUP BY t1.date_of_sale, t1.house_number

Related

How can I rank a table in postgresql and then find the rank of a specific row?

I have a postgresql table
cubing=# SELECT * FROM times;
count | name | time
-------+---------+--------
4 | sean | 32.97
5 | Austin | 15.64
6 | Kirk | 117.02
I retrieve all from it with SELECT * FROM times ORDER BY time ASC. But now I want to give the user the option to search for a specific value (say, WHERE name = Austin) and have it tell them what rank they are in the table. Right now, I have SELECT name,time, RANK () OVER ( ORDER BY time ASC) rank_number FROM times. From how I understand it, that is giving me the rank of the entire table. I would like the rank, name, and time of who I am searching for. I am afraid if I added a where clause to my last SELECT statement with the name Austin, it would only find where the name equals Austin and rank those, rather than the rank of Austin in the rest of the table.
thanks for reading
I think the behavior you want here is to first rank your current data, then query it with some WHERE filter:
WITH cte AS (
SELECT *, RANK() OVER (ORDER BY time) rank_number
FROM times
)
SELECT count, name, time
FROM cte
WHERE name = 'Austin';
The point here is that at the time we do a query searching for Austin, the ranks for each row in your original table have already been generated.
Edit:
If you're running this query from an application, it would probably be best to avoid CTE syntax. Instead, just inline the CTE as a subquery:
SELECT count, name, time, rank_number
FROM
(
SELECT *, RANK() OVER (ORDER BY time) rank_number
FROM times
) t
WHERE name = 'Austin';

Postgres 9.3 count rows matching a column relative to row's timestamp

I've used WINDOW functions before but only when working with data that has a fixed cadence/interval. I am likely missing something simple in aggregation but I've never had a scenario where I'm not working with fixed intervals.
I have a table the records samples at arbitrary timestamps. A sample is only recorded when it is a delta from the previous sample and the sample rate is completely irregular due to a large number of conditions. The table is very simple:
id (int)
happened_at (timestamp)
sensor_id (int)
new_value (float)
I'm trying to construct a query that will include a count of all of the samples before the happened_at of a given result row. So given an ultra simple 2 row sample data set:
id|happened_at |sensor_id| new_value
1 |2019-06-07:21:41|134679 | 123.331
2 |2019-06-07:19:00|134679 | 100.009
I'd like the result set to look like this:
happened_at |sensor_id | new_value | sample_count
2019-06-07:21:41|134679 |123.331 |2
2019-06-07:19:00|134679 |123.331 |1
I've tried:
SELECT *,
(SELECT count(sample_history.id) OVER (PARTITION BY score_history.sensor_id
ORDER BY sample_history.happened_at DESC))
FROM sensor_history
ORDER by happened_at DESC
and the duh not going to work.
(SELECT count(*)
FROM sample_history
WHERE sample_history.happened_at <= sample_timestamp)
Insights greatly appreciated.
Get rid of the SELECT (sub-query) when using the window function.
SELECT *,
count(*) OVER (PARTITION BY sensor_id ORDER BY happened_at DESC)
FROM sensor_history
ORDER BY happened_at DESC

UPDATE in a specific order

So let's say I have a table:
SELECT * from test_table order by name;
----|----
name|ord
----|----
a |4
a |5
b |2
c |3
d |1
And I want to change the ord such that it matches the alphabetized result of the "order by name" clause. My goal, therefore, is:
SELECT * from test_table order by name;
----|----
name|ord
----|----
a |1
a |2
b |3
c |4
d |5
Is there a good way in Postgres to do this? I have a new sequence I can pull from, I'm just not sure how to do this cleanly in-place, or if that's even possible. Or should I just store the results of the selection, then iterate over and select each name, assigning a new ord value to them? (They all have unique IDs, so the repeat shouldn't matter)
You don't need any sequence for this.
The first step is determinate the new data:
SELECT
*
FROM test_table AS test_table_old
LEFT JOIN (
SELECT
*, row_number() OVER () AS ord_new
FROM test_table
ORDER BY name, ord
) AS test_table_new USING (name, ord)
;
Then convert this to an update:
UPDATE test_table SET
ord = test_table_new.ord_new
FROM test_table AS test_table_old
LEFT JOIN (
SELECT
*, row_number() OVER () AS ord_new
FROM test_table
ORDER BY name, ord
) AS test_table_new USING (name, ord)
WHERE (test_table.name, test_table.ord) = (test_table_old.name, test_table_old.ord)
;
If you need a new sequence, then replace "row_numer() OVER ()" to "nextval('the_new_sequence_name')".

Getting earliest date by matching two columns, and returning array

I have a query I'm trying to write, but I cannot get the syntax quite right. From the table below, I have a set to dates with an id, and if the id does not have parent_id, and if the parent_id does not exist for an id it is NULL.
I'm trying to get an output of all the children of a parent that have the same date as the parent. As shown in the expected output below, [D#P, Z#Z] would be assigned to A because they have the same date and their parent_id is A, however Q#L would not be assigned to A because its date is not 1/1/2019. Nothing is assigned to B or D because they have no children on their created dates.
I've found some posts on how to do this in Postgres, however because I'm using Redshift some of the operations don't work.
Any help would be appreciated.
|date |id |parent_id |
-------------------------
1/1/2019|A |NULL
1/1/2019|B |NULL
1/1/2019|C |NULL
1/1/2019|D#P |A
1/1/2019|Z#Z |A
1/1/2019|K#H |C
1/2/2019|Q#L |A
1/3/2019|D |NULL
1/4/2019|H#Q |C
Expected Output:
date |id |children
-----------------------
1/1/2019 |A |[D#P, Z#Z]
1/1/2019 |C |[K#H]
Current Work:
SELECT
first_value(case
when parent_id
then date
end)
over (
partition by parent_id
order by date
rows between unbounded preceding and unbounded following)
as first_date)
id,
list_agg(parent_id)
FROM foo
I don't know why I am getting an error when using LISTAGG aggregate function, therefore I decided to use SELECT DISTINCT with LISTAGG window function:
WITH input as (
SELECT '1/1/2019' as date, 'A' as id, NULL as parent_id UNION ALL
SELECT '1/1/2019', 'B', NULL UNION ALL
SELECT '1/1/2019', 'C', NULL UNION ALL
SELECT '1/1/2019', 'D#P', 'A' UNION ALL
SELECT '1/1/2019', 'Z#Z', 'A' UNION ALL
SELECT '1/1/2019', 'K#H', 'C' UNION ALL
SELECT '1/2/2019', 'Q#L', 'A' UNION ALL
SELECT '1/3/2019', 'D', NULL UNION ALL
SELECT '1/4/2019', 'H#Q', 'C'
), parents as (
SELECT *
FROM input
WHERE parent_id IS NULL
), children as (
SELECT *
FROM input
WHERE parent_id IS NOT NULL
)
SELECT DISTINCT
parents.date,
parents.id,
listagg(children.id, ',') WITHIN GROUP ( ORDER BY children.id )OVER (PARTITION BY parents.id, parents.date) as children
FROM parents JOIN children
ON parents.id = children.parent_id
AND parents.date = children.date
Outputs:
date id children
1/1/2019 A D#P,Z#Z
1/1/2019 C K#H
Solution with GROUP BY and an LISTAGG aggregate function, would be for me more natural of solving your problem:
WITH input as (
[...]
SELECT
parents.date,
parents.id,
listagg(children.id, ',') WITHIN GROUP ( ORDER BY children.id )
FROM parents JOIN children
ON parents.id = children.parent_id
AND parents.date = children.date
group by parents.id, parents.date
Sadly it returns an error which I don't really understand:
[XX000][500310] Amazon Invalid operation: One or more of the used functions must be applied on at least one user created tables. Examples of user table only functions are LISTAGG, MEDIAN, PERCENTILE_CONT, etc; java.lang.RuntimeException: com.amazon.support.exceptions.ErrorException: Amazon Invalid operation: One or more of the used functions must be applied on at least one user created tables. Examples of user table only functions are LISTAGG, MEDIAN, PERCENTILE_CONT, etc;

How to group by in DB2 IBM and get the first item in each group?

I have a table like this:
|sub_account|name|email|
|-----------|----|-----|
// same account and same name: email different
|a1 |n1 |e1 |
|a1 |n1 |e2 |
// same account, name and email
|a2 |n2 |e3 |
|a2 |n2 |e3 |
I would like a query to get a table like this:
|sub_account|name|email|
|-----------|----|-----|
// nothing to do here
|a1 |n1 |e1 |
|a1 |n1 |e2 |
// remove the one that is exactly the same, but leave at least one
|a2 |n2 |e3 |
I've tried:
select sub_account, name, first(email)
from table
group by sub_account, name
but as you know "first" doesn't exists in the DB2; what is the alternative to it?
thanks
select sub_account, name, email
from table
group by sub_account, name, email
I am not sure in DB2. In SQL server, you can use DISTINCT for your issue.. You may try.
SELECT DISTINCT sub_acount, name, email
from TABLE
Create a subquery with the table values + a counter (pos) that gets increased for each row and gets reset to 1 each time a new sub-account+name is reached.
The final query filters out all results from the subquery other than those with pos 1 (i.e. first entries of the group):
select *
from (
select sub_account, name, email,
ROW_NUMBER() OVER (PARTITION BY sub_account, name
ORDER BY email DESC) AS pos
from table
)
where pos = 1
I found a way:
SELECT sub_account,
name,
CASE WHEN split_index=0 THEN MyList ELSE SUBSTR(MyList,1,LOCATE('|',MyList)-1) END
FROM (select sub_account, name, LISTAGG(email,'|') as MyList, LOCATE('|',LISTAGG(LB_ARTICLE_CAISSE,'|')) AS split_index
from TABLE
group by sub_account, name) AS TABLEA
This function will aggregate your mail and after split it and take the first one