Query a history table to find state on a given date in postgresql - postgresql

I have created a history table that is populated by triggers on another "live" table. I now want to be able to see how it looked on a given date. I am able to query a single product using a where clause which gives me the desired output for a single product.
SELECT * FROM test
WHERE productid = 1
AND updated < '2020-02-15'
ORDER BY updated DESC
LIMIT 1
But how do I get the last updated value before my given date (mid-Feb in this example) for each product in the table?
A simple version of my table looks like this:-
productid amount updated
1 5 01/01/2020
1 6 01/02/2020
1 7 01/03/2020
2 13 01/01/2020
2 14 01/02/2020
2 15 01/04/2020
and my desired outcome is:
productid amount updated
1 6 01/02/2020
2 14 01/02/2020
Many thanks

You can use distinct on:
select distinct on (productid) t.*
from test t
where updated < date '2020-02-15'
order by productid, updated desc

Related

how to make a query to get the last rows matching a list, with Postgres?

let's imagine a simple table:
name varchar,
ts timestamp,
val int,
UNIQUE (name, ts)
I have data pushed to the database as a batch. Every row in that batch has the same timestamp.
I would like to do a query where for a list of names I would get all the rows from the last batch.
For example, if I have:
ts name val
2020-01-01 joe 3
2020-01-01 eric 5
2020-01-01 amelia 9
2020-01-01 marcel 2
2020-01-01 erika 3
2020-01-02 joe 6
2020-01-02 amelia 8
2020-01-02 marcel 9
2020-01-02 erika 5
I would like to be able to pass to the query: [joe, eric, amelia] and only get data from the latest batch (2020-01-02).
The output should be:
ts name val
2020-01-02 joe 6
2020-01-02 amelia 8
So I was thinking about doing a query to know what's the latest timestamp and then do a query requiring that timestamp. Is there a way to do it in a single query?
Also, how can I pass a list of names in this scenario? (I'm a beginner at SQL)
Replace table with your table's name:
SELECT * FROM table WHERE ts = (SELECT max(ts) FROM table)
If you want only certain names add:
AND name IN ('joe', 'amelia')
Please note: in a transaction gives you the function now() always the same value, regardless of how long the transaction is running.

Merge selected group keys in KDB (Q) group by query

I have a query that essentially does counting by group key in KDB, in which I want to treat some of the groups as one for the purpose of this query. A simplified description of what I'm trying to do would be to count orders by customer in a month, where I have a couple of customers in the database that are actually subsidiaries of another customer, and I want to combine the counts of the subsidiaries with their parent organisation. The real scenario us much more complicated than that and without getting into unnecessary detail, suffice to say that I can't just group by customer and manipulate the results to merge counts after the query is executed - I need the "by" clause of my query to do the merging directly.
In SQL, I would do something like this:
select customer_id, count(*) as order_count
from orders
order by select case when customer_id = 1 then 2 when customer_id = 3 then 4 else customer_id end
In the above example, customer 1 is a subsidiary of customer 2, customer 3 is a subsidiary of customer 4 and every other customer is treated normally
Let's say the equivalent code in Q (without the manipulation of group keys) is:
select order_count:count i by customer_id from orders
How would I put in the equivalent select case statement to manipulate the group key? I tried this, but got a rank error:
select order_count:count i by $[customer_id=1;2;customer_id=3;4;customer_id] from orders
I'm terrible at Q so I'm probably making a very simple mistake. Any advice greatly appreciated.
One approach might be to have a dictionary of subsidiaries and use a lookup/re-map in your by clause:
q)dict:1 3!2 4
q)show t:([] order:1+til 10;customer:1+10?6)
order customer
--------------
1 1
2 1
3 6
4 2
5 3
6 4
7 5
8 5
9 3
10 5
q)select order_count:count i by customer^dict[customer] from t
customer| order_count
--------| -----------
2 | 3
4 | 3
5 | 3
6 | 1
You will lose some information about who actually owns the orders though, you'll only know at the parent level

CASE WHEN with COLLECT_SET

I have a toy table:
hive> SELECT * FROM ds.forgerock;
OK
forgerock.id forgerock.productname forgerock.description
1 OpenIDM Platform for building enterprise provisioning solutions
2 OpenAM Full-featured access management
3 OpenDJ Robust LDAP server for Java
4 OpenDJ desc2
4 OpenDJ desc2
Time taken: 0.083 seconds, Fetched: 5 row(s)
I am trying to get a table like:
id flag
1 0
2 0
3 1
4 1
I am using the toy table to iterate and develop working code.
SELECT id, CASE WHEN "OpenDJ" IN COLLECT_SET(productname) THEN 1 ELSE 0 END AS flag,
GROUP BY id FROM ds.forgerock;
Note that in the toy data set, every id only has one distinct value, so COLLECT_SET doesn't seem necessary. However, given the actual data set actually has more than one distinct value, what I am trying to do will make more sense.
Use max() for flag aggregation by id:
SELECT id, max(CASE WHEN productname='OpenDJ' THEN 1 ELSE 0 END) AS flag
FROM ds.forgerock
GROUP BY id;

Count number of days by ignoring times

I am trying to calculate the number of days that an event category has occurred in T-SQL in SSMS 2008. How do I write this expression?
This value is stored as a datetime, but I want to count the day portion only. For example, if my values were:
2013-01-05 19:20:00.000
2013-01-06 17:20:00.000
2013-01-06 18:20:00.000
2013-01-06 19:20:00.000
2013-01-03 16:15:00.000
2013-01-04 12:55:00.000
Then although there are 6 unique records listed above, I would want to count this as only 4, since there are 3 records on 1/6/2013. Make sense?
This is what I'm trying now that doesn't work:
select
count(s.date_value)
From
table_A s
Cast the datetime value as a date. Also if you want only unique values, use DISTINCT:
SELECT COUNT(DISTINCT CAST(date_value AS date)) FROM table_A

Select unique values sorted by date

I am trying to solve an interesting problem. I have a table that has, among other data, these columns (dates in this sample are shown in European format - dd/mm/yyyy):
n_place_id dt_visit_date
(integer) (date)
========== =============
1 10/02/2012
3 11/03/2012
4 11/05/2012
13 14/06/2012
3 04/10/2012
3 03/11/2012
5 05/09/2012
13 18/08/2012
Basically, each place may be visited multiple times - and the dates may be in the past (completed visits) or in the future (planned visits). For the sake of simplicity, today's visits are part of planned future visits.
Now, I need to run a select on this table, which would pull unique place IDs from this table (without date) sorted in the following order:
Future visits go before past visits
Future visits take precedence in sorting over past visits for the same place
For future visits, the earliest date must take precedence in sorting for the same place
For past visits, the latest date must take precedence in sorting for the same place.
For example, for the sample data shown above, the result I need is:
5 (earliest future visit)
3 (next future visit into the future)
13 (latest past visit)
4 (previous past visit)
1 (earlier visit in the past)
Now, I can achieve the desired sorting using case when in the order by clause like so:
select
n_place_id
from
place_visit
order by
(case when dt_visit_date >= now()::date then 1 else 2 end),
(case when dt_visit_date >= now():: date then 1 else -1 end) * extract(epoch from dt_visit_date)
This sort of does what I need, but it does contain repeated IDs, whereas I need unique place IDs. If I try to add distinct to the select statement, postgres complains that I must have the order by in the select clause - but then the unique won't be sensible any more, as I have dates in there.
Somehow I feel that there should be a way to get the result I need in one select statement, but I can't get my head around how to do it.
If this can't be done, then, of course, I'll have to do the whole thing in the code, but I'd prefer to have this in one SQL statement.
P.S. I am not worried about the performance, because the dataset I will be sorting is not large. After the where clause will be applied, it will rarely contain more than about 10 records.
With DISTINCT ON you can easily show additional columns of the row with the resulting n_place_id:
SELECT n_place_id, dt_visit_date
FROM (
SELECT DISTINCT ON (n_place_id) *
,dt_visit_date < now()::date AS prio -- future first
,#(now()::date - dt_visit_date) AS diff -- closest first
FROM place_visit
ORDER BY n_place_id, prio, diff
) x
ORDER BY prio, diff;
Effectively I pick the row with the earliest future date (including "today") per n_place_id - or latest date in the past, failing that.
Then the resulting unique rows are sorted by the same criteria.
FALSE sorts before TRUE
The "absolute value" # helps to sort "closest first"
More on the Postgres specific DISTINCT ON in this related answer.
Result:
n_place_id | dt_visit_date
------------+--------------
5 | 2012-09-05
3 | 2012-10-04
13 | 2012-08-18
4 | 2012-05-11
1 | 2012-02-10
Try this
select n_place_id
from
(
select *,
extract(epoch from (dt_visit_date - now())) as seconds,
1 - SIGN(extract(epoch from (dt_visit_date - now())) ) as futurepast
from #t
) v
group by n_place_id
order by max(futurepast) desc, min(abs(seconds))