How to find nearest entries before and after a value in Postresql

How to find nearest entries before and after a value in Postresql - postgresql

Similar to this question: How to find the first nearest value up and down in SQL?
I have a Postgres DB Table named prices structured as:
id
column_1
column_2
column_3
date_col
1
1.5
1.7
1.6
1234560000
2
0.9
1.1
1.0
1234570000
3
11.5
23.5
17.5
1234580000
4
8.3
12.3
10.3
1234600000
I'm trying to select either the row that matches exactly to an input date:
Example #1: input query for date_col = 1234580000 would return...
id
column_1
column_2
column_3
date_col
1
1.5
1.7
1.6
1234580000
or if that date does not exist, then retrieve the entries immediately before and after:
Example #2: input query for date_col = 1234590000 would return...
id
column_1
column_2
column_3
date_col
3
11.5
23.5
17.5
1234580000
4
8.3
12.3
10.3
1234600000
I attempted to mess around with the code from the similar question, but I am a bit stuck and have resorted to trying to get the original query date -> check if the DB returned anything in Python, then I create a broad range for the DB to return, send the second query, and then iterate over the returned result in Python. If the result still doesn't exist then I make the range larger... which I know is the wrong way, but my brain is too smooth for this haha
Current code that works when the entry does exist, but does not work when the entry does not exist:
SELECT *
FROM prices, (SELECT id, next_val, last_val
FROM (SELECT t.*,
LEAD(t.id, 1) OVER (ORDER BY t.date_col) as next_val,
LAG(t.id, 1) OVER (ORDER BY t.date_col) as last_val
FROM prices AS t) AS s
WHERE 1234580000 IN (s.date_col, s.next_val, s.last_val)) AS x
WHERE prices.id = x.id OR prices.id = x.next_val OR prices.id = x.last_val
based on the accepted answer this worked like a charm:
SELECT * FROM (SELECT * FROM prices WHERE prices.date_col <= 1234580000 ORDER BY prices.date_col DESC LIMIT 1) AS a UNION (SELECT * FROM prices WHERE prices.date_col >= 1234580000 ORDER BY prices.date_col ASC LIMIT 1)

I guess the simplest solution would be a UNION:
SELECT *
FROM prices
WHERE date_col <= 1234580000
ORDER BY date_col DESC
LIMIT 1
UNION
SELECT *
FROM prices
WHERE date_col >= 1234580000
ORDER BY date_col ASC
LIMIT 1

Related

Postgres 9.3 count rows matching a column relative to row's timestamp

I've used WINDOW functions before but only when working with data that has a fixed cadence/interval. I am likely missing something simple in aggregation but I've never had a scenario where I'm not working with fixed intervals.
I have a table the records samples at arbitrary timestamps. A sample is only recorded when it is a delta from the previous sample and the sample rate is completely irregular due to a large number of conditions. The table is very simple:
id (int)
happened_at (timestamp)
sensor_id (int)
new_value (float)
I'm trying to construct a query that will include a count of all of the samples before the happened_at of a given result row. So given an ultra simple 2 row sample data set:
id|happened_at |sensor_id| new_value
1 |2019-06-07:21:41|134679 | 123.331
2 |2019-06-07:19:00|134679 | 100.009
I'd like the result set to look like this:
happened_at |sensor_id | new_value | sample_count
2019-06-07:21:41|134679 |123.331 |2
2019-06-07:19:00|134679 |123.331 |1
I've tried:
SELECT *,
(SELECT count(sample_history.id) OVER (PARTITION BY score_history.sensor_id
ORDER BY sample_history.happened_at DESC))
FROM sensor_history
ORDER by happened_at DESC
and the duh not going to work.
(SELECT count(*)
FROM sample_history
WHERE sample_history.happened_at <= sample_timestamp)
Insights greatly appreciated.

Get rid of the SELECT (sub-query) when using the window function.
SELECT *,
count(*) OVER (PARTITION BY sensor_id ORDER BY happened_at DESC)
FROM sensor_history
ORDER BY happened_at DESC

PostgreSQL Crosstab issues / "Return and SQL tuple descriptions are incompatible"

Good afternoon, I am using POSTGRESql version 9.2 and I'm trying to use a crosstab function to transpose two columns on a table so that i can later join it to a different SELECT query.
I have installed the tablefunc extension.
However i keep getting this "Return and SQL tuple descriptions are incompatible" error which seems to be because of typecasts.
I don't need them to be a specific type.
My original SELECT query is this
SELECT inventoryid, ttype, tamount
FROM inventorytesting
Which gives me the following result:
inventoryid ttype tamount
2451530088940460 7 0.2
2451530088940460 2 0.5
2451530088940460 8 0.1
2451530088940460 1 15.7
8751530077940461 7 0.7
8751530077940461 2 0.2
8751530077940461 8 1.1
8751530077940461 1 19.2
and my goal is to get it like:
inventoryid 7 2 8 1
8751530077940461 0.7 0.2 1.1 19.2
2451530088940460 0.2 0.5 0.1 15.7
The 'ttype' field has 49 different values such as "7","2","8","1" which are fixed.
The 'tamount' field varies its values depending on the 'inventoryid' field but there will always be 49 of them, even if its value is zero. It will never be "null".
I have tried a few variations that i could find in the internet which sum up to this:
SELECT *
FROM crosstab (
$$SELECT inventoryid, ttype, tamount
FROM inventorytesting
WHERE inventoryid = '2451530088940460'
ORDER BY inventoryid, ttype$$
)
AS ct("inventoryid" text,"ttype" smallint,"tamount" numeric)
The fieldtypes on the inventorytesting table are
select column_name, data_type from information_schema.columns
where table_name = 'inventorytesting'
Results:
column_name data_type
id bigint
ttype smallint
tamount numeric
tunit text
tlessthan smallint
plantid text
sessiontime bigint
deleted smallint
inventoryid text
docdata text
docname text
labid bigint
Any pointers would be great.

demo:db<>fiddle
The resulting table definition has to contain the table structure you are expecting - the pivoted one - and not the structure of the given one:
SELECT *
FROM crosstab(
$$SELECT inventoryid, ttype, tamount
FROM inventorytesting
WHERE inventoryid = '2451530088940460'
ORDER BY inventoryid, ttype$$
)
AS ct("inventoryid" text,"type1" numeric,"type2" numeric,"type7" numeric,"type8" numeric)
Addionally there is no need to use the crosstab function. You can achieve a pivot by simply using the standard CASE function:
SELECT
inventoryid,
SUM(CASE WHEN ttype = 1 THEN tamount END) AS type1,
SUM(CASE WHEN ttype = 2 THEN tamount END) AS type2,
SUM(CASE WHEN ttype = 7 THEN tamount END) AS type7,
SUM(CASE WHEN ttype = 8 THEN tamount END) AS type8
FROM
inventorytesting
GROUP BY 1
If you were on 9.4 or higher you could use the Postgres specific FILTER clause:
SELECT
inventoryid,
SUM(tamount) FILTER (WHERE ttype = 1) AS type1,
SUM(tamount) FILTER (WHERE ttype = 2) AS type2,
SUM(tamount) FILTER (WHERE ttype = 7) AS type7,
SUM(tamount) FILTER (WHERE ttype = 8) AS type8
FROM
inventorytesting
GROUP BY 1
demo:db<>fiddle

With the crosstab, you define the actual result table (basically the result of the pivot). The input query defines three columns which are then processed as:
grouping column result in the actual rows
the pivot columns
value for the pivot column
In your case, the crosstab therefore needs to be defined as:
ct(
"inventoryid" text,
"tamount_1" numeric,
"tamount_2" numeric,
"tamount_3" numeric,
...
)
The column header will then correlate to a certain value of column ttype in the order as defined by the inner query's ORDER BY.
The thing with crosstab is that missing values for ttype (e.g. some value returned for 4 but not for 3), the resulting columns would be 1, 2, 4, ... with 3 being missing. Here, you'd have to make sure (if you need consistent output) that your inner query returns at least a NULL row (e.g. via LEFT JOIN).

sql oracle question about joins or subquery

select *
from amazon_shipment, customer
where amazon_shipment.customer_id = customer.customer_id
and amazon_shipment.customer_id in
(select top(1) amazon_shipment.customer_id
from amazon_shipment
group by amazon_shipment.customer_id
order by count(*) desc);
I am trying to select all the customers with the most order, however, I get an error:
FROM keyword not found were expected

TOP(1) isn't available in Oracle.
In Oracle 11gR2 and lower, you can use WHERE ROWNUM < 2
select *
from EMPLOYEES
where rownum < 2
order by SALARY desc;
In Oracle 12c and higher, you can use FETCH FIRST 1 ROWS ONLY
select *
from EMPLOYEES
order by SALARY desc
fetch first 1 rows only;

Postgres : Need distinct records count

I have a table with duplicate entries and the objective is to get the distinct entries based on the latest time stamp.
In my case 'serial_no' will have duplicate entries but I select unique entries based on the latest time stamp.
Below query is giving me the unique results with the latest time stamp.
But my concern is I need to get the total of unique entries.
For example assume my table has 40 entries overall. With the below query I am able to get 20 unique rows based on the serial number.
But the 'total' is returned as 40 instead of 20.
Any help on this pls?
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
ORDER BY
serial_no asc
product_info table intially has this data
serial_no name timestamp
11212 pulp12 2018-06-01 20:00:01
11213 mango 2018-06-01 17:00:01
11214 grapes 2018-06-02 04:00:01
11215 orange 2018-06-02 07:05:30
11212 pulp12 2018-06-03 14:00:01
11213 mango 2018-06-03 13:00:00
After the distict query I got all unique results based on the latest
timestamp:
serial_no name timestamp total
11212 pulp12 2018-06-03 14:00:01 6
11213 mango 2018-06-03 13:00:00 6
11214 grapes 2018-06-02 04:00:01 6
11215 orange 2018-06-02 07:05:30 6
But total is appearing as 6 . I wanted the total to be 4 since it has
only 4 unique entries.
I am not sure how to modify my existing query to get this desired
result.

Postgres supports COUNT(DISTINCT column_name), so if I have understood your request, using that instead of COUNT(*) will work, and you can drop the OVER.

What you could do is move the window function to a higher level select statement. This is because window function is evaluated before distinct on and limit clauses are applied. Also, you can not include DISTINCT keyword within window functions - it has not been implemented yet (as of Postgres 9.6).
SELECT
*,
COUNT(*) OVER() as total -- here
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC
LIMIT
10
) AS my_info
Additionally, offset is not required there and one more sorting is also superfluous. I've removed these.
Another way would be to include a computed column in the select clause but this would not be as fast as it would require one more scan of the table. This is obviously assuming that your total is strictly connected to your resultset and not what's beyond that being stored in the table, but gets filtered out.

select count(*), serial_no from product_info group by serial_no
will give you the number of duplicates for each serial number
The most mindless way of incorporating that information would be to join in a sub query
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
join (select count(*) as counts, serial_no from product_info group by serial_no) as X
on X.serial_no = my_info.serial_no
ORDER BY
serial_no asc

PostgreSQL: getting ordinal rank (row index? ) efficiently

You have a table like so:
id dollars dollars_rank points points_rank
1 20 1 35 1
2 18 2 30 3
3 10 3 33 2
I want a query that updates the table's rank columns (dollars_rank and points_rank) to set the rank for the given ID, which is just the row's index for that ID sorted by the relevant column in a descending order. How best to do this in PostgreSQL?

The window function dense_rank() is what you need - or maybe rank(). The UPDATE could look like this:
UPDATE tbl
SET dollars_rank = r.d_rnk
, points_rank = r.p_rnk
FROM (
SELECT id
, dense_rank() OVER (ORDER BY dollars DESC NULLS LAST) AS d_rnk
, dense_rank() OVER (ORDER BY points DESC NULLS LAST) AS p_rnk
FROM tbl
) r
WHERE tbl.id = r.id;
fiddle
NULLS LAST is only relevant if the involved columns can be NULL:
Sort by column ASC, but NULL values first?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse