Heroku dataclip variable substitution in query - postgresql

I want to set some parameters/variables at the top of the query and use them in several places in a complex query on Heroku Dataclips.
Here is a simple example:
WITH vars AS ( SELECT
'2018-01-07' AS calcdate,
12345 AS salary
)
select *
from taxes
where country_alpha3='USA' and year='2018' and active=true
and subdivision_code='US-MEDI'
and local_code is null
and start_date <= DATE(vars.calcdate) and end_date >= DATE(vars.calcdate)
and lower_amount_cents <= vars.salary and upper_amount_cents >= vars.salary;
I saw this style of code in another answer (from 2013) and it is not working in the actual dataclip as of today.
Error: Your query couldn't be updated.
ERROR: missing FROM-clause entry for table "vars"
LINE 12: and start_date <= DATE(vars.calcdate) and end_date >= DATE(v...
^
If the vars.calcdate and vars.salaryare changed to the constants, then the SQL works fine. It is something to do with the vars. syntax or usage, I think.

I found the solution. It was syntax as I supposed. The format of vars.name is not allowed (not certain where I saw that).
WITH vars AS ( SELECT
DATE('2018-01-07') AS calcdate,
12345 AS salary
)
select *
from taxes
where country_alpha3='USA' and year='2018' and active=true
and subdivision_code='US-MEDI'
and local_code is null
and start_date <= (SELECT calcdate from vars) and end_date >= (SELECT calcdate from vars)
and lower_amount_cents <= (SELECT salary from vars) and upper_amount_cents >= (SELECT salary from vars);

Related

ERROR: missing FROM-clause entry for table "max_table"

I want to get the maximum id from the home_history table and filtered home_history by this value. I used it with the operator. where did I mistake it?
with max_table as (select max(id) as max_id from home_history),
current_data as (
select Cast(created_at As date), count(id)
from home_history
where id > (max_table.max_id - 30 * 500000)
and created_at >= CAST((now() + (INTERVAL '-30 day')) AS date)
and home_history.created_at < CAST(now() AS date)
group by CAST(created_at AS date)
order by CAST(created_at As Date)
)
SELECT *
from current_data;
[42P01] ERROR: missing FROM-clause entry for table "max_table"
Position: 201
Even if it is obvious to you, you have to explicitly state in your main query that you select from the CTE.
Think of a CTE as a view defined just for a single query. You'd need a FROM clause to indicate the view you select from.

Pivot table using crosstab and count

I have to display a table like this:
Year
Month
Delivered
Not delivered
Not Received
2021
Jan
10
86
75
2021
Feb
13
36
96
2021
March
49
7
61
2021
Apr
3
21
72
Using raw data generated by this query:
SELECT
year,
TO_CHAR( creation_date, 'Month') AS month,
marking,
COUNT(*) AS count
FROM invoices
GROUP BY 1,2,3
I have tried using crosstab() but I got error:
SELECT * FROM crosstab('
SELECT
year,
TO_CHAR( creation_date, ''Month'') AS month,
marking,
COUNT(*) AS count
FROM invoices
GROUP BY 1,2,3
') AS ct(year text, month text, marking text)
I would prefer to not manually type all marking values because they are a lot.
ERROR: invalid source data SQL statement
DETAIL: The provided SQL must return 3 columns: rowid, category, and values.
1. Static solution with a limited list of marking values :
SELECT year
, TO_CHAR( creation_date, 'Month') AS month
, COUNT(*) FILTER (WHERE marking = 'Delivered') AS Delivered
, COUNT(*) FILTER (WHERE marking = 'Not delivered') AS "Not delivered"
, COUNT(*) FILTER (WHERE marking = 'Not Received') AS "Not Received"
FROM invoices
GROUP BY 1,2
2. Full dynamic solution with a large list of marking values :
This proposal is an alternative solution to the crosstab solution as proposed in A and B.
The proposed solution here just requires a dedicated composite type which can be dynamically created and then it relies on the jsonb type and standard functions :
Starting from your query which counts the number of rows per year, month and marking value :
Using the jsonb_object_agg function, the resulting rows are first
aggregated by year and month into jsonb objects whose jsonb keys
correspond to the marking values and whose jsonb values
correspond to the counts.
the resulting jsonb objects are then converted into records using the jsonb_populate_record function and the dedicated composite type.
First we dynamically create a composite type which corresponds to the ordered list of marking values :
CREATE OR REPLACE PROCEDURE create_composite_type() LANGUAGE plpgsql AS $$
DECLARE
column_list text ;
BEGIN
SELECT string_agg(DISTINCT quote_ident(marking) || ' bigint', ',' ORDER BY quote_ident(marking) || ' bigint' ASC)
INTO column_list
FROM invoices ;
EXECUTE 'DROP TYPE IF EXISTS composite_type' ;
EXECUTE 'CREATE TYPE composite_type AS (' || column_list || ')' ;
END ;
$$ ;
CALL create_composite_type() ;
Then the expected result is provided by the following query :
SELECT a.year
, TO_CHAR(a.year_month, 'Month') AS month
, (jsonb_populate_record( null :: composite_type
, jsonb_object_agg(a.marking, a.count)
)
).*
FROM
( SELECT year
, date_trunc('month', creation_date) AS year_month
, marking
, count(*) AS count
FROM invoices AS v
GROUP BY 1,2,3
) AS a
GROUP BY 1,2
ORDER BY month
Obviously, if the list of marking values may vary in time, then you have to recall the create_composite_type() procedure just before executing the query. If you don't update the composite_type, the query will still work (no error !) but some old marking values may be obsolete (not used anymore), and some new marking values may be missing in the query result (not displayed as columns).
See the full demo in dbfiddle.
You need to generate the crosstab() call dynamically.
But since SQL does not allow dynamic return types, you need a two-step workflow:
Generate query
Execute query
If you are unfamiliar with crosstab(), read this first:
PostgreSQL Crosstab Query
It's odd to generate the month from creation_date, but not the year. To simplify, I use a combined column year_month instead.
Query to generate the crosstab() query:
SELECT format(
$f$SELECT * FROM crosstab(
$q$
SELECT to_char(date_trunc('month', creation_date), 'YYYY_Month') AS year_month
, marking
, COUNT(*) AS ct
FROM invoices
GROUP BY date_trunc('month', creation_date), marking
ORDER BY date_trunc('month', creation_date) -- optional
$q$
, $c$VALUES (%s)$c$
) AS ct(year_month text, %s);
$f$, string_agg(quote_literal(sub.marking), '), (')
, string_agg(quote_ident (sub.marking), ' int, ') || ' int'
)
FROM (SELECT DISTINCT marking FROM invoices ORDER BY 1) sub;
If the table invoices is big with only few distinct values for marking (which seems likely) there are faster ways to get distinct values. See:
Optimize GROUP BY query to retrieve latest row per user
Generates a query of the form:
SELECT * FROM crosstab(
$q$
SELECT to_char(date_trunc('month', creation_date), 'YYYY_Month') AS year_month
, marking
, COUNT(*) AS ct
FROM invoices
GROUP BY date_trunc('month', creation_date), marking
ORDER BY date_trunc('month', creation_date) -- optional
$q$
, $c$VALUES ('Delivered'), ('Not Delivered'), ('Not Received')$c$
) AS ct(year_month text, "Delivered" int, "Not Delivered" int, "Not Received" int);
The simplified query does not need "extra columns. See:
Pivot on Multiple Columns using Tablefunc
Note the use date_trunc('month', creation_date) in GROUP BY and ORDER BY. This produces a valid sort order, and faster, too. See:
Cumulative sum of values by month, filling in for missing months
How to get rows by max(date) group by Year-Month in Postgres?
Also note the use of dollar-quotes to avoid quoting hell. See:
Insert text with single quotes in PostgreSQL
Months without entries don't show up in the result, and no markings for an existing month show as NULL. You can adapt either if need be. See:
Join a count query on generate_series() and retrieve Null values as '0'
Then execute the generated query.
db<>fiddle here (reusing
Edouard's fiddle, kudos!)
See:
Execute a dynamic crosstab query
In psql
In psql you can use \qexec to immediately execute the generated query. See:
Simulate CREATE DATABASE IF NOT EXISTS for PostgreSQL?
In Postgres 9.6 or later, you can also use the meta-command \crosstabview instead of crosstab():
test=> SELECT to_char(date_trunc('month', creation_date), 'YYYY_Month') AS year_month
test-> , marking
test-> , COUNT(*) AS count
test-> FROM invoices
test-> GROUP BY date_trunc('month', creation_date), 2
test-> ORDER BY date_trunc('month', creation_date)\crosstabview
year_month | Not Received | Delivered | Not Delivered
----------------+--------------+-----------+---------------
2020_January | 1 | 1 | 1
2020_March | | 2 | 2
2021_January | 1 | 1 | 2
2021_February | 1 | |
2021_March | | 1 |
2021_August | 2 | 1 | 1
2022_August | | 2 |
2022_November | 1 | 2 | 3
2022_December | 2 | |
(9 rows)
Note that \crosstabview - unlike crosstab() - does not support "extra" columns. If you insist on separate year and month columns, you need crosstab().
See:
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?

Using min/max values from a CTE in a later query, instead of using a subquery in Postgres

I've got a remedial question about pulling results out of a CTE in a later part of the query. For the example code, below are the relevant, stripped down tables:
CREATE TABLE print_job (
created_dts timestamp not null default now(),
status text not null
);
CREATE TABLE calendar_day (
date_actual date not null
);
In the current setup, there are gaps in the dates in the print_job data, and we would like to have a gapless result. For example, there are 87 days from the first to last date in the table, and only 77 days in there have data. We've already got a calendar_day dimension table to join with to get the 87 rows for the 87-day range. It's easy enough to figure out the min and max dates in the data with a subquery or in a CTE, but I don't know how to use those values from a CTE. I've got a full query below, but here are the relevant fragments with comments:
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- This CTE does not work because it doesn't know what date_range is.
complete_date_series_using_cte AS (
select actual_date
from calendar_day
where actual_date >= date_range.start_date
and actual_date <= date_range.end_date
),
-- Subqueries are fine, because the FROM is specified in the subquery condition directly.
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
I run into this regularly, and finally figured I'd ask. I've hunted around already for an answer, but I'm not clear how to summarize it well. And while there's nothing wrong with the subqueries in this case, I've got other situations where a CTE is nicer/more readable.
If it helps, I've listed the complete query below.
-- Get some counts and give them names.
WITH
daily_status AS (
select created_dts::date as created_date,
count(*) AS daily_total,
count(*) FILTER (where status = 'Error') AS status_error,
count(*) FILTER (where status = 'Processing') AS status_processing,
count(*) FILTER (where status = 'Aborted') AS status_aborted,
count(*) FILTER (where status = 'Done') AS status_done
from print_job
group by created_dts::date
),
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- There are gaps in the data, and we want a row for dates with no results.
-- Could use generate_series on a timestamp & convert that to dates. But,
-- in our case, we've already got dimension tables for days. All that's needed
-- here is the actual date.
-- This CTE does not work because it doesn't know what date_range is.
-- complete_date_series_using_cte AS (
-- select actual_date
--
-- from calendar_day
--
-- where actual_date >= date_range.start_date
-- and actual_date <= date_range.end_date
-- ),
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
-- The final query joins the complete date series with whatever data is in the print_job table daily summaries.
select date_actual,
coalesce(daily_total,0) AS total,
coalesce(status_error,0) AS errors,
coalesce(status_processing,0) AS processing,
coalesce(status_aborted,0) AS aborted,
coalesce(status_done,0) AS done
from complete_date_series_using_subquery
left join daily_status
on daily_status.created_date =
complete_date_series_using_subquery.date_actual
order by date_actual
I said it was a remedial question....I remembered where I'd seen this done before:
https://tapoueh.org/manual-post/2014/02/postgresql-histogram/
In my example, I need to list the CTE in the table list. That's obvious in retrospect, and I realize that I automatically don't think to do that as I'm habitually avoiding CROSS JOIN. The fragment below shows the slight change needed:
WITH
date_range AS (
select min(created_dts)::date as start_date,
max(created_dts)::date as end_date
from print_job
),
complete_date_series AS (
select date_actual
from calendar_day, date_range
where date_actual >= date_range.start_date
and date_actual <= date_range.end_date
),

How to store a temporary value in a Postgresql query and then use it multiple times?

I have this simple query:
SELECT COUNT(*)
FROM CompressorAlerts
WHERE CAST('2020-09-20' as timestamp) IS NULL OR faultTimestamp >= CAST('2020-09-20' as timestamp)
(I hardcoded 2020-09-20 ... it is really a value I am getting from somewhere else, irrelevant to this question, but I hardcoded it for simplicity).
As you can see, I am repeating twice CAST('2020-09-20' as timestamp). I want to avoid that, so I tried doing this:
WITH myDate as (SELECT CAST('2020-09-20' as timestamp))
SELECT COUNT(*)
FROM CompressorAlerts
WHERE myDate IS NULL OR faultTimestamp >= myDate
However, I get this error: column "mydate" does not exist.
How can I reference the myDate value I defined in the WITH clause? Am I doing something wrong?
mydate is the name of the CTE, not the name of the column expression inside it. Additionally you need to include the CTE in the FROM part of the query.
WITH params (mydate) as (
values (timestamp '2020-09-20')
)
SELECT COUNT(*)
FROM CompressorAlerts, params
WHERE mydate IS NULL
OR faultTimestamp >= mydate

Postgresql - get closest datetime row relative to given datetime value

I have a postgres table with a unique datetime field.
I would like to use/create a function that takes as argument a datetime value and returns the row id having the closest datetime relative (but not equal) to the passed datetime value. A second argument could specify before or after the passed value.
Ideally, some combination of native datetime functions could handle this requirement. Otherwise it'll have to be a custom function.
Question: What are methods for querying relative datetime over a collection of rows?
select id, passed_ts - ts_column difference
from t
where
passed_ts > ts_column and positive_interval
or
passed_ts < ts_column and not positive_interval
order by abs(extract(epoch from passed_ts - ts_column))
limit 1
passed_ts is the timestamp parameter and positive_interval is a boolean parameter. If true only rows where the timestamp column is lower then the passed timestamp. If false the inverse.
use simply -.
Assuming you have a table with attributes Key, Attr and T (timestamp with or without timezone):
you can search with
select min(T - TimeValue) from Table where (T - TimeValue) > 0;
this will give you the main difference. You can combine this value with a join to the same table to get the tuple you are interested in:
select * from (select *, T - TimeValue as diff from Table) as T1 NATURAL JOIN
( select min(T - TimeValue) as diff from Table where (T - TimeValue) > 0) as T2;
that should do it
--dmg
You want the first row of a select statement producing all the rows below (or above) the given datetime in descending (or ascending) order.
Pseudo code for the function body:
SELECT id
FROM table
WHERE IF(#above, datecol < #param, datecol > #param)
ORDER BY IF (#above. datecol ASC, datecol DESC)
LIMIT 1
However, this does not work: one cannot condition the ordering direction.
The second idea is to do both queries, and select afterwards:
SELECT *
FROM (
(
SELECT 'below' AS dir, id
FROM table
WHERE datecol < #param
ORDER BY datecol DESC
LIMIT 1
) UNION (
SELECT 'above' AS dir, id
FROM table
WHERE datecol > #param
ORDER BY datecol ASC
LIMIT 1)
) AS t
WHERE dir = #dir
That should be pretty fast with an index on the datetime column.
-- test rig
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE lutser
( dt timestamp NOT NULL PRIMARY KEY
);
-- populate it
INSERT INTO lutser(dt)
SELECT gs
FROM generate_series('2013-04-30', '2013-05-01', '1 min'::interval) gs
;
DELETE FROM lutser WHERE random() < 0.9;
--
-- The query:
WITH xyz AS (
SELECT dt AS hh
, LAG (dt) OVER (ORDER by dt ) AS ll
FROM lutser
)
SELECT *
FROM xyz bb
WHERE '2013-04-30 12:00' BETWEEN bb.ll AND bb.hh
;
Result:
NOTICE: drop cascades to table tmp.lutser
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "lutser_pkey" for table "lutser"
CREATE TABLE
INSERT 0 1441
DELETE 1288
hh | ll
---------------------+---------------------
2013-04-30 12:02:00 | 2013-04-30 11:50:00
(1 row)
Wrapping it into a function is left as an excercise for the reader
UPDATE: here is a second one with the sandwiched-not-exists-trick (TM):
SELECT lo.dt AS ll
FROM lutser lo
JOIN lutser hi ON hi.dt > lo.dt
AND NOT EXISTS (
SELECT * FROM lutser nx
WHERE nx.dt < hi.dt
AND nx.dt > lo.dt
)
WHERE '2013-04-30 12:00' BETWEEN lo.dt AND hi.dt
;
You have to join the table to itself with the where condition looking for the smallest nonzero (negative or positive) interval between the base table row's datetime and the joined table row's datetime. It would be good to have an index on that datetime column.
P.S. You could also look for the max() of the previous or the min() of the subsequent.
Try something like:
SELECT *
FROM your_table
WHERE (dt_time > argument_time and search_above = 'true')
OR (dt_time < argument_time and search_above = 'false')
ORDER BY CASE WHEN search_above = 'true'
THEN dt_time - argument_time
ELSE argument_time - dt_time
END
LIMIT 1;