How do I join rows in Postgres? - postgresql

I have a table of reservations in Postgres and I want to generate a table that has a row per month and shows earnings (and a lot of other things left out here for simplicity) for each year in a column.
I can do it by hard coding years, but there must be a better way. How do I do this to scale to x number of years?
Thanks!
CREATE TABLE reservations (
checkin date NOT NULL,
earnings integer
-- other data fields omitted
);
CREATE OR REPLACE FUNCTION compareYears()
RETURNS TABLE(month double precision, earnings_2016 bigint, earnings_2017 bigint)
AS
$$
BEGIN
RETURN QUERY
with
r2017 as (SELECT
date_part('month', reservations.checkin) AS month,
sum(reservations.earnings) as earnings_2017
FROM cd.reservations
WHERE date_part('year', reservations.checkin) = 2017
GROUP by date_part('month', reservations.checkin)),
r2016 as (SELECT
date_part('month', reservations.checkin) AS month,
sum(reservations.earnings) as earnings_2016
FROM cd.reservations
WHERE date_part('year', reservations.checkin) = 2016
GROUP by date_part('month', reservations.checkin))
SELECT r2017.month, r2016.earnings_2016, r2017.earnings_2017
FROM r2016, r2017
WHERE r2017.month = r2016.month;
END;
$$
LANGUAGE 'plpgsql' VOLATILE;

Have a look at crosstab. I think its exactly what you need.
Here's an example

Related

Trigger taking time to insert data in postgres (column count 300)

I have created a trigger, it is taking more time while inserting multiple records.
Insetting 1 or 2 records is working. But if the records are more than 1000 then not fast, still running query from 2 hours.
I have created only 15 columns in below table. My actual table has 300 columns.
Is any other way to insert multiple records on the trigger table.?
Table
create table patients (
id serial,
name character varying (50),
daily varchar (8),
month varchar (6),
quarter varchar (6),
registration_date timestamp,
age integer,
address text,
country text,
city text,
phone_number integer,
Education text,
Occupation text,
Marital_Status text,"E-mail" text
);
trigger function
CREATE OR REPLACE FUNCTION update_data_after_insert_data_into_patients()
RETURNS trigger AS
$$BEGIN
update patients t1
set quarter=t2.quarter
from (SELECT (extract(year from registration_date)::text || 'Q' || extract(quarter from registration_date)::text) as quarter,registration_date
from patients) t2 where t1.registration_date =t2.registration_date;
update patients t1
set month=t2.month
from (select (extract(year from registration_date)::text || '' || to_char(registration_date,'MM')) as month,registration_date
from patients) t2 where t1.registration_date =t2.registration_date;
update patients t1
set daily=t2.daily
from (select extract(year from registration_date) || '' ||to_char(registration_date,'MM') || '' || to_char(registration_date,'DD') as daily,registration_date
from patients) t2 where t1.registration_date =t2.registration_date;
RETURN new;
END;
$$ LANGUAGE plpgsql;
Trigger definition
create TRIGGER trigger_update_data_after_insert_patients
AFTER insert ON patients
FOR EACH ROW
EXECUTE PROCEDURE update_data_after_insert_data_into_patients();
insert multiple records into patients table
INSERT INTO public.patients
("name", daily, "month", quarter, registration_date, age, address, country, city, phone_number, education, occupation, marital_status, "E-mail")
VALUES('Adam', '20221215', '202212', '2022Q4', '2022-08-17 19:01:10-08', 24, '', '', '', 1245578, '', '', '', '');
select statement
select * from patients;
You are updating all rows in the table with the same registration date as the one provided in the insert three times - just to calculate those generated columns.
You can do this more efficiently by assigning the generated values to the NEW record in a BEFORE trigger.
CREATE OR REPLACE FUNCTION update_data_after_insert_data_into_patients()
RETURNS trigger AS
$$
BEGIN
new.quarter := to_char(new.registration_date, 'yyyy"Q"q');
new.month := to_char(new.registration_date, 'yyyy mm');
new.daily := to_char(new.registration_date, 'yyyymmdd');
RETURN new;
END;
$$
LANGUAGE plpgsql;
create TRIGGER trigger_update_data_after_insert_patients
BEFORE insert ON patients
FOR EACH ROW
EXECUTE PROCEDURE update_data_after_insert_data_into_patients();
However I don't see the need to store these calculated values when you can easily format the registration_date when retrieving the data. I would get rid of those columns and the trigger and create a VIEW that does the formatting.

postgresql filter users by age range

Is there a better way of doing this?
Basically, I have a users table, and on of the columns is birth_date (date)
I am supposed to filter by age range, meaning, I will get a range like 18-24.
This will be passed to a function in a jsonb parameter, as an array of 2 integers.
So I have done the following
create or replace function my_filter_function(
p_search_parameters jsonb
)
returns TABLE(
user_id bigint,
birth_date date,
age interval,
years double precision
)
security definer
language plpgsql
as
$$
begin
return query
select u.user_id, u.birth_date, age(u.birth_date), date_part('year', age(u.birth_date))
from users u
where u.birth_date is not null
and ( (p_search_parameters#>>'{age,0}') is null or u.birth_date <= (now() - ((p_search_parameters#>>'{age,0}')::integer * interval '1 year'))::date)
and ( (p_search_parameters#>>'{age,1}') is null or u.birth_date >= (now() - ((p_search_parameters#>>'{age,1}')::integer * interval '1 year'))::date)
;
end;
$$;
-- this is just a aluttle helpder function to better post and explain the question
This seems to be doing the job, but was hoping to find other ways of doing this while still getting a jsonb parameter, array of 2 integers
Any ideas?

Weighted Random Selection

Please. I have two tables with the most common first and last names. Each table has basically two fields:
Tables
CREATE TABLE "common_first_name" (
"first_name" text PRIMARY KEY, --The text representing the name
"ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.
"inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
"updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);
CREATE TABLE "common_last_name" (
"last_name" text PRIMARY KEY, --The text representing the name
"ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.
"inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
"updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);
P.S: The TOP 1 name occurs only ~ 1.8% of the time. The tables have 1000 rows each.
Function (Pseudo, not READY)
CREATE OR REPLACE FUNCTION create_sample_data(p_number_of_records INT)
RETURNS VOID
AS $$
DECLARE
SUM_OF_WEIGHTS CONSTANT INT := 100;
BEGIN
FOR i IN 1..coalesce(p_number_of_records, 0) LOOP
--Get the random first and last name but taking in consideration their probability (RATIO)round(random()*SUM_OF_WEIGHTS);
--create_person (random_first_name || ' ' || random_last_name);
END LOOP;
END
$$
LANGUAGE plpgsql VOLATILE;
P.S.: The sum of all ratios for each name (per table) sums up to 100%.
I want to run a function N times and get a name and a surname to create sample data... both tables have 1000 rows each.
The sample size can be anywhere from 1000 full names to 1000000 names, so if there is a "fast" way of doing this random weighted function, even better.
Any suggestion of how to do it in PL/PGSQL?
I am using PG 13.3 on SUPABASE.IO.
Thanks
Given the small input dataset, it's straightforward to do this in pure SQL. Use CTEs to build lower & upper bound columns for each row in each of the common_FOO_name tables, then use generate_series() to generate sets of random numbers. Join everything together, and use the random value between the bounds as the WHERE clause.
with first_names_weighted as (
select first_name,
sum(ratio) over (order by first_name) - ratio as lower_bound,
sum(ratio) over (order by first_name) as upper_bound
from common_first_name
),
last_names_weighted as (
select last_name,
sum(ratio) over (order by last_name) - ratio as lower_bound,
sum(ratio) over (order by last_name) as upper_bound
from common_last_name
),
randoms as (
select random() * (select sum(ratio) from common_first_name) as f_random,
random() * (select sum(ratio) from common_last_name) as l_random
from generate_series(1, 32)
)
select r, first_name, last_name
from randoms r
cross join first_names_weighted f
cross join last_names_weighted l
where f.lower_bound <= r.f_random and r.f_random <= f.upper_bound
and l.lower_bound <= r.l_random and r.l_random <= l.upper_bound;
Change the value passed to generate_series() to control how many names to generate. If it's important that it be a function, you can just use a LANGAUGE SQL function definition to parameterize that number:
https://www.db-fiddle.com/f/mmGQRhCP2W1yfhZTm1yXu5/3

Why won't Postgres filter my date range partitions?

I have a table that uses declarative partitioning (w00t!) to partition tables by date range - one year in my case.
When I query against the table - SELECT * FROM tbl WHERE date > date '2016-01-01', it works exactly as intended; only tables containing newer data are scanned.
When I specify a date using variables or functions (CURRENT_DATE, NOW(), etc), EXPLAIN says it scans every partition.
Things that work as intended:
SELECT * FROM tbl WHERE date > date '2016-01-01'
--
SELECT * FROM tbl WHERE date > '2016-01-01'::date
Things that scan all partitions unnecessarily:
SELECT * FROM tbl WHERE date > CURRENT_DATE
--
SELECT * FROM tbl WHERE date > NOW()
--
SELECT * FROM tbl WHERE date > (NOW() - 365)::date
--
SELECT * FROM tbl WHERE date > (SELECT (NOW()::date - 365)::date AS d)
-- Even CTEs are no dice:
WITH a AS (SELECT CURRENT_DATE AS d)
SELECT * FROM tbl, a WHERE date > a.d
-- Same with JOINs
SELECT w.*
FROM (CURRENT_DATE - 365 as d) a
LEFT JOIN wtf w ON w.date > a.d
..etc
I get the same behavior with other comparison operators - =, <, etc.
The docs say I don't need an idx on the field (which I don't anyways). I added one just in case and it did not help.
Why is this happening, and what can I do to stop it (preferably without adding complication to a simple query)?
Thanks to JustMe for answering this- see the comments on the OP.
The issue lies with when NOW() and CURRENT_TIMESTAMP are evaluated in relation to FROM; it's the same issue you see when you try to filter in a join ala WHERE join_table.a > from_table.b.
Supposing today is Jan 1, 1970, these queries
SELECT * FROM my_stuff WHERE date > NOW()::date;
--
SELECT * FROM my_stuff WHERE date > '1970-01-01'::date;
will necessarily produce an identical resultset but will not necessarily be evaluated in an identical way.
That's why this is happening and unfortunately, there doesn't seem to be a simple way to stop it. A function seems to be the best-ish option:
CREATE OR REPLACE FUNCTION myfunc()
RETURNS setof tbl
LANGUAGE 'plpgsql'
AS $$
DECLARE
n date := CURRENT_DATE - 365;
BEGIN
RETURN query EXECUTE $a$
SELECT * FROM tbl
WHERE date > $1;
$a$ using n;
END $$;
You can test this by changing RETURNS setof tbl to RETURNS setof text and SELECT... to EXPLAIN SELECT...

Extracting the number of days from a calculated interval

I am trying to get a query like the following one to work:
SELECT EXTRACT(DAY FROM INTERVAL to_date - from_date) FROM histories;
In the referenced table, to_date and from_date are of type timestamp without time zone. A regular query like
SELECT to_date - from_date FROM histories;
Gives me interval results such as '65 days 04:58:09.99'. But using this expression inside the first query gives me an error: invalid input syntax for type interval. I've tried various quotations and even nesting the query without luck. Can this be done?
SELECT EXTRACT(DAY FROM INTERVAL to_date - from_date) FROM histories;
This makes no sense. INTERVAL xxx is syntax for interval literals. So INTERVAL from_date is a syntax error, since from_date isn't a literal. If your code really looks more like INTERVAL '2012-02-01' then that's going to fail, because 2012-02-01 is not valid syntax for an INTERVAL.
The INTERVAL keyword here is just noise. I suspect you misunderstood an example from the documentation. Remove it and the expression will be fine.
I'm guessing you're trying to get the number of days between two dates represented as timestamp or timestamptz.
If so, either cast both to date:
SELECT to_date::date - from_date::date FROM histories;
or get the interval, then extract the day component:
SELECT extract(day from to_date - from_date) FROM histories;
This example demontrates the creation of a table with trigger which updates the difference between a stop_time and start_time in DDD HH24:MI:SS format where the DDD stands for the amount of dates ...
DROP TABLE IF EXISTS benchmarks ;
SELECT 'create the "benchmarks" table'
;
CREATE TABLE benchmarks (
guid UUID NOT NULL DEFAULT gen_random_uuid()
, id bigint UNIQUE NOT NULL DEFAULT cast (to_char(current_timestamp, 'YYMMDDHH12MISS') as bigint)
, git_hash char (8) NULL DEFAULT 'hash...'
, start_time timestamp NOT NULL DEFAULT DATE_TRUNC('second', NOW())
, stop_time timestamp NOT NULL DEFAULT DATE_TRUNC('second', NOW())
, diff_time varchar (20) NOT NULL DEFAULT 'HH:MI:SS'
, update_time timestamp DEFAULT DATE_TRUNC('second', NOW())
, CONSTRAINT pk_benchmarks_guid PRIMARY KEY (guid)
) WITH (
OIDS=FALSE
);
create unique index idx_uniq_benchmarks_id on benchmarks (id);
-- START trigger trg_benchmarks_upsrt_diff_time
-- hrt = human readable time
CREATE OR REPLACE FUNCTION fnc_benchmarks_upsrt_diff_time()
RETURNS TRIGGER
AS $$
BEGIN
-- NEW.diff_time = age(NEW.stop_time::timestamp-NEW.start_time::timestamp);
NEW.diff_time = to_char(NEW.stop_time-NEW.start_time, 'DDD HH24:MI:SS');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_benchmarks_upsrt_diff_time
BEFORE INSERT OR UPDATE ON benchmarks
FOR EACH ROW EXECUTE PROCEDURE fnc_benchmarks_upsrt_diff_time();
--
-- STOP trigger trg_benchmarks_upsrt_diff_time
Just remove the keyword INTERVAL:
SELECT EXTRACT(DAY FROM to_date - from_date) FROM histories;