postgresql filter users by age range - postgresql

Is there a better way of doing this?
Basically, I have a users table, and on of the columns is birth_date (date)
I am supposed to filter by age range, meaning, I will get a range like 18-24.
This will be passed to a function in a jsonb parameter, as an array of 2 integers.
So I have done the following
create or replace function my_filter_function(
p_search_parameters jsonb
)
returns TABLE(
user_id bigint,
birth_date date,
age interval,
years double precision
)
security definer
language plpgsql
as
$$
begin
return query
select u.user_id, u.birth_date, age(u.birth_date), date_part('year', age(u.birth_date))
from users u
where u.birth_date is not null
and ( (p_search_parameters#>>'{age,0}') is null or u.birth_date <= (now() - ((p_search_parameters#>>'{age,0}')::integer * interval '1 year'))::date)
and ( (p_search_parameters#>>'{age,1}') is null or u.birth_date >= (now() - ((p_search_parameters#>>'{age,1}')::integer * interval '1 year'))::date)
;
end;
$$;
-- this is just a aluttle helpder function to better post and explain the question
This seems to be doing the job, but was hoping to find other ways of doing this while still getting a jsonb parameter, array of 2 integers
Any ideas?

Related

Postgres on conflict

CREATE FUNCTION get_or_create_id(scheduleid integer,member_id character varying, user_id integer,role_id integer, _appointment_date timestamp without time zone,active boolean) RETURNS INT AS
$$
WITH get AS (
SELECT memid FROM member_appointment_details d WHERE appointment_date=_appointment_date
), new AS (
INSERT INTO member_appointment_details (schedule_id,memberid,userid,roleid,appointment_date,active)
VALUES (scheduleid,member_id,user_id,role_id,_appointment_date,active)
ON CONFLICT (appointment_date) DO Nothing
RETURNING memid
)
SELECT memid FROM get
UNION ALL
SELECT memid FROM new
$$
LANGUAGE sql;
table 1
appointment_details has columns :
id integer NOT NULL GENERATED ALWAYS AS IDENTITY
appointment_date timestamp without time zone
table 2
appointment_schedule
id integer NOT NULL GENERATED ALWAYS AS IDENTITY
time_duration integer,
now i want to use conflict for the appointment_date from tbl1 with given duration dynamically like 15 mins 20 mins etc
can i use join in this function ? and how to get my exact output?
currently im getting 10:33:00 like this if i give 10:34:00 it is accepting but i want 10:33:00 + 15 mins = 10:48:00 i.e it should accept time + given duration of the given time

Weighted Random Selection

Please. I have two tables with the most common first and last names. Each table has basically two fields:
Tables
CREATE TABLE "common_first_name" (
"first_name" text PRIMARY KEY, --The text representing the name
"ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.
"inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
"updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);
CREATE TABLE "common_last_name" (
"last_name" text PRIMARY KEY, --The text representing the name
"ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.
"inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
"updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);
P.S: The TOP 1 name occurs only ~ 1.8% of the time. The tables have 1000 rows each.
Function (Pseudo, not READY)
CREATE OR REPLACE FUNCTION create_sample_data(p_number_of_records INT)
RETURNS VOID
AS $$
DECLARE
SUM_OF_WEIGHTS CONSTANT INT := 100;
BEGIN
FOR i IN 1..coalesce(p_number_of_records, 0) LOOP
--Get the random first and last name but taking in consideration their probability (RATIO)round(random()*SUM_OF_WEIGHTS);
--create_person (random_first_name || ' ' || random_last_name);
END LOOP;
END
$$
LANGUAGE plpgsql VOLATILE;
P.S.: The sum of all ratios for each name (per table) sums up to 100%.
I want to run a function N times and get a name and a surname to create sample data... both tables have 1000 rows each.
The sample size can be anywhere from 1000 full names to 1000000 names, so if there is a "fast" way of doing this random weighted function, even better.
Any suggestion of how to do it in PL/PGSQL?
I am using PG 13.3 on SUPABASE.IO.
Thanks
Given the small input dataset, it's straightforward to do this in pure SQL. Use CTEs to build lower & upper bound columns for each row in each of the common_FOO_name tables, then use generate_series() to generate sets of random numbers. Join everything together, and use the random value between the bounds as the WHERE clause.
with first_names_weighted as (
select first_name,
sum(ratio) over (order by first_name) - ratio as lower_bound,
sum(ratio) over (order by first_name) as upper_bound
from common_first_name
),
last_names_weighted as (
select last_name,
sum(ratio) over (order by last_name) - ratio as lower_bound,
sum(ratio) over (order by last_name) as upper_bound
from common_last_name
),
randoms as (
select random() * (select sum(ratio) from common_first_name) as f_random,
random() * (select sum(ratio) from common_last_name) as l_random
from generate_series(1, 32)
)
select r, first_name, last_name
from randoms r
cross join first_names_weighted f
cross join last_names_weighted l
where f.lower_bound <= r.f_random and r.f_random <= f.upper_bound
and l.lower_bound <= r.l_random and r.l_random <= l.upper_bound;
Change the value passed to generate_series() to control how many names to generate. If it's important that it be a function, you can just use a LANGAUGE SQL function definition to parameterize that number:
https://www.db-fiddle.com/f/mmGQRhCP2W1yfhZTm1yXu5/3

Why won't Postgres filter my date range partitions?

I have a table that uses declarative partitioning (w00t!) to partition tables by date range - one year in my case.
When I query against the table - SELECT * FROM tbl WHERE date > date '2016-01-01', it works exactly as intended; only tables containing newer data are scanned.
When I specify a date using variables or functions (CURRENT_DATE, NOW(), etc), EXPLAIN says it scans every partition.
Things that work as intended:
SELECT * FROM tbl WHERE date > date '2016-01-01'
--
SELECT * FROM tbl WHERE date > '2016-01-01'::date
Things that scan all partitions unnecessarily:
SELECT * FROM tbl WHERE date > CURRENT_DATE
--
SELECT * FROM tbl WHERE date > NOW()
--
SELECT * FROM tbl WHERE date > (NOW() - 365)::date
--
SELECT * FROM tbl WHERE date > (SELECT (NOW()::date - 365)::date AS d)
-- Even CTEs are no dice:
WITH a AS (SELECT CURRENT_DATE AS d)
SELECT * FROM tbl, a WHERE date > a.d
-- Same with JOINs
SELECT w.*
FROM (CURRENT_DATE - 365 as d) a
LEFT JOIN wtf w ON w.date > a.d
..etc
I get the same behavior with other comparison operators - =, <, etc.
The docs say I don't need an idx on the field (which I don't anyways). I added one just in case and it did not help.
Why is this happening, and what can I do to stop it (preferably without adding complication to a simple query)?
Thanks to JustMe for answering this- see the comments on the OP.
The issue lies with when NOW() and CURRENT_TIMESTAMP are evaluated in relation to FROM; it's the same issue you see when you try to filter in a join ala WHERE join_table.a > from_table.b.
Supposing today is Jan 1, 1970, these queries
SELECT * FROM my_stuff WHERE date > NOW()::date;
--
SELECT * FROM my_stuff WHERE date > '1970-01-01'::date;
will necessarily produce an identical resultset but will not necessarily be evaluated in an identical way.
That's why this is happening and unfortunately, there doesn't seem to be a simple way to stop it. A function seems to be the best-ish option:
CREATE OR REPLACE FUNCTION myfunc()
RETURNS setof tbl
LANGUAGE 'plpgsql'
AS $$
DECLARE
n date := CURRENT_DATE - 365;
BEGIN
RETURN query EXECUTE $a$
SELECT * FROM tbl
WHERE date > $1;
$a$ using n;
END $$;
You can test this by changing RETURNS setof tbl to RETURNS setof text and SELECT... to EXPLAIN SELECT...

How do I join rows in Postgres?

I have a table of reservations in Postgres and I want to generate a table that has a row per month and shows earnings (and a lot of other things left out here for simplicity) for each year in a column.
I can do it by hard coding years, but there must be a better way. How do I do this to scale to x number of years?
Thanks!
CREATE TABLE reservations (
checkin date NOT NULL,
earnings integer
-- other data fields omitted
);
CREATE OR REPLACE FUNCTION compareYears()
RETURNS TABLE(month double precision, earnings_2016 bigint, earnings_2017 bigint)
AS
$$
BEGIN
RETURN QUERY
with
r2017 as (SELECT
date_part('month', reservations.checkin) AS month,
sum(reservations.earnings) as earnings_2017
FROM cd.reservations
WHERE date_part('year', reservations.checkin) = 2017
GROUP by date_part('month', reservations.checkin)),
r2016 as (SELECT
date_part('month', reservations.checkin) AS month,
sum(reservations.earnings) as earnings_2016
FROM cd.reservations
WHERE date_part('year', reservations.checkin) = 2016
GROUP by date_part('month', reservations.checkin))
SELECT r2017.month, r2016.earnings_2016, r2017.earnings_2017
FROM r2016, r2017
WHERE r2017.month = r2016.month;
END;
$$
LANGUAGE 'plpgsql' VOLATILE;
Have a look at crosstab. I think its exactly what you need.
Here's an example

Extracting the number of days from a calculated interval

I am trying to get a query like the following one to work:
SELECT EXTRACT(DAY FROM INTERVAL to_date - from_date) FROM histories;
In the referenced table, to_date and from_date are of type timestamp without time zone. A regular query like
SELECT to_date - from_date FROM histories;
Gives me interval results such as '65 days 04:58:09.99'. But using this expression inside the first query gives me an error: invalid input syntax for type interval. I've tried various quotations and even nesting the query without luck. Can this be done?
SELECT EXTRACT(DAY FROM INTERVAL to_date - from_date) FROM histories;
This makes no sense. INTERVAL xxx is syntax for interval literals. So INTERVAL from_date is a syntax error, since from_date isn't a literal. If your code really looks more like INTERVAL '2012-02-01' then that's going to fail, because 2012-02-01 is not valid syntax for an INTERVAL.
The INTERVAL keyword here is just noise. I suspect you misunderstood an example from the documentation. Remove it and the expression will be fine.
I'm guessing you're trying to get the number of days between two dates represented as timestamp or timestamptz.
If so, either cast both to date:
SELECT to_date::date - from_date::date FROM histories;
or get the interval, then extract the day component:
SELECT extract(day from to_date - from_date) FROM histories;
This example demontrates the creation of a table with trigger which updates the difference between a stop_time and start_time in DDD HH24:MI:SS format where the DDD stands for the amount of dates ...
DROP TABLE IF EXISTS benchmarks ;
SELECT 'create the "benchmarks" table'
;
CREATE TABLE benchmarks (
guid UUID NOT NULL DEFAULT gen_random_uuid()
, id bigint UNIQUE NOT NULL DEFAULT cast (to_char(current_timestamp, 'YYMMDDHH12MISS') as bigint)
, git_hash char (8) NULL DEFAULT 'hash...'
, start_time timestamp NOT NULL DEFAULT DATE_TRUNC('second', NOW())
, stop_time timestamp NOT NULL DEFAULT DATE_TRUNC('second', NOW())
, diff_time varchar (20) NOT NULL DEFAULT 'HH:MI:SS'
, update_time timestamp DEFAULT DATE_TRUNC('second', NOW())
, CONSTRAINT pk_benchmarks_guid PRIMARY KEY (guid)
) WITH (
OIDS=FALSE
);
create unique index idx_uniq_benchmarks_id on benchmarks (id);
-- START trigger trg_benchmarks_upsrt_diff_time
-- hrt = human readable time
CREATE OR REPLACE FUNCTION fnc_benchmarks_upsrt_diff_time()
RETURNS TRIGGER
AS $$
BEGIN
-- NEW.diff_time = age(NEW.stop_time::timestamp-NEW.start_time::timestamp);
NEW.diff_time = to_char(NEW.stop_time-NEW.start_time, 'DDD HH24:MI:SS');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_benchmarks_upsrt_diff_time
BEFORE INSERT OR UPDATE ON benchmarks
FOR EACH ROW EXECUTE PROCEDURE fnc_benchmarks_upsrt_diff_time();
--
-- STOP trigger trg_benchmarks_upsrt_diff_time
Just remove the keyword INTERVAL:
SELECT EXTRACT(DAY FROM to_date - from_date) FROM histories;