Compare counts in PostgreSQL - postgresql

I want to compare two results of queries of the same table, by checking theresulting row count, but Postgres doesn't support column aliases in the where clause.
select id from article where version=1308
and exists(
select count(ident) as count1 from artprice AS p1
where p1.valid_to<to_timestamp(1586642400000) or p1.valid_from>to_timestamp(1672441199000)
and p1.article=article.id
and p1.count1=(select count(ident) from artprice where article=article.id)
)
I also cannot use aggregate functions in the where clause, so
select id from article where version=1308
and exists(
select count(ident) as count1 from artprice AS p1
where p1.valid_to<to_timestamp(1586642400000) or p1.valid_from>to_timestamp(1672441199000)
and p1.article=article.id
and p1.count(ident)=(select count(ident) from artprice where article=article.id)
)
also doesn't work. Any ideas?
EDIT:
What I want to get are articles where every article price is outside of a valid range defined by validFrom andValidTo.

I now changed the statement by negating the positive conditions:
Select distinct article.id from Article article, ArtPrice price
where
(
(article.version=?)
and
(
(
(
(
(not(price.valid_from>=?)) or (not(price.valid_to<=?))
)
and
(
(not(price.valid_from<=?)) or (not(price.valid_to>=?))
)
)
and
(
(not(price.valid_to>=?)) or (not(price.valid_to<=?))
)
)
and
(
(not(price.valid_from>=?)) or (not(price.valid_from<=?))
)
)
) and article.id=price.article
Probably not the very elegant solution, but it works.

Aggregates are not allowed in WHERE clause, but there's HAVING clause for them.
EDIT: What I want to get are articles where every article price is outside of a valid range defined by validFrom andValidTo.
I think that bool_or() would be a good fit here when combined with range operations:
SELECT article.id
FROM Article AS article
JOIN ArtPrice AS price ON price.article = article.id
WHERE article.version = 1308
GROUP BY article.id
HAVING NOT bool_or(tsrange(price.valid_from, price.valid_to)
&& tsrange(to_timestamp(1586642400000),
to_timestamp(1672441199000)))
This reads as "...those having not any price tsrange overlap with given tsrange".
Postgresql also supports the SQL OVERLAPS operator:
(price.valid_from, price.valid_to) OVERLAPS (to_timestamp(1586642400000),
to_timestamp(1672441199000))
As a note, it operates on half-open intervals start <= time < end.

Related

How to identify and list Circular Reference elements in Postgres

I have an employee , manager hierarchy which could end up being circular.
Ex:
28397468N>88518119N>87606705N>28397468N
Create Table emp_manager ( Emp_id varchar(30), Manager_id varchar(30));
Insert into emp_manager values ('28397468N','88518119N');
Insert into emp_manager values ('88518119N','87606705N');
Insert into emp_manager values ('87606705N','28397468N');
My requirement is:
When my proc is called and there are circular hierarchies in the emp_manager table, we should return an error listing the employees in the hierarchy.
The below link contains some useful info:
https://mccalljt.io/blog/2017/01/postgres-circular-references/
I have modified it as below:
select * from (
WITH RECURSIVE circular_managers(Emp_id, Manager_id, depth, path, cycle) AS (
SELECT u.Emp_id, u.Manager_id, 1,
ARRAY[u.Emp_id],
false
FROM emp_manager u
UNION ALL
SELECT u.Emp_id, u.Manager_id, cm.depth + 1,
(path || u.Emp_id)::character varying(32)[],
u.Emp_id = ANY(path)
FROM emp_manager u, circular_managers cm
WHERE u.Emp_id = cm.Manager_id AND NOT cycle
)
select
distinct (path) d
FROM circular_managers
WHERE cycle
AND path[1] = path[array_upper(path, 1)]) cm
BUT, the problem is, it is returning all combinations of the hierarchy:
{28397468N,88518119N,87606705N,28397468N}
{87606705N,28397468N,88518119N,87606705N}
{88518119N,87606705N,28397468N,88518119N}
I need a simple answer like this:
28397468N>88518119N>87606705N>28397468N
even this will do:
28397468N>88518119N>87606705N
Please help!
So all references:
{28397468N,88518119N,87606705N,28397468N}
{87606705N,28397468N,88518119N,87606705N}
{88518119N,87606705N,28397468N,88518119N}
are correct but just start from different element.
I need a simple answer like this: 28397468N>88518119N>87606705N>28397468N
So what's needed is a filter for the same circle refs.
Let's do that in a way:
sort distinct items in arrays
aggregate them back - so for all references it will be '{28397468N,87606705N,88518119N}'
use produced value for DISTINCT FIRST_VALUE
WITH D (circle_ref ) AS (
VALUES
('{28397468N,88518119N,87606705N,28397468N}'::text[]),
('{87606705N,28397468N,88518119N,87606705N}'::text[]),
('{88518119N,87606705N,28397468N,88518119N}'::text[])
), ordered AS (
SELECT
D.circle_ref,
(SELECT ARRAY_AGG(DISTINCT el ORDER BY el) FROM UNNEST(D.circle_ref) AS el ) AS ordered_circle
FROM
D
)
SELECT DISTINCT
FIRST_VALUE (circle_ref) OVER (PARTITION BY ordered_circle ORDER BY circle_ref) AS circle_ref
FROM
ordered;
circle_ref
{28397468N,88518119N,87606705N,28397468N}
DB Fiddle: https://www.db-fiddle.com/f/6ytb2v11s8T95PPLoTZZed/0
To prevent circular references, you can use a closure table and a trigger - as explained in https://stackoverflow.com/a/38701519/5962802
The closure table will also allow you to easily get all subordinates for a given supervisor (no matter how deep in the hierarchy) - or all direct bosses of a given employee (up to the root).
Before using the rebuild_tree stored procedure you will have to remove all circular references from the hierarchy.

Comparing two text array columns using % and ordering by number of matches

I have a user table that contains a "skills" column which is a text array. Given some input array, I would like to find all the users whose skills % one or more of the entries in the input array, and order by number of matches (according to the % operator from pg_trgm).
For example, I have Array['java', 'ruby', 'postgres'] and I want users who have these skills ordered by the number of matches (max is 3 in this case).
I tried unnest() with an inner join. It looked like I was getting somewhere, but I still have no idea how I can capture the count of the matching array entries. Any ideas on what the structure of the query may look like?
Edit: Details:
Here is what my programmers table looks like:
id | skills
----+-------------------------------
1 | {javascript,rails,css}
2 | {java,"ruby on rails",adobe}
3 | {typescript,nodejs,expressjs}
4 | {auth0,c++,redis}
where skills is a text array.
Here is what I have so far:
SELECT * FROM programmers, unnest(skills) skill_array(x)
INNER JOIN unnest(Array['ruby', 'node']) search(y)
ON skill_array.x % search.y;
which outputs the following:
id | skills | x | y
----+-------------------------------+---------------+---------
2 | {java,"ruby on rails",adobe} | ruby on rails | ruby
3 | {typescript,nodejs,expressjs} | nodejs | node
3 | {typescript,nodejs,expressjs} | expressjs | express
*Assuming pg_trgm is enabled.
For an exact match between the user skills and the searched skills, you can proceed like this :
You put the searched skills in the target_skills text array
You filter the users from the table user_table whose user_skills array has at least one common element with the target_skills array by using the && operator
For each of the selected users, you select the common skills by using unnest and INTERSECT, and you calculate the number of these common skills
You order the result by the number of common skills DESC
In this process, the users with skill "ruby" will be selected for the target skill "ruby", but not the users with skill "ruby on rails".
This process can be implemented as follow :
SELECT u.user_id
, u.user_skills
, inter.skills
FROM user_table AS u
CROSS JOIN LATERAL
( SELECT array( SELECT unnest(u.user_skills)
INTERSECT
SELECT unnest(target_skills)
) AS skills
) AS inter
WHERE u.user_skills && target_skills
ORDER BY array_length(inter.skills, 1) DESC
or with this variant :
SELECT u.user_id
, u.user_skills
, array_agg(t_skill) AS inter_skills
FROM user_table AS u
CROSS JOIN LATERAL unnest(target_skills) AS t_skill
WHERE u.user_skills && array[t_skill]
GROUP BY u.user_id, u.user_skills
ORDER BY array_length(inter_skills, 1) DESC
This query can be accelerated by creating a GIN index on the user_skills column of the user_table.
For a partial match between the user skills and the target skills (ie the users with skill "ruby on rails" must be selected for the target skill "ruby"), you need to use the pattern matching operator LIKE or the regular expression, but it is not possible to use them with text arrays, so you need first to transform your user_skills text array into a simple text with the function array_to_string. The query becomes :
SELECT u.user_id
, u.user_skills
, array_agg(t_skill) AS inter_skills
FROM user_table AS u
CROSS JOIN unnest(target_skills) AS t_skill
WHERE array_to_string(u.user_skills, ' ') ~ t_skill
GROUP BY u.user_id, u.user_skills
ORDER BY array_length(inter_skills, 1) DESC ;
Then you can accelerate the queries by creating the following GIN (or GiST) index :
DROP INDEX IF EXISTS user_skills ;
CREATE INDEX user_skills
ON user_table
USING gist (array_to_string(user_skills, ' ') gist_trgm_ops) ; -- gin_trgm_ops and gist_trgm_ops indexes are compliant with the LIKE operator and the regular expressions
In any case, managing the skills as text will ever fail if there are typing errors or if the skills list is not normalized.
I accepted Edouard's answer, but I thought I'd show something else I adapted from it.
CREATE OR REPLACE FUNCTION partial_and_and(list1 TEXT[], list2 TEXT[])
RETURNS BOOLEAN AS $$
SELECT EXISTS(
SELECT * FROM unnest(list1) x, unnest(list2) y
WHERE x % y
);
$$ LANGUAGE SQL IMMUTABLE;
Then create the operator:
CREATE OPERATOR &&% (
LEFTARG = TEXT[],
RIGHTARG = TEXT[],
PROCEDURE = partial_and_and,
COMMUTATOR = &&%
);
And finally, the query:
SELECT p.id, p.skills, array_agg(t_skill) AS inter_skills
FROM programmers AS p
CROSS JOIN LATERAL unnest(Array['ruby', 'java']) AS t_skill
WHERE p.skills &&% array[t_skill]
GROUP BY p.id, p.skills
ORDER BY array_length(inter_skills, 1) DESC;
This will output an error saying column 'inter_skills' does not exist (not sure why), but oh well point is the query seems to work. All credit goes to Edouard.

How to get ids of grouped by rows in postgresql and use result?

I have a table containing transactions with an amount. I want to create a batch of transactions so that the sum of amount of each 'group by' is negative.
My problematic is to get all ids of the rows concerned by a 'group by' where each group is validate by a sum condition.
I find many solutions which don't work for me.
The best solution I found is to request the db a first time with the 'group by' and the sum, then return ids to finally request the db another time with all of them.
Here an example of what I would like (it doesn't work!) :
SELECT * FROM transaction_table transaction
AND transaction.id IN (
select string_agg(grouped::character varying, ',' ) from (
SELECT array_agg(transaction2.id) as grouped FROM transaction_table transaction2
WHERE transaction2.c_scte='c'
AND (same conditions)
GROUP BY
transaction2.motto ,
transaction2.accountBnf ,
transaction2.payment ,
transaction2.accountClt
HAVING sum(transaction2.amount)<0
)
);
the result of the array_agg is like:
{39758,39759}
{39757,39756,39755,39743,39727,39713}
and the string_agg is :
{39758,39759},{39757,39756,39755,39743,39727,39713}
Now I just need to use them but I don't know how to...
unfortunatly, it doesn't work because of type casting :
ERROR: operator does not exist: integer = integer[]
IndiceĀ : No operator matches the given name and argument type(s). You might need to add explicit type casts.
Maybe you are looking for
SELECT id, motto, accountbnf, payment, accountclnt, amount
FROM (SELECT id, motto, accountbnf, payment, accountclnt, amount,
sum(amount)
OVER (PARTITION BY motto, accountbnf, payment, accountclnt)
AS group_total
FROM transaction_table) AS q
WHERE group_total < 0;
The inner SELECT adds an additional column using a window function that calculates the sum for each group, and the outer query removes all results where that sum is not negative.
Finally I found this option using the 'unnest' method. It works perfectly.
Array_agg bring together all ids in different array
unnest flattened all of them
This comes from here
SELECT * FROM transaction_table transaction
WHERE transaction.id = ANY(
SELECT unnest(array_agg(transaction2.id)) as grouped FROM transaction_table transaction2
WHERE transaction2.c_scte='c'
AND (same conditions)
GROUP BY
transaction2.motto ,
transaction2.accountBnf ,
transaction2.payment ,
transaction2.accountClt
HAVING sum(transaction2.amount)<0
);
The problem with this solution is that hibernate doesn't take into account the array_agg method.

Greatest N per group in Open SQL

Selecting the rows from a table by (partial) key with the maximum value in a particular column is a common task in SQL. This question has some excellent answers that cover a variety of approaches to it. Unfortunately I'm struggling to replicate this in my ABAP program.
None of the commonly used approaches seem to be supported:
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons: SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
After trying each of these solutions in turn only to discover that ABAP doesn't seem to support them and being unable to find any equivalents I'm starting to think that I'll have no choice but to dump the data of the subquery to an itab.
What is the best practice for this common programming requirement in ABAP development?
First of all, specific requirement, would give you a better answer. As it happens I bumped into this question when working on a program, that uses 3 distinct methods of pseudo-grouping, (while looking for alternatives) and ALL 3 can be used to answer your question, depending on what exactly you need to do. I'm sure there are more ways to do it.
For instance, you can pull maximum values within a group by simply selecting max( your_field ) and grouping by some fields, if that's all you need.
select bname, nation, max( date_from ) from adrp group by bname, nation. "selects highest "from" date for each bname
If you need to use that max value as a filter condition within a query, you can do it by performing pseudo-grouping using sub-query and max within sub-query like this (notice how I move out the BNAME check into sub query, which means I don't have to check both fields using in (subquery) addition):
select ... from adrp as b_adrp "Pulls the latest person info for a user (some conditions are missing, but this is a part of an actual query)
where b_adrp~date_from in (
select max( date_from ) "Highest date_from where both dates are valid
from adrp where persnumber = b_adrp~persnumber and nation = b_adrp~nation and date_from <= #sy-datum )
The query above allows you to select selects all user info from base query and (where the first one only allows to take aggregated and grouped data).
Finally, If you need to check based on composite key and compare it to multiple agregate function results, the implementation will heavily depend on specifics of your requirement (and since your question has none, I'll provide a generic one). Easiest option is to use exists / not exists instead of in (subquery), in exact same way and form the subquery to check for existance of specific key or condition rather than pull a list ( you can nest subqueries if you have to ):
select * from bkpf where exists ( select 1 from bkpf as b where belnr = bkpf~belnr and gjahr = bkpf~gjahr group by belnr, gjahr having max( budat ) = bkpf~budat ) "Took an available example, that I had in testing program.
All 3 queries will get you max value of a column within a group and in fact, all 3 can use joins to achieve identical results.
please find my answers below your questions.
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Putting the subquery in your where condition should do the work SELECT * FROM X AS x INNER JOIN Y AS y ON x~a = y~b WHERE ( SELECT * FROM y WHERE ... )
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
You have to split your WHERE clause: SELECT * FROM X WHERE key1 IN ( SELECT key1 FROM y ) AND key2 IN ( SELECT key2 FROM y )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons.
Yes, thats right at the moment.
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons:
SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
This is not true. This SELECT is perfectly valid:
SELECT b1~budat
INTO TABLE lt_bkpf
FROM bkpf AS b1
LEFT JOIN bkpf AS b2
ON b2~belnr < b1~belnr
WHERE b1~bukrs <> ''.
And was valid at least since 7.40 SP08, since July 2013, so at the time you asked this question it was valid as well.

MySQL select rows where date not between date

I have a booking system in which I need to select any available room from the database. The basic setup is:
table: room
columns: id, maxGuests
table: roombooking
columns: id, startDate, endDate
table: roombooking_room:
columns: id, room_id, roombooking_id
I need to select rooms that can fit the requested guests in, or select two (or more) rooms to fit the guests in (as defined by maxGuests, obviously using the lowest/closet maxGuests first)
I could loop through my date range and use this sql:
SELECT `id`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`, `roombooking`
WHERE `roombooking`.`confirmed` =1
AND DATE(%s) BETWEEN `roombooking`.`startDate` AND `roombooking`.`endDate`
)
AND `room`.`maxGuests`>=%d
Where %$1 is the looped date and %2d is the number of guests to be booked in. But this will just return false if there are more guests than any room can take, and there must be a quicker way of doing this rather than looping with php and running the query?
This is similar to part of the sql I was thinking of: Getting Dates between a range of dates but with Mysql
Solution, based on ircmaxwell's answer:
$query = sprintf(
"SELECT `id`, `maxGuests`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`
JOIN `roombooking` ON `roombooking_room`.`roombooking_id` = `roombooking`.`id`
WHERE `roombooking`.`confirmed` =1
AND (`roomBooking`.`startDate` > DATE(%s) OR `roomBooking`.`endDate` < DATE(%s))
)
AND `maxGuests` <= %d ORDER BY `maxGuests` DESC",
$endDate->toString('yyyy-MM-dd'), $startDate->toString('yyyy-MM-dd'), $noGuests);
$result = $db->query($query);
$result = $result->fetchAll();
$rooms = array();
$guests = 0;
foreach($result as $res) {
if($guests >= $noGuests) break;
$guests += (int)$res['maxGuests'];
$rooms[] = $res['id'];
}
Assuming that you are interested to place #Guests from #StartDate to #EndDate
SELECT DISTINCT r.id,
FROM room r
LEFT JOIN roombooking_room rbr ON r.id = rbr.room_id
LEFT JOIN roombooking ON rbr.roombooking_id = rb.id
WHERE COALESCE(#StartDate NOT BETWEEN rb.startDate AND rb.endDate, TRUE)
AND COALESCE(#EndDate NOT BETWEEN rb.startDate AND rb.endDate, TRUE)
AND #Guests < r.maxGuests
should give you a list of all rooms that are free and can accommodate given number of guests for the given period.
NOTES
This query works only for single rooms, if you want to look at multiple rooms you will need to apply the same criteria to a combination of rooms. For this you would need recursive queries or some helper tables.
Also, COALESCE is there to take care of NULLs - if a room is not booked at all it would not have any records with dates to compare to, so it would not return completely free rooms. Date between date1 and date2 will return NULL if either date1 or date2 is null and coalesce will turn it to true (alternative is to do a UNION of completely free rooms; which might be faster).
With multiple rooms things get really interesting.
Is that scenario big part of your problem? And which database are you using i.e. do you have access to recursive queries?
EDIT
As I stated multiple times before, your way of looking for a solution (greedy algorithm that looks at the largest free rooms first) is not the optimal if you want to get the best fit between required number of guests and rooms.
So, if you replace your foreach with
$bestCapacity = 0;
$bestSolution = array();
for ($i = 1; $i <= pow(2,sizeof($result))-1; $i++) {
$solutionIdx = $i;
$solutionGuests = 0;
$solution = array();
$j = 0;
while ($solutionIdx > 0) :
if ($solutionIdx % 2 == 1) {
$solution[] = $result[$j]['id'];
$solutionGuests += $result[$j]['maxGuests'];
}
$solutionIdx = intval($solutionIdx/2);
$j++;
endwhile;
if (($solutionGuests <= $bestCapacity || $bestCapacity == 0) && $solutionGuests >= $noGuests) {
$bestCapacity = $solutionGuests;
$bestSolution = $solution;
}
}
print_r($bestSolution);
print_r($bestCapacity);
Will go through all possible combinations and find the solution that wastes the least number of spaces.
Ok, first off, the inner query you're using is a cartesian join, and will be VERY expensive. You need to specify join criteria (roombooking_room.booking_id = roombooking.id for example).
Secondly, assuming that you have a range of dates, what can we say about that? Well, let's call the start of your range rangeStartDate and rangeEndDate.
Now, what can we say about any other range of dates that does not have any form of overlap with this range? Well, the endDate must not be between be either the rangeStartDate and the rangeEndDate. Same with the startDate. And the rangeStartDate (and rangeEndDate, but we don't need to check it) cannot be between startDate and endDate...
So, assuming %1$s is rangeStartDate and %2$s is rangeEndDate, a comprehensive where clause might be:
WHERE `roomBooking`.`startDate` NOT BETWEEN %1$s AND %2s
AND `roomBooking`.`endDate` NOT BETWEEN %1$s AND %2$$s
AND %1s NOT BETWEEN `roomBooking`.`startDate` AND `roomBooking`.`endDate`
But, there's a simpler way of saying that. The only way for a range to be outside of another is for the start_date to be after the end_date, or the end_date be before the start_id
So, assuming %1$s is rangeStartDate and %2$s is rangeEndDate, another comprehensive where clause might be:
WHERE `roomBooking`.`startDate` > %2$s
OR `roomBooking`.`endDate` < %1$s
So, that brings your overall query to:
SELECT `id`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`
JOIN `roombooking` ON `roombooking_room`.`roombooking_id` = `roombooking`.`id`
WHERE `roombooking`.`confirmed` =1
AND (`roomBooking`.`startDate` > %2$s
OR `roomBooking`.`endDate` < %1$s)
)
AND `room`.`maxGuests`>=%d
There are other ways of doing this as well, so keep looking...
SELECT rooms.id
FROM rooms LEFT JOIN bookings
ON booking.room_id = rooms.id
WHERE <booking overlaps date range of interest> AND <wherever else>
GROUP BY rooms.id
HAVING booking.id IS NULL
I might be miss remembering how left join works so you might need to use a slightly different condition on the having, maybe a count or a sum.
At worst, with suitable indexes, that should scan half the bookings.