code works to a point - postgresql

The following code looks good to me but works to a point. The function should display the grade levels of students based on exam performance but it does not run the last two else statements and so, if a student scores lower than 50 the function still displays "pass".
CREATE OR REPLACE FUNCTION stud_Result(integer,numeric) RETURNS text
AS
$$
DECLARE
stuNum ALIAS FOR $1;
grade ALIAS FOR $2;
result TEXT;
BEGIN
IF grade >= 70.0 THEN SELECT 'distinction' INTO result FROM student,entry
WHERE student.sno = entry.sno AND student.sno = stuNum;
ELSIF grade >=50.0 OR grade <=70.0 THEN SELECT 'pass' INTO result FROM student,entry
WHERE student.sno = entry.sno AND student.sno = stuNum;
ELSIF grade >0 OR grade< 50.0 THEN SELECT 'fail' INTO result FROM student,entry
WHERE student.sno = entry.sno AND student.sno = stuNum;
ELSE SELECT 'NOT TAKEN' INTO result FROM student,entry
WHERE student.sno = entry.sno AND student.sno = stuNum;
END IF;
RETURN result;
END;$$
LANGUAGE PLPGSQL;
Can anyone point me to the problem?

This is a PostgreSQL gotcha that has tripped me up as well. You need to replace your ELSE IFs with ELSIF.
You're seeing that error because each successive ELSE IF is being interpreted as starting a nested IF block, which expects its own END IF;.
See the documentation on conditionals for more information on the proper syntax.

Your logic in the conditionals is a bit strange. You have these:
grade >= 70.0
grade >= 50.0 OR grade <= 70.0
grade > 0 OR grade < 50.0
Note that zero satisfies the second condition as do a lot of other values that you don't want in that branch of the conditonal. I think you want these:
grade >= 70.0
grade >= 50.0 AND grade <= 70.0
grade > 0 AND grade < 50.0
You also seem to be using your SELECTs to check if the person is in the course but if the grade is given and they're not in the course, you will end up with a NULL result. Either the "in the course" check should be outside your function or you should convert a NULL result to 'NOT TAKEN' before returning.
This looks like homework so I'm not going to be anymore explicit than this.

Generally, I don't think it is a good idea to hide data in code. Data belongs in tables:
SET search_path='tmp';
-- create some data
DROP TABLE tmp.student CASCADE;
CREATE TABLE tmp.student
( sno INTEGER NOT NULL
, grade INTEGER
, sname varchar
);
INSERT INTO tmp.student(sno) SELECT generate_series(1,10);
UPDATE tmp.student SET grade = sno*sno;
DROP TABLE tmp.entry CASCADE;
CREATE TABLE tmp.entry
( sno INTEGER NOT NULL
, sdate TIMESTAMP
);
INSERT INTO tmp.entry(sno) SELECT generate_series(1,10);
-- table with interval lookup
DROP TABLE tmp.lookup CASCADE;
CREATE TABLE tmp.lookup
( llimit NUMERIC NOT NULL
, hlimit NUMERIC
, result varchar
);
INSERT INTO lookup (llimit,hlimit,result) VALUES(70, NULL, 'Excellent'), (50, 70, 'Passed'), (30, 50, 'Failed')
;
CREATE OR REPLACE FUNCTION stud_result(integer,numeric) RETURNS text
AS $BODY$
DECLARE
stunum ALIAS FOR $1;
grade ALIAS FOR $2;
result TEXT;
BEGIN
SELECT COALESCE(lut.result, 'NOT TAKEN') INTO result
FROM student st, entry en
LEFT JOIN lookup lut ON (grade >= lut.llimit
AND (grade < lut.hlimit OR lut.hlimit IS NULL) )
WHERE st.sno = en.sno
AND st.sno = stunum
;
RETURN result;
END; $BODY$ LANGUAGE PLPGSQL;
-- query joining students with their function values
SELECT st.*
, stud_result (st.sno, st.grade)
FROM student st
;
But wait: you can just as well do without the ugly function:
-- Plain query
SELECT
st.sno, st.sname, st.grade
, COALESCE(lut.result, 'NOT TAKEN') AS result
FROM student st
LEFT JOIN lookup lut ON ( 1=1
AND lut.llimit <= st.grade
AND ( lut.hlimit > st.grade OR lut.hlimit IS NULL)
)
JOIN entry en ON st.sno = en.sno
;
Results:
sno | grade | sname | stud_result
-----+-------+-------+-------------
1 | 1 | | NOT TAKEN
2 | 4 | | NOT TAKEN
3 | 9 | | NOT TAKEN
4 | 16 | | NOT TAKEN
5 | 25 | | NOT TAKEN
6 | 36 | | Failed
7 | 49 | | Failed
8 | 64 | | Passed
9 | 81 | | Excellent
10 | 100 | | Excellent
(10 rows)
sno | sname | grade | result
-----+-------+-------+-----------
1 | | 1 | NOT TAKEN
2 | | 4 | NOT TAKEN
3 | | 9 | NOT TAKEN
4 | | 16 | NOT TAKEN
5 | | 25 | NOT TAKEN
6 | | 36 | Failed
7 | | 49 | Failed
8 | | 64 | Passed
9 | | 81 | Excellent
10 | | 100 | Excellent
(10 rows)

Get rid of the function altogether and use a query:
SELECT
s.*,
CASE e.grade
WHEN >= 0 AND < 50 THEN 'failed'
WHEN >= 50 AND < 70 THEN 'passed'
WHEN >= 70 AND <= 100 THEN 'excellent'
ELSE 'not taken'
END
FROM
student s,
entry e
WHERE
s.sno = e.sno;

Related

how to drop rows if a variale is less than x, in sql

I have the following query code
query = """
with double_entry_book as (
SELECT to_address as address, value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE to_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
-- credits
SELECT from_address as address, -value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE from_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
)
SELECT address,
sum(value) / 1000000000000000000 as balance
from double_entry_book
group by address
order by balance desc
LIMIT 15000000
"""
In the last part, I want to drop rows where "balance" is less than, let's say, 0.02 and then group, order, etc. I imagine this should be a simple code. Any help will be appreciated!
We can delete on a CTE and use returning to get the id's of the rows being deleted, but they still exist until the transaction is comitted.
CREATE TABLE t (
id serial,
variale int);
insert into t (variale) values
(1),(2),(3),(4),(5);
✓
5 rows affected
with del as
(delete from t
where variale < 3
returning id)
select
t.id,
t.variale,
del.id ids_being_deleted
from t
left join del
on t.id = del.id;
id | variale | ids_being_deleted
-: | ------: | ----------------:
1 | 1 | 1
2 | 2 | 2
3 | 3 | null
4 | 4 | null
5 | 5 | null
select * from t;
id | variale
-: | ------:
3 | 3
4 | 4
5 | 5
db<>fiddle here

Pivot sum in PostgreSQL

I'm using PosgreSQL 4.2. And I have the following code:
SELECT ID, Num1, Num2 FROM Tab1
WHERE ID IN(1,2,3,4.....50);
Returns results as:
ID | Num1 | Num2
----+--------------------
1 | 100 | 0
2 | 50 | 1
3 | 30 | 2
4 | 110 | 3
5 | 33 | 4
6 | 46 | 5
7 | 36 | 6
8 | 19 | 7
9 | 20 | 8
10 | 31 | 9
11 | 68 | 10
12 | 123 | 11
13 | 588 | 0
14 | 231 | 1
15 | 136 | 2
I want to Pivot sum to return result with pairs of number in IN clause, and result return like this:
| ID | Meaning
-------------------------
Num1 | 150 | 1+2(num1)
Num2 | 1 | 1+2(num2)
Num1 | 140 | 3+4(num1)
Num2 | 5 | 3+4(num2)
Num1 | 79 | 5+6(num1)
Num2 | 9 | 5+6(num2)
.........................
How can I do that?
PostgreSQL 4.2. No you must have another application ans confusing your application versions:
In 1996, the project was renamed to PostgreSQL to reflect its support
for SQL. The online presence at the website PostgreSQL.org began on
October 22, 1996.[26] The first PostgreSQL release formed version 6.0
on January 29, 1997.
Wikipedia. You might want to run the query: select version();
However, any supported version is sufficient for this request. Actually the SQL necessary to supply the necessary summations is simple once you understand LEAD (LAG) functions. The LEAD function permits access to the following row. Using that then the query:
select id, nid, n1s, n2s
from ( select id
, coalesce(lead(id) nid
, num1 + coalesce(lead(num1) over (order by id),0) n1s
, num2 + coalesce(lead(num2) over (order by id),0) n2s
, row_number() over() rn
from tab1
) i
where rn%2 = 1;
Provides all the necessary data for the presentation layer to format the the result as desired. However, that's not the result requested, that requires some SQL gymnastics.
We begin the gymnastics show by wrapping the above into an CTE adding a little setup for the show to follow. The main event then breaks the results into 2 sets in order to add the syntactical sugar tags before bringing then back together. So for the big show:
with joiner(id,nid,n1s,n2s,rn) as
( select *
from ( select id
, coalesce(lead(id) over (order by id),0)
, num1 + coalesce(lead(num1) over (order by id),0)
, num2 + coalesce(lead(num2) over (order by id),0)
, row_number() over() rn
from tab1
) i
where rn%2 = 1
)
select "Name","Sum","Meaning"
from (select 'Num1' as "Name"
, n1s as "Sum"
, concat(id::text, case when nid = 0
then null
else '+' || nid::text
end
) || ' (num1)'as "Meaning"
, rn
from joiner
union all
select 'Num2'
, n2s
, concat(id::text, case when nid = 0
then null
else '+' || nid::text
end
) || ' (num2)'
, rn
from joiner
) j
order by rn, "Name"
See fiddle. Note: I used "Sum" instead of ID for the column title, as it indicates the id not at all. That is just in the "Meaning" column.

Coalesce overlapping time ranges in PostgreSQL

I have a PostgreSQL (9.4) table that contains time stamp ranges and user IDs, and I need to collapse any overlapping ranges (with the same user ID) into a single record.
I've tried a complicated set of CTEs to accomplish this, but there are some edge cases in our (40,000+ rows) real table that complicate matters. I've come to the conclusion that I probably need a recursive CTE, but I haven't had any luck writing it.
Here's some code to create a test table and populate it with data. This isn't the exact layout of our table, but it's close enough for an example.
CREATE TABLE public.test
(
id serial,
sessionrange tstzrange,
fk_user_id integer
);
insert into test (sessionrange, fk_user_id)
values
('[2016-01-14 11:57:01-05,2016-01-14 12:06:59-05]', 1)
,('[2016-01-14 12:06:53-05,2016-01-14 12:17:28-05]', 1)
,('[2016-01-14 12:17:24-05,2016-01-14 12:21:56-05]', 1)
,('[2016-01-14 18:18:00-05,2016-01-14 18:42:09-05]', 2)
,('[2016-01-14 18:18:08-05,2016-01-14 18:18:15-05]', 1)
,('[2016-01-14 18:38:12-05,2016-01-14 18:48:20-05]', 1)
,('[2016-01-14 18:18:16-05,2016-01-14 18:18:26-05]', 1)
,('[2016-01-14 18:18:24-05,2016-01-14 18:18:31-05]', 1)
,('[2016-01-14 18:18:12-05,2016-01-14 18:18:20-05]', 3)
,('[2016-01-14 19:32:12-05,2016-01-14 23:18:20-05]', 3)
,('[2016-01-14 18:18:16-05,2016-01-14 18:18:26-05]', 4)
,('[2016-01-14 18:18:24-05,2016-01-14 18:18:31-05]', 2);
I have found that I can do this to get the sessions sorted by the time they started:
select * from test order by fk_user_id, sessionrange
I could use this to determine whether an individual record overlaps with the previous, using window functions:
SELECT *, sessionrange && lag(sessionrange) OVER (PARTITION BY fk_user_id ORDER BY sessionrange)
FROM test
ORDER BY fk_user_id, sessionrange
But this only detects whether the single previous record overlaps the current one (see the record where id = 6). I need to detect all the way back to the beginning of the partition.
After that, I'd need to group any records that overlap together, to find the beginning of the earliest session and the end of the last session to terminate.
I'm sure there's a way to do this that I'm overlooking. How can I collapse these overlapping records?
It is relatively easy to merge overlapping ranges as elements of an array. For simplicity the following function returns set of tstzrange:
create or replace function merge_ranges(tstzrange[])
returns setof tstzrange language plpgsql as $$
declare
t tstzrange;
r tstzrange;
begin
foreach t in array $1 loop
if r && t then r:= r + t;
else
if r notnull then return next r;
end if;
r:= t;
end if;
end loop;
if r notnull then return next r;
end if;
end $$;
Just aggregate the ranges for a user and use the function:
select fk_user_id, merge_ranges(array_agg(sessionrange))
from test
group by 1
order by 1, 2
fk_user_id | merge_ranges
------------+-----------------------------------------------------
1 | ["2016-01-14 17:57:01+01","2016-01-14 18:21:56+01"]
1 | ["2016-01-15 00:18:08+01","2016-01-15 00:18:15+01"]
1 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:31+01"]
1 | ["2016-01-15 00:38:12+01","2016-01-15 00:48:20+01"]
2 | ["2016-01-15 00:18:00+01","2016-01-15 00:42:09+01"]
3 | ["2016-01-15 00:18:12+01","2016-01-15 00:18:20+01"]
3 | ["2016-01-15 01:32:12+01","2016-01-15 05:18:20+01"]
4 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:26+01"]
(8 rows)
Alternatively, the algorithm can be applied to the entire table in one function loop. I'm not sure but for a large dataset this method should be faster.
create or replace function merge_ranges_in_test()
returns setof test language plpgsql as $$
declare
curr test;
prev test;
begin
for curr in
select *
from test
order by fk_user_id, sessionrange
loop
if prev notnull and prev.fk_user_id <> curr.fk_user_id then
return next prev;
prev:= null;
end if;
if prev.sessionrange && curr.sessionrange then
prev.sessionrange:= prev.sessionrange + curr.sessionrange;
else
if prev notnull then
return next prev;
end if;
prev:= curr;
end if;
end loop;
return next prev;
end $$;
Results:
select *
from merge_ranges_in_test();
id | sessionrange | fk_user_id
----+-----------------------------------------------------+------------
1 | ["2016-01-14 17:57:01+01","2016-01-14 18:21:56+01"] | 1
5 | ["2016-01-15 00:18:08+01","2016-01-15 00:18:15+01"] | 1
7 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:31+01"] | 1
6 | ["2016-01-15 00:38:12+01","2016-01-15 00:48:20+01"] | 1
4 | ["2016-01-15 00:18:00+01","2016-01-15 00:42:09+01"] | 2
9 | ["2016-01-15 00:18:12+01","2016-01-15 00:18:20+01"] | 3
10 | ["2016-01-15 01:32:12+01","2016-01-15 05:18:20+01"] | 3
11 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:26+01"] | 4
(8 rows)
The problem is very interesting. I've tried to find a recursive solution but it seems the procedural attempt is most natural and efficient.
I have finally found a recursive solution. The query deletes overlapping rows and inserts their compacted equivalent:
with recursive cte (user_id, ids, range) as (
select t1.fk_user_id, array[t1.id, t2.id], t1.sessionrange + t2.sessionrange
from test t1
join test t2
on t1.fk_user_id = t2.fk_user_id
and t1.id < t2.id
and t1.sessionrange && t2.sessionrange
union all
select user_id, ids || t.id, range + sessionrange
from cte
join test t
on user_id = t.fk_user_id
and ids[cardinality(ids)] < t.id
and range && t.sessionrange
),
list as (
select distinct on(id) id, range, user_id
from cte, unnest(ids) id
order by id, upper(range)- lower(range) desc
),
deleted as (
delete from test
where id in (select id from list)
)
insert into test
select distinct on (range) id, range, user_id
from list
order by range, id;
Results:
select *
from test
order by 3, 2;
id | sessionrange | fk_user_id
----+-----------------------------------------------------+------------
1 | ["2016-01-14 17:57:01+01","2016-01-14 18:21:56+01"] | 1
5 | ["2016-01-15 00:18:08+01","2016-01-15 00:18:15+01"] | 1
7 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:31+01"] | 1
6 | ["2016-01-15 00:38:12+01","2016-01-15 00:48:20+01"] | 1
4 | ["2016-01-15 00:18:00+01","2016-01-15 00:42:09+01"] | 2
9 | ["2016-01-15 00:18:12+01","2016-01-15 00:18:20+01"] | 3
10 | ["2016-01-15 01:32:12+01","2016-01-15 05:18:20+01"] | 3
11 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:26+01"] | 4
(8 rows)

Can window function LAG reference the column which value is being calculated?

I need to calculate value of some column X based on some other columns of the current record and the value of X for the previous record (using some partition and order). Basically I need to implement query in the form
SELECT <some fields>,
<some expression using LAG(X) OVER(PARTITION BY ... ORDER BY ...) AS X
FROM <table>
This is not possible because only existing columns can be used in window function so I'm looking way how to overcome this.
Here is an example. I have a table with events. Each event has type and time_stamp.
create table event (id serial, type integer, time_stamp integer);
I wan't to find "duplicate" events (to skip them). By duplicate I mean the following. Let's order all events for given type by time_stamp ascending. Then
the first event is not a duplicate
all events that follow non duplicate and are within some time frame after it (that is their time_stamp is not greater then time_stamp of the previous non duplicate plus some constant TIMEFRAME) are duplicates
the next event which time_stamp is greater than previous non duplicate by more than TIMEFRAME is not duplicate
and so on
For this data
insert into event (type, time_stamp)
values
(1, 1), (1, 2), (2, 2), (1,3), (1, 10), (2,10),
(1,15), (1, 21), (2,13),
(1, 40);
and TIMEFRAME=10 result should be
time_stamp | type | duplicate
-----------------------------
1 | 1 | false
2 | 1 | true
3 | 1 | true
10 | 1 | true
15 | 1 | false
21 | 1 | true
40 | 1 | false
2 | 2 | false
10 | 2 | true
13 | 2 | false
I could calculate the value of duplicate field based on current time_stamp and time_stamp of the previous non-duplicate event like this:
WITH evt AS (
SELECT
time_stamp,
CASE WHEN
time_stamp - LAG(current_non_dupl_time_stamp) OVER w >= TIMEFRAME
THEN
time_stamp
ELSE
LAG(current_non_dupl_time_stamp) OVER w
END AS current_non_dupl_time_stamp
FROM event
WINDOW w AS (PARTITION BY type ORDER BY time_stamp ASC)
)
SELECT time_stamp, time_stamp != current_non_dupl_time_stamp AS duplicate
But this does not work because the field which is calculated cannot be referenced in LAG:
ERROR: column "current_non_dupl_time_stamp" does not exist.
So the question: can I rewrite this query to achieve the effect I need?
Naive recursive chain knitter:
-- temp view to avoid nested CTE
CREATE TEMP VIEW drag AS
SELECT e.type,e.time_stamp
, ROW_NUMBER() OVER www as rn -- number the records
, FIRST_VALUE(e.time_stamp) OVER www as fst -- the "group leader"
, EXISTS (SELECT * FROM event x
WHERE x.type = e.type
AND x.time_stamp < e.time_stamp) AS is_dup
FROM event e
WINDOW www AS (PARTITION BY type ORDER BY time_stamp)
;
WITH RECURSIVE ttt AS (
SELECT d0.*
FROM drag d0 WHERE d0.is_dup = False -- only the "group leaders"
UNION ALL
SELECT d1.type, d1.time_stamp, d1.rn
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN d1.time_stamp
ELSE ttt.fst END AS fst -- new "group leader"
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN False
ELSE True END AS is_dup
FROM drag d1
JOIN ttt ON d1.type = ttt.type AND d1.rn = ttt.rn+1
)
SELECT * FROM ttt
ORDER BY type, time_stamp
;
Results:
CREATE TABLE
INSERT 0 10
CREATE VIEW
type | time_stamp | rn | fst | is_dup
------+------------+----+-----+--------
1 | 1 | 1 | 1 | f
1 | 2 | 2 | 1 | t
1 | 3 | 3 | 1 | t
1 | 10 | 4 | 1 | t
1 | 15 | 5 | 1 | t
1 | 21 | 6 | 1 | t
1 | 40 | 7 | 40 | f
2 | 2 | 1 | 2 | f
2 | 10 | 2 | 2 | t
2 | 13 | 3 | 2 | t
(10 rows)
An alternative to a recursive approach is a custom aggregate. Once you master the technique of writing your own aggregates, creating transition and final functions is easy and logical.
State transition function:
create or replace function is_duplicate(st int[], time_stamp int, timeframe int)
returns int[] language plpgsql as $$
begin
if st is null or st[1] + timeframe <= time_stamp
then
st[1] := time_stamp;
end if;
st[2] := time_stamp;
return st;
end $$;
Final function:
create or replace function is_duplicate_final(st int[])
returns boolean language sql as $$
select st[1] <> st[2];
$$;
Aggregate:
create aggregate is_duplicate_agg(time_stamp int, timeframe int)
(
sfunc = is_duplicate,
stype = int[],
finalfunc = is_duplicate_final
);
Query:
select *, is_duplicate_agg(time_stamp, 10) over w
from event
window w as (partition by type order by time_stamp asc)
order by type, time_stamp;
id | type | time_stamp | is_duplicate_agg
----+------+------------+------------------
1 | 1 | 1 | f
2 | 1 | 2 | t
4 | 1 | 3 | t
5 | 1 | 10 | t
7 | 1 | 15 | f
8 | 1 | 21 | t
10 | 1 | 40 | f
3 | 2 | 2 | f
6 | 2 | 10 | t
9 | 2 | 13 | f
(10 rows)
Read in the documentation: 37.10. User-defined Aggregates and CREATE AGGREGATE.
This feels more like a recursive problem than windowing function. The following query obtained the desired results:
WITH RECURSIVE base(type, time_stamp) AS (
-- 3. base of recursive query
SELECT x.type, x.time_stamp, y.next_time_stamp
FROM
-- 1. start with the initial records of each type
( SELECT type, min(time_stamp) AS time_stamp
FROM event
GROUP BY type
) x
LEFT JOIN LATERAL
-- 2. for each of the initial records, find the next TIMEFRAME (10) in the future
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = x.type
AND time_stamp > (x.time_stamp + 10)
) y ON true
UNION ALL
-- 4. recursive join, same logic as base
SELECT e.type, e.time_stamp, z.next_time_stamp
FROM event e
JOIN base b ON (e.type = b.type AND e.time_stamp = b.next_time_stamp)
LEFT JOIN LATERAL
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = e.type
AND time_stamp > (e.time_stamp + 10)
) z ON true
)
-- The actual query:
-- 5a. All records from base are not duplicates
SELECT time_stamp, type, false
FROM base
UNION
-- 5b. All records from event that are not in base are duplicates
SELECT time_stamp, type, true
FROM event
WHERE (type, time_stamp) NOT IN (SELECT type, time_stamp FROM base)
ORDER BY type, time_stamp
There are a lot of caveats with this. It assumes no duplicate time_stamp for a given type. Really the joins should be based on a unique id rather than type and time_stamp. I didn't test this much, but it may at least suggest an approach.
This is my first time to try a LATERAL join. So there may be a way to simplify that moe. Really what I wanted to do was a recursive CTE with the recursive part using MIN(time_stamp) based on time_stamp > (x.time_stamp + 10), but aggregate functions are not allowed in CTEs in that manner. But it seems the lateral join can be used in the CTE.

Select multiple row values into single row with multi-table clauses

I've searched the forums and while I see similar posts, they only address pieces of the full query I need to formulate (array_aggr, where exists, joins, etc.). If the question I'm posting has been answered, I will gladly accept references to those threads.
I did find this thread ...which is very similar to what I need, except it is for MySQL, and I kept running into errors trying to get it into psql syntax. Hoping someone can help me get everything together. Here's the scenario:
Attribute
attrib_id | attrib_name
UserAttribute
user_id | attrib_id | value
Here's a small example of what the data looks like:
Attribute
attrib_id | attrib_name
-----------------------
1 | attrib1
2 | attrib2
3 | attrib3
4 | attrib4
5 | attrib5
UserAttribute -- there can be up to 15 attrib_id's/value's per user_id
user_id | attrib_id | value
----------------------------
101 | 1 | valueA
101 | 2 | valueB
102 | 1 | valueC
102 | 2 | valueD
103 | 1 | valueA
103 | 2 | valueB
104 | 1 | valueC
104 | 2 | valueD
105 | 1 | valueA
105 | 2 | valueB
Here's what I'm looking for
Result
user_id | attrib1_value | attrib2_value
--------------------------------------------------------
101 | valueA | valueB
102 | valueC | valueD
103 | valueA | valueB
104 | valueC | valueD
105 | valueA | valueB
As shown, I'm looking for single rows that contain:
- user_id from the UserAttribute table
- attribute values from the UserAttribute table
Note: I only need attribute values from the UserAttribute table for two specific attribute names in the Attribute table
Again, any help or reference to an existing solution would be greatly appreciated.
UPDATE:
#ronin provided a query that gets the results desired:
SELECT ua.user_id
,MAX(CASE WHEN a.attrib_name = 'attrib1' THEN ua.value ELSE NULL END) AS attrib_1_val
,MAX(CASE WHEN a.attrib_name = 'attrib2' THEN ua.value ELSE NULL END) AS attrib_2_val
FROM UserAttribute ua
JOIN Attribute a ON (a.attrib_id = ua.attrib_id)
WHERE a.attrib_name IN ('attrib1', 'attrib2')
GROUP BY ua.user_id;
To build on that, I tried to add some 'LIKE' pattern matching within the 'WHEN' condition (against the ua.value), but everything ends up as the 'FALSE' value. Will start a new question to see if that can be incorporated if I cannot figure it out. Thanks all for the help!!
If each attribute only has a single value for a user, you can start by making a sparse matrix:
SELECT user_id
,CASE WHEN attrib_id = 1 THEN value ELSE NULL END AS attrib_1_val
,CASE WHEN attrib_id = 2 THEN value ELSE NULL END AS attrib_2_val
FROM UserAttribute;
Then compress the matrix using an aggregate function:
SELECT user_id
,MAX(CASE WHEN attrib_id = 1 THEN value ELSE NULL END) AS attrib_1_val
,MAX(CASE WHEN attrib_id = 2 THEN value ELSE NULL END) AS attrib_2_val
FROM UserAttribute
GROUP BY user_id;
In response to the comment, searching by attribute name rather than id:
SELECT ua.user_id
,MAX(CASE WHEN a.attrib_name = 'attrib1' THEN ua.value ELSE NULL END) AS attrib_1_val
,MAX(CASE WHEN a.attrib_name = 'attrib2' THEN ua.value ELSE NULL END) AS attrib_2_val
FROM UserAttribute ua
JOIN Attribute a ON (a.attrib_id = ua.attrib_id)
WHERE a.attrib_name IN ('attrib1', 'attrib2')
GROUP BY ua.user_id;
Starting with Postgres 9.4 you can use the simpler aggregate FILTER clause:
SELECT user_id
,MAX(value) FILTER (WHERE attrib_id = 1) AS attrib_1_val
,MAX(value) FILTER (WHERE attrib_id = 2) AS attrib_2_val
FROM UserAttribute
WHERE attrib_id IN (1,2)
GROUP BY 1;
For more than a few attributes or for top performance, look to crosstab() from the additional module tablefunc (Postgres 8.3+). Details here:
PostgreSQL Crosstab Query
What about something like this:
select ua.user_id, a.attrib_name attrib_value1, a2.attrib_name attrib_value2
from user_attribute ua
left join attribute a on a.atribute_id=ua.attribute_id and a.attribute_id in (1,2)
left join user_attribute ua2 on ua2.user_id=ua.user_id and ua2.attribute_id > ua.attribute_id
left join attribute a2 on a2.attribute_id=ua2.attribute_id and a2.attribute_id in (1,2)