how to optimize a query which needs timestamp normalization - postgresql

I have the following data source, which has several physical values (one per column) coming from several devices at different times:
+-----------+------------+---------+-------+
| id_device | timestamp | Vln1 | kWl1 |
+-----------+------------+---------+-------+
| 123 | 1495696500 | | |
| 122 | 1495696800 | | |
| 122 | 1495697100 | 230 | 5.748 |
| 122 | 1495697100 | 230 | 5.185 |
| 124 | 1495700100 | 226.119 | 0.294 |
| 122 | 1495713900 | 230 | |
| 122 | 1495716000 | | |
| 122 | 1495716300 | 230 | |
| 122 | 1495716300 | | |
| 122 | 1495716300 | | |
| 122 | 1495716600 | 230 | 4.606 |
| 122 | 1495716600 | | |
| 124 | 1495739100 | | |
| 123 | 1495739400 | | |
+-----------+------------+---------+-------+
timestamp is (unfortunately) bigint and each device sends data at different times and with different frequency: some of the devices push every 5 mins, others every 10mins, other every 15 mins. The physical values could be NULL.
A front-end application needs to plot charts - let us say line charts - of a specific time stamp, with time ticks every minutes. Time ticks are chosen by the user.
The charts can be made of multiple physical values of multiple devices, and each line is a independent request made to the backend.
Let us think about a case where:
the chosen time tick is 10 mins
two lines to plot are chosen, having two different physical values (columns) on two different devices:
A device pushes every 5 mins
The other every 10 mins
What the front-end app expects are normalized results:
<timestamp>, <value>
Where
timestamp represents rounded time (00:00, 00:10, 00:20, and so forth)
in case there are more than one value in each "timebox" (ex: there will be 2 values for a device pushing every 5 minutes within 00:00 and 00:10), a single value will be returned, which is an aggregated value (AVG)
In order to accomplish this I created some plpgsql functions that help me, but I'm not sure that what I'm doing is the best in terms of performance.
Basically what I do is:
Get the data for the particular device and phisical measure, within the timespan selected
Normalize the data returned: each timestamp is rounded to the time ticks selected (i.e. 10:12:23 -> 10:10:00). That way, each tuple will represent a value within a "time bucket"
Create a range of time buckets, according the chosen time ticks the user selected
JOIN the timestamp-normalized data with the range. Aggregate in case of multiple values within the same range
Here are my functions:
create or replace function app_iso50k1.blkGetTimeSelParams(
t_end bigint,
t_granularity integer,
t_span bigint,
OUT delta_time_bucket interval,
OUT b_timebox timestamp,
OUT e_timebox timestamp)
as
$$
DECLARE
delta_time interval;
BEGIN
/* normalization: no minutes */
t_end = extract('epoch' from date_trunc('minute', (to_timestamp(t_end) at time zone 'UTC')::timestamp));
delta_time = app_iso50k1.blkGetDeltaTimeBucket(t_end, t_granularity);
e_timebox = date_trunc('minute', (to_timestamp(t_end - extract('epoch' from delta_time)) at time zone 'UTC'))::timestamp;
b_timebox = (to_timestamp(extract('epoch' from e_timebox) - t_span) at time zone 'UTC')::timestamp;
delta_time_bucket = delta_time;
END
$$ immutable language 'plpgsql' security invoker;
create or replace function app_iso50k1.getPhyMetData(
tablename character varying,
t_span bigint,
t_end bigint,
t_granularity integer,
idinstrum integer,
id_device integer,
varname character varying,
op character varying,
page_size int,
page int)
RETURNS TABLE(times bigint , val double precision) as
$$
DECLARE
series REFCURSOR;
serie RECORD;
first_notnull bool = false;
prev_val double precision;
time_params record;
q_offset int;
BEGIN
time_params = app_iso50k1.blkGetTimeSelParams(t_end, t_granularity, t_span);
if(page = 1) then
q_offset = 0;
else
q_offset = page_size * (page -1);
end if;
if not public.blkIftableexists('resgetphymetdata')
THEN
create temporary table resgetphymetdata (times bigint, val double precision);
ELSE
truncate table resgetphymetdata;
END IF;
execute format($ff$
insert into resgetphymetdata (
/* generate every possible range between these dates */
with ranges as (
select generate_series($1, $2, interval '$5 minutes') as range_start
),
/* normalize your data to which <t_granularity>-minute interval it belongs to */
rounded_hst as (
select
date_trunc ('minutes', (to_timestamp("timestamp") at time zone 'UTC')::timestamp)::timestamp -
mod (extract ('minutes' from ((to_timestamp("timestamp") at time zone 'UTC')::timestamp))::int, $5) * interval '1 minute' as round_time,
*
from public.%I
where
idinstrum = $3 and
id_device = $4 and
timestamp <= $8
)
select
extract('epoch' from r.range_start)::bigint AS times,
%s (hd.%I) AS val
from
ranges r
left join rounded_hst hd on r.range_start = hd.round_time
group by
r.range_start
order by
r.range_start
LIMIT $6 OFFSET $7
);
$ff$, tablename, op, varname) using time_params.b_timebox, time_params.e_timebox, idinstrum, id_device, t_granularity, page_size, q_offset, t_end;
/* data cleansing: val holes between not-null values are filled with the previous value */
open series no scroll for select * from resgetphymetdata;
loop
fetch series into serie;
exit when not found;
if NOT first_notnull then
if serie.val NOTNULL then
first_notnull = true;
prev_val = serie.val;
end if;
else
if serie.val is NULL then
update resgetphymetdata
set val = prev_val
where current of series;
else
prev_val = serie.val;
end if;
end if;
end loop;
close series;
return query select * from resgetphymetdata;
END;
$$ volatile language 'plpgsql' security invoker;
Do you see good alternatives to what I coded? Is there room for improvements?
Thanks!

You can fully translate your iterative logic with pure SQL query.
You can parametrize your query with a function.
For better performance use sql langage for your function.
You can build partial sum over timeseries interval using Window function like explained here
Window function trailing dates in PostgreSQL
Other suggestions
Manage null values with coalesce
Avoid timestamp conversion with a dedicated computed column
You can use small computed views and join them in your final query or use LATERAL JOIN

Related

Coalesce overlapping time ranges in PostgreSQL

I have a PostgreSQL (9.4) table that contains time stamp ranges and user IDs, and I need to collapse any overlapping ranges (with the same user ID) into a single record.
I've tried a complicated set of CTEs to accomplish this, but there are some edge cases in our (40,000+ rows) real table that complicate matters. I've come to the conclusion that I probably need a recursive CTE, but I haven't had any luck writing it.
Here's some code to create a test table and populate it with data. This isn't the exact layout of our table, but it's close enough for an example.
CREATE TABLE public.test
(
id serial,
sessionrange tstzrange,
fk_user_id integer
);
insert into test (sessionrange, fk_user_id)
values
('[2016-01-14 11:57:01-05,2016-01-14 12:06:59-05]', 1)
,('[2016-01-14 12:06:53-05,2016-01-14 12:17:28-05]', 1)
,('[2016-01-14 12:17:24-05,2016-01-14 12:21:56-05]', 1)
,('[2016-01-14 18:18:00-05,2016-01-14 18:42:09-05]', 2)
,('[2016-01-14 18:18:08-05,2016-01-14 18:18:15-05]', 1)
,('[2016-01-14 18:38:12-05,2016-01-14 18:48:20-05]', 1)
,('[2016-01-14 18:18:16-05,2016-01-14 18:18:26-05]', 1)
,('[2016-01-14 18:18:24-05,2016-01-14 18:18:31-05]', 1)
,('[2016-01-14 18:18:12-05,2016-01-14 18:18:20-05]', 3)
,('[2016-01-14 19:32:12-05,2016-01-14 23:18:20-05]', 3)
,('[2016-01-14 18:18:16-05,2016-01-14 18:18:26-05]', 4)
,('[2016-01-14 18:18:24-05,2016-01-14 18:18:31-05]', 2);
I have found that I can do this to get the sessions sorted by the time they started:
select * from test order by fk_user_id, sessionrange
I could use this to determine whether an individual record overlaps with the previous, using window functions:
SELECT *, sessionrange && lag(sessionrange) OVER (PARTITION BY fk_user_id ORDER BY sessionrange)
FROM test
ORDER BY fk_user_id, sessionrange
But this only detects whether the single previous record overlaps the current one (see the record where id = 6). I need to detect all the way back to the beginning of the partition.
After that, I'd need to group any records that overlap together, to find the beginning of the earliest session and the end of the last session to terminate.
I'm sure there's a way to do this that I'm overlooking. How can I collapse these overlapping records?
It is relatively easy to merge overlapping ranges as elements of an array. For simplicity the following function returns set of tstzrange:
create or replace function merge_ranges(tstzrange[])
returns setof tstzrange language plpgsql as $$
declare
t tstzrange;
r tstzrange;
begin
foreach t in array $1 loop
if r && t then r:= r + t;
else
if r notnull then return next r;
end if;
r:= t;
end if;
end loop;
if r notnull then return next r;
end if;
end $$;
Just aggregate the ranges for a user and use the function:
select fk_user_id, merge_ranges(array_agg(sessionrange))
from test
group by 1
order by 1, 2
fk_user_id | merge_ranges
------------+-----------------------------------------------------
1 | ["2016-01-14 17:57:01+01","2016-01-14 18:21:56+01"]
1 | ["2016-01-15 00:18:08+01","2016-01-15 00:18:15+01"]
1 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:31+01"]
1 | ["2016-01-15 00:38:12+01","2016-01-15 00:48:20+01"]
2 | ["2016-01-15 00:18:00+01","2016-01-15 00:42:09+01"]
3 | ["2016-01-15 00:18:12+01","2016-01-15 00:18:20+01"]
3 | ["2016-01-15 01:32:12+01","2016-01-15 05:18:20+01"]
4 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:26+01"]
(8 rows)
Alternatively, the algorithm can be applied to the entire table in one function loop. I'm not sure but for a large dataset this method should be faster.
create or replace function merge_ranges_in_test()
returns setof test language plpgsql as $$
declare
curr test;
prev test;
begin
for curr in
select *
from test
order by fk_user_id, sessionrange
loop
if prev notnull and prev.fk_user_id <> curr.fk_user_id then
return next prev;
prev:= null;
end if;
if prev.sessionrange && curr.sessionrange then
prev.sessionrange:= prev.sessionrange + curr.sessionrange;
else
if prev notnull then
return next prev;
end if;
prev:= curr;
end if;
end loop;
return next prev;
end $$;
Results:
select *
from merge_ranges_in_test();
id | sessionrange | fk_user_id
----+-----------------------------------------------------+------------
1 | ["2016-01-14 17:57:01+01","2016-01-14 18:21:56+01"] | 1
5 | ["2016-01-15 00:18:08+01","2016-01-15 00:18:15+01"] | 1
7 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:31+01"] | 1
6 | ["2016-01-15 00:38:12+01","2016-01-15 00:48:20+01"] | 1
4 | ["2016-01-15 00:18:00+01","2016-01-15 00:42:09+01"] | 2
9 | ["2016-01-15 00:18:12+01","2016-01-15 00:18:20+01"] | 3
10 | ["2016-01-15 01:32:12+01","2016-01-15 05:18:20+01"] | 3
11 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:26+01"] | 4
(8 rows)
The problem is very interesting. I've tried to find a recursive solution but it seems the procedural attempt is most natural and efficient.
I have finally found a recursive solution. The query deletes overlapping rows and inserts their compacted equivalent:
with recursive cte (user_id, ids, range) as (
select t1.fk_user_id, array[t1.id, t2.id], t1.sessionrange + t2.sessionrange
from test t1
join test t2
on t1.fk_user_id = t2.fk_user_id
and t1.id < t2.id
and t1.sessionrange && t2.sessionrange
union all
select user_id, ids || t.id, range + sessionrange
from cte
join test t
on user_id = t.fk_user_id
and ids[cardinality(ids)] < t.id
and range && t.sessionrange
),
list as (
select distinct on(id) id, range, user_id
from cte, unnest(ids) id
order by id, upper(range)- lower(range) desc
),
deleted as (
delete from test
where id in (select id from list)
)
insert into test
select distinct on (range) id, range, user_id
from list
order by range, id;
Results:
select *
from test
order by 3, 2;
id | sessionrange | fk_user_id
----+-----------------------------------------------------+------------
1 | ["2016-01-14 17:57:01+01","2016-01-14 18:21:56+01"] | 1
5 | ["2016-01-15 00:18:08+01","2016-01-15 00:18:15+01"] | 1
7 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:31+01"] | 1
6 | ["2016-01-15 00:38:12+01","2016-01-15 00:48:20+01"] | 1
4 | ["2016-01-15 00:18:00+01","2016-01-15 00:42:09+01"] | 2
9 | ["2016-01-15 00:18:12+01","2016-01-15 00:18:20+01"] | 3
10 | ["2016-01-15 01:32:12+01","2016-01-15 05:18:20+01"] | 3
11 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:26+01"] | 4
(8 rows)

Can window function LAG reference the column which value is being calculated?

I need to calculate value of some column X based on some other columns of the current record and the value of X for the previous record (using some partition and order). Basically I need to implement query in the form
SELECT <some fields>,
<some expression using LAG(X) OVER(PARTITION BY ... ORDER BY ...) AS X
FROM <table>
This is not possible because only existing columns can be used in window function so I'm looking way how to overcome this.
Here is an example. I have a table with events. Each event has type and time_stamp.
create table event (id serial, type integer, time_stamp integer);
I wan't to find "duplicate" events (to skip them). By duplicate I mean the following. Let's order all events for given type by time_stamp ascending. Then
the first event is not a duplicate
all events that follow non duplicate and are within some time frame after it (that is their time_stamp is not greater then time_stamp of the previous non duplicate plus some constant TIMEFRAME) are duplicates
the next event which time_stamp is greater than previous non duplicate by more than TIMEFRAME is not duplicate
and so on
For this data
insert into event (type, time_stamp)
values
(1, 1), (1, 2), (2, 2), (1,3), (1, 10), (2,10),
(1,15), (1, 21), (2,13),
(1, 40);
and TIMEFRAME=10 result should be
time_stamp | type | duplicate
-----------------------------
1 | 1 | false
2 | 1 | true
3 | 1 | true
10 | 1 | true
15 | 1 | false
21 | 1 | true
40 | 1 | false
2 | 2 | false
10 | 2 | true
13 | 2 | false
I could calculate the value of duplicate field based on current time_stamp and time_stamp of the previous non-duplicate event like this:
WITH evt AS (
SELECT
time_stamp,
CASE WHEN
time_stamp - LAG(current_non_dupl_time_stamp) OVER w >= TIMEFRAME
THEN
time_stamp
ELSE
LAG(current_non_dupl_time_stamp) OVER w
END AS current_non_dupl_time_stamp
FROM event
WINDOW w AS (PARTITION BY type ORDER BY time_stamp ASC)
)
SELECT time_stamp, time_stamp != current_non_dupl_time_stamp AS duplicate
But this does not work because the field which is calculated cannot be referenced in LAG:
ERROR: column "current_non_dupl_time_stamp" does not exist.
So the question: can I rewrite this query to achieve the effect I need?
Naive recursive chain knitter:
-- temp view to avoid nested CTE
CREATE TEMP VIEW drag AS
SELECT e.type,e.time_stamp
, ROW_NUMBER() OVER www as rn -- number the records
, FIRST_VALUE(e.time_stamp) OVER www as fst -- the "group leader"
, EXISTS (SELECT * FROM event x
WHERE x.type = e.type
AND x.time_stamp < e.time_stamp) AS is_dup
FROM event e
WINDOW www AS (PARTITION BY type ORDER BY time_stamp)
;
WITH RECURSIVE ttt AS (
SELECT d0.*
FROM drag d0 WHERE d0.is_dup = False -- only the "group leaders"
UNION ALL
SELECT d1.type, d1.time_stamp, d1.rn
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN d1.time_stamp
ELSE ttt.fst END AS fst -- new "group leader"
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN False
ELSE True END AS is_dup
FROM drag d1
JOIN ttt ON d1.type = ttt.type AND d1.rn = ttt.rn+1
)
SELECT * FROM ttt
ORDER BY type, time_stamp
;
Results:
CREATE TABLE
INSERT 0 10
CREATE VIEW
type | time_stamp | rn | fst | is_dup
------+------------+----+-----+--------
1 | 1 | 1 | 1 | f
1 | 2 | 2 | 1 | t
1 | 3 | 3 | 1 | t
1 | 10 | 4 | 1 | t
1 | 15 | 5 | 1 | t
1 | 21 | 6 | 1 | t
1 | 40 | 7 | 40 | f
2 | 2 | 1 | 2 | f
2 | 10 | 2 | 2 | t
2 | 13 | 3 | 2 | t
(10 rows)
An alternative to a recursive approach is a custom aggregate. Once you master the technique of writing your own aggregates, creating transition and final functions is easy and logical.
State transition function:
create or replace function is_duplicate(st int[], time_stamp int, timeframe int)
returns int[] language plpgsql as $$
begin
if st is null or st[1] + timeframe <= time_stamp
then
st[1] := time_stamp;
end if;
st[2] := time_stamp;
return st;
end $$;
Final function:
create or replace function is_duplicate_final(st int[])
returns boolean language sql as $$
select st[1] <> st[2];
$$;
Aggregate:
create aggregate is_duplicate_agg(time_stamp int, timeframe int)
(
sfunc = is_duplicate,
stype = int[],
finalfunc = is_duplicate_final
);
Query:
select *, is_duplicate_agg(time_stamp, 10) over w
from event
window w as (partition by type order by time_stamp asc)
order by type, time_stamp;
id | type | time_stamp | is_duplicate_agg
----+------+------------+------------------
1 | 1 | 1 | f
2 | 1 | 2 | t
4 | 1 | 3 | t
5 | 1 | 10 | t
7 | 1 | 15 | f
8 | 1 | 21 | t
10 | 1 | 40 | f
3 | 2 | 2 | f
6 | 2 | 10 | t
9 | 2 | 13 | f
(10 rows)
Read in the documentation: 37.10. User-defined Aggregates and CREATE AGGREGATE.
This feels more like a recursive problem than windowing function. The following query obtained the desired results:
WITH RECURSIVE base(type, time_stamp) AS (
-- 3. base of recursive query
SELECT x.type, x.time_stamp, y.next_time_stamp
FROM
-- 1. start with the initial records of each type
( SELECT type, min(time_stamp) AS time_stamp
FROM event
GROUP BY type
) x
LEFT JOIN LATERAL
-- 2. for each of the initial records, find the next TIMEFRAME (10) in the future
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = x.type
AND time_stamp > (x.time_stamp + 10)
) y ON true
UNION ALL
-- 4. recursive join, same logic as base
SELECT e.type, e.time_stamp, z.next_time_stamp
FROM event e
JOIN base b ON (e.type = b.type AND e.time_stamp = b.next_time_stamp)
LEFT JOIN LATERAL
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = e.type
AND time_stamp > (e.time_stamp + 10)
) z ON true
)
-- The actual query:
-- 5a. All records from base are not duplicates
SELECT time_stamp, type, false
FROM base
UNION
-- 5b. All records from event that are not in base are duplicates
SELECT time_stamp, type, true
FROM event
WHERE (type, time_stamp) NOT IN (SELECT type, time_stamp FROM base)
ORDER BY type, time_stamp
There are a lot of caveats with this. It assumes no duplicate time_stamp for a given type. Really the joins should be based on a unique id rather than type and time_stamp. I didn't test this much, but it may at least suggest an approach.
This is my first time to try a LATERAL join. So there may be a way to simplify that moe. Really what I wanted to do was a recursive CTE with the recursive part using MIN(time_stamp) based on time_stamp > (x.time_stamp + 10), but aggregate functions are not allowed in CTEs in that manner. But it seems the lateral join can be used in the CTE.

Collect rating statistics using postgresql

I am currently trying to collect rating statistics from a postgreSql database. Below you can find a simplified example of the database schema I would like to query.
CREATE DATABASE test_db;
CREATE TABLE rateable_object (
id BIGSERIAL PRIMARY KEY,
cdate TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
mdate TIMESTAMP,
name VARCHAR(160) NOT NULL,
description VARCHAR NOT NULL
);
CREATE TABLE ratings (
id BIGSERIAL PRIMARY KEY,
cdate TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
mdate TIMESTAMP,
parent_id BIGINT NOT NULL,
rating INTEGER NOT NULL DEFAULT -1
);
I would now like to collect a statistic for the values in the ratings column. The response should look like this:
+--------------+-------+
| column_value | count |
+--------------+-------+
| -1 | 2 |
| 0 | 45 |
| 1 | 37 |
| 2 | 13 |
| 3 | 5 |
| 4 | 35 |
| 5 | 75 |
+--------------+-------+
My first solution (see below) is very naive and probably not the fastest and simplest one. So my question is, if there is a better solution.
WITH
stars AS (SELECT generate_series(-1, 5) AS value),
votes AS (SELECT * FROM ratings WHERE parent_id = 1)
SELECT
stars.value AS stars, coalesce(COUNT(votes.*), 0) as votes
FROM
stars
LEFT JOIN
votes
ON
votes.rating = stars.value
GROUP BY stars.value
ORDER BY stars.value;
As I would not like to waste your time, I prepared some test data for you:
INSERT INTO rateable_object (name, description) VALUES
('Penguin', 'This is the Linux penguin.'),
('Gnu', 'This is the GNU gnu.'),
('Elephant', 'This is the PHP elephant.'),
('Elephant', 'This is the postgres elephant.'),
('Duck', 'This is the duckduckgo duck.'),
('Cat', 'This is the GitHub cat.'),
('Bird', 'This is the Twitter bird.'),
('Lion', 'This is the Leo lion.');
CREATE OR REPLACE FUNCTION generate_test_data() RETURNS INTEGER LANGUAGE plpgsql AS
$$
BEGIN
FOR i IN 0..1000 LOOP
INSERT INTO ratings (parent_id, rating) VALUES
(
(1 + (10 - 1) * random())::numeric::int,
(-1 + (5 + 1) * random())::numeric::int
);
END LOOP;
RETURN 0;
END;
$$;
SELECT generate_test_data();

Get integer part of number

So I have a table with numbers in decimals, say
id value
2323 2.43
4954 63.98
And I would like to get
id value
2323 2
4954 63
Is there a simple function in T-SQL to do that?
SELECT FLOOR(value)
http://msdn.microsoft.com/en-us/library/ms178531.aspx
FLOOR returns the largest integer less than or equal to the specified numeric expression.
Assuming you are OK with truncation of the decimal part you can do:
SELECT Id, CAST(value AS INT) INTO IntegerTable FROM NumericTable
FLOOR,CAST... do not return integer part with negative numbers, a solution is to define an internal procedure for the integer part:
DELIMITER //
DROP FUNCTION IF EXISTS INTEGER_PART//
CREATE FUNCTION INTEGER_PART(n DOUBLE)
RETURNS INTEGER
DETERMINISTIC
BEGIN
IF (n >= 0) THEN RETURN FLOOR(n);
ELSE RETURN CEILING(n);
END IF;
END
//
MariaDB [sidonieDE]> SELECT INTEGER_PART(3.7);
+-------------------+
| INTEGER_PART(3.7) |
+-------------------+
| 3 |
+-------------------+
1 row in set (0.00 sec)
MariaDB [sidonieDE]> SELECT INTEGER_PART(-3.7);
+--------------------+
| INTEGER_PART(-3.7) |
+--------------------+
| -3 |
+--------------------+
1 row in set (0.00 sec)
after you can use the procedure in a query like that:
SELECT INTEGER_PART(value) FROM table;
if you do not want to define an internal procedure in the database you can put an IF in a query like that:
select if(value < 0,CEILING(value),FLOOR(value)) from table ;

code works to a point

The following code looks good to me but works to a point. The function should display the grade levels of students based on exam performance but it does not run the last two else statements and so, if a student scores lower than 50 the function still displays "pass".
CREATE OR REPLACE FUNCTION stud_Result(integer,numeric) RETURNS text
AS
$$
DECLARE
stuNum ALIAS FOR $1;
grade ALIAS FOR $2;
result TEXT;
BEGIN
IF grade >= 70.0 THEN SELECT 'distinction' INTO result FROM student,entry
WHERE student.sno = entry.sno AND student.sno = stuNum;
ELSIF grade >=50.0 OR grade <=70.0 THEN SELECT 'pass' INTO result FROM student,entry
WHERE student.sno = entry.sno AND student.sno = stuNum;
ELSIF grade >0 OR grade< 50.0 THEN SELECT 'fail' INTO result FROM student,entry
WHERE student.sno = entry.sno AND student.sno = stuNum;
ELSE SELECT 'NOT TAKEN' INTO result FROM student,entry
WHERE student.sno = entry.sno AND student.sno = stuNum;
END IF;
RETURN result;
END;$$
LANGUAGE PLPGSQL;
Can anyone point me to the problem?
This is a PostgreSQL gotcha that has tripped me up as well. You need to replace your ELSE IFs with ELSIF.
You're seeing that error because each successive ELSE IF is being interpreted as starting a nested IF block, which expects its own END IF;.
See the documentation on conditionals for more information on the proper syntax.
Your logic in the conditionals is a bit strange. You have these:
grade >= 70.0
grade >= 50.0 OR grade <= 70.0
grade > 0 OR grade < 50.0
Note that zero satisfies the second condition as do a lot of other values that you don't want in that branch of the conditonal. I think you want these:
grade >= 70.0
grade >= 50.0 AND grade <= 70.0
grade > 0 AND grade < 50.0
You also seem to be using your SELECTs to check if the person is in the course but if the grade is given and they're not in the course, you will end up with a NULL result. Either the "in the course" check should be outside your function or you should convert a NULL result to 'NOT TAKEN' before returning.
This looks like homework so I'm not going to be anymore explicit than this.
Generally, I don't think it is a good idea to hide data in code. Data belongs in tables:
SET search_path='tmp';
-- create some data
DROP TABLE tmp.student CASCADE;
CREATE TABLE tmp.student
( sno INTEGER NOT NULL
, grade INTEGER
, sname varchar
);
INSERT INTO tmp.student(sno) SELECT generate_series(1,10);
UPDATE tmp.student SET grade = sno*sno;
DROP TABLE tmp.entry CASCADE;
CREATE TABLE tmp.entry
( sno INTEGER NOT NULL
, sdate TIMESTAMP
);
INSERT INTO tmp.entry(sno) SELECT generate_series(1,10);
-- table with interval lookup
DROP TABLE tmp.lookup CASCADE;
CREATE TABLE tmp.lookup
( llimit NUMERIC NOT NULL
, hlimit NUMERIC
, result varchar
);
INSERT INTO lookup (llimit,hlimit,result) VALUES(70, NULL, 'Excellent'), (50, 70, 'Passed'), (30, 50, 'Failed')
;
CREATE OR REPLACE FUNCTION stud_result(integer,numeric) RETURNS text
AS $BODY$
DECLARE
stunum ALIAS FOR $1;
grade ALIAS FOR $2;
result TEXT;
BEGIN
SELECT COALESCE(lut.result, 'NOT TAKEN') INTO result
FROM student st, entry en
LEFT JOIN lookup lut ON (grade >= lut.llimit
AND (grade < lut.hlimit OR lut.hlimit IS NULL) )
WHERE st.sno = en.sno
AND st.sno = stunum
;
RETURN result;
END; $BODY$ LANGUAGE PLPGSQL;
-- query joining students with their function values
SELECT st.*
, stud_result (st.sno, st.grade)
FROM student st
;
But wait: you can just as well do without the ugly function:
-- Plain query
SELECT
st.sno, st.sname, st.grade
, COALESCE(lut.result, 'NOT TAKEN') AS result
FROM student st
LEFT JOIN lookup lut ON ( 1=1
AND lut.llimit <= st.grade
AND ( lut.hlimit > st.grade OR lut.hlimit IS NULL)
)
JOIN entry en ON st.sno = en.sno
;
Results:
sno | grade | sname | stud_result
-----+-------+-------+-------------
1 | 1 | | NOT TAKEN
2 | 4 | | NOT TAKEN
3 | 9 | | NOT TAKEN
4 | 16 | | NOT TAKEN
5 | 25 | | NOT TAKEN
6 | 36 | | Failed
7 | 49 | | Failed
8 | 64 | | Passed
9 | 81 | | Excellent
10 | 100 | | Excellent
(10 rows)
sno | sname | grade | result
-----+-------+-------+-----------
1 | | 1 | NOT TAKEN
2 | | 4 | NOT TAKEN
3 | | 9 | NOT TAKEN
4 | | 16 | NOT TAKEN
5 | | 25 | NOT TAKEN
6 | | 36 | Failed
7 | | 49 | Failed
8 | | 64 | Passed
9 | | 81 | Excellent
10 | | 100 | Excellent
(10 rows)
Get rid of the function altogether and use a query:
SELECT
s.*,
CASE e.grade
WHEN >= 0 AND < 50 THEN 'failed'
WHEN >= 50 AND < 70 THEN 'passed'
WHEN >= 70 AND <= 100 THEN 'excellent'
ELSE 'not taken'
END
FROM
student s,
entry e
WHERE
s.sno = e.sno;