Collect rating statistics using postgresql - postgresql

I am currently trying to collect rating statistics from a postgreSql database. Below you can find a simplified example of the database schema I would like to query.
CREATE DATABASE test_db;
CREATE TABLE rateable_object (
id BIGSERIAL PRIMARY KEY,
cdate TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
mdate TIMESTAMP,
name VARCHAR(160) NOT NULL,
description VARCHAR NOT NULL
);
CREATE TABLE ratings (
id BIGSERIAL PRIMARY KEY,
cdate TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
mdate TIMESTAMP,
parent_id BIGINT NOT NULL,
rating INTEGER NOT NULL DEFAULT -1
);
I would now like to collect a statistic for the values in the ratings column. The response should look like this:
+--------------+-------+
| column_value | count |
+--------------+-------+
| -1 | 2 |
| 0 | 45 |
| 1 | 37 |
| 2 | 13 |
| 3 | 5 |
| 4 | 35 |
| 5 | 75 |
+--------------+-------+
My first solution (see below) is very naive and probably not the fastest and simplest one. So my question is, if there is a better solution.
WITH
stars AS (SELECT generate_series(-1, 5) AS value),
votes AS (SELECT * FROM ratings WHERE parent_id = 1)
SELECT
stars.value AS stars, coalesce(COUNT(votes.*), 0) as votes
FROM
stars
LEFT JOIN
votes
ON
votes.rating = stars.value
GROUP BY stars.value
ORDER BY stars.value;
As I would not like to waste your time, I prepared some test data for you:
INSERT INTO rateable_object (name, description) VALUES
('Penguin', 'This is the Linux penguin.'),
('Gnu', 'This is the GNU gnu.'),
('Elephant', 'This is the PHP elephant.'),
('Elephant', 'This is the postgres elephant.'),
('Duck', 'This is the duckduckgo duck.'),
('Cat', 'This is the GitHub cat.'),
('Bird', 'This is the Twitter bird.'),
('Lion', 'This is the Leo lion.');
CREATE OR REPLACE FUNCTION generate_test_data() RETURNS INTEGER LANGUAGE plpgsql AS
$$
BEGIN
FOR i IN 0..1000 LOOP
INSERT INTO ratings (parent_id, rating) VALUES
(
(1 + (10 - 1) * random())::numeric::int,
(-1 + (5 + 1) * random())::numeric::int
);
END LOOP;
RETURN 0;
END;
$$;
SELECT generate_test_data();

Related

PostgreSQL Fetch by dates that doesn't exists in jsonb item in another table

I have two tables in mypostgres database: parkingplaces where I have all the places that I want to fetch, and a second table where there are date ranges(in jsonb[] format ) that cannot be selected if the date given by user is contained in these date ranges (which contain a start date, end date and a range of hours)
CREATE TABLE parkingplaces
(
id integer NOT NULL DEFAULT nextval('parkingplaces_id_seq'::regclass),
namePlace character varying COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT parkingplaces_pkey PRIMARY KEY (id)
)
AND
CREATE TABLE parkingblockedtimeslots (
id SERIAL UNIQUE,
idplace character varying,
timeslot jsonb[],
PRIMARY KEY (idplace)
);
id | idplace | timeslot
9 | 140 | {"{\"end\": \"2020-10-09\", \"slot\": \"[08-12h, 12-16h]\", \"start\": \"2020-10-09\"}","{\"end\": \"2021-01-07\", \"slot\": \"[08-12h, 20-00h]\", \"start\": \"2021-01-05\"}","{\"end\": \"2021-02-25\", \"slot\": \"[08-12h, 20-00h]\", \"start\": \"2021-02-09\"}","{\"end\": \"2021-02-25\", \"slot\": \"[08-12h, 20-00h, 00-04h, 04-08h]\", \"start\": \"2021-02-09\"}"}
Given a date, I want to select all the places in the parkingplaces that doesn't have an entry in the parkingblockedtimeslots that contains this date
It will look like this
select * from parkingplaces p where NOT EXISTS
(select * from parkingblockedtimeslots b WHERE b.idplace = p.id AND [date] BETWEEN EACH ITEM in the jsonb[] )
\\ [date] must be between start - end and also in the range of hours (slot item)
\\ If startdate == enddate in the jsonb item -> [date] must between the range of hours of this day
\\ If enddate > startdate in the jsonb item -> [date] must between the range of hours of each day
Thank you for your help
second table where there are date ranges (in jsonb[] format)
I would start with normalising that database schmema. Instead of storing an array of objects for each place (already in a separate table at least), better store each object in its own row:
CREATE TABLE parkingblockedtimeslots (
id SERIAL PRIMARY KEY,
idplace integer REFERENCES parkingplaces(id),
timeslot jsonb
);
id | idplace | timeslot
9 | 140 | '{"end": "2020-10-09", "slot": "[08-12h, 12-16h]", "start": "2020-10-09"}'
10 | 140 | '{"end": "2021-01-07", "slot": "[08-12h, 20-00h]", "start": "2021-01-05"}'
11 | 140 | '{"end": "2021-02-25", "slot": "[08-12h, 20-00h]", "start": "2021-02-09"}'
12 | 140 | '{"end": "2021-02-25", "slot": "[08-12h, 20-00h, 00-04h, 04-08h]", "start": "2021-02-09"}'
Instead of storing jsonb objects, better store the values in individual columns of a suitable datatype. In this case, I'd suggest
CREATE TABLE parkingblockedtimeslots (
id SERIAL PRIMARY KEY,
idplace integer REFERENCES parkingplaces(id),
valid_period daterange, -- use tstzrange for hours on start and end date
hour_slots intrange[]
);
CREATE INDEX ON parkingblockedtimeslots USING gist (idplace, valid_period);
id | idplace | valid_period | hour_slots
9 | 140 | '[2020-10-09, 2020-10-09]' | '{[8, 12), [12, 16)}'
10 | 140 | '[2021-01-05, 2021-01-07]' | '{[8, 12), [20, 24)}'
11 | 140 | '[2021-02-09, 2021-02-25]' | '{[8, 12), [20, 24)}'
12 | 140 | '[2021-02-09, 2021-02-25]' | '{[0-4), [8, 12), [20, 24)}'
(Syntax of literal values might need work)
With this, your query will be
SELECT *
FROM parkingplaces p
WHERE NOT EXISTS (
SELECT *
FROM parkingblockedtimeslots b
WHERE b.idplace = p.id
AND [date] <# valid_period
AND [hour] <# ANY hour_slots
)

Efficiently copying a tree modelled with adjacency list in postgres

I have the following table:
CREATE TABLE tree_node (
id serial primary key,
name character varying(255),
parent_id integer references tree (id)
);
The table contains many trees with up to about 1000 nodes.
(I'm able to query a tree and its descendants efficiently with a recursive query).
However, I need to be able to copy a single tree in one operation. Say I have a tree with 3 nodes, ids 1,2,3 (this is potentially a large tree). I would like to make a copy of it i.e. creating new nodes with new ids. (Here the copied tree is ids 4,5,6):
id | name | parent_id
----+-----------------+-----------
1 | NodeA |
2 | NodeA.1 | 1
3 | NodeA.1.1 | 2
4 | NodeA(copy) |
5 | NodeA.1(copy) | 4
6 | NodeA.1.1(copy) | 5
Is there a way to copy a tree and its descendants more efficiently than inserting each tree node separately (because the new parent_id is needed)?
There you go:
\i tmp.sql
CREATE TABLE tree_node (
id serial primary key
, name varchar
, parent_id integer references tree_node (id)
);
INSERT INTO tree_node(name, parent_id) VALUES
( 'Node-A', NULL)
, ( 'Node-A.1', 1)
, ( 'Node-A.1.1', 2)
;
SELECT * FROM tree_node;
-- Find the top value of the sequence
-- and use it as an increment on all the copies
WITH top(val) AS
(select currval('tree_node_id_seq')
)
INSERT INTO tree_node(id, name, parent_id)
SELECT id+top.val
, name|| '(copy)'
, parent_id + top.val
FROM tree_node
CROSS JOIN top
;
SELECT * FROM tree_node;
-- bump the sequence
WITH nxt AS (
select max(id) mx from tree_node
)
SELECT setval('tree_node_id_seq', (select mx FROM nxt) )
;
Output:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 3
id | name | parent_id
----+------------+-----------
1 | Node-A |
2 | Node-A.1 | 1
3 | Node-A.1.1 | 2
(3 rows)
INSERT 0 3
id | name | parent_id
----+------------------+-----------
1 | Node-A |
2 | Node-A.1 | 1
3 | Node-A.1.1 | 2
4 | Node-A(copy) |
5 | Node-A.1(copy) | 4
6 | Node-A.1.1(copy) | 5
(6 rows)
setval
--------
6
(1 row)

Remove duplicate with lower Date from SQL result

I have following table:
CREATE TABLE Kundendaten (
beschreiben_knr INTEGER REFERENCES Kunde(knr) DEFERRABLE INITIALLY DEFERRED,
erstelldatum DATE,
anschrift VARCHAR(40),
sonderrabat INTEGER,
PRIMARY KEY (erstelldatum, beschreiben_knr)
);
If i make this query:
select * from Kundendaten ORDER BY erstelldatum DESC;
i get:
beschreiben_knr | erstelldatum | anschrift | sonderrabat
-----------------+--------------+---------------+-------------
1 | 2015-11-01 | Winkelgasse 5 | 0
2 | 2015-11-01 | Badeteich 7 | 10
3 | 2015-11-01 | Senfgasse 7 | 15
1 | 2015-10-30 | Sonnenweg 3 | 5
But i need to get only the entry for the highest date entry if there are more then one. In this case the last row should not appear.
How can i achieve this in postgresql?
You want something like WHERE erstelldatum = MAX(DATE) but that doesn't work. You can use a sub-query to get the newest date.
SELECT *
FROM Kundendaten
WHERE erstelldatum = (
SELECT MAX(erstelldatum) FROM Kundendaten
);
(SQL Fiddle)
Postgres will optimize that subquery so it is only run once, but you'll want to make sure erstelldatum is indexed.

way to calculate sum of two corresponded columns in postgresql

I have two tables called "incoming" and "orders" and i want to create view called "stock" which is produced using data from incoming and orders.
CREATE TABLE incoming
(
id serial NOT NULL,
model integer,
size integer,
color integer,
price real,
quanity integer,
CONSTRAINT pk PRIMARY KEY (id),
CONSTRAINT "incoming_model_size_color_key" UNIQUE (model, size, color)
)
CREATE TABLE orders
(
id serial NOT NULL,
model integer,
size integer,
color integer,
price real,
quanity integer,
Comenttext text,
CONSTRAINT pk_orders PRIMARY KEY (id)
)
For now i have this dirty solution:
CREATE OR REPLACE VIEW stock AS
WITH total_orders AS (
SELECT orders.model,
orders.size,
orders.color,
sum(orders.quanity) AS sum
FROM orders
GROUP BY orders.color, orders.size, orders.model
)
SELECT incoming.model,
incoming.size,
incoming.color,
incoming.quanity - (( SELECT
CASE count(*)
WHEN 1 THEN ( SELECT total_orders_1.sum
FROM total_orders total_orders_1
WHERE incoming.model = total_orders_1.model AND incoming.size = total_orders_1.size)
ELSE 0::bigint
END AS "case"
FROM total_orders
WHERE incoming.model = total_orders.model AND incoming.size=total_orders.size)) AS quanity
FROM incoming;
how can i use it more clear and simple?
examples:
select * from incloming
id | model | size | color | price | quanity
----+-------+------+-------+-------+--------
1 | 1 | 6 | 5 | 550 | 15
2 | 1 | 5 | 5 | 800 | 20
select * from orders
id | model | size | color | price | quanity |
----+-------+------+-------+-------+---------+
1 | 1 | 6 | 5 | 1000 | 1 |
2 | 1 | 6 | 5 | 1000 | 2 | -- sum is 3
select * from stock
model | size | color | quanity
-------+------+-------+----------
1 | 6 | 5 | 12 --= 15 - 3 !! excellent
1 | 5 | 5 | 20 -- has no oerders yet
You just need to left join on the aggregated orders:
select i.model, i.size, i.color, i.quantity,
o.qty as ordered,
i.quantity - coalesce(o.qty, 0) as quantity_on_stock
from incoming i
left join (
select model, size, color, sum(quantity) as qty
from orders
group by model, size, color
) o on (o.model, o.size, o.color) = (i.model, i.size, i.color);
SQLFiddle: http://sqlfiddle.com/#!15/7fbec/2
When using your CTE as base, then you wind up with this:
WITH total_orders AS (
SELECT orders.model,
orders.size,
orders.color,
sum(orders.quantity) AS sum
FROM orders
GROUP BY color, size, model
)
SELECT i.model,
i.size,
i.color,
i.quantity - coalesce(tot.sum, 0) AS quanity
FROM incoming i
LEFT JOIN total_orders tot on (tot.model, tot.size, tot.color) = (i.model, i.size, i.color);
Whether or not the CTE or the derived table (the first solution) is faster you need to test.

Restart primary key numbers of existing rows after deleting most of a big table

I am working with a PostgreSQL 8.4.13 database.
Recently I had around around 86.5 million records in a table. I deleted almost all of them - only 5000 records are left now. I ran:
vacuum full
after deleting the rows and that returned disk space to the OS (thx to suggestion from fellow SO member)
But I see that my id numbers are still stuck at millions. For ex:
id | date_time | event_id | secs_since_1970 | value
---------+-------------------------+----------+-----------------+-----------
61216287 | 2013/03/18 16:42:42:041 | 6 | 1363646562.04 | 46.4082
61216289 | 2013/03/18 16:42:43:041 | 6 | 1363646563.04 | 55.4496
61216290 | 2013/03/18 16:42:44:041 | 6 | 1363646564.04 | 40.0553
61216291 | 2013/03/18 16:42:45:041 | 6 | 1363646565.04 | 38.5694
In an attempt to start the id value at 1 again for the remaining rows, I tried:
cluster mytable_pkey on mytable;
where mytable is the name of my table. But that did not help.
So, my question(s) is/are:
Is there a way to get the index (id value) to start at 1 again?
If I add or update the table with a new record, will it start from 1 or pick up the next highest integer value (say 61216292 in above example)?
My table description is as follows: There is no FK constraint and no sequence in it.
jbossql=> \d mytable;
Table "public.mytable"
Column | Type | Modifiers
-----------------+------------------------+-----------
id | bigint | not null
date_time | character varying(255) |
event_id | bigint |
secs_since_1970 | double precision |
value | real |
Indexes:
"mydata_pkey" PRIMARY KEY, btree (id) CLUSTER
Drop the primary key fisrt and create a temporary sequence.
alter table mytable drop constraint mydata_pkey;
create temporary sequence temp_seq;
Use the sequence to update:
update mytable
set id = nextval('temp_seq');
Recreate the primary key and drop the sequence
alter table mytable add primary key (id);
drop sequence temp_seq;
If there is a foreign key dependency on this table then you will have to deal with it first and the update will be a more complex procedure.
Is your primary key defined using a serial? If so that creates an implicit sequence. You can use ALTER SEQUENCE (see: http://www.postgresql.org/docs/8.2/static/sql-altersequence.html for syntax) to change the starting number back to 1.
Based on the fact that you have some records left (just noticed the 5000 left), you DO NOT want to reset that number to a number before the last ID of the remaining records because then that sequence will generate non-unique numbers. The point of using a sequence is it gives you a transactional way to increment a number and guarantee successive operations get unique incremented numbers.
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
--
-- Note: "deferrable initially deferred" appears to be the default
--
CREATE TABLE funky
( id SERIAL NOT NULL PRIMARY KEY DEFERRABLE INITIALLY DEFERRED
, tekst varchar
);
-- create some data with gaps in it
INSERT INTO funky(id, tekst)
SELECT gs, 'Number:' || gs::text
FROM generate_series(1,100,10) gs
;
-- set the sequence to the max occuring id
SELECT setval('funky_id_seq' , mx.mx)
FROM (SELECT max(id) AS mx FROM funky) mx
;
SELECT * FROM funky ;
-- compress the keyspace, making the ids consecutive
UPDATE funky xy
SET id = self.newid
FROM (
SELECT id AS id
, row_number() OVER (ORDER BY id) AS newid
FROM funky
) self
WHERE self.id = xy.id
;
-- set the sequence to the new max occuring id
SELECT setval('funky_id_seq' , mx.mx)
FROM (SELECT max(id) AS mx FROM funky) mx
;
SELECT * FROM funky ;
Result:
CREATE TABLE
INSERT 0 10
setval
--------
91
(1 row)
id | tekst
----+-----------
1 | Number:1
11 | Number:11
21 | Number:21
31 | Number:31
41 | Number:41
51 | Number:51
61 | Number:61
71 | Number:71
81 | Number:81
91 | Number:91
(10 rows)
UPDATE 10
setval
--------
10
(1 row)
id | tekst
----+-----------
1 | Number:1
2 | Number:11
3 | Number:21
4 | Number:31
5 | Number:41
6 | Number:51
7 | Number:61
8 | Number:71
9 | Number:81
10 | Number:91
(10 rows)
WARNING WARNING WARNING WARNING WARNING WARNING ACHTUNG:
Changing key values is generally a terrible idea. Avoid it at all cost.