I'm trying to set up a database structure for storing user progress in an app. Right now I'm using PostgreSQL to store user information and other data related to the app. I'm not sure how to how to structure the database for when the user makes progress, i.e. unlocks a certain level. I was thinking of making a relational database that has all of the users as a row and then as columns has all of the things they can possible unlock and then store true or false values in it, but this seems rather inefficient. Is there a better way to store this information?
I would rather add achievements as rows, not columns, eg:
t=# create table achievements (i smallserial primary key, ach text);
CREATE TABLE
t=# create table user_achievements (i serial, user_id int, ach_id smallint references achievements(i), level int, achieved timestamptz default now());
CREATE TABLE
t=# insert into achievements (ach) values('blah');
INSERT 0 1
t=# insert into user_achievements(user_id,ach_id,level) values(1,1,1);
INSERT 0 1
t=# insert into user_achievements(user_id,ach_id,level) values(1,1,2);
INSERT 0 1
t=# select * from user_achievements;
i | user_id | ach_id | level | achieved
---+---------+--------+-------+-------------------------------
1 | 1 | 1 | 1 | 2018-01-29 08:25:32.018466+00
2 | 1 | 1 | 2 | 2018-01-29 08:25:34.089929+00
Related
I am using PostgreSQL 13 and has intermediate level experience with PostgreSQL.
I have a table named tbl_employee. it stores employee details for number of customers.
Below is my table structure, followed by datatype and index access method
Column | Data Type | Index name | Idx Access Type
-------------+-----------------------------+---------------------------+---------------------------
id | bigint | |
name | character varying | |
customer_id | bigint | idx_customer_id | btree
is_active | boolean | idx_is_active | btree
is_delete | boolean | idx_is_delete | btree
I want to delete employees for specific customer by customer_id.
In table I have total 18,00,000+ records.
When I execute below query for customer_id 1001 it returns 85,000.
SELECT COUNT(*) FROM tbl_employee WHERE customer_id=1001;
When I perform delete operation using below query for this customer then it takes 2 hours, 45 minutes to delete the records.
DELETE FROM tbl_employee WHERE customer_id=1001
Problem
My concern is that this query should take less than 1 min to delete the records. Is this normal to take such long time or is there any way we can optimise and reduce the execution time?
Below is Explain output of delete query
The values of seq_page_cost = 1 and random_page_cost = 4.
Below are no.of pages occupied by the table "tbl_employee" from pg_class.
Please guide. Thanks
During :
DELETE FROM tbl_employee WHERE customer_id=1001
Is there any other operation accessing this table? If only this SQL accessing this table, I don't think it will take so much time.
In RDBMS systems each SQL statement is also a transaction, unless it's wrapped in BEGIN; and COMMIT; to make multi-statement transactions.
It's possible your multirow DELETE statement is generating a very large transaction that's forcing PostgreSQL to thrash -- to spill its transaction logs from RAM to disk.
You can try repeating this statement until you've deleted all the rows you need to delete:
DELETE FROM tbl_employee WHERE customer_id=1001 LIMIT 1000;
Doing it this way will keep your transactions smaller, and may avoid the thrashing.
SQL: DELETE FROM tbl_employee WHERE customer_id=1001 LIMIT 1000;
will not work then.
To make the batch delete smaller, you can try this:
DELETE FROM tbl_employee WHERE ctid IN (SELECT ctid FROM tbl_employee where customer_id=1001 limit 1000)
Until there is nothing to delete.
Here the "ctid" is an internal column of Postgresql Tables. It can locate the rows.
I have table:
book_id | part | name
1 | 1 | chap 1
1 | 2 | chap 2
1 | 3 | chap 3
1 | 4 | chap 4
Primary key is book_id and part.
How can delete part 2 and update order of parts to get:
book_id | part | name
1 | 1 | chap 1
1 | 2 | chap 3
1 | 3 | chap 4
I can do a transaction and firstly delete part 2, but how can i then update part column without getting duplicate primary key error?
I would choose a different approach. Instead of persisting the part number, persist the order of the parts:
CREATE TABLE book_part (
book_id bigint NOT NULL,
part_order real NOT NULL,
name text NOT NULL,
PRIMARY KEY (book_id, part_order)
);
The first part that gets entered gets a part_order of 0.0. If you add a part at the beginning or the end, you just assign to part_order 1.0 less or more than the previous minimum or maximum. If you insert a part between two existing parts, you assign a part_order that is the arithmetic mean of the adjacent parts.
An example:
-- insert the first part
INSERT INTO book_part VALUES (1, 0.0, 'Introduction');
-- insert a part at the end
INSERT INTO book_part VALUES (1, 1.0, 'Getting started with PostgreSQL');
-- insert a part between the two existing parts
INSERT INTO book_part VALUES (1, 0.5, 'The history of PostgreSQL');
-- adding yet another part between two existing parts
INSERT INTO book_part VALUES (1, 0.25, 'An introductory example');
The actual part number is calculated when you query the table:
SELECT book_id,
row_number() OVER (PARTITION BY book_id ORDER BY part_order) AS part,
name
FROM book_part;
The beauty of that is that you don't need to update a lot of rows when you add or delete a part.
Unlike most RDBMS, PostGreSQL does not support updating a primary key that might violate a preexisting value without having to use a deferred constraint.
In fact PostGreSQL execute the update row by row which conducts to find a "phantom" duplicate key, while other RDBMS that respects the standard uses a set based approach (MS SQL Server, Oracle, DB2...)
So you must use a deferred constraint.
ALTER TABLE book_part
ALTER CONSTRAINT ??? *PK constraint name* ??? DEFERRABLE INITIALLY IMMEDIATE;
This is a severe limitations of PG... See "5 – The hard way to udpates unique values" in
http://mssqlserver.fr/postgresql-vs-sql-server-mssql-part-3-very-extremely-detailed-comparison/
I am currently looking into an efficient way to allocate data into a partitioned table. Is it possible to use postgres/psql to COPY data into a specific table partition (instead of using INSERT)?
According to the documentation on COPY here:
COPY FROM can be used with plain, foreign, or partitioned tables or with views that have INSTEAD OF INSERT triggers.
And according to the documentation on partitioning here:
Be aware that COPY ignores rules. If you want to use COPY to insert data, you'll need to copy into the correct partition table rather than into the master. COPY does fire triggers, so you can use it normally if you use the trigger approach.
From my understanding of the aforementioned resources, it seems possible to copy into partition; however, I can't find any examples or support for that online.
In other words, can I write something like:
COPY some_table_partition_one FROM '/some_dir/some_file'
COPY to a partitioned table was introduced in v11:
Allow INSERT, UPDATE, and COPY on partitioned tables to properly route rows to foreign partitions (Etsuro Fujita, Amit Langote)
But COPY directly to a partition is possible in all releases since v10, where declarative partitioning was introduced.
It seems like we forgot to remove the second quotation from the documentation.
It is possible at least with PG 12.2:
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE
CREATE TABLE measurement_y2020m03 PARTITION OF measurement
FOR VALUES FROM ('2020-03-01') TO ('2020-03-31');
CREATE TABLE
CREATE TABLE measurement_y2020m04 PARTITION OF measurement
FOR VALUES FROM ('2020-04-01') TO ('2020-04-30');
CREATE TABLE
insert into measurement values (1, current_date, 10,100);
INSERT 0 1
select * from measurement;
city_id | logdate | peaktemp | unitsales
---------+------------+----------+-----------
1 | 2020-03-27 | 10 | 100
(1 row)
cat /tmp/m.dat
4,2020-04-01,40,400
copy measurement_y2020m04 from '/tmp/m.dat' delimiter ',';
COPY 1
select * from measurement;
city_id | logdate | peaktemp | unitsales
---------+------------+----------+-----------
1 | 2020-03-27 | 10 | 100
4 | 2020-04-01 | 40 | 400
(2 rows)
I'm trying to write a rule on a view to delete tuples from the component tables, but so far can only remove data from one of them. I've used postgres with basic views for a while, but I don't have any experience with rules on views.
I wrote a stupid little test case to figure out/show my problem. There's only one parent tuple per child tuple in this example (my actual schema isn't actually like this of course).
Component tables:
CREATE TABLE parent(
id serial PRIMARY KEY,
p_data integer NOT NULL UNIQUE
);
CREATE TABLE child(
id serial PRIMARY KEY,
parent_id integer NOT NULL UNIQUE REFERENCES parent(id),
c_data integer NOT NULL
);
View:
CREATE TABLE child_view(
id integer,
p_data integer,
c_data integer
);
CREATE RULE "_RETURN" AS ON SELECT TO child_view DO INSTEAD
SELECT child.id, p_data, c_data
FROM parent JOIN child ON (parent_id=parent.id);
Problem delete rule
CREATE RULE child_delete AS ON DELETE TO child_view DO INSTEAD(
DELETE FROM child WHERE id=OLD.id;
DELETE FROM parent WHERE p_data=OLD.p_data;
);
The intent of the above rule is to remove tuples referenced in the view from the component tables. The WHERE p_data=OLD.p_data seems odd to me, but I don't see how else to reference the desired tuple in the parent table.
Here's what happens when I try to use the above rule:
>SELECT * FROM child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 10
2 | 2 | 11
3 | 3 | 12
(3 rows)
>DELETE FROM child_view WHERE id=3;
DELETE 0
>SELECT * FROM child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 10
2 | 2 | 11
(2 rows)
But looking at the parent table, the second part of the delete isn't working (id=3 "should" have been deleted):
>SELECT * FROM parent;
id | p_data
----+--------
1 | 1
2 | 2
3 | 3
(3 rows)
How should I write the deletion rule to remove both child and parent tuples?
This is using postgres v9.
Any help is appreciated. Also pointers to any materials covering rules on views beyond the postgres docs (unless I've obviously missed something) would also be appreciated. Thanks.
EDIT: as jmz points out, it would be easier to use a cascading delete than a rule here, but that approach doesn't work for my actual schema.
What you're seeing with the rule problem is that the rule system doesn't handle the data atomically. The first delete is executed regardless of the order of the two statements in the DO INSTEAD rule. The second statement is never executed, because the row to which OLD.id refers to has been removed from the view. You could use a LEFT JOIN, but that won't help you because of the example table design (it may work on your actual database schema).
The fundamental problem, as I see it, is that you're treating the rule system as it was a trigger.
Your best option is to use foreign keys and ON DELETE CASCADE instead of rules. With them your example schema would work too: You'd only need on delete for the parent table to get rid of all the children.
What you want to do will work fine. But you made a left turn on this:
CREATE TABLE child_view(
id integer,
p_data integer,
c_data integer
);
CREATE RULE "_RETURN" AS ON SELECT TO child_view DO INSTEAD
SELECT child.id, p_data, c_data
FROM parent JOIN child ON (parent_id=parent.id);
You want a real life view here not a table. That is why delete will not work.
CREATE VIEW child_view AS SELECT
child.id,
p_data,
c_data
FROM parent
JOIN child ON (parent_id=parent.id)
;
Replace the top with the bottom and it will work perfectly (It did when I tested it). The reason delete does not work is it trying to delete id from the TABLE child view which is of course empty! It does not execute the 'select do instead' rule so it is working on the real table child view. People may poo-poo using rules but if they cannot see such an obvious mistake I wonder how much they know?
I have used rules successfully in defining interfaces to enforce business rules. They can lead elegant solutions in ways triggers could not.
Note: I only recommend this to make writable views for an interface. You could do clever things like checking constraints across tables - and you may be asking for it. That kind stuff really should be used with triggers.
Edit: script per request
-- set this as you may have had an error if you running
-- from a script and not noticed it with all the NOTICES
\set ON_ERROR_STOP
drop table if exists parent cascade;
drop table if exists child cascade;
CREATE TABLE parent(
id serial PRIMARY KEY,
p_data integer NOT NULL UNIQUE
);
CREATE TABLE child(
id serial PRIMARY KEY,
parent_id integer NOT NULL UNIQUE REFERENCES parent(id),
c_data integer NOT NULL
);
CREATE VIEW child_view AS SELECT
child.id,
p_data,
c_data
FROM parent
JOIN child ON (parent_id=parent.id)
;
CREATE RULE child_delete AS ON DELETE TO child_view DO INSTEAD(
DELETE FROM child WHERE id=OLD.id;
DELETE FROM parent WHERE p_data=OLD.p_data;
);
insert into parent (p_data) values (1), (2), (3);
insert into child (parent_id, c_data) values (1, 1), (2, 2), (3, 3);
select * from child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
(3 rows)
delete from child_view where id=3;
DELETE 0
select * from child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 1
2 | 2 | 2
(2 rows)
There is a table:
CREATE TABLE temp
(
IDR decimal(9) NOT NULL,
IDS decimal(9) NOT NULL,
DT date NOT NULL,
VAL decimal(10) NOT NULL,
AFFID decimal(9),
CONSTRAINT PKtemp PRIMARY KEY (IDR,IDS,DT)
)
;
Let's see the plan for select star query:
SQL>explain plan for select * from temp;
Explained.
SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 61 | 2 (0)|
| 1 | TABLE ACCESS FULL| TEMP | 1 | 61 | 2 (0)|
---------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
11 rows selected.
SQL server 2008 shows in the same situation Clustered index scan. What is the reason?
select * with no where clause -- means read every row in the table, fetch every column.
What do you gain by using an index? You have to go to the index, get a rowid, translate the rowid into a table offset, read the file.
What happens when you do a full table scan? You go the th first rowid in the table, then read on through the table to the end.
Which one of these is faster given the table you have above? Full table scan. Why? because it skips having to to go the index, retreive values, then going back to the other to where the table lives and fetching.
To answer this more simply without mumbo-jumbo, the reason is:
Clustered Index = Table
That's by definition in SQL Server. If this is not clear, look up the definition.
To be absolutely clear once again, since most people seem to miss this, the Clustered Index IS the table itself. It therefore follows that "Clustered Index Scan" is another way of saying "Table Scan". Or what Oracle calls "TABLE ACCESS FULL"