I worked with the ESRI geodb where you can insert (into the "property" of a table) some metadata (also compliant to some International standard), like creation date, organization, source, copyright info etc.
Is there something similar in Postgres, for metadata of a table as a whole? I know only COMMENT but it seems too poor for my purposes.
Here is a very naive example of how you can save your wanted "metadata"
lest assume you have two tables you want to have data about:
t=# create table so66(i int, t text);
CREATE TABLE
Time: 5.431 ms
t=# create table so67(i int, t text);
CREATE TABLE
Time: 4.797 ms
and a "metadata" holder table:
t=# create table metadata(tname text, created timestamptz, details json);
CREATE TABLE
Time: 6.814 ms
t=# insert into metadata select 'so66',now(),'{"organization":"n/a","source":"manual","catalog":false}';
INSERT 0 1
Time: 3.144 ms
t=# insert into metadata select 'so76',now(),'{"organization":"home","source":"manual","catalog":true}';
INSERT 0 1
Time: 0.907 ms
t=# select * from metadata ;
tname | created | details
-------+-------------------------------+----------------------------------------------------------
so66 | 2017-04-21 09:24:08.233346+00 | {"organization":"n/a","source":"manual","catalog":false}
so76 | 2017-04-21 09:24:26.641526+00 | {"organization":"home","source":"manual","catalog":true}
(2 rows)
Time: 0.253 ms
I used json to save arbitrary details. Of course you can add columns with special data types for your needs. Also you might want to use oids instead of table names or do some logic on insert/update of it.
Related
As school work we're supposed to create a table that logs all operations done by users on another table. To be more clear, say I have table1 and logtable, table1 can contain any info (names, ids, job, etc), logtable contains info on who did what, when on table1. Using a function and a trigger I managed to get the INSERT, DELETE and UPDATE operations to be a logged in logtable, but we're also supposed to keep a log of SELECTs. To be more specific about the SELECTs, in a View if you do a SELECT, this is supposed to be logged into logtable via an INSERT, essentially the logtable is supposed to have a new row with information telling that somebody did a SELECT. My problem is that I can't figure out any way to accomplish this as SELECTs can't make use of triggers and in turn can't make use of functions, and rules don't allow for two different operations to take place. The only thing that came close was using query logs, however as the database is the school's and not mine I can't make any use of them.
Here is a rough example of what I'm working with (in reality tstamp has hours minutes and such):
id operation hid tablename who tstamp val_new val_old
x INSERT x table1 name YYYY-MM-DD newValues previousValues
That works as intended, but what I also need to get to work is this (Note: Whether val_new and old come out as empty or not in this case is not a concern):
id operation hid tablename who tstamp val_new val_old
x SELECT x table1 name YYYY-MM-DD NULL previousValues
Any and all help is appreciated.
Here is an example:
CREATE TABLE public.test (id integer PRIMARY KEY, value integer);
INSERT INTO test VALUES (1,42),(2,13);
CREATE TABLE test_log(id serial primary key, dbuser varchar,datetime timestamp);
-- get_test() inserts username / timestamp into log, then returns all rows
-- of test
CREATE OR REPLACE FUNCTION get_test() RETURNS SETOF test AS '
INSERT INTO test_log (dbuser,datetime)VALUES(current_user,now());
SELECT * FROM test;'
language 'sql';
-- now a view returns the full row set of test by instead calling our function
CREATE VIEW test_v AS SELECT * FROM get_test();
SELECT * FROM test_v;
id | value
----+-------
1 | 42
2 | 13
(2 rows)
SELECT * FROM test_log;
id | dbuser | datetime
----+----------+----------------------------
1 | postgres | 2020-11-30 12:42:00.188341
(1 row)
If your table has many rows and/or the selects are complex, you don't want to use this view for performance reasons.
I am currently looking into an efficient way to allocate data into a partitioned table. Is it possible to use postgres/psql to COPY data into a specific table partition (instead of using INSERT)?
According to the documentation on COPY here:
COPY FROM can be used with plain, foreign, or partitioned tables or with views that have INSTEAD OF INSERT triggers.
And according to the documentation on partitioning here:
Be aware that COPY ignores rules. If you want to use COPY to insert data, you'll need to copy into the correct partition table rather than into the master. COPY does fire triggers, so you can use it normally if you use the trigger approach.
From my understanding of the aforementioned resources, it seems possible to copy into partition; however, I can't find any examples or support for that online.
In other words, can I write something like:
COPY some_table_partition_one FROM '/some_dir/some_file'
COPY to a partitioned table was introduced in v11:
Allow INSERT, UPDATE, and COPY on partitioned tables to properly route rows to foreign partitions (Etsuro Fujita, Amit Langote)
But COPY directly to a partition is possible in all releases since v10, where declarative partitioning was introduced.
It seems like we forgot to remove the second quotation from the documentation.
It is possible at least with PG 12.2:
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE
CREATE TABLE measurement_y2020m03 PARTITION OF measurement
FOR VALUES FROM ('2020-03-01') TO ('2020-03-31');
CREATE TABLE
CREATE TABLE measurement_y2020m04 PARTITION OF measurement
FOR VALUES FROM ('2020-04-01') TO ('2020-04-30');
CREATE TABLE
insert into measurement values (1, current_date, 10,100);
INSERT 0 1
select * from measurement;
city_id | logdate | peaktemp | unitsales
---------+------------+----------+-----------
1 | 2020-03-27 | 10 | 100
(1 row)
cat /tmp/m.dat
4,2020-04-01,40,400
copy measurement_y2020m04 from '/tmp/m.dat' delimiter ',';
COPY 1
select * from measurement;
city_id | logdate | peaktemp | unitsales
---------+------------+----------+-----------
1 | 2020-03-27 | 10 | 100
4 | 2020-04-01 | 40 | 400
(2 rows)
I'm trying to set up a database structure for storing user progress in an app. Right now I'm using PostgreSQL to store user information and other data related to the app. I'm not sure how to how to structure the database for when the user makes progress, i.e. unlocks a certain level. I was thinking of making a relational database that has all of the users as a row and then as columns has all of the things they can possible unlock and then store true or false values in it, but this seems rather inefficient. Is there a better way to store this information?
I would rather add achievements as rows, not columns, eg:
t=# create table achievements (i smallserial primary key, ach text);
CREATE TABLE
t=# create table user_achievements (i serial, user_id int, ach_id smallint references achievements(i), level int, achieved timestamptz default now());
CREATE TABLE
t=# insert into achievements (ach) values('blah');
INSERT 0 1
t=# insert into user_achievements(user_id,ach_id,level) values(1,1,1);
INSERT 0 1
t=# insert into user_achievements(user_id,ach_id,level) values(1,1,2);
INSERT 0 1
t=# select * from user_achievements;
i | user_id | ach_id | level | achieved
---+---------+--------+-------+-------------------------------
1 | 1 | 1 | 1 | 2018-01-29 08:25:32.018466+00
2 | 1 | 1 | 2 | 2018-01-29 08:25:34.089929+00
I have a table that's designed as follows.
master_table
id -> serial
timestamp -> timestamp without time zone
fk_slave_id -> integer
fk_id -> id of the table
fk_table1_id -> foreign key relationship with table1
...
fk_table30_id -> foreign key relationship with table30
Every time a new table is added, this table gets altered to include a new column to link. I've been told it was designed as such to allow for deletes in the tables to cascade in the master.
The issue I'm having is finding a proper solution to linking the master table to the other tables. I can do it programmatically using loops and such, but that would be incredibly inefficient.
Here's the query being used to grab the id of the table the id of the row within that table.
SELECT fk_slave_id, concat(fk_table1_id,...,fk_table30_id) AS id
FROM master_table
ORDER BY id DESC
LIMIT 100;
The results are.
fk_slave_id | id
-------------+-----
30 | 678
25 | 677
29 | 676
1 | 675
15 | 674
9 | 673
The next step is using this data to formulate the table required to get the required data. For example, data is required from table30 with id 678.
This is where I'm stuck. If I use WITH it doesn't seem to accept the output in the FROM clause.
WITH items AS (
SELECT fk_slave_id, concat(fk_table1_id,...,fk_table30_id) AS id
FROM master_table
ORDER BY id DESC
LIMIT 100
)
SELECT data
FROM concat('table', items.fk_slave_id)
WHERE id = items.id;
This produces the following error.
ERROR: missing FROM-clause entry for table "items"
LINE x: FROM string_agg('table', items.fk_slave_id)
plpgsql is an option to use EXECUTE with format, but then I'd have to loop through each result and process it with EXECUTE.
Is there any way to achieve what I'm after using SQL or is it a matter of needing to do it programmatically?
Apologies on the bad title. I can't think of another way to word this question.
edit 1: Replaced rows with items
edit 2: Based on the responses it doesn't seem like this can be accomplished cleanly. I'll be resorting to creating an additional column and using triggers instead.
I don't think you can reference a dynamically named table like that in your FROM clause:
FROM concat('table', rows.fk_slave_id)
Have you tried building/executing that SQL from a stored procedure/function. You can create the SQL you want to execute as a string and then just EXECUTE it.
Take a look at this one:
PostgreSQL - Writing dynamic sql in stored procedure that returns a result set
I just discovered JSONB for PostgreSQL and was wondering what could go wrong if I used it for all my tables' columns ?
That is to say all my tables would have primary and foreign keys as columns and a field column of type JSONB for any other data.
Besides taking up extra space because of JSONB's overhead, and losing typing on "columns", what would I miss ?
It turns out you're on to something here.
The major points of using a relational database.
Well defined relationships.
A well defined and detailed schema.
High performance for large data sets.
You get to keep the relationships. But you lose the schema and a lot of the performance. The schema is more than just data validation. It means you can't use triggers or constraints on individual fields.
As for performance... you'll notice that most tests of JSONB performance are against other similar data types. They're never against normal SQL tables. That's because, while JSONB is astonishingly efficient, its not nearly as efficient as regular SQL. So let's test it, it turns out you're on to something here.
Using the dataset from this JSONB performance presentation I created a proper SQL schema...
create table customers (
id text primary key
);
create table products (
id text primary key,
title text,
sales_rank integer,
"group" text,
category text,
subcategory text,
similar_ids text[]
);
create table reviews (
customer_id text references customers(id),
product_id text references products(id),
"date" timestamp,
rating integer,
votes integer,
helpful_votes integer
);
And one that uses SQL relationships but JSONB for data...
create table customers (
id text primary key
);
create table products_jb (
id text primary key,
fields jsonb
);
create table reviews_jb (
customer_id text references customers(id),
product_id text references products_jb(id),
fields jsonb
);
And a single JSONB table.
create table reviews_jsonb (
review jsonb
);
Then I imported the same data into both sets of tables using a little script. 589859 reviews, 93319 products, 98761 customers.
Let's try the same query as in the JSONB performance article, getting the average review for a product category. First, without indexes.
Traditional SQL: 138 ms
test=> select round(avg(r.rating), 2)
from reviews r
join products p on p.id = r.product_id
where p.category = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 138.631 ms
Full JSONB: 380 ms
test=> select round(avg((review#>>'{review,rating}')::numeric),2)
test-> from reviews_jsonb
test-> where review #>>'{product,category}' = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 380.697 ms
Hybrid JSONB: 190 ms
test=> select round(avg((r.fields#>>'{rating}')::numeric),2)
from reviews_jb r
join products_jb p on p.id = r.product_id
where p.fields#>>'{category}' = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 192.333 ms
That honestly went better than it thought. The hybrid approach is twice as fast as full JSONB, but 50% slower than normal SQL. Now how about with indexes?
Traditional SQL: 130 ms (+500 ms for the index)
test=> create index products_category on products(category);
CREATE INDEX
Time: 491.969 ms
test=> select round(avg(r.rating), 2)
from reviews r
join products p on p.id = r.product_id
where p.category = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 128.212 ms
Full JSONB: 360 ms (+ 25000 ms for the index)
test=> create index on reviews_jsonb using gin(review);
CREATE INDEX
Time: 25253.348 ms
test=> select round(avg((review#>>'{review,rating}')::numeric),2)
from reviews_jsonb
where review #>>'{product,category}' = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 363.222 ms
Hybrid JSONB: 185 ms (+6900 ms for the indexes)
test=> create index on products_jb using gin(fields);
CREATE INDEX
Time: 3654.894 ms
test=> create index on reviews_jb using gin(fields);
CREATE INDEX
Time: 3237.534 ms
test=> select round(avg((r.fields#>>'{rating}')::numeric),2)
from reviews_jb r
join products_jb p on p.id = r.product_id
where p.fields#>>'{category}' = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 183.679 ms
It turns out this is a query indexing isn't going to be much help for.
That's what I see playing with the data a bit, Hybrid JSONB is always slower than Full SQL, but faster than Full JSONB. It seems like a good compromise. You get to use traditional foreign keys and joins, but have the flexibility of adding whatever fields you like.
I recommend taking the hybrid approach a step further: use SQL columns for the fields you know are going to be there, and have a JSONB column to pick up any additional fields for flexibility.
I encourage you to play around with the test data here and see what the performance is like.