How do I represent an array of tuples in postgresql?

How do I represent an array of tuples in postgresql? - postgresql

Here's the easiest way I can think of to explain this. Imagine a user wants to bookmark a bunch of webpages. There's a url table with a UrlID and the actual url. I'd like the user to have a list of UrlIDs which are unique (but I don't need the constraint) and a 32bit int value such as an epoch date. The only two things I care about is 1) being to check if UrlID is in this list or not and 2) get the entire list and sort it by date (or second value)
If it helps I'm expecting no more than 8K bookmarks but most likely it will be <128

If you really want to avoid the extra table to express the relationship, you can do something like that:
CREATE TABLE "user" (
id integer primary key,
name text not null,
bookmarks integer[] not null
);
CREATE TABLE url (
id integer primary key,
time timestamp with time zone not null,
val text not null
);
Then finding all bookmarks for a particular user (say with id 66) would involve doing something like that:
SELECT url,time
FROM (SELECT bookmarks FROM "user" WHERE id=66) u
JOIN url ON url.id=ANY(bookmarks)
ORDER BY TIME;
Now here's why I don't like this schema. First, adding a new bookmark would require to rewrite the bookmarks array and hence the entire user row (so adding n bookmarks, one after the other, would require Θ(n^2) time). Secondly, you cannot use foreign keys on the elements of the array. Thridly, many queries will become more complicated to write, e.g. in order to retrieve all bookmarks for all users, you have to do something like that:
SELECT "user".id,"user".name,url.val,url.time
FROM "user",
LATERAL unnest((SELECT bookmarks)) b
LEFT JOIN url ON b = url.id;
Edit: So here's the schema I would use and which I think fits best with the relational paradigm
CREATE TABLE "user" (
id integer primary key,
name text not null
);
CREATE TABLE url (
id integer primary key,
val text not null
);
CREATE TABLE bookmark (
user_id integer not null REFERENCES "user",
url_id integer REFERENCES url,
time timestamp with time zone not null,
UNIQUE (user_id,url_id)
);

Related

How to create table (SQL) for storing products information

I have been doing quite a few tutorials such as SqlBOLT where I have been trying to learn more and more regarding SQL.
I have asked some of my friends where they recommended me to check "JOIN" for my situation even though I dont think it does fit for my purpose.
The idea of mine is to store products information which are title, image and url and I have came to a conclusion to use:
CREATE TABLE products (
id INTEGER PRIMARY KEY,
title TEXT,
image TEXT,
url TEXT UNIQUE,
added_date DATE
);
The reason of URL is UNIQUE is due to we cannot have same URL in the database as it will be duplicated which we do not want but then still I dont really understand why and how I should use JOIN in my situation.
So my question is, what would be the best way for me to store the products information? Which way would be the most benefit as well as best performance wise? If you are planning to use JOIN, I would gladly get more information why in that case. (There could be a situation where I could have over 4000 rows inserted overtime.)
I hope you all who are reading this will have a wonderful day! :)

The solution using stores.
CREATE TABLE stores (
id SERIAL PRIMARY KEY,
store_name TEXT
-- add more fields if needed
);
CREATE TABLE products (
id SERIAL,
store_id INTEGER NOT NULL,
title TEXT,
image TEXT,
url TEXT UNIQUE,
added_date timestamp without time zone NOT NULL DEFAULT NOW(),
PRIMARY KEY(id, store_id)
);
ALTER TABLE products
ADD CONSTRAINT "FK_products_stores" FOREIGN KEY ("store_id")
REFERENCES stores (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT;

Create/alter table for each new user/project

I am building a platform with two kinds of users: Users_A create projects with unique virtual coins associated, and Users_B can buy and exchange this coins.
The problem:
Approach 1: if I use one unique table as a virtual wallet, the User_B ID will be the row, and each column will be each coin. In this way, I have to ALter the table each time a new project is created.
Approach 2: I create an electronic wallet (table) for every single User_B.
Which one of the two is worse/better in terms of performance?
Is there any other possible approach?

It's a bit unclear to me what exactly you are trying to model. But any model that requires ALTERing a table because you add new content to the database is flawed.
That sounds like a basic many-to-many relationship to me:
You definitely need a table for the users:
create table users
(
user_id integer primary key,
... other columns ...
);
and one for the different coins:
create table coin
(
coin_id integer primary key,
... other columns ...
);
You need a table for the projects. You said "unique virtual coins associated", so I assume one project deals with exactly one type of coins:
create table project
(
project_id integer primary key,
owner_user_id integer not null references users,
coin_id integer not null references coin
... other columns
);
I am not sure what exactly you mean with "buy and exchange" coins, but you probably need something like a transfer table:
create table coin_transfer
(
from_user_id integer not null references users,
to_user_id integer not null references users,
project_id integer not null references project,
transfer_type text not null check (transfer_type in ('buy', 'exchange'))
amount numeric not null
);
You also mention a "wallet" that belongs to a user. You would never create one table for each wallet, instead a table that contains the information which user owns the wallet. Assuming each user would have one wallet for each coin type you'd need something like this:
create table wallet
(
wallet_id integer primary key,
owner_user_id integer not null references users,
coin_id integer not null references coin,
... other columns ...
);
The above is only a very rough sketch and because there is a lot of information missing from your question.

Fine on SQLite, broken in Postgresql: column must appear in the GROUP BY clause or be used in an aggregate function

I have a query which works fine on SQLite, but when I run it on the same data in Postgresql I get:
column "role.id" must appear in the GROUP BY clause or be used in an aggregate function
I have three tables, for people, exhibitions, and a table that links the two: "One person in one exhibition performing a particular role" (such as "Artist" or "Curator"):
CREATE TABLE "person" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"name" varchar(255));
CREATE TABLE "exhibition" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"name" varchar(255));
CREATE TABLE "role" (`id` integer NOT NULL PRIMARY KEY AUTOINCREMENT,
`name` varchar(30) NOT NULL,
`exhibition_id` integer NOT NULL,
`person_id` integer NOT NULL,
FOREIGN KEY(`exhibition_id`) REFERENCES `exhibition`(`id`),
FOREIGN KEY(`person_id`) REFERENCES `person`(`id`));
I want to display the people involved in an exhibition ordered by how many things they've done. So, I get the IDs of the people in an exhibition (1,2,3,4) and then do this:
SELECT
*,
COUNT(person.id) AS role_count
FROM person
INNER JOIN role
ON person.id = role.person_id
WHERE person.id IN ( 1, 2, 3, 4 )
GROUP BY person.id
ORDER BY role_count DESC
That orders the people by role_count, which is the number of roles they've had across all exhibitions
It works fine on SQLite, but not in Postgresql. I've tried putting role.id into the GROUP BY (instead of, and as well as, person.id) but that changes the results.

You know when you struggle for ages, post an SO question, and then immediately stumble on the answer?
From this answer I realised that I couldn't select role.id (which the SELECT * is implicitly doing) as it wasn't in the GROUP BY.
I couldn't add it to the GROUP BY (because that changes the results) so the solution was to not select it.
So I changed the SELECT part to:
SELECT
person.*,
COUNT(person.id) AS role_count
FROM person
...
Now role.id is not being selected. And that works.
If I needed any other fields from the role table, like name, I could add those explicitly too:
SELECT
person.*,
role.name,
COUNT(person.id) AS role_count
FROM person
...

Just like the error says, Standard SQL doesn't let you SELECT anything other than one of the GROUP BY columns or a call to an aggregate function. (For a logical reason: How would the RDBMS know which role.id to select when there are multiple rows to select from within a group?) PostgreSQL actually enforces this rule; SQLite ignores it and just returns data from an arbitrary row in the group.
As you discovered, omitting role.id from the SELECT fixes your error. But if you do want SQLite's behavior of selecting the ID from an arbitrary row, you can simply wrap it in an aggregate function, e.g., SELECT MAX(role.id) instead of just SELECT role.id).

Generate column value automatically from other columns values and be used as PRIMARY KEY

I have a table with a column named "source" and "id". This table is populated from open data DB.
"id" can't be UNIQUE, since my data came from other db with their own id system. There is a real risk to have same id but really different data.
I want to create another column which combine source and id into a single value.
"openDataA" + 123456789 -> "openDataA123456789"
"openDataB" + 123456789 -> "openDataB123456789"
I have seen example that use || and function to concatenate value. This is good, but I want to make this third column my PRIMARY KEY, to avoid duplicate, and create a really unique id that I can query without much computation and that I can use as a foreign key constraint for other table.
I think Composite Types is what I'm looking for, but instead of setting the value manually each time, I want to grab them automatically by setting only "source" and "id"
I'm fairly new to postgresql, so any help is welcome.
Thank you.

You could just have a composite key in your table:
CREATE TABLE mytable (
source VARCHAR(10),
id VARCHAR(10),
PRIMARY KEY (source, id)
);
If you really want a joined column, you could create a view to display it:
CREATE VIEW myview AS
SELECT *, source || id AS primary_key
FROM mytable;

postgreSQL table design

I need to create a table (postgresql 9.1) and I am stuck. Could you possibly help?
The incoming data can assume either of the two formats:
client id(int), shop id(int), asof(date), quantity
client id(int), , asof(date), quantity
The given incoming CSV template is: {client id, shop id, shop type, shop genre, asof, quantity}
In the first case, the key is -- client id, shop id, asof
In the second case, the key is -- client id, shop type, shop genre, asof
I tried something like:
create table(
client_id int references...,
shop_id int references...,
shop_type int references...,
shop_genre varchar(30),
asof date,
quantity real,
primary key( client_id, shop_id, shop_type, shop_genre, asof )
);
But then I ran into a problem. When data is of format 1, the inserts fail because of nulls in pk.
The queries within a client can be either by shop id, or by a combination of shop type and genre. There are no use cases of partial or regex matches on genre.
What would be a suitable design? Must I split this into 2 tables and then take a union of search results? Or, is it customary to put 0's and blanks for missing values and move along?
If it matters, the table is expected to be 100-500 million rows once all historic data is loaded.
Thanks.

You could try partial unique indexes aka filtered unique index aka conditional unique indexes.
http://www.postgresql.org/docs/9.2/static/indexes-partial.html
Basically what it comes down to is the uniqueness is filtered based on a where clause,
For example(Of course test for correctness and impact on performance):
CREATE TABLE client(
pk_id SERIAL,
client_id int,
shop_id int,
shop_type int,
shop_genre varchar(30),
asof date,
quantity real,
PRIMARY KEY (pk_id)
);
CREATE UNIQUE INDEX uidx1_client
ON client
USING btree
(client_id, shop_id, asof, quantity)
WHERE client_id = 200;
CREATE UNIQUE INDEX uidx2_client
ON client
USING btree
(client_id, asof, quantity)
WHERE client_id = 500;

A simple solution would be to create a field for the primary key which would use one of two algorithms to generate its data depending on what is passed in.
If you wanted a fully normalised solution, you would probably need to split the shop information into two separate tables and have it referenced from this table using outer joins.
You may also be able to use table inheritance available in postgres.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse