I have a products table, and a separate table that I'd like to store related products in which consists of two fields: create table related_products(id1 int, id2 int) and indexes placed on each field. This would mean that I'd have to search both id1 and id2 for a product id, then pull out the other id field which seems quite messy. (Of course, one product could have many related products).
Is there a better table structure for storing the related products that I could use in postgresql?
That is not messy from a database perspective, but exactly the way it should be, as long as only pairs of products can be related.
If you want to make sure that a relationship can be entered only once, you could use a unique index:
CREATE UNIQUE INDEX ON related_products(LEAST(id1, id2), GREATEST(id1, id2));
To search products related to product 42, you could query like this:
SELECT products.*
FROM products
JOIN (SELECT id2 AS id
FROM related_products
WHERE id1 = 42
UNION ALL
SELECT id1
FROM related_products
WHERE id2 = 42
) rel
USING (id);
Related
consider Amazon product category architecture (one product may have 7 parent categories another might have 2). I want to build the same thing using Postgres.
A: Is there any scaleable logical way to do this? or I must consider using a graph database.
ps: the project will not be AMAZON BIG. this is a monolith project, not a microservice.
B: my thoughts are that I should have a field named parent_categories in my category table which is an array of UUIDs of categories then a field named category_id for the products table that is related to the last category parent would work.
something like this:
CREATE TABLE categories (
id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid (),
name VARCHAR NOT NULL,
parent_categories UUID[]
);
CREATE TABLE products (
id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid (),
name VARCHAR NOT NULL,
category_id UUID[],
CONSTRAINT fk_category FOREIGN KEY(category_id) REFERENCES categories(id)
);
the problem is with joining the chained categories I'm expecting a result like the below when fetching categories (I'm using node.js) and I don't know how to join every element of that array.
categories: [{
id: "id",
name: "name",
parent_categories: [{
id: "id",
name: "name"
}]
}]
This question is about relational theory.
You have a pair of tables containing id and name, that's lovely.
Discard the array attributes, and then
CREATE TABLE product_category (
product_id UUID REFERENCES products(id),
category_id UUID REFERENCES categories(id),
PRIMARY KEY (product_id, category_id)
)
Now you are perfectly set up for 3-way JOINs.
Consider adopting the "table names are singular" convention,
rather than the current plural-form names.
Add a parent_id column to categories,
so the table supports self-joins.
Then use WITH RECURSIVE to navigate
the hierarchical tree of categories.
(Classic example in the Oracle documentation
shows how manager can be used for emp
self-joins to produce a deeply nested org chart.)
This May be a dumb question as I am a beginner in postgreSQL but what I'm trying to do is
I have a Table called Products and inside products there is 3 columns Name, Price, Expiry Date. Now I have a second table called orders with 4 columns. Product, purchasePrice, Amount, and CountryRecieved.
All I want is to reference the Product column to the product table so it has all the Information of the product table?
Is this do able?
The key concepts you need to read up on are:
"normalisation": the process of breaking down data into multiple related entities
"foreign keys": pointers from one database table to another
"joins": the query construct used to follow that pointer and get the data back together
In your case:
You have correctly determined that the information from Products should not just be copied manually into each row of the Orders table. This is one of the most basic aspects of normalisation: each piece of data is in one place, so updates cannot make it inconsistent.
You have deduced that the Orders table needs some kind of Product column; this is your foreign key. The most common way to represent this is to give the Products table an ID column that uniquely identifies each row, and then have a ProductID column in the Orders table. You could also use the product's name as the key, but this means you can never rename a product, as other entities in the database might reference it; integer keys will generally be more efficient in storage and speed, as well.
To use that foreign key relationship, you use a JOIN in your SQL queries. For example, to get the name and quantity of products ordered, you could write:
SELECT
P.Name,
O.Amount
FROM
Products as P
INNER JOIN
Orders as O
-- This "ON" clause tells the database how to look up the foreign key
On O.ProductId = P.ProductId
ORDER BY
P.Name
Here I've used an "inner join"; there are also "left outer join" and "right outer join", which can be used when only some rows on one side will meet the condition. I recommend you find a tutorial that explains them better than I can in a single paragraph.
Assuming the name column is key in Products table and product column in Orders table refers to it, you can join the two table on related column(s) and get all the information:
select
o.*, p.*
from orders o
join products p on o.product = p.name;
I mainly focus on the query operation, not union or intersection.
Here is an example.
Let say we have a multi-level category:
CATEGORY-TOP-LEVEL:
CATEGORY1:
CATEGORY1.1:
item1
CATEGORY2:
CATEGORY2.1:
item2
Here, item[N] is the data. Category is a tree structure to represent which category the item belongs to.
Now, suppose I'd like to query all data in category 1, the database should give me item1.
Suppose I'd like to query all data in category-top-level, the database should give me item1 and item2.
It's like set theory. Because item1 belongs to CATEGORY1.1, and CATEGORY1.1 belongs to CATEGORY1. Thus item1 belongs to CATEGORY1.
One solution is use Materialized Paths: We put an field in item, named path, the value is like: ",CATEGORY-TOP-LEVEL,CATEGORY1,CATEGORY1.2". But the problem is it will cause a lot of writing operations when I change a category's name or the hierarchy of the category.
Can MongoDB support that? if not, is there a database can support that?
P.S. Let's take query performance into consideration.
Every modern relational database can support that.
There are different ways of modeling this in a relational database, the most common one is called the "adjacency model":
create table category
(
id integer primary key,
name varchar(100) not null,
parent_category_id integer references category
);
If an item can only ever belong to a single category, the item table would look like this:
create table item
(
id integer primary key,
name varchar(100) not null,
category_id integer not null rerences category
);
If an item can belong to more then one category, you need a many-to-many relationship (also very common in the relational world)
To get all categories below a certain category you can use a recursive query:
with recursive cat_tree as (
select id, name, parent_category_id
from category
where id = 42 -- this is the category where you want to start
union all
select c.id, c.name, c.parent_category_id
from category c
join cat_tree p on p.id = c.parent_category_id
)
select *
from cat_tree;
To get the items together with the categories, just join the above to the item table.
The above query is standard ANSI SQL.
Other popular models are the nested set model, the materialized path (you mentioned that) and the closure table.
This gets asked a lot. See the tags recursive-query and hierarchical-data for many more examples.
I am trying to create a normalized set of tables for my books and then
to select them ordering by either book title or authors.
I want to be able to have 'n' books per author, and 'n' authors per book.
The problem I want to solve is how to display my books and authors
ordered by tile or by lastname,firstname,middlename?
I started with a table like this with some 1441 entries.
create table books(
bookid serial,
title text,
firstname text,
lastname text);
I then created an authors table
create table authors(
authorid serial,
firstname text,
lastname text);
and populated it.
I then created a cross reference table
create table bookAuthor
(
bookId INTEGER NOT NULL REFERENCES books(bookId),
authorId INTEGER NOT NULL REFERENCES authors(authorId)
);
and
create unique index bookAuthor_unique_index on bookAuthor(bookId, authorId);
I then populated the bookauthor table with 1441 entries.
I am pretty sure the three tables are populated correctly. I managed to
do several inserts into the authors table and then insert the correct cross relationshipes into the bookauthor table.
I am now stuck, I can't figure out how to display my books and authors
ordered by title or by authors names.
Am I going down the wrong path to create this ability to create N titles per author and N authors per book.
I did multiple searches for foreign keys, and multiple tables with nothing that seemed to resolve my problem.
I'm in a postgresql 9.x environment.
A join will be helpful.
select * from bookAuthor
inner join books on bookAuthor.bookId = books.bookid
inner join authors on bookAuthor.authorId = authors.authorid
order by books.title;
Or you can order by authors.lastname, authors.firstname instead.
I need to create a table (postgresql 9.1) and I am stuck. Could you possibly help?
The incoming data can assume either of the two formats:
client id(int), shop id(int), asof(date), quantity
client id(int), , asof(date), quantity
The given incoming CSV template is: {client id, shop id, shop type, shop genre, asof, quantity}
In the first case, the key is -- client id, shop id, asof
In the second case, the key is -- client id, shop type, shop genre, asof
I tried something like:
create table(
client_id int references...,
shop_id int references...,
shop_type int references...,
shop_genre varchar(30),
asof date,
quantity real,
primary key( client_id, shop_id, shop_type, shop_genre, asof )
);
But then I ran into a problem. When data is of format 1, the inserts fail because of nulls in pk.
The queries within a client can be either by shop id, or by a combination of shop type and genre. There are no use cases of partial or regex matches on genre.
What would be a suitable design? Must I split this into 2 tables and then take a union of search results? Or, is it customary to put 0's and blanks for missing values and move along?
If it matters, the table is expected to be 100-500 million rows once all historic data is loaded.
Thanks.
You could try partial unique indexes aka filtered unique index aka conditional unique indexes.
http://www.postgresql.org/docs/9.2/static/indexes-partial.html
Basically what it comes down to is the uniqueness is filtered based on a where clause,
For example(Of course test for correctness and impact on performance):
CREATE TABLE client(
pk_id SERIAL,
client_id int,
shop_id int,
shop_type int,
shop_genre varchar(30),
asof date,
quantity real,
PRIMARY KEY (pk_id)
);
CREATE UNIQUE INDEX uidx1_client
ON client
USING btree
(client_id, shop_id, asof, quantity)
WHERE client_id = 200;
CREATE UNIQUE INDEX uidx2_client
ON client
USING btree
(client_id, asof, quantity)
WHERE client_id = 500;
A simple solution would be to create a field for the primary key which would use one of two algorithms to generate its data depending on what is passed in.
If you wanted a fully normalised solution, you would probably need to split the shop information into two separate tables and have it referenced from this table using outer joins.
You may also be able to use table inheritance available in postgres.