Is there a database can represent set-like structure? - mongodb

I mainly focus on the query operation, not union or intersection.
Here is an example.
Let say we have a multi-level category:
CATEGORY-TOP-LEVEL:
CATEGORY1:
CATEGORY1.1:
item1
CATEGORY2:
CATEGORY2.1:
item2
Here, item[N] is the data. Category is a tree structure to represent which category the item belongs to.
Now, suppose I'd like to query all data in category 1, the database should give me item1.
Suppose I'd like to query all data in category-top-level, the database should give me item1 and item2.
It's like set theory. Because item1 belongs to CATEGORY1.1, and CATEGORY1.1 belongs to CATEGORY1. Thus item1 belongs to CATEGORY1.
One solution is use Materialized Paths: We put an field in item, named path, the value is like: ",CATEGORY-TOP-LEVEL,CATEGORY1,CATEGORY1.2". But the problem is it will cause a lot of writing operations when I change a category's name or the hierarchy of the category.
Can MongoDB support that? if not, is there a database can support that?
P.S. Let's take query performance into consideration.

Every modern relational database can support that.
There are different ways of modeling this in a relational database, the most common one is called the "adjacency model":
create table category
(
id integer primary key,
name varchar(100) not null,
parent_category_id integer references category
);
If an item can only ever belong to a single category, the item table would look like this:
create table item
(
id integer primary key,
name varchar(100) not null,
category_id integer not null rerences category
);
If an item can belong to more then one category, you need a many-to-many relationship (also very common in the relational world)
To get all categories below a certain category you can use a recursive query:
with recursive cat_tree as (
select id, name, parent_category_id
from category
where id = 42 -- this is the category where you want to start
union all
select c.id, c.name, c.parent_category_id
from category c
join cat_tree p on p.id = c.parent_category_id
)
select *
from cat_tree;
To get the items together with the categories, just join the above to the item table.
The above query is standard ANSI SQL.
Other popular models are the nested set model, the materialized path (you mentioned that) and the closure table.
This gets asked a lot. See the tags recursive-query and hierarchical-data for many more examples.

Related

how to relate many rows to a row from the same table postgresql?

consider Amazon product category architecture (one product may have 7 parent categories another might have 2). I want to build the same thing using Postgres.
A: Is there any scaleable logical way to do this? or I must consider using a graph database.
ps: the project will not be AMAZON BIG. this is a monolith project, not a microservice.
B: my thoughts are that I should have a field named parent_categories in my category table which is an array of UUIDs of categories then a field named category_id for the products table that is related to the last category parent would work.
something like this:
CREATE TABLE categories (
id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid (),
name VARCHAR NOT NULL,
parent_categories UUID[]
);
CREATE TABLE products (
id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid (),
name VARCHAR NOT NULL,
category_id UUID[],
CONSTRAINT fk_category FOREIGN KEY(category_id) REFERENCES categories(id)
);
the problem is with joining the chained categories I'm expecting a result like the below when fetching categories (I'm using node.js) and I don't know how to join every element of that array.
categories: [{
id: "id",
name: "name",
parent_categories: [{
id: "id",
name: "name"
}]
}]
This question is about relational theory.
You have a pair of tables containing id and name, that's lovely.
Discard the array attributes, and then
CREATE TABLE product_category (
product_id UUID REFERENCES products(id),
category_id UUID REFERENCES categories(id),
PRIMARY KEY (product_id, category_id)
)
Now you are perfectly set up for 3-way JOINs.
Consider adopting the "table names are singular" convention,
rather than the current plural-form names.
Add a parent_id column to categories,
so the table supports self-joins.
Then use WITH RECURSIVE to navigate
the hierarchical tree of categories.
(Classic example in the Oracle documentation
shows how manager can be used for emp
self-joins to produce a deeply nested org chart.)

PostgreSQL hierarchy(?) structure

Please excuse my ignorance. I'm certain this is a FAQ, but I don't know the terminology well enough to know what to look for.
My company uses the following structure in terms of territory (example following):
Customer -> Market -> Area -> District -> Region
XYZ Co. -> Queens -> NYC -> Mid Atlantic -> Northeast
Each customer has only one market. Each market has only one district, and so forth. (I'm not sure if you'd call that one-to-many or many-to-one. I don't want to label it incorrectly).
This is how I have things set up right now:
create table region(
id int not null primary key,
name varchar(24)
);
create table district(
id int not null primary key,
name varchar(24),
region_id int references region(id) on update cascade
);
create table area(
id int not null primary key,
name varchar(24),
district_id int references district(id) on update cascade
);
create table market(
id int not null primary key,
name varchar(24),
area_id int references area(id) on update cascade
);
create table customer(
id int not null primary key,
name varchar(32),
sixweekavg numeric,
market_id int references market(id) on update cascade
);
Right now I have an opportunity to improve that setup as I'm more or less rewriting the site. I looked at this popular page:
What are the options for storing hierarchical data in a relational database?
And I'm sure that my best scenario lies there, but I don't know enough to figure out which one.
It's a reporting site, so there are way more reads than writes. Some of my pages show aggregated data at each level, customer through region (and top, too). So right now on a page that shows district-level data I would write something like:
select d.name, sum(sixweekavg) as avg from customer c
inner join market m on m.id = c.market_id
inner join area a on a.id = m.area_id
inner join district d on d.id = a.district_id
group by d.name order by d.name;
Pretty standard stuff, right? I'm sure a whole separate conversation could be had about materialized views, but for now I'd like to explore a better option for structuring the hierarchy (if that's even the correct term for this).
So given the following summary
PostgreSQL (it can be assumed this will not change)
Fixed hierarchy (my employer may at some point add or remove a tier, but every row in the customer table will always have the same number of "parents")
Significantly more reads than writes
Is there one method that may be better than the others for setting this up?
ltree
I did look at ltree, but I'm not quite sure how that would work. On pages where a user can select a district, for example, I query the district table for the names of each district. I had the idea to add an ltree column in my customers table which would hold the hierarchy, but still maintain the other tables. Is that a feasible and reasonable approach? I've searched for real-world examples of ltree but came up short - most that I found were designed for a random number of parent/child nodes, like a threaded comment section.
I appreciate your help and your patience!

Storing and accessing related objects in postgres

I have a products table, and a separate table that I'd like to store related products in which consists of two fields: create table related_products(id1 int, id2 int) and indexes placed on each field. This would mean that I'd have to search both id1 and id2 for a product id, then pull out the other id field which seems quite messy. (Of course, one product could have many related products).
Is there a better table structure for storing the related products that I could use in postgresql?
That is not messy from a database perspective, but exactly the way it should be, as long as only pairs of products can be related.
If you want to make sure that a relationship can be entered only once, you could use a unique index:
CREATE UNIQUE INDEX ON related_products(LEAST(id1, id2), GREATEST(id1, id2));
To search products related to product 42, you could query like this:
SELECT products.*
FROM products
JOIN (SELECT id2 AS id
FROM related_products
WHERE id1 = 42
UNION ALL
SELECT id1
FROM related_products
WHERE id2 = 42
) rel
USING (id);

How to Carry over a Table into a Column PostgreSQL

This May be a dumb question as I am a beginner in postgreSQL but what I'm trying to do is
I have a Table called Products and inside products there is 3 columns Name, Price, Expiry Date. Now I have a second table called orders with 4 columns. Product, purchasePrice, Amount, and CountryRecieved.
All I want is to reference the Product column to the product table so it has all the Information of the product table?
Is this do able?
The key concepts you need to read up on are:
"normalisation": the process of breaking down data into multiple related entities
"foreign keys": pointers from one database table to another
"joins": the query construct used to follow that pointer and get the data back together
In your case:
You have correctly determined that the information from Products should not just be copied manually into each row of the Orders table. This is one of the most basic aspects of normalisation: each piece of data is in one place, so updates cannot make it inconsistent.
You have deduced that the Orders table needs some kind of Product column; this is your foreign key. The most common way to represent this is to give the Products table an ID column that uniquely identifies each row, and then have a ProductID column in the Orders table. You could also use the product's name as the key, but this means you can never rename a product, as other entities in the database might reference it; integer keys will generally be more efficient in storage and speed, as well.
To use that foreign key relationship, you use a JOIN in your SQL queries. For example, to get the name and quantity of products ordered, you could write:
SELECT
P.Name,
O.Amount
FROM
Products as P
INNER JOIN
Orders as O
-- This "ON" clause tells the database how to look up the foreign key
On O.ProductId = P.ProductId
ORDER BY
P.Name
Here I've used an "inner join"; there are also "left outer join" and "right outer join", which can be used when only some rows on one side will meet the condition. I recommend you find a tutorial that explains them better than I can in a single paragraph.
Assuming the name column is key in Products table and product column in Orders table refers to it, you can join the two table on related column(s) and get all the information:
select
o.*, p.*
from orders o
join products p on o.product = p.name;

How to create a hierarchy of categories and subcategories with ENUMs in Postgres

I am trying to create a hierarchy of categories and subcategories. The ENUM type makes the most sense to me, but I am trying to figure out how to create the tables to model my data. I have a list of categories. I also have a list of subcategories that are going to be in a 1 to many relation with the category. Here are my ENUMs.
CREATE TYPE categories as ENUM('Casting', 'Cutting');
CREATE TYPE casting_subs as ENUM('Composite', 'Metal', 'Plastic', 'Rubber', 'Foam');
CREATE TYPE cutting_subs as ENUM('Die Cutting', 'Flame', 'Laser', 'Plasma', 'Saw', 'Waterjet');
How do I create the tables that models my data?
Using an enum here makes very little sense, actually. You're doing it correctly already if that is what you really want, i.e. without a table. What you probably want instead is a hierarchical table:
create table categories (
id serial primary key,
parent int references categories (id),
name varchar not null
);
Start filling it like:
insert into categories (name)
values ('Casting'), ('Cutting');
Then fill subcategories using their relevant id, like:
insert into categories (parent, name)
values (1, 'Composite'), (1, 'Metal');
To query the full hierarchy, you can use a CTE:
http://www.postgresql.org/docs/current/static/queries-with.html