Postgres database: how to model multiple attributes that can have also multiple value, and have relations to other two entities - postgresql

I have three entities, Items, Categories, and Attributes.
An Item can be in one or multiple Categories, so there is N:M relation.
Item ItemCategories Categories
id name item_id category_id id name
1 alfa 1 1 1 chipset
1 2 2 interface
An Item can have multiple Attributes depending on the 'Categories' they are in.
For example, the items in Category 'chipset' can have as attributes: 'interface', 'memory' 'tech'.
These attributes have a set of predefined values that don't change often, but they can change.
For example: 'memory' can only be ddr2, ddr3, ddr4.
Attributes CategoryAttributes
id name values category_id attribute_id
1 memory {ddr2, ddr3, ddr4} 1 1
An Item that is in the 'chipset' Category has access to the Attribute and can only have Null or the predefined value of the attribute.
I thought to use Enum or Json for Attribute values, but I have two other conditions:
ItemAttributes
item_id attribute_id value
1 1 {ddr2, ddr4}
1) If an Attribute appears in 2 Categories, and an Ithe is in both categories, only once an attribute can be shown.
2) I need to use the value with rank, so if two corresponding attribute values appear for an item, the rank should be greater if it is only one, or the value doesn't exist.
3)Creating separate tables for Attributes is not an option, because the number is not fixed, and can be big.
So, I don't know exactly the best options in the database design are to constrain the values and use for order ranking.

The problem you are describing is a typical open schema or vertical database, which is a classic use case for some kind of EAV model.
EAV is a complex yet powerful paradigm that allows a potentially open schema while respecting the database normal forms and allows to have what you need: having a variable number of attributes depending on specific instances of the same entity.
That is what happens typically in e-commerce using relational database since different products have different attributes (i.e a lipstick has color, but maybe for a hard drive you dont care about color but about capacity) and it doesn't make sense to have one attribute table, because the number is not fixed and can be big, and for most rows, there would be a lot of NULL values (that is the mathematical notion of a sparse matrix, that looks very ugly in a DB table)
You can take a look at Magento DB Model, a true reference in pure EAV at scale, or Wikipedia, but probably you can do that later, and for now, you just need the basics:
The basic idea is to store attributes, and their corresponding values as rows, instead of columns, in a single table.
In the simpler implementation the table has at least three columns: entity (usually a foreign key to an entity, or entity type/category), attribute (this can be a string, o a foreign key in more complex systems), and value.
In my previous example, oversimplifying, we could have a table like this, that lists attribute names and its values for
Item table Attributes table
+------+--------------+ +-------------+-----------+-------------+
| id | name | | item_id | attribute | value |
+------+--------------+ +-------------+-----------+-------------+
| 1 | "hard drive" | | 2 | "color" | "red" |
+------+--------------+ +-------------+-----------+-------------+
| 2 | "lipstick" | | 2 | "price" | 10 |
+------+--------------+ +-------------+-----------+-------------+
| 1 | "capacity"| "1TB" |
+-------------+-----------+-------------+
| 1 | "price" | 200 |
+-------------+-----------+-------------+
So for every item, you can have a list of attributes.
Since your model is more complex, has a few more constraints, so we need to adapt this model.
Since you want to limit the possible values, you will need a table for values
Since you will have a values table, the values hast to refer to an attribute, so you need the attributes to have an id, so you will have an attribute table
to make explicit and strict what categories have what attribute, you need a category-attribute table
With this, you end up with something like
Categories table
List of categories ids and names
+------+--------------+
| id | name |
+------+--------------+
| 1 | "chipset" |
+------+--------------+
| 2 | "interface" |
+------+--------------+
Attributes table
List of attribute ids and their name
+------+--------------+
| id | name |
+------+--------------+
| 1 | "interface" |
+------+--------------+
| 2 | "memory" |
+------+--------------+
| 3 | "tech" |
+------+--------------+
| 4 | "price" |
+------+--------------+
Category-Attribute table
What category has what attributes. Note that one attribute (i.e 4) can belong to 2 categories
+--------------+--------------+
| attribute_id | category_id |
+--------------+--------------+
| 1 | 1 |
+--------------+--------------+
| 2 | 1 |
+--------------+--------------+
| 3 | 1 |
+--------------+--------------+
| 4 | 1 |
+--------------+--------------+
| 4 | 2 |
+--------------+--------------+
Value table
List of possible values for every attribute
+----------+--------------+--------+
| value_id | attribute_id | value |
+-------------+-----------+--------+
| 1 | 2 | "ddr2" |
+----------+--------------+--------+
| 2 | 2 | "ddr3" |
+----------+--------------+--------+
| 3 | 2 | "ddr4" |
+----------+--------------+--------+
| 4 | 3 |"tech_1"|
+----------+--------------+--------+
| 5 | 3 |"tech_2"|
+----------+--------------+--------+
| 6 | ... | ... |
+----------+--------------+--------+
| 7 | ... | ... |
And finally, what you can imagine, the
Item-Attribute table will list one attribute value per row
+----------+--------------+-------+
| item_id | attribute_id | value |
+----------+-----------+----------+
| 1 | 2 | 1 |
+----------+--------------+-------+
| 1 | 2 | 3 |
+----------+--------------+-------+
Meaning that item 1, for attribute 2 (`memory`), has values 1 and 3 (`ddr2` and `ddr3`)
This will cover all your conditions:
Number of attributes is unlimited, as big as needed and not fixed
You can define clearly what category has what attributes
Two categories can have the same attribute
If 1 item belongs to two categories that have the same attribute, you can show only one (ie SELECT * from Category-Attribute where category_id in (SELECT category_id from ItemCategories where item_id = ...) will give you the list of eligible attributes, only one of each even if 2 categories had the same
You can do a rank, I think I dont have enough info for this query, but being this a fully normalized model, definitely, you can do a rank. You have here pretty much the full model, so surely you can figure out the query.
This is very similar to the model that Magento uses. It is very powerful but of course, it can get hard to manage, but it is the best way if we want to keep the model strict and make sure that it will enforce the constraints and that will accept all the SQL functions.
For systems less strict, it is always an option to go for a NoSQL database with much more flexible schemas.

Related

Hibernate - SQL query: How to get all child descandants starting with specific node

I have the following sample data (items) with some kind of recursion. For the sake of simplicity I limited the sample to 2 level. Matter of fact - they could grow quite deep.
+----+--------------------------+----------+------------------+-------+
| ID | Item - Name | ParentID | MaterializedPath | Color |
+----+--------------------------+----------+------------------+-------+
| 1 | Parent 1 | null | 1 | green |
| 2 | Parent 2 | null | 2 | green |
| 4 | Parent 2 Child 1 | 2 | 2.4 | orange|
| 6 | Parent 2 Child 1 Child 1 | 4 | 2.4.6 | red |
| 7 | Parent 2 Child 1 Child 2 | 4 | 2.4.7 | orange|
| 3 | Parent 1 Child 1 | 1 | 1.3 | orange|
| 5 | Parent 1 Child 1 Child | 3 | 1.3.5 | red |
+----+--------------------------+----------+------------------+-------+
I need to get via SQL all children
which are not orange
for a given starting ID
with either starting ID=1. The result should be 1, 1.3.5. When start with ID=4 the should be 2.4.6.
I read little bit and found the CTE should be used. I tried the following simplified definition
WITH w1( id, parent_item_id) AS
( SELECT
i.id,
i.parent_item_id
FROM
item i
WHERE
id = 4
UNION ALL
SELECT
i.id,
i.parent_item_id
FROM
item, JOIN w1 ON i.parent_item_id = w1.id
);
However, this won't even be executed as SQL-statement. I have several question to this:
CTE could be used with Hibernate?
Is there a way have the result via SQL queries? (more or less as recursive pattern)
I'm somehow lost with the recursive pattern combined with selection of color for the end result.
Your query is invalid for the following reasons:
As documented in the manual a recursive CTE requires the RECURSIVE keyword
Your JOIN syntax is wrong. You need to remove the , and give the items table an alias.
If you need the color column, just add it to both SELECTs inside the CTE and filter the rows in the final SELECT.
If that is changed, the following works fine:
WITH recursive w1 (id, parent_item_id, color) AS
(
SELECT i.id,
i.parent_item_id,
i.color
FROM item i
WHERE id = 4
UNION ALL
SELECT i.id,
i.parent_item_id,
i.color
FROM item i --<< missing alias
JOIN w1 ON i.parent_item_id = w1.id
)
select *
from w1
where color <> 'orange'
Note that the column list for the CTE definition is optional, so you can just write with recursive w1 as ....

Database design with multiple units and multiple attributes

Supposing I have this database design which I have researched.
Table: Products
ProductId | Name | BaseUnitId
1 | Lab gown | 1
2 | Gloves | 1
FK: BaseUnitId references Units.UnitId
Table: Units
UnitId | Name
1 | Each / Pieces
2 | Dozen
3 | Box
Table: Unit Conversion
ProdID | BaseUnitID | Factor | ConvertToUnitID
1 | 1 | 12 | 2
2 | 1 | 100 | 3
FK: BaseUnitId references Units.UnitID
FK: ConvertToUnitId references Units.UnitID
Table: Product Attribute
AttribId | Prod_ID | Attribute | Value
1 | 1 | Color | Blue
2 | 1 | Size | Large
3 | 2 | Color | Violet
4 | 2 | Size | Small
5 | 2 | Size | Medium
6 | 2 | Size | Large
7 | 2 | Color | White
FK: Prod_ID references Product.ProductID
Table: Inventory
Prod_ID | Base Unit Qty | Expiry
1 | 12 | n/a
2 | 100 | 2020-01-01
2 | 100 | 2021-12-31
FK: Prod_ID references Product.ProductID
How can I breakdown the inventory per unit per attribute?
e.g How can I get the inventory of SMALL VIOLET GLOVES? LARGE WHITE GLOVES?
Any suggestions? My idea is to create another table which will link product unit, product attribute and quantity.
But I dont know how to link the size attribute and color attribute to a unit.
Lastly, is there something wrong with this design?
I think it is quite wrong to split off the attributes of a product into a different table. I understand the desire to normalize, but it should be done differently.
I'd handle a product and its attributes like this:
CREATE TABLE product (
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
name text NOT NULL,
baseunit_id bigint NOT NULL REFERENCES unit
);
CREATE TABLE inventory (
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
product_id bigint NOT NULL REFERENCES product,
color integer REFERENCES product_color,
size integer REFERENCES product_size,
other_attributes jsonb
);
That also makes sense if you think about in natural language terms: “How many dozens of large blue gloves do we have on store?”
Attributes that do not apply to a certain product can be left NULL.
I make a distinction between common and rare attributes. Common attributes have their own column. Rare attributes are bunched together in a jsonb column. I know that the latter is not normalized nor pretty, but varying attributes are not very suited for a relational model. A GIN index on the column will allow searches to be efficient.

How to get back aggregate values across 2 dimensions using Python Cubes?

Situation
Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5.
These are my datatables in pictorial form:
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to use cubes to be able to do a display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
Note the following:
Even though there were no records in sales for Saucer 12 under Store S3, I displayed 0 instead of null or none.
I want to be able to do sort by store, say descending order for, S3.
The cells indicate the SUM total of that particular product spent in that particular store.
I also want to have pagination.
What I tried
This is the configuration I used:
"cubes": [
{
"name": "sales",
"dimensions": ["product", "store"],
"joins": [
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]
This is the code I used:
result = browser.aggregate(drilldown=['Store','Product'],
order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])
I didn't get what I want.
I got it like this:
----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1 | 1 | 7.05 |
| 1 | 2 | 9 |
| 2 | 3 | 2.00 |
| and many more .... |
|---------------------------------------------|
which is the whole table with no pagination and if the products not sold in that store it won't show up as zero.
My question
How do I get what I want?
Do I need to create another data table that aggregates everything by store and product before I use cubes to run the query?
Update
I have read more. I realised that what I want is called dicing as I needed to go across 2 dimensions. See: https://en.wikipedia.org/wiki/OLAP_cube#Operations
Cross-posted at Cubes GitHub issues to get more attention.
This is a pure SQL solution using crosstab() from the additional tablefunc module to pivot the aggregated data. It typically performs better than any client-side alternative. If you are not familiar with crosstab(), read this first:
PostgreSQL Crosstab Query
And this about the "extra" column in the crosstab() output:
Pivot on Multiple Columns using Tablefunc
SELECT product_id, product
, COALESCE(s1, 0) AS s1 -- 1. ... displayed 0 instead of null
, COALESCE(s2, 0) AS s2
, COALESCE(s3, 0) AS s3
, COALESCE(s4, 0) AS s4
, COALESCE(s5, 0) AS s5
FROM crosstab(
'SELECT s.product_id, p.name, s.store_id, s.sum_amount
FROM product p
JOIN (
SELECT product_id, store_id
, sum(amount) AS sum_amount -- 3. SUM total of product spent in store
FROM sales
GROUP BY product_id, store_id
) s ON p.id = s.product_id
ORDER BY s.product_id, s.store_id;'
, 'VALUES (1),(2),(3),(4),(5)' -- desired store_id's
) AS ct (product_id int, product text -- "extra" column
, s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER BY s3 DESC; -- 2. ... descending order for S3
Produces your desired result exactly (plus product_id).
To include products that have never been sold replace [INNER] JOIN with LEFT [OUTER] JOIN.
SQL Fiddle with base query.
The tablefunc module is not installed on sqlfiddle.
Major points
Read the basic explanation in the reference answer for crosstab().
I am including with product_id because product.name is hardly unique. This might otherwise lead to sneaky errors conflating two different products.
You don't need the store table in the query if referential integrity is guaranteed.
ORDER BY s3 DESC works, because s3 references the output column where NULL values have been replaced with COALESCE. Else we would need DESC NULLS LAST to sort NULL values last:
PostgreSQL sort by datetime asc, null first?
For building crosstab() queries dynamically consider:
Dynamic alternative to pivot with CASE and GROUP BY
I also want to have pagination.
That last item is fuzzy. Simple pagination can be had with LIMIT and OFFSET:
Displaying data in grid view page by page
I would consider a MATERIALIZED VIEW to materialize results before pagination. If you have a stable page size I would add page numbers to the MV for easy and fast results.
To optimize performance for big result sets, consider:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
Optimize query with OFFSET on large table

Sane way to store different data types within same column in postgres?

I'm currently attempting to modify an existing API that interacts with a postgres database. Long story short, it's essentially stores descriptors/metadata to determine where an actual 'asset' (typically this is a file of some sort) is storing on the server's hard disk.
Currently, its possible to 'tag' these 'assets' with any number of undefined key-value pairs (i.e. uploadedBy, addedOn, assetType, etc.) These tags are stored in a separate table with a structure similar to the following:
+---------------+----------------+-------------+
|assetid (text) | tagid(integer) | value(text) |
|---------------+----------------+-------------|
|someStringValue| 1234 | someValue |
|---------------+----------------+-------------|
|aDiffStringKey | 1235 | a username |
|---------------+----------------+-------------|
|aDiffStrKey | 1236 | Nov 5, 1605 |
+---------------+----------------+-------------+
assetid and tagid are foreign keys from other tables. Think of the assetid representing a file and the tagid/value pair is a map of descriptors.
Right now, the API (which is in Java) creates all these key-value pairs as a Map object. This includes things like timestamps/dates. What we'd like to do is to somehow be able to store different types of data for the value in the key-value pair. Or at least, storing it differently within the database, so that if we needed to, we could run queries checking date-ranges and the like on these tags. However, if they're stored as text items in the db, then we'd have to a.) Know that this is actually a date/time/timestamp item, and b.) convert into something that we could actually run such a query on.
There is only 1 idea I could think of thus far, without complete changing changing the layout of the db too much.
It is to expand the assettag table (shown above) to have additional columns for various types (numeric, text, timestamp), allow them to be null, and then on insert, checking the corresponding 'key' to figure out what type of data it really is. However, I can see a lot of problems with that sort of implementation.
Can any PostgreSQL-Ninjas out there offer a suggestion on how to approach this problem? I'm only recently getting thrown back into the deep-end of database interactions, so I admit I'm a bit rusty.
You've basically got two choices:
Option 1: A sparse table
Have one column for each data type, but only use the column that matches that data type you want to store. Of course this leads to most columns being null - a waste of space, but the purists like it because of the strong typing. It's a bit clunky having to check each column for null to figure out which datatype applies. Also, too bad if you actually want to store a null - then you must chose a specific value that "means null" - more clunkiness.
Option 2: Two columns - one for content, one for type
Everything can be expressed as text, so have a text column for the value, and another column (int or text) for the type, so your app code can restore the correct value in the correct type object. Good things are you don't have lots of nulls, but importantly you can easily extend the types to something beyond SQL data types to application classes by storing their value as json and their type as the class name.
I have used option 2 several times in my career and it was always very successful.
Another option, depending on what your doing, could be to just have one value column but store some json around the value...
This could look something like:
{
"type": "datetime",
"value": "2019-05-31 13:51:36"
}
That could even go a step further, using a Json or XML column.
I'm not in any way PostgreSQL ninja, but I think that instead of two columns (one for name and one for type) you could look at hstore data type:
data type for storing sets of key/value pairs within a single
PostgreSQL value. This can be useful in various scenarios, such as
rows with many attributes that are rarely examined, or semi-structured
data. Keys and values are simply text strings.
Of course, you have to check how date/timestamps converting into and from this type and see if it good for you.
You can use 2 different technics:
if you have floating type for every tagid
Define table and ID for every tagid-assetid combination and actual data tables:
maintable:
+---------------+----------------+-----------------+---------------+
|assetid (text) | tagid(integer) | tablename(text) | table_id(int) |
|---------------+----------------+-----------------+---------------|
|someStringValue| 1234 | tablebool | 123 |
|---------------+----------------+-----------------+---------------|
|aDiffStringKey | 1235 | tablefloat | 123 |
|---------------+----------------+-----------------+---------------|
|aDiffStrKey | 1236 | tablestring | 123 |
+---------------+----------------+-----------------+---------------+
tablebool
+-------------+-------------+
| id(integer) | value(bool) |
|-------------+-------------|
| 123 | False |
+-------------+-------------+
tablefloat
+-------------+--------------+
| id(integer) | value(float) |
|-------------+--------------|
| 123 | 12.345 |
+-------------+--------------+
tablestring
+-------------+---------------+
| id(integer) | value(string) |
|-------------+---------------|
| 123 | 'text' |
+-------------+---------------+
In case if every tagid has fixed type
create tagid description table
tag descriptors
+---------------+----------------+-----------------+
|assetid (text) | tagid(integer) | tablename(text) |
|---------------+----------------+-----------------|
|someStringValue| 1234 | tablebool |
|---------------+----------------+-----------------|
|aDiffStringKey | 1235 | tablefloat |
|---------------+----------------+-----------------|
|aDiffStrKey | 1236 | tablestring |
+---------------+----------------+-----------------+
and correspodnding data tables
tablebool
+-------------+----------------+-------------+
| id(integer) | tagid(integer) | value(bool) |
|-------------+----------------+-------------|
| 123 | 1234 | False |
+-------------+----------------+-------------+
tablefloat
+-------------+----------------+--------------+
| id(integer) | tagid(integer) | value(float) |
|-------------+----------------+--------------|
| 123 | 1235 | 12.345 |
+-------------+----------------+--------------+
tablestring
+-------------+----------------+---------------+
| id(integer) | tagid(integer) | value(string) |
|-------------+----------------+---------------|
| 123 | 1236 | 'text' |
+-------------+----------------+---------------+
All this is just for general idea. You should adapt it for your needs.

How to show all recursive results with hierarchyid sql

I have a table categories:
ID | NAME | PARENT ID | POSITION | LEVEL | ORDER
----------------------------------------------------------------------------
1 | root | -1 | 0x | 0 | 255
2 | cars | 1 | 0x58 | 1 | 10
5 | trucks | 1 | 0x68 | 1 | 10
13 | city cars | 2 | 0x5AC0 | 2 | 255
14 | offroad cars | 2 | 0x5B40 | 2 | 255
where:
ID int ident
NAME nvarchar(255)
PARENT ID int
POSITION hierarchyid
LEVEL hierarchyid GetLevel()
ORDER tinyint
This table model specifies model name and category where it belongs. Example:
ID | NAME | CATEGORY
-----------------------------
1 | Civic | 13
2 | Pajero | 14
3 | 815 | 5
4 | Avensis | 13
where:
ID int ident
NAME nvarchar(255)
CATEGORY int link to ID category table
What I am trying to do is to be able to show:
all models - would show all models from root recursively,
models within category cars (cars included)
models from city cars (or its children if any)
How do I use hierarchyid for such filtering and how to join the table for results with models? Is that a quick way how to show all model results from certain level?
I believe this would have given you what you were looking for:
declare #id hierarchyid
select #id = Position from Categories where Name = 'root' -- or 'cars', 'city cars', etc.
select m.*
from Models m
join Categories c on m.Category = c.ID
where c.Position.IsDescendantOf(#id) = 1
For more information on the IsDescendantOf method and other hierarchyid methods, check the method reference.
You going to want to use a CTE: Common Table Expression
https://web.archive.org/web/20210927200924/http://www.4guysfromrolla.com/webtech/071906-1.shtml
Introduced in SQL 2005 the allow for an easy way to do hierarchic or recursive relationships.
This is pretty close to your example:
http://www.sqlservercurry.com/2009/06/simple-family-tree-query-using.html