postgreSQL table design - postgresql

I need to create a table (postgresql 9.1) and I am stuck. Could you possibly help?
The incoming data can assume either of the two formats:
client id(int), shop id(int), asof(date), quantity
client id(int), , asof(date), quantity
The given incoming CSV template is: {client id, shop id, shop type, shop genre, asof, quantity}
In the first case, the key is -- client id, shop id, asof
In the second case, the key is -- client id, shop type, shop genre, asof
I tried something like:
create table(
client_id int references...,
shop_id int references...,
shop_type int references...,
shop_genre varchar(30),
asof date,
quantity real,
primary key( client_id, shop_id, shop_type, shop_genre, asof )
);
But then I ran into a problem. When data is of format 1, the inserts fail because of nulls in pk.
The queries within a client can be either by shop id, or by a combination of shop type and genre. There are no use cases of partial or regex matches on genre.
What would be a suitable design? Must I split this into 2 tables and then take a union of search results? Or, is it customary to put 0's and blanks for missing values and move along?
If it matters, the table is expected to be 100-500 million rows once all historic data is loaded.
Thanks.

You could try partial unique indexes aka filtered unique index aka conditional unique indexes.
http://www.postgresql.org/docs/9.2/static/indexes-partial.html
Basically what it comes down to is the uniqueness is filtered based on a where clause,
For example(Of course test for correctness and impact on performance):
CREATE TABLE client(
pk_id SERIAL,
client_id int,
shop_id int,
shop_type int,
shop_genre varchar(30),
asof date,
quantity real,
PRIMARY KEY (pk_id)
);
CREATE UNIQUE INDEX uidx1_client
ON client
USING btree
(client_id, shop_id, asof, quantity)
WHERE client_id = 200;
CREATE UNIQUE INDEX uidx2_client
ON client
USING btree
(client_id, asof, quantity)
WHERE client_id = 500;

A simple solution would be to create a field for the primary key which would use one of two algorithms to generate its data depending on what is passed in.
If you wanted a fully normalised solution, you would probably need to split the shop information into two separate tables and have it referenced from this table using outer joins.
You may also be able to use table inheritance available in postgres.

Related

Use index to speed up query using values from different tables

I have a table products, a table orders and a table orderProducts.
Products have a name as a PK (apple, banana, mango) and a price .
orders have a created_at date and an id as a PK.
orderProducts connects orders and products, so they have a product_name and an order_id. Now I would like to show all orders for a given product that happened in the last 24 hours.
I use the following query:
SELECT
orders.id,
orders.created_at,
products.name,
products.price
FROM
orderProducts
JOIN products ON
products.name=orderProducts.product
JOIN orders ON
orders.id=orderProducts.order
WHERE
products.name='banana'
AND
orders.created_at BETWEEN NOW() - INTERVAL '24 HOURS' AND NOW()
ORDER BY
orders.created_at
This works, but I would like to optimize this query with an index. This index would need to first be ordered by
the product name, so it can be filtered
then the created_at of the order in descending order, so it can select only the ones from 24 hours ago
The problem is, that from what I have seen, indexes can only be created on a single table, without the possibility of joining another tables values to it. Since two individual index do not solve this problem either, I was wondering if there was an alternative way to optimize this particular query.
Here are the table scripts:
CREATE TABLE products
(
name text PRIMARY KEY,
price integer,
)
CREATE TABLE orders
(
id SERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW(),
)
CREATE TABLE orderProducts
(
product text REFERENCES products(name),
"order" integer REFERENCES orders(id),
)
First of all. Please do not put indices everywhere - that lead to slower changing operations...
As proposed by #Laurenz Albe - do not guess - check.
Other than that. Note that you know product name, price is repeated - so you can query that once. Question if in your case two queries are going to be faster then single one... Check that.
Please read docs. I would try this index:
create index orders_id_created_at on orders(created_at desc, id)
Normally id should go first, since that is unique, however here system should be able to filter out on both predicates - where/join. Just guessing here.
orderProducts I would like to see index on both columns, however for this query only one should be needed. In practice you are going from products to orders, or other way - both paths are possible, that is why I've wrote about indexing both columns. I would use two separate indexes:
create index orderproducts_product_id on orderproducts (product_id) include (order_id);
create index orderproducts_order_id on orderproducts (order_id) include (product_id);
Probably that is not changing much, but... idea is to use only index, but not the table itself.
These rules are important in terms of performance:
Integer index faster than string index, therefore, you should try to make the primary keys always be an integer. Because join the tables uses primary keys too.
If when in where clauses always use two fields then we must create an index for both fields.
Foreign-Keys are not indexed, you must create an index for foreign-key fields manually.
So, recommended table scripts will be are that:
CREATE TABLE products
(
id serial primary key,
name text,
price integer
);
CREATE UNIQUE INDEX products_name_idx ON products USING btree (name);
CREATE TABLE orders
(
id SERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX orders_created_at_idx ON orders USING btree (created_at);
CREATE TABLE orderProducts
(
product_id integer REFERENCES products(id),
order_id integer REFERENCES orders(id)
);
CREATE INDEX orderproducts_product_id_idx ON orderproducts USING btree (product_id, order_id);
---- OR ----
CREATE INDEX orderproducts_product_id ON orderproducts (product_id);
CREATE INDEX orderproducts_order_id ON orderproducts (order_id);

Do i really need individual table for my three types of users?

If i have three type of users. Let's say seller, consumers, and sales persons. Should i make individual table for there details like name, email passwords and all other credentials etc with a role_type table or separate table for each of them. Which is the best approach for a large project considering all engineering principles for DBMS like normalization etc.
Also tell me Does it effect the performance of the app if i have lots of joins in tables to perform certain operations?
If the only thing that distinguishes those people is the role but all details are the same, then I would definitely go for a single table.
The question is however, can a single person have more than one role? If that is never the case, then add a role_type column to the person table. Depending on how fixed those roles are maybe use a lookup table and a foreign key, e.g.:
create table role_type
(
id integer primary key,
name varchar(20) not null unique
);
create table person
(
id integer primary key,
.... other attributes ...,
role_id integer not null references role_type
);
However, in my experience the restriction to exactly one role per person usually doesn't hold, so you would need a many-to-many relation ship
create table role_type
(
id integer primary key,
name varchar(20) not null unique
);
create table person
(
id integer primary key,
.... other attributes ...,
);
create table person_role
(
person_id integer not null references person,
role_id integer not null references role_type,
primary key (person_id, role_id)
);
It sounds like this is a case of trying to model inheritance in your relational database. Complex topic, discussed here and here.
It sounds like your "seller, consumer, sales person" will need lots of different attributes and relationships. A seller typically belongs to a department, has targets, is linked to sales. A consumer has purchase history, maybe a credit limit, etc.
If that's the case,I'd suggest "class table inheritance" might be the right solution.
That might look something like this.
create table user_account
(id int not null,
username varchar not null,
password varchar not null
....);
create table buyer
(id int not null,
user_account_id int not null(fk),
credit_limit float not null,
....);
create table seller
(id int not null,
user_account_id int not null(fk),
sales_target float,
....);
To answer your other question - relational databases are optimized for joining tables. Decades of research and development have gone into this area, and a well-designed database (with indexes on the columns you're joining on) will show no noticeable performance impact due to joins. From practical experience, queries with hundreds of millions of records and ten or more joins run very fast on modern hardware.

Model `select by partition key in Cassandra` in BigTable

What would the equivalent of modeling a select by partition key in Cassandra be in BigTable?
For example; if I had a Cassandra table
CREATE TABLE emp (
empID int,
deptID int,
first_name varchar,
last_name varchar,
PRIMARY KEY (empID, deptID));
I can query
SELECT deptid FROM emp WHERE empid = 104;
In BigTable; I think this is equivalent to adding columns to a Row?
If so is that a relatively standard design pattern?
Or if not; is there another pattern that can be used?
Thanks
Brent
This is mostly addressed in the comments. Bigtable does not have separate partition key and primary key concepts and only has a single index.
Your example you would probably want to make both your employee ID and department ID part of your row key. Keys are stored lexicographically and you can use prefixes to do more efficient subscans, so you would need to determine whether to concatenate either employee ID followed by department ID, or vice versa.
This is somewhat akin to the reverse domain name pattern and you may want to review the guidance suggested here:
https://cloud.google.com/bigtable/docs/schema-design#types_of_row_keys

How to map tables without FK

I have these tables:
carWash
order
orderHistory
user
services
Connections: one carWash to many orders, one order to one orderHistory, one carWash to many services, one user to many orders
The question is: when user create order he choose some services that are provided by car_wash for example (wash_car_body = 20$, wash_wheels = 10$ ) from services table and when user wants to see order history I want to show all chosen services to user, how to do it better?
My services script:
create table services(
id SERIAL not null,
car_wash_id int not null,
name text not null,
price double precision not null,
car_body text,
CONSTRAINT "gmoika_service_company_id_fKey" FOREIGN KEY (car_wash_id)
REFERENCES car_wash (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
Car Wash Script:
create table services(
id SERIAL not null,
name text not null)
For example
insert into services (car_wash_id, price , name car_body) values (1,20,wheels_wash,sedan)
And when user create order to car_wash with id = "id" i use the following script to give him all services ,
select * from services s where s.car_wash_id = "id".
Then user choose services. And i want to save choosen services into order history.
OrderHistory script
CREATE TABLE order_history(
id SERIAL not null,
order_id int,
wash_price double precision,
car_body text,
box_number int,
car_wash_name text,
currency text,
time timestamp with time zone NOT NULL,
status text,
CONSTRAINT "gmoika_wash_history_id_fKey" FOREIGN KEY (order_id)
REFERENCES order (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
You'd need to add another table, e.g. orderServiceHistory which will be a mapping of order_id to service_id. Both of these will be foreign keys to the order & services tables. You could then use this to store which services were taken from a historic car wash order.
However, I'd recommend you think about your schema a bit more. Few things to consider off the top of my head:
Why the mix of plural and singular form (e.g. order vs services)
You haven't defined your primary keys
What is the purpose of the one to one mapping between order & orderHistory?
How would you handle when the price of a service changes?
How would the removal/inactivation of services & car washes etc be handled?

Index method for a column used only for ordering

I have a table product_images with a foreign key product_id and integer field order to manualy set order of product's images. Knowing that the table will be used only like this:
SELECT * FROM product_images
WHERE product_id = ?
ORDER BY "order"
-- what is the optimal index method for product_id and order?
Is that enough?:
CREATE INDEX product_images_unique_order
ON "product_images"("product_id", "order");
SQL Fiddle
Yes, that should do it.
PostgreSQL might decide not to use that index, depending on how many rows you have, how many images any given product_id has, and how scattered about the table all of the rows with the same product_id are, and how wide the rows of the product_images table are; plus many other things.
But by having that index you provide PostgreSQL with the opportunity to use it.