I have an application that handles products and sales in different currencies. So every row in the same table in the database can store prices in different currencies. How to correctly do that?
The most straightforward way is to define a numeric price_amount column and varchar price_currency oolumn, but I feel that two technically independent columns for essentially single value (price) is wrong. Like physical measurements are meaningless numbers without units, amount of money is also meaningless without their units–currency.
In my opinion money should be a single value containing both amount and currency within itself.
I started searching and got surprised a bit that there are no ready solutions or good articles in search results. There is pg-currency extension that does what I want, but it was abandoned almost 10 years ago.
I created the following composite datatype as a starting point:
CREATE TYPE true_money AS (
currency varchar,
amount numeric
);
And then starting to write supporting things for it: validations, arithmetic, aggregates… And realized that this rabbit hole is truly deep.
All my current (partial) results over this composite type can be found here for reference: https://gist.github.com/Envek/780b917e72a86c123776ee763b8dd986?fbclid=IwAR2GxGUVPg5FtN3SSPhQv2uFA7oPNNjbZeTYWRix-ZijYaJFRec15chWLA8#file-true_money-sql
And now I can do following things:
INSERT INTO "products" ("title", "price") VALUES ('Гравицапа', ('RUB',100500));
INSERT INTO "products" ("title", "price") VALUES ('Эцих с гвоздями', ('RUB',12100.42));
INSERT INTO "products" ("title", "price") VALUES ('Gravizapa', ('USD',19999.99));
-- You can access its parts if you need to extract them or do filtering or grouping
SELECT SUM(price) FROM test_monetaries WHERE (price).currency = 'RUB';
-- (RUB,112600.42)
-- And if you forget filtering/grouping then custom types can save you from nonsense results
SELECT SUM(price) FROM test_monetaries;
ERROR: (USD,19999.99) can not be added to (RUB,112600.42) - currencies do not match
Is it a correct approach? How to do it right?
A bit of context: in our app, users (sellers) can manage their stock (products) in any currency they want (e.g., USD, EUR, JPY, RUB, whatever). The app will convert currencies and publish products in local sites (like British or Australian). Buyers will also buy these goods in their local currency (GBP, AUD, etc.) that will eventually be converted to seller currency and paid to them (except fees). So in many places in the application, almost any supported currency can appear. And finally, the seller should be able to change their currency and all products should be converted to new currency in batches (single update within transaction can't be used by some reasons). So we can't say “keep only numeric values in the products table and JOIN with the sellers table to get currency” (which is anti-pattern per se, I believe).
Yes, creating your own type is quite a lot of work if you want to integrate it seamlessly with PostgreSQL.
If an item can be sold in different countries and has a different price everywhere, you should model the data accordingly. Having an exchange rate is not good enough, because the same item might be more expensive in Japan than in China.
If you are only interested in the current price, that could look like this:
CREATE TABLE currency (
currency_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
denomination text CHECK (length(denomination) = 3) NOT NULL
);
CREATE TABLE exchange (
from_curr_id bigint REFERENCES currency NOT NULL,
to_curr_id bigint REFERENCES currency NOT NULL,
rate numeric(10,5) NOT NULL
);
CREATE TABLE country (
country_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text UNIQUE NOT NULL,
currency_id bigint REFERENCES currency NOT NULL
);
CREATE TABLE product (
product_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
title text NOT NULL,
);
CRATE TABLE price (
country_id bigint REFERENCES country NOT NULL,
product_id bigint REFERENCES product NOT NULL,
amount numeric(10,2) NOT NULL,
PRIMARY KEY (product_id, country_id)
);
CREATE INDEX ON price (country_id); -- for the foreign key
This way, each product can have a certain price in each country, and the price is associated with a currency via the country.
Of course, the real world might be still more complicated:
you could have more than one currency per country
you might want to keep historical price information
The main thing is that you can always follow a chain of foreign keys that leads you to the desired amount and currency unambiguously.
For converting between currencies
Related
I'm making a real-time chat app and was stuck figuring out how the DB model should look like. I've made this diagram, but would this work? My issue is more to do with foreign keys.
I know this is a very vague question. But have been struggling with this model for a while now. This is the first database I'm setting up so it's probably got a load of errors.
Actually you are fairly close, but over complicated it a bit. At the conceptual/logical model you have just 2 entities. Users and Messages
with a many-to-many relationship. At the physical level the Channels table resolves the M:M into the 2 one_to_many you have described. But the
viewing this way ravels a couple issues. The attribute user is not required in the Messages table and if physically implemented requires a not easily done validation
that the user there exists in the Channels table. Further everything that Message:User relationship provides is a available
via Users:Channels:Messages relationship. A similar argument applies to Channels column in Users - completely resolved by the resolution table. Suggestion: drop user from message table and channels from users.
Now lets look at the columns of Channels. It looks like you using a boiler plate for created_at and updated_at, but are they necessary?
Well at least for updated_at No. What can be updated? If either User or Message is updated you have a brand new entry. Yes it may seem like the same physical row (actually it is not)
but the meaning is completely different. Well how about last massage? What is it trying to indicate that the max value created at for the user does not give you?
I cannot see anything. I guess you could change the created at but what is the point of tracking when I changed that column. Suggestion: drop last message sent and updated at (unless required by Institution standards) from message table.
That leaves the Users table itself. Besides Channels mentioned above there is the Contacts column. Physically as a array it violates 1NF and becomes difficult to manage - (as wall as validating that the contact is in fact a user)
Logically it is creating a M:M on USER:USER. So resolve it the same way as User:Messages, pull it out into another table, say User_Contacts with 2 attributes to the Users table. Suggestion drop contacts for the users table and create a resolution table.
Unfortunately, I do not have a good ERD diagrammer, so I just provide DDL.
create table users (
user_id integer generated always as identity primary key
, name text
, phone_number text
, last_login timestamptz
, created_at timestamptz
, updated_at timestamptz
) ;
create type message_type as enum ('short', 'long'); -- list all values
create table messages(
msg_id integer generated always as identity primary key
, msg_type message_type
, message text
, created_at timestamptz
, updated_at timestamptz
);
create table channels( -- resolves M:M Users:Messages
user_id integer
, msg_id integer
, created_at timestamptz
, constraint channels_pk
primary key (user_id, msg_id)
, constraint channels_2_users_fk
foreign key (user_id)
references users(user_id)
, constraint channels_2_messages_fk
foreign key (msg_id)
references messages(msg_id )
);
create table user_contacts( -- resolves M:M Users:Users
user_id integer
, contact_id integer
, created_at timestamptz
, constraint user_contacts_pk
primary key (user_id, contact_id)
, constraint user_2_users_fk
foreign key (user_id)
references users(user_id)
, constraint contact_2_user_fk
foreign key (user_id)
references users(user_id)
, constraint contact_not_me_check check (user_id <> contact_id)
);
Notes:
Do not use text as PK, use either integer (bigint) or UUID, and generate them during insert.
Caution on ENUM. In Postgres you can add new values, but you cannot remove a value. Depending upon number of values and how often the change consider creating a lookup/reference table for them.
Do not use the data type TIME. It is really not that useful without the date. Simple example I login today at 15:00, you login tomorrow at 13:00. Now, from the database itself, which of us logged in first.
If i have three type of users. Let's say seller, consumers, and sales persons. Should i make individual table for there details like name, email passwords and all other credentials etc with a role_type table or separate table for each of them. Which is the best approach for a large project considering all engineering principles for DBMS like normalization etc.
Also tell me Does it effect the performance of the app if i have lots of joins in tables to perform certain operations?
If the only thing that distinguishes those people is the role but all details are the same, then I would definitely go for a single table.
The question is however, can a single person have more than one role? If that is never the case, then add a role_type column to the person table. Depending on how fixed those roles are maybe use a lookup table and a foreign key, e.g.:
create table role_type
(
id integer primary key,
name varchar(20) not null unique
);
create table person
(
id integer primary key,
.... other attributes ...,
role_id integer not null references role_type
);
However, in my experience the restriction to exactly one role per person usually doesn't hold, so you would need a many-to-many relation ship
create table role_type
(
id integer primary key,
name varchar(20) not null unique
);
create table person
(
id integer primary key,
.... other attributes ...,
);
create table person_role
(
person_id integer not null references person,
role_id integer not null references role_type,
primary key (person_id, role_id)
);
It sounds like this is a case of trying to model inheritance in your relational database. Complex topic, discussed here and here.
It sounds like your "seller, consumer, sales person" will need lots of different attributes and relationships. A seller typically belongs to a department, has targets, is linked to sales. A consumer has purchase history, maybe a credit limit, etc.
If that's the case,I'd suggest "class table inheritance" might be the right solution.
That might look something like this.
create table user_account
(id int not null,
username varchar not null,
password varchar not null
....);
create table buyer
(id int not null,
user_account_id int not null(fk),
credit_limit float not null,
....);
create table seller
(id int not null,
user_account_id int not null(fk),
sales_target float,
....);
To answer your other question - relational databases are optimized for joining tables. Decades of research and development have gone into this area, and a well-designed database (with indexes on the columns you're joining on) will show no noticeable performance impact due to joins. From practical experience, queries with hundreds of millions of records and ten or more joins run very fast on modern hardware.
I have a case when a marketplace have many promotion. Each promotion is eligbile for only whitelisted customer. There is probability that a customer is whitelisted for more than one promotion. And also it must handle high traffic since customer always entered promotion code to see if its eligible or not. Currently I implemented a whitelist_customer table, with column 'customer_id' with unique index, and 'values' column with text[] array data type. For example, in one row I can store customer_id '1' and this customer eligible for 3 promotions so I store 'values' with 'PROMOCODE1','PROMOCODE2','PROMOCODE3'.
So when customer_id '1' enter the code PROMOCODE2, it will inquire the eligibility by searching by its customer_id, and then search using #> (contains).
Is this approach is the best solution?
No.
arrays break the First Normal Form rule in database design (repeating values).
also, to search in array takes time.
The best solution would be with a table where each promotion is in a different row. This table will have a composite primary key (that means there will be an index on the two columns as well)
CREATE TABLE customer_promotion (
cust_id int,
promotion varchar(50),
PRIMARY KEY(cust_id, promotion)
);
searching for specific cust_id and promotion becomes trivial
I am building a platform with two kinds of users: Users_A create projects with unique virtual coins associated, and Users_B can buy and exchange this coins.
The problem:
Approach 1: if I use one unique table as a virtual wallet, the User_B ID will be the row, and each column will be each coin. In this way, I have to ALter the table each time a new project is created.
Approach 2: I create an electronic wallet (table) for every single User_B.
Which one of the two is worse/better in terms of performance?
Is there any other possible approach?
It's a bit unclear to me what exactly you are trying to model. But any model that requires ALTERing a table because you add new content to the database is flawed.
That sounds like a basic many-to-many relationship to me:
You definitely need a table for the users:
create table users
(
user_id integer primary key,
... other columns ...
);
and one for the different coins:
create table coin
(
coin_id integer primary key,
... other columns ...
);
You need a table for the projects. You said "unique virtual coins associated", so I assume one project deals with exactly one type of coins:
create table project
(
project_id integer primary key,
owner_user_id integer not null references users,
coin_id integer not null references coin
... other columns
);
I am not sure what exactly you mean with "buy and exchange" coins, but you probably need something like a transfer table:
create table coin_transfer
(
from_user_id integer not null references users,
to_user_id integer not null references users,
project_id integer not null references project,
transfer_type text not null check (transfer_type in ('buy', 'exchange'))
amount numeric not null
);
You also mention a "wallet" that belongs to a user. You would never create one table for each wallet, instead a table that contains the information which user owns the wallet. Assuming each user would have one wallet for each coin type you'd need something like this:
create table wallet
(
wallet_id integer primary key,
owner_user_id integer not null references users,
coin_id integer not null references coin,
... other columns ...
);
The above is only a very rough sketch and because there is a lot of information missing from your question.
I need to create a table (postgresql 9.1) and I am stuck. Could you possibly help?
The incoming data can assume either of the two formats:
client id(int), shop id(int), asof(date), quantity
client id(int), , asof(date), quantity
The given incoming CSV template is: {client id, shop id, shop type, shop genre, asof, quantity}
In the first case, the key is -- client id, shop id, asof
In the second case, the key is -- client id, shop type, shop genre, asof
I tried something like:
create table(
client_id int references...,
shop_id int references...,
shop_type int references...,
shop_genre varchar(30),
asof date,
quantity real,
primary key( client_id, shop_id, shop_type, shop_genre, asof )
);
But then I ran into a problem. When data is of format 1, the inserts fail because of nulls in pk.
The queries within a client can be either by shop id, or by a combination of shop type and genre. There are no use cases of partial or regex matches on genre.
What would be a suitable design? Must I split this into 2 tables and then take a union of search results? Or, is it customary to put 0's and blanks for missing values and move along?
If it matters, the table is expected to be 100-500 million rows once all historic data is loaded.
Thanks.
You could try partial unique indexes aka filtered unique index aka conditional unique indexes.
http://www.postgresql.org/docs/9.2/static/indexes-partial.html
Basically what it comes down to is the uniqueness is filtered based on a where clause,
For example(Of course test for correctness and impact on performance):
CREATE TABLE client(
pk_id SERIAL,
client_id int,
shop_id int,
shop_type int,
shop_genre varchar(30),
asof date,
quantity real,
PRIMARY KEY (pk_id)
);
CREATE UNIQUE INDEX uidx1_client
ON client
USING btree
(client_id, shop_id, asof, quantity)
WHERE client_id = 200;
CREATE UNIQUE INDEX uidx2_client
ON client
USING btree
(client_id, asof, quantity)
WHERE client_id = 500;
A simple solution would be to create a field for the primary key which would use one of two algorithms to generate its data depending on what is passed in.
If you wanted a fully normalised solution, you would probably need to split the shop information into two separate tables and have it referenced from this table using outer joins.
You may also be able to use table inheritance available in postgres.