Hstore vs many-to-many relationship search performance - postgresql

Completely hypothetical question to compare performance of hstore is postgress
Let's say each user has a list of followers. There are 2 way to implement it
Many-to-Many relationship with a 'follower' table ( user_id, follower_id )
An hstore column where the values are the ids of the followers. (with a GiST index)
If I want to find all the users that follow a certain user, which version would perform faster?
SELECT follower_id from follower where user_id = '1234'
SELECT user_id from user where (data #> 'followers=>'1234')
In real life, for option b we would probably also maintain a list of all users the user follows - for the sake of the question let's assume we don't do that.

Related

If a Postgres DB has unique IDs across its tables, how do you find a row using its ID without knowing its table?

Following the blog of Rob Conery I have set of unique IDs across the tables of my Postgres DB.
Now, using these unique IDs, is there a way to query a row on the DB without knowing what table it is in? Or can those tables be indexed such that if the row is not available on the current table, I just increase the index and I can query to the next table?
In short - if you did not prepared for that - then no. You can prepare for that by generating your own uuid. Please look here. For instance PG has uuid that preserve order. Also uuid v5 has something like namespaces. So you can build hierarchy. However that is done by hashing namespace, and I don't know tool to do opposite inside PG.
If you know all possible tables in advance you could prepare a query that simply UNIONs a search with a tagged type over all tables. In case of two tables named comments and news you could do something like:
PREPARE type_of_id(uuid) AS
SELECT id, 'comments' AS type
FROM comments
WHERE id = $1
UNION
SELECT id, 'news' AS type
FROM news
WHERE id = $1;
EXECUTE type_of_id('8ecf6bb1-02d1-4c04-8875-f1da62b7f720');
Automatically generating this could probably be done by querying pg_catalog.pg_tables and generating the relevant query on the fly.

How to Carry over a Table into a Column PostgreSQL

This May be a dumb question as I am a beginner in postgreSQL but what I'm trying to do is
I have a Table called Products and inside products there is 3 columns Name, Price, Expiry Date. Now I have a second table called orders with 4 columns. Product, purchasePrice, Amount, and CountryRecieved.
All I want is to reference the Product column to the product table so it has all the Information of the product table?
Is this do able?
The key concepts you need to read up on are:
"normalisation": the process of breaking down data into multiple related entities
"foreign keys": pointers from one database table to another
"joins": the query construct used to follow that pointer and get the data back together
In your case:
You have correctly determined that the information from Products should not just be copied manually into each row of the Orders table. This is one of the most basic aspects of normalisation: each piece of data is in one place, so updates cannot make it inconsistent.
You have deduced that the Orders table needs some kind of Product column; this is your foreign key. The most common way to represent this is to give the Products table an ID column that uniquely identifies each row, and then have a ProductID column in the Orders table. You could also use the product's name as the key, but this means you can never rename a product, as other entities in the database might reference it; integer keys will generally be more efficient in storage and speed, as well.
To use that foreign key relationship, you use a JOIN in your SQL queries. For example, to get the name and quantity of products ordered, you could write:
SELECT
P.Name,
O.Amount
FROM
Products as P
INNER JOIN
Orders as O
-- This "ON" clause tells the database how to look up the foreign key
On O.ProductId = P.ProductId
ORDER BY
P.Name
Here I've used an "inner join"; there are also "left outer join" and "right outer join", which can be used when only some rows on one side will meet the condition. I recommend you find a tutorial that explains them better than I can in a single paragraph.
Assuming the name column is key in Products table and product column in Orders table refers to it, you can join the two table on related column(s) and get all the information:
select
o.*, p.*
from orders o
join products p on o.product = p.name;

Postgres table partitioning with star schema

I have a schema with one table with the majority of data, customer, and three other tables with foreign key references to customer.entry_id which is a BIGSERIAL field. The three other tables are called location, devices and urls where we store various data related to a specific entry in the customer table.
I want to partition the customer table into monthly child tables, and have that part worked out; customer will stay as-is, each month will have a table customer_YYYY_MM that inherits from the master table with the right CHECK constraint and indexes will be created on each individual child table. Data will be moved to the correct child tables while the master table stays empty.
My question is about the other three tables, as I want to partition them as well. However, they have no date information (at all), only the reference to the primary key from the master table. How can I setup the constraints on these tables? Is it even meaningful or possible without date information?
My application logic knows where to insert all the data (it's fairly trivial), but I expect to be able to do simple SELECT queries without specifying which child tables to get it from. So this should work as you would expect from non-partitioned tables:
SELECT l.*
FROM customer c
JOIN location l USING entry_id
WHERE c.date_field > '2015-01-01'
I would partition them by the reference key. The foreign key is used in join conditions and is not usually subject to change so it fulfills the following important points:
Partition by the information that is used mostly in the WHERE clauses of the queries or other parts where partitioning can be used to filter out tables that don't need to be scanned. As one guide puts it:
The objective when defining partitions should be to allow as many queries as possible to fetch data from as few partitions as possible - ideally one.
Partition by information that is not going to be changed so that rows don't constantly need to be thrown from one subtable to another
This all depends of the size of the tables too of course. If the sizes stay small then there is no need to partition.
Read more about partitioning here.
Use views:
create view customer as
select * from customer_jan_15 union all
select * from customer_feb_15 union all
select * from customer_mar_15;
create view location as
select * from location_jan_15 union all
select * from location_feb_15 union all
select * from location_mar_15;

Indexing for efficient querying and pagination of financial data in PostgreSQL

I'm working in an API that needs to return a list of financial transactions. These records are held in 6 different tables, but all have 3 common fields:
transaction_id int NOT NULL,
account_id bigint NOT NULL,
created timestamptz NOT NULL
note: might have actually
been a good use of table in inheritance in postgresql but it wasn't done like that.
The business requirement is to return all transactions for a given account_id in 1 list sorted by created in descending order (similar to an online banking page where your last transaction is at the top). Originally, they want to paginate in groups of 50 records, but I've got them to do it on date ranges (believing that I can do it more efficiently in the database than using offset and limits).
My intent is to create an index on each of these tables like this:
CREATE INDEX idx_table_1_account_created ON table_1(account_id, created desc);
ALTER TABLE table_1 CLUSTER ON idx_table_1_account_created;
Then finally to create a view to union all of the records from the 6 tables into one list and then obviously the records from the 6 tables will need to be *resorted" to come up with a unified list (in the correct order). This call will look like:
SELECT * FROM vw_all_transactions
WHERE account_id = 12345678901234
AND created >= '2014-01-01' AND created < '2014-02-01'
ORDER BY created desc;
My question is related to creating the indexing and clustering scheme. Since the records are going to have to be resorted by the view anyway is there any reason to do specify the individual indexes as created desc? And does sorting this way have any penalties when periodicially calling CLUSTER;
I've done some googling and reading but can't really seem to find any information that answers how this clustering is going to work.
Using PostgreSQL 9.2 on Heroku.

Is it possible to insert values in column using just a primary key reference?

I have a table 'users' with the columns:
user_id(PK), user_firstname, user_lastname
and another table 'room' with the columns:
event_id(PK), user_id(FK), user_firstname, user_lastname....(and more columns).
I want to know if it is possible to fill the user_firstname and user_lastname automatically just knowing the user_id column.
Like the default value of user_firstname would be like: "select users.user_firstname where users.user_id = user_id"
I don't know if was clear enough...As you can see my knowledge in database is very narrow.
What you want to achieve can be done with JOINs. They will avoid those redundant user_firstname and user_lastname columns. So you'd just fetch from both tables when querying the room table and you get the extra columns of users into the result set:
SELECT * FROM room AS r INNER JOIN users AS u ON r.user_id = u.user_id;
The thing we did here is called normalization. Another important thing to take care of are foreign key constraints and their cascades, in your case room.user_id references user.user_id. A delete on user should most probably cascade to room, if you want to delete users, instead of flagging them deleted.
The columns user_firstname and user_lastname do not belong in your room table. The user_id column references the users table, that is all you need.
To select the data, you can use a JOIN statement, something like
SELECT R.event_id, R.user_id, U.user_firstname, U.user_lastname
FROM room AS R
JOIN users AS U ON R.user_uid = U.user_id
The answer here is sideways to the question. You do not want a user_firstname and user_lastname column in the Event table. The user_id is a proxy for that row of the entire User table. When you need to access user_firstname, you do a JOIN of the two tables on the common column.