Creating a temp table inside a PostgreSQL function creates conflicts between different function calls? - postgresql

I want to use a temporary table (let's call it temp_tbl) created in a PostgreSQL function in order to SELECT into it just some rows and columns from a table (let's call it tbl) as follows:
One of the columns that both temp_tbl and tbl have is order_date of type DATE and the function also takes a start_date DATE argument. I want to SELECT in temp_tbl just the rows from tbl that have an order_date later than the function's start_date.
My question is: if this function gets called concurrently 2 or more times in the same session, won't the calls use the same instance of the temporary table temp_tbl ?
More specifically, when using psycopg2 in the backend of a webserver, different clients of the webserver might require calling our function at the same time. Will this generate a conflict over the temp_tbl temporary table declared inside the function?
EDIT: my actual context
I'm building (for education purposes) an online shop. I have 3 tables for 3 kinds of products that all use a common sequence for their ids. I have another table for orders that includes a column which is an array of product ids and a column which is an array of quantities (associated with the product ids of the ordered products).
I want to return a table of common product details (columns common to all 3 tables like id, name, price etc) and their associated number of sales.
My current method is to concatenate all the arrays of ids and quantities from all order entries, then create a temporary table out of the 2 arrays and sum the number of orders for each product id so I have a table with one entry for each ordered product.
Then, I create 3 temporary tables in order to join each product table with the temporary product orders figures table and SELECT only the columns that are common to all 3 tables.
Finally, I UNION the 3 temporary tables.
This is kind of complicated for me so I think that maybe there were better design decisions I could have made ?

Related

Handle Changes in multiple tables used in creation of dimension

I am working on data warehousing project, Need help with below
OLAP Table:
Product Dimension Table:
Product_id, category_id, category_name,brand_id, brand_name ,manufacturer_id, manufacturer_name
OLTP Tables:
Each table contains create_ts and update_ts for tracking creation & update in tables.
**Product_info, id, product_name,category_id,brand_id,manufacturer,create_ts, update_ts
Product_category_mapping: id,product_id,category_id,create_ts, update_ts
brand: id, name,create_ts, update_ts
manufacturer:id, name,create_ts, update_ts**
Looking to track all the changes in any of the tables, should reflect in the dimension table.
For Example:
Current OLAP Snapshot
Product_id, category_id, category_name,brand_id, brand_name ,manufacturer_id, manufacturer_name
1,33,Noodles,45, Nestle,455,nestele_pvt_ltd
Suppose brand name changes from nestle to nestle-us, How will we track this as we are capturing changes based on only product_info update_ts??
Should we consider all the 4 table changes??
Please suggest.
if data changes in any table that is a source for your DW then you need to include it in your extract logic.
For reference data like this where you can have a number of tables that contribute to a single "target" table, an approach I often take is to create a View across these tables in your source DB, include all the columns you need to take across to the DW but only have a single update_ts column that is calculated using the SQL GREATEST function where you pass in the update_ts columns from all the tables in the View. Then you only need to compare this single column to your "last extracted date" to determine if there are any changes that you may need to process

How to Carry over a Table into a Column PostgreSQL

This May be a dumb question as I am a beginner in postgreSQL but what I'm trying to do is
I have a Table called Products and inside products there is 3 columns Name, Price, Expiry Date. Now I have a second table called orders with 4 columns. Product, purchasePrice, Amount, and CountryRecieved.
All I want is to reference the Product column to the product table so it has all the Information of the product table?
Is this do able?
The key concepts you need to read up on are:
"normalisation": the process of breaking down data into multiple related entities
"foreign keys": pointers from one database table to another
"joins": the query construct used to follow that pointer and get the data back together
In your case:
You have correctly determined that the information from Products should not just be copied manually into each row of the Orders table. This is one of the most basic aspects of normalisation: each piece of data is in one place, so updates cannot make it inconsistent.
You have deduced that the Orders table needs some kind of Product column; this is your foreign key. The most common way to represent this is to give the Products table an ID column that uniquely identifies each row, and then have a ProductID column in the Orders table. You could also use the product's name as the key, but this means you can never rename a product, as other entities in the database might reference it; integer keys will generally be more efficient in storage and speed, as well.
To use that foreign key relationship, you use a JOIN in your SQL queries. For example, to get the name and quantity of products ordered, you could write:
SELECT
P.Name,
O.Amount
FROM
Products as P
INNER JOIN
Orders as O
-- This "ON" clause tells the database how to look up the foreign key
On O.ProductId = P.ProductId
ORDER BY
P.Name
Here I've used an "inner join"; there are also "left outer join" and "right outer join", which can be used when only some rows on one side will meet the condition. I recommend you find a tutorial that explains them better than I can in a single paragraph.
Assuming the name column is key in Products table and product column in Orders table refers to it, you can join the two table on related column(s) and get all the information:
select
o.*, p.*
from orders o
join products p on o.product = p.name;

Postgres table partitioning with star schema

I have a schema with one table with the majority of data, customer, and three other tables with foreign key references to customer.entry_id which is a BIGSERIAL field. The three other tables are called location, devices and urls where we store various data related to a specific entry in the customer table.
I want to partition the customer table into monthly child tables, and have that part worked out; customer will stay as-is, each month will have a table customer_YYYY_MM that inherits from the master table with the right CHECK constraint and indexes will be created on each individual child table. Data will be moved to the correct child tables while the master table stays empty.
My question is about the other three tables, as I want to partition them as well. However, they have no date information (at all), only the reference to the primary key from the master table. How can I setup the constraints on these tables? Is it even meaningful or possible without date information?
My application logic knows where to insert all the data (it's fairly trivial), but I expect to be able to do simple SELECT queries without specifying which child tables to get it from. So this should work as you would expect from non-partitioned tables:
SELECT l.*
FROM customer c
JOIN location l USING entry_id
WHERE c.date_field > '2015-01-01'
I would partition them by the reference key. The foreign key is used in join conditions and is not usually subject to change so it fulfills the following important points:
Partition by the information that is used mostly in the WHERE clauses of the queries or other parts where partitioning can be used to filter out tables that don't need to be scanned. As one guide puts it:
The objective when defining partitions should be to allow as many queries as possible to fetch data from as few partitions as possible - ideally one.
Partition by information that is not going to be changed so that rows don't constantly need to be thrown from one subtable to another
This all depends of the size of the tables too of course. If the sizes stay small then there is no need to partition.
Read more about partitioning here.
Use views:
create view customer as
select * from customer_jan_15 union all
select * from customer_feb_15 union all
select * from customer_mar_15;
create view location as
select * from location_jan_15 union all
select * from location_feb_15 union all
select * from location_mar_15;

Indexing for efficient querying and pagination of financial data in PostgreSQL

I'm working in an API that needs to return a list of financial transactions. These records are held in 6 different tables, but all have 3 common fields:
transaction_id int NOT NULL,
account_id bigint NOT NULL,
created timestamptz NOT NULL
note: might have actually
been a good use of table in inheritance in postgresql but it wasn't done like that.
The business requirement is to return all transactions for a given account_id in 1 list sorted by created in descending order (similar to an online banking page where your last transaction is at the top). Originally, they want to paginate in groups of 50 records, but I've got them to do it on date ranges (believing that I can do it more efficiently in the database than using offset and limits).
My intent is to create an index on each of these tables like this:
CREATE INDEX idx_table_1_account_created ON table_1(account_id, created desc);
ALTER TABLE table_1 CLUSTER ON idx_table_1_account_created;
Then finally to create a view to union all of the records from the 6 tables into one list and then obviously the records from the 6 tables will need to be *resorted" to come up with a unified list (in the correct order). This call will look like:
SELECT * FROM vw_all_transactions
WHERE account_id = 12345678901234
AND created >= '2014-01-01' AND created < '2014-02-01'
ORDER BY created desc;
My question is related to creating the indexing and clustering scheme. Since the records are going to have to be resorted by the view anyway is there any reason to do specify the individual indexes as created desc? And does sorting this way have any penalties when periodicially calling CLUSTER;
I've done some googling and reading but can't really seem to find any information that answers how this clustering is going to work.
Using PostgreSQL 9.2 on Heroku.

What should be the strategy to read from many tables having millions rows each in postgresql?

I have following scenario while using postgresql -
No of tables - 100 ,
No of rows per table - ~ 10 Million .
All the tables have same schema E.g. each table contains daily call records of a company. So 100 tables contain call records of 100 days.
I want to make following type of queries on these tables -
For each column of each table get count of records having null value in that column.
So considering above scenario, what can be the major optimizations in table structures ? How should i prepare my query and does there exist any efficient way of querying for such cases
If you're using Postgres table inheritance, a simple select count(*) from calls where foo is null will work fine. It will use an index on foo provided null foo rows aren't too common.
Internally, that will do what you'd do manually without table inheritance, i.e. union all the result for each individual child table.
If you need to run this repeatedly, maintain the count in memcached or in another table.