Stored procedure help for taking IDs and getting result from another table - postgresql

In a table (store100) I have a list of storeIDs that make more than 100sales a day. In another table I have sales. Now what I want to do is for every StoreID in table store100 I want to see how many of product x they sold in the sales table. How do I achieve this? Obviously I don't want to be manually entering the storeIDs all the time so I want it to take all the IDs in the table and compare for sales of x in the sales table.
Table Structre:
store100 table:
ID
lon1
lon2
glas4
edi5
etc
Sales Table:
ID |Location|Product|Quantity|Total Price
lon1 London Wallet 5 50
edi5 Manc Shoes 4 100
So for example I want a query where it takes all the store100 IDs and shows how many wallets they sold.
If anyone has a better idea of achieving please tell me

You will need a join for this
SELECT S100.ID
,S.Product
,S.Quantity
FROM Store100 S100
INNER JOIN Sales S
ON (S100.ID = S.ID)
Of course you will still need your where clause if you need it and modify your select to fit your needs.

Related

How to create a working pivot table without killing the system

Let's say you have a Customer table, a simple customer table with just 4 columns:
customerCode numeric(7,0)
customerName char(50)
customerVATNumber char(11)
customerLocation char(35)
Keep in mind that the customers table contains 3 million rows because there are all the customers of the last 40 years, but the active ones are only 980000.
Suppose we then have a table called Sales structured in this way:
saleID integer
customerCode numeric(7,0)
agentID numeric(6,0)
productID char(2)
dateBeginSale date
dateEndSale date
There are about three and a half million rows in this table (here too we have stuff from 40 years ago), but the current supplies for the various products are a total of one million. The company only sells 4 products. Each customer can purchase up to 4 products with 4 different contracts even from 4 different agents. Most (90%) buy only one, the remaining from two to 4 (those who make the complete assortment are just 4 cats).
I was asked to build a pivot table showing for each customer with it's name and location all the product he purchased and from which agent.
The proposed layout for this pivot table is:
customerCode
customerName
customerLocation
productID1
agentID1
saleID1
dateBeginSale1
dateEndSale1
productID2
agentID2
saleID2
dateBeginSale2
dateEndSale2
productID3
agentID3
saleID3
dateBeginSale3
dateEndSale3
productID4
agentID4
saleID4
dateBeginSale4
dateEndSale4
I built the pivot with a view.
First I created 4 views, one for each product id on the Sales table, also useful for other statistical and reporting purposes
View1 as
customerCode1
productID1
agentID1
saleID1
dateBeginSale1
dateEndSale1
View2 as
customerCode2
productID2
agentID2
saleID2
dateBeginSale2
dateEndSale2
and so on till View4
Then i joined the 4 views with the customer table and created the PivotView i needed.
Now Select * from PivotView works perfectly.
Also Select * from PivotView Where customerLocation='NEW YORK CITY' too.
Any other request, for example: we select and count the customers residing in LOS ANGELES who have purchased the products from the same agent or from different sales agents, literally makes the machine sit down, I see the memory occupation grow (probably due to the construction of some temporary table or view) and often the execution of the query crashes.
However, if I create the same pivot on a table instead of a view the times of the various selections collapse and even if heavy (there are always about a million records to scan to verify the existence of the various conditions) they become acceptable.
For sure i am mistaking something and/or there must to be a better way to achieve the result: having a pivot built on on line data istead of one from data extracted nightly.
I'll be happy to read your comments and suggestion.
I don't clearly understand your data layout and what you need. But I'll say that the usual problem with pivoting data on Db2 for IBM i is that there's no built in way to dynamically pivot the data.
Given that you only have 4 products, the above limitation doesn't really apply.
Your problem would seem to be that by creating 4 views over the same table, you're processing records repeatedly. Instead, try to touch the data one time.
create view PivotSales as
select
customerCode,
-- product 1
max(case productID when '01' then productID end) as productID1,
max(case productID when '01' then agentID end) as agentID1,
max(case productID when '01' then saleID end) as saleID1,
max(case productID when '01' then dateBeginSale end) as dateBeginSale1,
max(case productID when '01' then dateEndSale end) as dateEndSale1,
-- product 2
max(case productID when '02' then productID end) as productID2,
max(case productID when '02' then agentID end) as agentID2,
max(case productID when '02' then saleID end) as saleID2,
max(case productID when '02' then dateBeginSale end) as dateBeginSale2,
max(case productID when '02' then dateEndSale end) as dateEndSale2,
-- repeat for product 3 and 4
from Sales
group by customerCode;
Now you can have a CustomerSales view:
create view CustomerSales as
select *
from Customers join SalesPivot using (customerCode);
Run your queries, using Visual Explain to see what indexes the system suggests are needed. At minimum, you should have an indexes:
Customer (customerCode)
Customer (location, customerCode)
Sales (customerCode)
I suspect that some Encoded Vector Indexes (EVI) over various columns in Sales and Customer would prove helpful. Especially since you mention "counting". An EVI keeps track of the counts of the symbols. So counting is "free". An example:
create encoded vector index customerLocEvi
on Customers (location);
-- this doesn't have to read any rows in customer
select count(*)
from customer
where location = 'LOS ANGELES';
For sure I am mistaking something and/or there must to be a better way
to achieve the result: having a pivot built on on line data istead of
one from data extracted nightly.
Don't be too sure about that. The DB structure that best supports Business Intelligence type queries usually doesn't match the typical transactional data structure. A periodic "extract, transform, load (ETL)" is pretty typical.
For your particular use case, you could turn CustomerSales into a Materalized Query Table (MQT), build some supporting indexes for it and just run queries directly over it. Nightly rebuild would be as simple as REFRESH CustomerSales;
Or if you wanted too, since Db2 for IBM i doesn't support SYSTEM MAINTAINED MQTs, a trigger over Sales could automatically propagate data to CustomerSales instead of rebuilding it nightly.

Convert certain columns that are recurring to rows in postgres

I have a table in Postgres that has ~50 million rows. I need to convert certain columns to rows.
I need to unpivot certain columns for individuals that repeat as an individual column and repeat the non-individual variables against the respective ID -
The following is the output I need -
Id appreciate any help on this.
50 million rows is not a big deal in Greenplum but returning that many rows to a client is kind of pointless. I'm guessing you want to create a new table for this new output. You are also going to be creating a table that is 2x larger because you are taking a single row and turning it into 2.
create table new_table as
select id, mid_1 as mid, name_1 as name, age_1 as age, location
from your_table
union all
select id, mid_2 as mid, name_2 as name, age_2 as age, location
from your_table
distributed by (id);

Divide count of Table 1 by count of Table 2 on the same time interval in Tableau

I have two tables with IDs and time stamps. Table 1 has two columns: ID and created_at. Table 2 has two columns: ID and post_date. I'd like to create a chart in Tableau that displays the Number of Records in Table 1 divided by Number of Records in Table 2, by week. How can I achieve this?
One way might be to use Custom SQL like this to create a new data source for your visualization:
SELECT created_table.created_date,
created_table.created_count,
posted_table.posted_count
FROM (SELECT TRUNC (created_at) AS created_date, COUNT (*) AS created_count
FROM Table1) created_table
LEFT JOIN
(SELECT TRUNC (post_date) AS posted_date, COUNT (*) AS posted_count
FROM Table2) posted_table
ON created_table.created_date = posted_table.posted_date
This would give you dates and counts from both tables for those dates, which you could group using Tableau's date functions in the visualization. I made created_table the first part of the left join on the assumption that some records would be created and not posted, but you wouldn't have posts without creations. If that isn't the case you will want a different join.

number of points within a radius of another set of points

I have two tables. One is a list of stores (with lat/long). The other is a list of customer addresses (with lat/long). What I want is a query that will return the number of customers within a certain radius for each store in my table. This gives me the total number of customers within 10,000 meters of ANY store, but I'm not sure how to loop it to return one row for each store with a count.
Note that I'm doing this queries using cartoDB, where the_geom is basically long/lat.
SELECT COUNT(*) as customer_count FROM customer_table
WHERE EXISTS(
SELECT 1 FROM store_table
WHERE ST_Distance_Sphere(store_table.the_geom, customer_table.the_geom) < 10000
)
This results in a single row :
customer_count
4009
Suggestions on how to make this work against my problem? I'm open to doing this other ways that might be more efficient (faster).
For reference, the column with store names, which would be in one column is store_identifier.store_table
I'll assume that you use the_geom to represent the coordinate (lat/lon) of store and customer. I will also assume that the_geom is of geography type. Your query will be something like this
select s.id, count(*) as customer_count
from customers c
inner join stores s
on st_dwithin(c.the_geom, s.the_geom, 10000)
group by s.id
This should give you neat table with a store id and count of customers within 10,000 meters from the store.
If the_geom is of type geometry, you query will be very similar but you should use st_distance_sphere() instead in order to express distance in kilometers (not degrees).

Complex Joins in Postgresql

It's possible I'm stupid, but I've been querying and checking for hours and I can't seem to find the answer to this, so I apologize in advance if the post is redundant... but I can't seem to find its doppelganger.
OK: I have a PostGreSQL db with the following tables:
Key(containing two fields in which I'm interested, ID and Name)
and a second table, Key.
Data contains well... data, sorted by ID. ID is unique, but each Name has multiple ID's. E.G. if Bill enters the building this is ID 1 for Bill. Mary enters the building, ID 2 for Mary, Bill re-enters the building, ID 3 for Bill.
The ID field is in both the Key table, and the DATA table.
What I want to do is... find
The MAX (e.g. last) ID, unique to EACH NAME, and the Data associated with it.
E.g. Bill - Last Login: ID 10. Time: 123UTC Door: West and so on.
So... I'm trying the following query:
SELECT
*
FROM
Data, Key
WHERE
Key.ID = (
SELECT
MAX (ID)
FROM
Key
GROUP BY ID
)
Here's the kicker, there's about... something like 800M items in these tables, so errors are... time consuming. Can anyone help to see if this query is gonna do what I expect?
Thanks so much.
To get the maximum key for each name . . .
select Name, max(ID) as max_id
from data
group by Name;
Join that to your other table.
select *
from key t1
inner join (select Name, max(ID) as max_id
from data
group by Name) t2
on t1.id = t2.max_id