Plotting Total Spending/Purchase of Unique Customers against Each Store - TSQL - tsql

I have a requirement where purchases (spending) of customers made at different stores are provided in transactions table. It is required that
What are the total purchases made at each store.
How many unique/distinct customers made purchases at each store.
Problematic part is that What are the total purchases of those unique customer at all stores which are spend at a particulars store. i.e. if three unique customer spent at a 'Mega Store', how to plot the total spent of those three customer against 'Mega Store'. If 4 unique customer spent at 'Alpine Store', then show total purchases of those customers against 'Alpine Store'
Challenge I am facing is that how to attain the 03rd requirement. Did some manual work in excel to display the desired results.
Below is the Table Creation Script:
CREATE TABLE [dbo].[Transactions](
[Date] [datetime] NULL,
[Cust_ID] [smallint] NULL,
[Amount] [smallint] NULL,
[Store] [varchar](10) NULL
)
GO
INSERT INTO Transactions (Date,Cust_ID,Amount,Store)
Values
('20210222','1001',100,'Mega Store'),
('20210223','1002',200,'Z Trade'),
('20210224','1003',300,'Alpine'),
('20210227','1002',200,'Alpine'),
('20210228','1003',300,'Mega Store'),
('20210302','1001',100,'Alpine'),
('20210303','1002',200,'Mega Store'),
('20210304','1003',300,'Z Trade'),
('20210306','1001',100,'Mega Store'),
('20210307','1002',200,'Z Trade'),
('20210308','1003',300,'Alpine'),
('20210309','1004',400,'Mega Store')
select * from Transactions
-- Get Total Spend and Distinct Customer --
SELECT A.Store,SUM(A.Amount) AS Spend,B.Distinct_Customer
from Transactions A
left outer join
(select store,count(distinct cust_id) as Distinct_Customer from Transactions group by Store) B
on A.Store = B.store
group by a.store,b.Distinct_Customer
-- Store wise Unique Cust_ids
select distinct store,cust_id from Transactions
group by store,cust_id
An image of working done in excel also added
enter image description here

Related

How to create a working pivot table without killing the system

Let's say you have a Customer table, a simple customer table with just 4 columns:
customerCode numeric(7,0)
customerName char(50)
customerVATNumber char(11)
customerLocation char(35)
Keep in mind that the customers table contains 3 million rows because there are all the customers of the last 40 years, but the active ones are only 980000.
Suppose we then have a table called Sales structured in this way:
saleID integer
customerCode numeric(7,0)
agentID numeric(6,0)
productID char(2)
dateBeginSale date
dateEndSale date
There are about three and a half million rows in this table (here too we have stuff from 40 years ago), but the current supplies for the various products are a total of one million. The company only sells 4 products. Each customer can purchase up to 4 products with 4 different contracts even from 4 different agents. Most (90%) buy only one, the remaining from two to 4 (those who make the complete assortment are just 4 cats).
I was asked to build a pivot table showing for each customer with it's name and location all the product he purchased and from which agent.
The proposed layout for this pivot table is:
customerCode
customerName
customerLocation
productID1
agentID1
saleID1
dateBeginSale1
dateEndSale1
productID2
agentID2
saleID2
dateBeginSale2
dateEndSale2
productID3
agentID3
saleID3
dateBeginSale3
dateEndSale3
productID4
agentID4
saleID4
dateBeginSale4
dateEndSale4
I built the pivot with a view.
First I created 4 views, one for each product id on the Sales table, also useful for other statistical and reporting purposes
View1 as
customerCode1
productID1
agentID1
saleID1
dateBeginSale1
dateEndSale1
View2 as
customerCode2
productID2
agentID2
saleID2
dateBeginSale2
dateEndSale2
and so on till View4
Then i joined the 4 views with the customer table and created the PivotView i needed.
Now Select * from PivotView works perfectly.
Also Select * from PivotView Where customerLocation='NEW YORK CITY' too.
Any other request, for example: we select and count the customers residing in LOS ANGELES who have purchased the products from the same agent or from different sales agents, literally makes the machine sit down, I see the memory occupation grow (probably due to the construction of some temporary table or view) and often the execution of the query crashes.
However, if I create the same pivot on a table instead of a view the times of the various selections collapse and even if heavy (there are always about a million records to scan to verify the existence of the various conditions) they become acceptable.
For sure i am mistaking something and/or there must to be a better way to achieve the result: having a pivot built on on line data istead of one from data extracted nightly.
I'll be happy to read your comments and suggestion.
I don't clearly understand your data layout and what you need. But I'll say that the usual problem with pivoting data on Db2 for IBM i is that there's no built in way to dynamically pivot the data.
Given that you only have 4 products, the above limitation doesn't really apply.
Your problem would seem to be that by creating 4 views over the same table, you're processing records repeatedly. Instead, try to touch the data one time.
create view PivotSales as
select
customerCode,
-- product 1
max(case productID when '01' then productID end) as productID1,
max(case productID when '01' then agentID end) as agentID1,
max(case productID when '01' then saleID end) as saleID1,
max(case productID when '01' then dateBeginSale end) as dateBeginSale1,
max(case productID when '01' then dateEndSale end) as dateEndSale1,
-- product 2
max(case productID when '02' then productID end) as productID2,
max(case productID when '02' then agentID end) as agentID2,
max(case productID when '02' then saleID end) as saleID2,
max(case productID when '02' then dateBeginSale end) as dateBeginSale2,
max(case productID when '02' then dateEndSale end) as dateEndSale2,
-- repeat for product 3 and 4
from Sales
group by customerCode;
Now you can have a CustomerSales view:
create view CustomerSales as
select *
from Customers join SalesPivot using (customerCode);
Run your queries, using Visual Explain to see what indexes the system suggests are needed. At minimum, you should have an indexes:
Customer (customerCode)
Customer (location, customerCode)
Sales (customerCode)
I suspect that some Encoded Vector Indexes (EVI) over various columns in Sales and Customer would prove helpful. Especially since you mention "counting". An EVI keeps track of the counts of the symbols. So counting is "free". An example:
create encoded vector index customerLocEvi
on Customers (location);
-- this doesn't have to read any rows in customer
select count(*)
from customer
where location = 'LOS ANGELES';
For sure I am mistaking something and/or there must to be a better way
to achieve the result: having a pivot built on on line data istead of
one from data extracted nightly.
Don't be too sure about that. The DB structure that best supports Business Intelligence type queries usually doesn't match the typical transactional data structure. A periodic "extract, transform, load (ETL)" is pretty typical.
For your particular use case, you could turn CustomerSales into a Materalized Query Table (MQT), build some supporting indexes for it and just run queries directly over it. Nightly rebuild would be as simple as REFRESH CustomerSales;
Or if you wanted too, since Db2 for IBM i doesn't support SYSTEM MAINTAINED MQTs, a trigger over Sales could automatically propagate data to CustomerSales instead of rebuilding it nightly.

Order picking in warehouse

In implementing the warehouse management system for an ecommerce store, I'm trying to create a picking list for warehouse workers, who will walk around a warehouse picking products in orders from different shelves.
One type of product can be on different shelves, and on each shelf there can be many of the same type of product.
If there are many of the same product in one order, sometimes the picker has to pick from multiple shelves to get all the items in an order.
To further make things trickier, sometimes the product will run out of stock as well.
My data model looks like this (simplified):
CREATE TABLE order_product (
id SERIAL PRIMARY KEY,
product_id integer,
order_id text
);
INSERT INTO "public"."order_product"("id","product_id","order_id")
VALUES
(1,1,'order1'),
(2,1,'order1'),
(3,1,'order1'),
(4,2,'order1'),
(5,2,'order2'),
(6,2,'order2');
CREATE TABLE warehouse_placement (
id SERIAL PRIMARY KEY,
product_id integer,
shelf text,
quantity integer
);
INSERT INTO "public"."warehouse_placement"("id","product_id","shelf","quantity")
VALUES
(1,1,E'A',2),
(2,2,E'B',2),
(3,1,E'C',2);
Is it possible, in postgres, to generate a picking list of instructions like the following:
order_id product_id shelf quantity_left_on_shelf
order1 1 A 1
order1 1 A 0
order1 2 B 1
order1 1 C 1
order2 2 B 0
order2 2 NONE null
I currently do this in the application code, but that feel quite clunky and somehow I feel like there should be a way to do this directly in SQL.
Thanks for any help!
Here we go:
WITH product_on_shelf AS (
SELECT warehouse_placement.*,
generate_series(1, quantity) AS order_on_shelf,
quantity - generate_series(1, quantity) AS quantity_left_on_shelf
FROM warehouse_placement
)
, product_on_shelf_with_product_order AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY product_id
ORDER BY quantity, shelf, order_on_shelf
) AS order_among_product
FROM product_on_shelf
)
, order_product_with_order_among_product AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY product_id
ORDER BY id
) AS order_among_product
FROM order_product
)
SELECT order_product_with_order_among_product.id,
order_product_with_order_among_product.order_id,
order_product_with_order_among_product.product_id,
product_on_shelf_with_product_order.shelf,
product_on_shelf_with_product_order.quantity_left_on_shelf
FROM order_product_with_order_among_product
LEFT JOIN product_on_shelf_with_product_order
ON order_product_with_order_among_product.product_id = product_on_shelf_with_product_order.product_id
AND order_product_with_order_among_product.order_among_product = product_on_shelf_with_product_order.order_among_product
ORDER BY order_product_with_order_among_product.id
;
Here's the idea:
We create a temporary table product_on_shelf, which is the same as warehouse_placement, except the rows are duplicated n times, n being the quantity of the product on the shelf.
We assign a number order_among_product to each row in product_on_shelf, so that each object on shelf knows its order among the same products.
We assign a symmetric number order_among_product to each row in order_product.
For each row in order_product, we try to find the product on shelf with the same order_among_product. If we can't find any, it means we've exhausted the products on any shelf.
Side note #1: Picking products off shelves is a concurrent action. You should make sure, either on the application side or on the DB side via smart locks, that any product on shelf can be attributed to one single order. Treating each row of product_order on the application side might be the best option to deal with concurrence.
Side note #2: I've written this query using CTEs for clarity. To boost performance, consider using subqueries instead. Make sure to run EXPLAIN ANALYZE

Stored procedure help for taking IDs and getting result from another table

In a table (store100) I have a list of storeIDs that make more than 100sales a day. In another table I have sales. Now what I want to do is for every StoreID in table store100 I want to see how many of product x they sold in the sales table. How do I achieve this? Obviously I don't want to be manually entering the storeIDs all the time so I want it to take all the IDs in the table and compare for sales of x in the sales table.
Table Structre:
store100 table:
ID
lon1
lon2
glas4
edi5
etc
Sales Table:
ID |Location|Product|Quantity|Total Price
lon1 London Wallet 5 50
edi5 Manc Shoes 4 100
So for example I want a query where it takes all the store100 IDs and shows how many wallets they sold.
If anyone has a better idea of achieving please tell me
You will need a join for this
SELECT S100.ID
,S.Product
,S.Quantity
FROM Store100 S100
INNER JOIN Sales S
ON (S100.ID = S.ID)
Of course you will still need your where clause if you need it and modify your select to fit your needs.

Concurrency scenarios with INSERTs

I'm designing a booking system in PHP + PostgreSQL.
I'm not able to find a clean solution to a concurrency problem based on INSERTs operations.
The DB system is mainly made of these tables:
CREATE TABLE booking (
booking_id INT,
user_id INT,
state SMALLINT,
nb_coupons INT
);
CREATE booking_state_history (
booking_state_history_id INT,
timestamp TIMESTAMP,
booking_id INT,
state SMALLINT);
CREATE TABLE coupon_purchase(
coupon_purchase_id,
user_id INT,
nb INT,
value MONEY)
CREATE TABLE coupon_refund(
coupon_refund_id INT,
user_id,
nb INT,
value MONEY)
CREATE TABLE booking_payment(
booking_payment_id INT,
user_id,
booking_id,
nb INT,
value MONEY)
A booking must be paid with coupons that have been previously purchased by the user. Some coupons may have been refund. All these operations are stored in the two corresponding tables to keep an history and be able to compute the coupon balance.
Constraint: the coupon balance cannot be negative at any time.
A booking is finalized when it is paid with coupons.
Then the following operations happen:
BEGIN;
(1) Check there are enough coupons remaining to pay the booking. (SELECT)
(2) Decide which coupons (number and value) will be used to pay the booking
(mainly, higher cost coupon used first. But that is not the issue here.)
(3) Add records to booking_payment (INSERTs)
(4) Move the booking to state="PAID" (integer value representing "PAID") (UPDATE)
(5) Add a record to booking_state_history (INSERT)
COMMIT;
These operations need to be atomic to preserve DB information coherency.
Hence the usage of transactions that allow to COMMIT or ROLLBACK in case of failure, DB exception, PHP exception or any other issue in the middle of the operations.
Scenario 1
Since I'm in a concurrent access environment (web site) nothing prevents the user from (for instance) asking for a coupon refund while doing a booking payment at the same time.
Scenario 2
He can also trigger two concurrent booking payments at the same time in two different transactions.
So the following can happen:
Scenario 1
After (1) is done, the coupon refund is triggered by the user and the subsequent coupon balance is not enough to pay the booking any more.
When it COMMITs the balance becomes negative.
Note:
Even if I do a recheck of coupon balance in a new (6) step, there is a possibility for the coupon refund to happen in the meantime between (6) and COMMIT.
Scenario 2
Two concurrent booking payment transactions for which total number of coupons for payment is too much for the global balance to stay positive. Only one of them can happen.
Transaction 1 and transaction 2 are checking for balance and seeing enough coupons for their respective payment in step (1).
They go on with their operations and COMMIT. The new balance is negative and conflicting with the constraint.
Note:
Even if I do a coupon balance recheck in a new (6) step the transactions cannot see the operations not yet commited by the other one.
So they blindly proceed to COMMIT.
I guess this is an usual concurrency case but I cannot find a pattern to solve this on the internet.
I thought of rechecking the balance after the COMMIT so I can manually UNDO all the operations. But it is not totally safe since if an exception happen after the commit the UNDO won't be done.
Any idea to solve this concurrency problem?
Thanks.
Your problem boils down to the question of "what should be the synchronization lock". From your question it seems that the booking is not booking of a specific item. But lets assume, that a user is booking a specific hotel room so you need to solve two problems:
prevent overbooking (e.g. booking the same thing for two people)
prevent parallel account state miscalculation
So when a user gets to a point when he/she is about to hit confirm button, this is a possible scenario you can implement:
begin transaction
lock the user entry so that parallel processes are blocked
SELECT * FROM user FOR UPDATE WHERE id = :id
re-check account balance and throw exception / rollback if there are insufficient funds
lock the item to be booked to prevent overbooking
SELECT * FROM room FOR UPDATE WHERE id = :id
re-check booking availability and throw exception / rollback if the item is already booked
create booking entry and subtract funds from user's account
commit transaction (all locks will be released)
If, in your case, you don't need to check for overbooking, just skip / ignore steps 4 and 5.
Below is the solution I've implemented.
Note: I just treated the coupon transfer part below but it is the same with booking state change and booking_state_history.
The main idea is to preserve this part of the processing as a critical section.
When an INSERT into booking_payment, coupon_purchase or coupon_refund is to be done I prevent other transactions from doing the same by putting a lock on a dedicated table through an UPDATE for the given user_id.
This way, only transactions impacting this given user_id for the same kind of treatment will be locked.
Intialization
DROP TABLE coupon_purchase;
DROP TABLE coupon_refund;
DROP TABLE booking_payment;
DROP TABLE lock_coupon_transaction;
CREATE TABLE coupon_purchase(
coupon_purchase_id SERIAL PRIMARY KEY,
user_id INT,
nb INT);
CREATE TABLE coupon_refund(
coupon_refund_id SERIAL PRIMARY KEY,
user_id INT,
nb INT);
CREATE TABLE booking_payment(
booking_payment_id SERIAL PRIMARY KEY,
user_id INT,
booking_id INT,
nb INT);
CREATE TABLE lock_coupon_transaction (
user_id INT,
timestamp TIMESTAMP);
INSERT INTO coupon_purchase
(user_id, nb) VALUES
(1, 1),
(1, 5);
INSERT INTO coupon_refund
(user_id, nb) VALUES
(1, 3);
INSERT INTO lock_coupon_transaction
(user_id, timestamp) VALUES
(1, current_timestamp);
Transaction 1
BEGIN;
UPDATE lock_coupon_transaction SET timestamp=current_timestamp WHERE user_id='1';
WITH coupon_balance AS (
SELECT
t1.nb_purchased_coupons -
t2.nb_refunded_coupons -
t3.nb_booking_payment_coupons AS total
FROM
(SELECT COALESCE(SUM(nb),0) AS nb_purchased_coupons FROM coupon_purchase WHERE user_id='1' ) t1,
(SELECT COALESCE(SUM(nb),0) AS nb_refunded_coupons FROM coupon_refund WHERE user_id='1' ) t2,
(SELECT COALESCE(SUM(nb),0) AS nb_booking_payment_coupons FROM booking_payment WHERE user_id='1' ) t3
)
INSERT INTO booking_payment
(user_id, booking_id, nb)
SELECT 1::INT, 1::INT, 3::INT
FROM coupon_balance
WHERE (total::INT >= 3::INT);
INSERT 0 1
Transaction 2
BEGIN;
UPDATE lock_coupon_transaction SET timestamp=current_timestamp WHERE user_id='1';
// Transaction is locked waiting for a COMMIT or ROLLBACK from transaction 1.
Transaction 1
COMMIT;
COMMIT
Transaction 2
// Transaction 1 lock has been released so transaction 2 can go on
WITH coupon_balance AS (
SELECT
t1.nb_purchased_coupons -
t2.nb_refunded_coupons -
t3.nb_booking_payment_coupons AS total
FROM
(SELECT COALESCE(SUM(nb),0) AS nb_purchased_coupons FROM coupon_purchase WHERE user_id='1' ) t1,
(SELECT COALESCE(SUM(nb),0) AS nb_refunded_coupons FROM coupon_refund WHERE user_id='1' ) t2,
(SELECT COALESCE(SUM(nb),0) AS nb_booking_payment_coupons FROM booking_payment WHERE user_id='1' ) t3
)
INSERT INTO coupon_refund
(user_id, nb)
SELECT 1::INT, 3::INT
FROM coupon_balance
WHERE (total::INT >= 3::INT);
INSERT 0 0
COMMIT;
COMMIT
INSERT couldn't be done since not enough money on the account. This is the expected behavior.
The previous transaction was commited when the second one proceeded. So transaction 2 could see all changes made by transaction 1.
This way there is not risk to have concurrent access to coupons handling.

Creating a many to many in postgresql

I have two tables that I need to make a many to many relationship with. The one table we will call inventory is populated via a form. The other table sales is populated by importing CSVs in to the database weekly.
Example tables image
I want to step through the sales table and associate each sale row with a row with the same sku in the inventory table. Here's the kick. I need to associate only the number of sales rows indicated in the Quantity field of each Inventory row.
Example: Example image of linked tables
Now I know I can do this by creating a perl script that steps through the sales table and creates links using the ItemIDUniqueKey field in a loop based on the Quantity field. What I want to know is, is there a way to do this using SQL commands alone? I've read a lot about many to many and I've not found any one doing this.
Assuming tables:
create table a(
item_id integer,
quantity integer,
supplier_id text,
sku text
);
and
create table b(
sku text,
sale_number integer,
item_id integer
);
following query seems to do what you want:
update b b_updated set item_id = (
select item_id
from (select *, sum(quantity) over (partition by sku order by item_id) as sum from a) a
where
a.sku=b_updated.sku and
(a.sum)>
(select count(1) from b b_counted
where
b_counted.sale_number<b_updated.sale_number and
b_counted.sku=b_updated.sku
)
order by a.sum asc limit 1
);