Creating a many to many in postgresql - postgresql

I have two tables that I need to make a many to many relationship with. The one table we will call inventory is populated via a form. The other table sales is populated by importing CSVs in to the database weekly.
Example tables image
I want to step through the sales table and associate each sale row with a row with the same sku in the inventory table. Here's the kick. I need to associate only the number of sales rows indicated in the Quantity field of each Inventory row.
Example: Example image of linked tables
Now I know I can do this by creating a perl script that steps through the sales table and creates links using the ItemIDUniqueKey field in a loop based on the Quantity field. What I want to know is, is there a way to do this using SQL commands alone? I've read a lot about many to many and I've not found any one doing this.

Assuming tables:
create table a(
item_id integer,
quantity integer,
supplier_id text,
sku text
);
and
create table b(
sku text,
sale_number integer,
item_id integer
);
following query seems to do what you want:
update b b_updated set item_id = (
select item_id
from (select *, sum(quantity) over (partition by sku order by item_id) as sum from a) a
where
a.sku=b_updated.sku and
(a.sum)>
(select count(1) from b b_counted
where
b_counted.sale_number<b_updated.sale_number and
b_counted.sku=b_updated.sku
)
order by a.sum asc limit 1
);

Related

Postgres query filter by non column in table

i have a challenge whose consist in filter a query not with a value that is not present in a table but a value that is retrieved by a function.
let's consider a table that contains all sales on database
id, description, category, price, col1 , ..... col n
i have function that retrieve me a table of similar sales from one (based on rules and business logic) . This function performs a query again on all records in the sales table and match validation in some fields.
similar_sales (sale_id integer) - > returns a integer[]
now i need to list all similar sales for each one present in sales table.
select s.id, similar_sales (s.id)
from sales s
but the similar_sales can be null and i am interested only return sales which contains at least one.
select id, similar
from (
select s.id, similar_sales (s.id) as similar
from sales s
) q
where #similar > 1 (Pseudocode)
limit x
i can't do the limit in subquery because i don't know what sales have similar or not.
I just wanted do a subquery for a set of small rows and not all entire table to get query performance gains (pagination strategy)
you can try this :
select id, similar
from sales s
cross join lateral similar_sales (s.id) as similar
where not isempty(similar)
limit x

using 1 WITH statement with multiple (n) INSERTs

I wanted to know if it is possible to create one WITH statement and add multiple data values to other table with select, or do any equivalent thing.
I have 2 tables
one has data
create table bg_item(
item_id text primary key default 'a'||nextval('bg_item_seq'),
sellerid bigint not null references bg_user(userid),
item_type char(1)not null default 'N', --NORMAL (public)
item_upload_date date NOT NULL default current_date,
item_name varchar(30) not null,
item_desc text not null,
other has images link
create table item_images(
img_id bigint primary key default nextval('bg_item_image_seq'),
item_id text not null references bg_item (item_id),
image_link text not null
);
The user can add item to sell and upload images of it, now these images can be 3 or more, now when i will add the images, and complete item's description and everything from app, my request goes to backend, and i want to perform the query that it adds the user's item, sends me the item's id (which is a sequence in PostgresSQL) and use that id to reference images that i am inserting.
Currently i was doing this (for 1 image):
WITH ins1 AS (
INSERT INTO bg_item(sellerid,item_type,item_date,item_name,item_desc,item_specs,item_category)
VALUES (1005, 'k',default,'asdf','asdf','asd','asd')
RETURNING item_id
)
INSERT INTO item_images (item_id, image_link)
select item_id,'asdfg.asd.asdf.com' from ins1
(for 3 images)
INSERT INTO bg_item(sellerid,item_type,item_date,item_name,item_desc,item_specs,item_category)
VALUES (1005, 'k',default,'asdf','asdf','asd','asd')
RETURNING item_id
)
INSERT INTO item_images (item_id, image_link)
select item_id,'asdfg.asd.asdf.com' from ins1
select item_id,'asdfg.asdaws3f.com' from ins1
select item_id,'asdfg.gooolefnsd.sfsjf.com' from ins1
This would work for 3 images.
So my question is how to do it with n number of images? (as user can upload from 1 to n images)
Can i write a for loop?
a procedure or function?
References:
With and Insert
Sql multiple insert select
I didn't understand the Edit 3 (if it is related to my answer) in the above one.
One Solution i can think of is to write a procedure to return me item_id and write one more procedure to run multiple inserts, but i want a more efficient solution.
If you are going to work with SQL then there is a concept you need to expel from your thoughts -- LOOP. As soon as you think it, it is time to rethink. It does not exist is SQL and is not typically needed. SQL works in sets of qualifying things not individual things.
Now to your issue, it can be done is 1 statement. You pass your image list as an array of text in the with clause, then unnest that array and join to your existing cte during the Insert/Select:
with images (ilist) as
(
select array['image1','image2','image3','image4','image5']
)
, item (item_id) as
(
insert into bg_item(sellerid,item_type,item_date,item_name,item_desc,item_specs,item_category)
values (1005, 'k',default,'asdf','asdf','asd','asd')
returning item_id
)
insert into item_images (item_id, image_link)
select item_id,unnest (ilist)
from images
join item on true;

Order picking in warehouse

In implementing the warehouse management system for an ecommerce store, I'm trying to create a picking list for warehouse workers, who will walk around a warehouse picking products in orders from different shelves.
One type of product can be on different shelves, and on each shelf there can be many of the same type of product.
If there are many of the same product in one order, sometimes the picker has to pick from multiple shelves to get all the items in an order.
To further make things trickier, sometimes the product will run out of stock as well.
My data model looks like this (simplified):
CREATE TABLE order_product (
id SERIAL PRIMARY KEY,
product_id integer,
order_id text
);
INSERT INTO "public"."order_product"("id","product_id","order_id")
VALUES
(1,1,'order1'),
(2,1,'order1'),
(3,1,'order1'),
(4,2,'order1'),
(5,2,'order2'),
(6,2,'order2');
CREATE TABLE warehouse_placement (
id SERIAL PRIMARY KEY,
product_id integer,
shelf text,
quantity integer
);
INSERT INTO "public"."warehouse_placement"("id","product_id","shelf","quantity")
VALUES
(1,1,E'A',2),
(2,2,E'B',2),
(3,1,E'C',2);
Is it possible, in postgres, to generate a picking list of instructions like the following:
order_id product_id shelf quantity_left_on_shelf
order1 1 A 1
order1 1 A 0
order1 2 B 1
order1 1 C 1
order2 2 B 0
order2 2 NONE null
I currently do this in the application code, but that feel quite clunky and somehow I feel like there should be a way to do this directly in SQL.
Thanks for any help!
Here we go:
WITH product_on_shelf AS (
SELECT warehouse_placement.*,
generate_series(1, quantity) AS order_on_shelf,
quantity - generate_series(1, quantity) AS quantity_left_on_shelf
FROM warehouse_placement
)
, product_on_shelf_with_product_order AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY product_id
ORDER BY quantity, shelf, order_on_shelf
) AS order_among_product
FROM product_on_shelf
)
, order_product_with_order_among_product AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY product_id
ORDER BY id
) AS order_among_product
FROM order_product
)
SELECT order_product_with_order_among_product.id,
order_product_with_order_among_product.order_id,
order_product_with_order_among_product.product_id,
product_on_shelf_with_product_order.shelf,
product_on_shelf_with_product_order.quantity_left_on_shelf
FROM order_product_with_order_among_product
LEFT JOIN product_on_shelf_with_product_order
ON order_product_with_order_among_product.product_id = product_on_shelf_with_product_order.product_id
AND order_product_with_order_among_product.order_among_product = product_on_shelf_with_product_order.order_among_product
ORDER BY order_product_with_order_among_product.id
;
Here's the idea:
We create a temporary table product_on_shelf, which is the same as warehouse_placement, except the rows are duplicated n times, n being the quantity of the product on the shelf.
We assign a number order_among_product to each row in product_on_shelf, so that each object on shelf knows its order among the same products.
We assign a symmetric number order_among_product to each row in order_product.
For each row in order_product, we try to find the product on shelf with the same order_among_product. If we can't find any, it means we've exhausted the products on any shelf.
Side note #1: Picking products off shelves is a concurrent action. You should make sure, either on the application side or on the DB side via smart locks, that any product on shelf can be attributed to one single order. Treating each row of product_order on the application side might be the best option to deal with concurrence.
Side note #2: I've written this query using CTEs for clarity. To boost performance, consider using subqueries instead. Make sure to run EXPLAIN ANALYZE

Summarize repeated data in a Postgres table

I have a Postgres 9.1 table called ngram_sightings. Each row is a record of seeing an ngram in a document. An ngram can appear multiple times in a given document.
CREATE TABLE ngram_sightings
(
ngram VARCHAR,
doc_id INTEGER
);
I want summarize this table in another table called ngram_counts.
CREATE TABLE ngram_counts
(
ngram VARCHAR PRIMARY INDEX,
-- the number of unique doc_ids for a given ngram
doc_count INTEGER,
-- the count of a given ngram in ngram_sightings
corpus_count INTEGER
);
What is the best way to do this?
ngram_sightings is ~1 billion rows.
Should I create an index on ngram_sightings.ngram first?
Give this a shot!
INSERT INTO ngram_counts (ngram, doc_count, corpus_count)
SELECT
ngram
, count(distinct doc_id) AS doc_count
, count(*) AS corpus_count
FROM ngram_counts
GROUP BY ngram;
-- EDIT --
Here is a longer version using some temporary tables. First, count how many documents each ngram is associated with. I'm using 'tf' for "term frequency" and 'df' for "doc frequency", since you are heading in the direction of tf-idf vectorization and you may as well use the standard language, it will help with the next few steps.
CREATE TEMPORARY TABLE ngram_df AS
SELECT
ngram
, count(distinct doc_id) AS df
FROM ngram_counts
GROUP BY ngram;
Now you can create table for the total count of each ngram.
CREATE TEMPORARY TABLE ngram_tf AS
SELECT
ngram
, count(*) AS tf
FROM ngram_counts
GROUP BY ngram;
Then join the two on ngram.
CREATE TABLE ngram_tfidf AS
SELECT
tf.ngram
, tf.tf
, df.df
FROM ngram_tf
INNER JOIN ngram_df ON ngram_tf.ngram = ngram_df.ngram;
At this point, I expect you will be looking up ngram quite a bit, so it makes sense to index the last table on ngram. Keep me posted!

Finding duplicates between two tables

I've got two SQL2008 tables, one is a "Import" table containing new data and the other a "Destination" table with the live data. Both tables are similar but not identical (there's more columns in the Destination table updated by a CRM system), but both tables have three "phone number" fields - Tel1, Tel2 and Tel3. I need to remove all records from the Import table where any of the phone numbers already exist in the destination table.
I've tried knocking together a simple query (just a SELECT to test with just now):
select t2.account_id
from ImportData t2, Destination t1
where
(t2.Tel1!='' AND (t2.Tel1 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel2!='' AND (t2.Tel2 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel3!='' AND (t2.Tel3 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
... but I'm aware this is almost certainly Not The Way To Do Things, especially as it's very slow. Can anyone point me in the right direction?
this query requires a little more that this information. If You want to write it in the efficient way we need to know whether there is more duplicates each load or more new records. I assume that account_id is the primary key and has a clustered index.
I would use the temporary table approach that is create a normalized table #r with an index on phone_no and account_id like
SELECT Phone, Account into #tmp
FROM
(SELECT account_id, tel1, tel2, tel3
FROM destination) p
UNPIVOT
(Phone FOR Account IN
(Tel1, tel2, tel3)
)AS unpvt;
create unclustered index on this table with the first column on the phone number and the second part the account number. You can't escape one full table scan so I assume You can scan the import(probably smaller). then just join with this table and use the not exists qualifier as explained. Then of course drop the table after the processing
luke
I am not sure on the perforamance of this query, but since I made the effort of writing it I will post it anyway...
;with aaa(tel)
as
(
select Tel1
from Destination
union
select Tel2
from Destination
union
select Tel3
from Destination
)
,bbb(tel, id)
as
(
select Tel1, account_id
from ImportData
union
select Tel2, account_id
from ImportData
union
select Tel3, account_id
from ImportData
)
select distinct b.id
from bbb b
where b.tel in
(
select a.tel
from aaa a
intersect
select b2.tel
from bbb b2
)
Exists will short-circuit the query and not do a full traversal of the table like a join. You could refactor the where clause as well, if this still doesn't perform the way you want.
SELECT *
FROM ImportData t2
WHERE NOT EXISTS (
select 1
from Destination t1
where (t2.Tel1!='' AND (t2.Tel1 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel2!='' AND (t2.Tel2 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel3!='' AND (t2.Tel3 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
)