I'm building an ecommerce system with products and variants, where each has between 1 and 5 images that are stored on Amazon S3. Is it considered best practice to have a separate images table where I store the S3 URLs, or is acceptable to just add 5 image columns to each of the products and variants tables? Having a separate images table means that on import I need to do 6 SELECTS and then INSERTS (to make sure the product and each of its images don't already exist and then to import them) rather than 1. And, on retrieval, I need to join the images table to the products table 5 times to have it return the images with the product, like this:
SELECT prd."id" AS id, prd."title" AS title, prd."description" AS description,
prd."createdAt" AS productcreatedate,
prdPic1."url" AS productpic1,
prdPic2."url" AS productpic2,
prdPic3."url" AS productpic3,
prdPic4."url" AS productpic4,
prdPic5."url" AS productpic5,
brd."name" AS brandname, brd."id" AS brandid,
cat."name" AS categoryname, cat."id" AS categoryid,
prt."name" AS partnername, prt."id" AS partnerid
FROM "Products" prd
LEFT OUTER JOIN "Pictures" prdPic1 ON prdPic1."entityId" = prd."id" AND prdPic1."entity" = '1'
AND prdPic1."sortOrder" = 1
LEFT OUTER JOIN "Pictures" prdPic2 ON prdPic2."entityId" = prd."id" AND prdPic2."entity" = '1'
AND prdPic2."sortOrder" = 2
LEFT OUTER JOIN "Pictures" prdPic3 ON prdPic3."entityId" = prd."id" AND prdPic3."entity" = '1'
AND prdPic3."sortOrder" = 3
LEFT OUTER JOIN "Pictures" prdPic4 ON prdPic4."entityId" = prd."id" AND prdPic4."entity" = '1'
AND prdPic4."sortOrder" = 4
LEFT OUTER JOIN "Pictures" prdPic5 ON prdPic5."entityId" = prd."id" AND prdPic5."entity" = '1'
AND prdPic5."sortOrder" = 5
INNER JOIN "Brands" brd ON brd."id" = prd."BrandId"
INNER JOIN "Categories" cat ON cat."id" = prd."CategoryId"
INNER JOIN "Partners" prt ON prt."id" = brd."PartnerId";
The value of normalizing Brands, Categories, and Partners is clear to me to reduce redundancy. I'm less clear on the value for an images table. Explain Analyze on Postgres says this query takes 3310 msec to return 22000 rows. However, I haven't created indexes on Pictures yet, so that's not a fair analysis.
In this case, I think I would add single column, of text[] type (array of text). In this way, you have the images where you need them, and you're not bound to end in hell when you'll add more images for a product.
Related
Goal: Create a query to pull the closest cycle count event (Table C) for a product ID based on the inventory adjustments results sourced from another table (Table A).
All records from Table A will be used, but is not guaranteed to have a match in Table C.
The ID column will be present in both tables, but is not unique in either, so that pair of IDs and Timestamps together are needed for each table.
Current simplified SQL
SELECT
A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM
A
LEFT JOIN
C
ON A.LPID = C.LPID
WHERE
A.facility = 'FACID'
AND A.WHENOCCURRED > '23-DEC-22'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC
;
This is currently pulling the first hit on C.WHENOCCURRED on the LPID matches. Want to see if there is a simpler JOIN solution before going in a direction that creates 2 temp tables based on WHENOCCURRED.
I have a functioning INDEX(MATCH(MIN()) solution in Excel but that requires exporting a couple system reports first and is extremely slow with X,XXX row tables.
If you are using Oracle 12 or later, you can use a LATERAL join and FETCH FIRST ROW ONLY:
SELECT A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM A
LEFT OUTER JOIN LATERAL (
SELECT *
FROM C
WHERE A.LPID = C.LPID
AND A.whenoccurred <= c.whenoccurred
ORDER BY c.whenoccurred
FETCH FIRST ROW ONLY
) C
ON (1 = 1) -- The join condition is inside the lateral join
WHERE A.facility = 'FACID'
AND A.WHENOCCURRED > DATE '2022-12-23'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC;
I have 3 tables in our ERP database holding all delivery data (table documents holds one row for each delivery note, documentpos holds all positions on delivery notes, documentserialnumbers holds all serial numbers for delivered items).
I want to show all items with their serial number that have been delivered to the customer and still resides there so far.
The above shown output of the following query however shows, that one item that has been delivered was returned (red marks) later. The return delivery note has the document number 527419 (dark red mark) and refers to the the delivery note 319821 (green) which is listed yellow.
The correct list would consequential show only items that are still on customer's site without the returned items (see below).
How do I have to change the query in order to exclude the returned items from the output?
The upper table shows in the image shows the output of my query, the table below how it should be.
select a.BelID, c.ReferenzBelID, a.itemnumber, a.itemname, c.deliverynotenumber,c.documenttype, c.documentmark, b.serialnumber
from dbo.documentpos a
inner join dbo.documentserialnumbers b on a.BelPosID = b.BelPosID
inner join dbo.documents c on a.BelID = c.BelID
inner join sysdba.customers d on d.account = c.A0Name1
where d.AccountID = 'customername' and c.documenttype like '%delivery%'
order by a.BelID
You may exclude positions, which are referenced by any "return" delivery note, like this (edited)
select a.BelID, c.ReferenzBelID, a.itemnumber, a.itemname, c.deliverynotenumber,c.documenttype, c.documentmark, b.serialnumber
from dbo.documentpos a
inner join dbo.documentserialnumbers b on a.BelPosID = b.BelPosID
inner join dbo.documents c on a.BelID = c.BelID
inner join sysdba.customers d on d.account = c.A0Name1
where d.AccountID = 'customername' and c.documenttype like '%delivery%'
and not exists (select 1
from dbo.documents cc
where cc.documenttype like '%delivery%'
and c.ReferenzBelID=cc.BelID
and c.documentmark='VLR')
and not exists (select 1
from dbo.documents ccc
join dbo.documentpos aa on aa.BelID = ccc.BelID
where ccc.ReferenzBelID=c.BelID
and ccc.documentmark='VLR'
and a.itemnumber=aa.itemnumber)
order by a.BelID
It is my understanding that when this query runs it would not populate any data any number of times it runs because of the where clause
where c.company_id = lot.company_id
and p.product_id = lot.product_id
and l.packlevel_id = lot.packlevel_id
It looks to me that at the very beginning when the table fact_table_lot is empty the where clause would return with empty data because it would not find anything in an empty table and it would happen everytime. Is my understanding wrong?
insert into fact_table_lot(company_id, product_id, packlevel_id, l_num, sn_count, comm_loct, comm_start, commdate_end, man_date, exp_date, user_id, created_datetime)
select c.company_id, p.product_id, l.packlevel_id, l_num, sn_count, comm_loct, comm_start, commdate_end, man_date, exp_date, user_id, sysdate
from staging_serials s
left outer join fact_table_lot lot on s.lotnumber = lot.l_num
join company c on c.lsc_company_id = s.companyid
join product p on s.compositeprodcode = p.compositeprodcode
join level l on l.unit_of_measure = p.packaginguom
where c.company_id = lot.company_id
and p.product_id = lot.product_id
and l.packlevel_id = lot.packlevel_id
and lot.created_datetime is null
In your query staging_serials s left outer join fact_table_lot lot on s.lotnumber= lot.l_num this will give the result set containing all records from staging_serials and since fact table is empty null values for those column from fact table. If you want no records to be returned use a inner join instead of left join.
I have a bunch of tables in postgresql and I run a query as follows
SELECT DISTINCT ON ...some stuff...
FROM "rent_flats" INNER JOIN "rent_flats_linked_users"
ON "rent_flats_linked_users"."rent_flat_id" = "rent_flats"."id"
INNER JOIN "users"
ON "users"."id" = rent_flats_linked_users"."user_id"
INNER JOIN "owners"
ON "owners"."id" = "users"."profile_id" AND "users"."profile_type" = 'Owner'
INNER JOIN "phone_numbers"
ON "phone_numbers"."person_id" = "owners"."id" AND "phone_numbers"."person_type" = 'Owner'
INNER JOIN "phone_number_categories"
ON "phone_number_categories"."id" = "phone_numbers"."phone_number_category_id"
INNER JOIN "localities"
ON "localities"."id" = "rent_flats"."locality_id"
INNER JOIN "regions"
ON "regions"."id" = "localities"."region_id"
INNER JOIN "cities"
ON "cities"."id" = "regions"."city_id"
INNER JOIN "property_types"
ON "property_types"."id" = "rent_flats"."property_type_id"
INNER JOIN "apartment_types"
ON "apartment_types"."id" = "rent_flats"."apartment_type_id"
WHERE "rent_flats"."status" = 3
AND (((extract(epoch from age(current_date,rent_flats.date_added))/86400)::int) IN (cities.short_period,cities.long_period))
AND (phone_number_categories.name IN ('SMS','SMS & Mobile'))
ORDER BY rf_id, phone_numbers.priority ASC
Note: The rent_flats table contains around 5 million rows, and rent_flats_linked_users contains around 600k rows and users contains 350k rows.Other tables are small in size.
The query takes about 6.8 secs to execute and the explain analyses shows that around 50% of the total time goes in sequential scans of the rent_flats, users and rent_flats_linked_users tables and the other 30% in Hash joins.
On setting seq_scan to off...the query takes even longer to ~11 secs (in this case Hash and Hash join take upto 97.5% of the time)
Here's the explain query plan analyses.
I have put indices on the fields involved in the inner joins as well as on fields involved in the filters like phone_numbers.priority and cities.short_period and cities.long_period. But I still get a sequential scan. What can be the reasons and possible solutions to fasten the query?
I suspect that if there is a part of that query worth optimising then it is this:
(((extract(epoch from age(current_date,rent_flats.date_added))/86400)::int) IN (cities.short_period,cities.long_period))
You really need to turn that into something like:
rent_flats.date_added in (...)
Then you can index date_added, and maybe index (date_added, status).
the next step would be to make sure that the join columns are indexed.
In my application i have a query that do multiple joins with a table position. Just like this:
SELECT *
FROM (...) as trips
join trip as t on trips.trip_id = t.trip_id
left outer join vehicle as v on v.vehicle_id = t.trip_vehicle_id
left outer join position as start on trips.start_position_id = start.position_id and start.position_vehicle_id = v.vehicle_id
left outer join position as "end" on trips.end_position_id = "end".position_id and "end".position_vehicle_id = v.vehicle_id
left outer join position as last on trips.last_position_id = last.position_id and last.position_vehicle_id = v.vehicle_id;
My table position has 35 columns(for example position_id).
When I run the query, in result should appear the table position 3 times, start, end and last. But postgres can not distinguish between, for exemplar, start.position_id, end.position_id and last.position_id. So this 3 columns are group and appear as one, position_id.
As the data from start.position_id and end.position_id are different, the column, position_id, that appear in result, it's empty.
Without having to rename all the columns, like this: start.position_id as start_position_id.
How can i get each group of data separately, for exemple, get all columns from the table 'start'. In MYSQL i can do this operation by calling fetch_fields, and give the function an alias, like 'start'.
But i can i do this in Postgres?
Best Regards,
Nuno Oliveira
My understanding is that you can't (or find it difficult to) discern between which table each column with a shared name (such as "position_id") belongs to, but only need to see one of the sets of shared columns at any one time. If that is the case, use tablename.* in your SELECT, so SELECT trips.*, start.*... would show the columns from trips and start, but no columns from other tables involved in the join.
SELECT [...,] start.* [,...] FROM [...] atable AS start [...]