I'm having issues with the below query for my iPhone app. When the app runs the query it takes quite a while to process the result, maybe around a second or so... I was wondering if the query can be optimised in anyway? I'm using the FMDB framework to proces all my SQL.
select pd.discounttypeid, pd.productdiscountid, pd.quantity, pd.value, p.name, p.price, pi.path
from productdeals as pd, product as p, productimages as pi
where pd.productid = 53252
and pd.discounttypeid == 8769
and pd.productdiscountid = p.parentproductid
and pd.productdiscountid = pi.productid
and pi.type = 362
order by pd.id
limit 1
My statements are below for the tables:
CREATE TABLE "ProductImages" (
"ProductID" INTEGER,
"Type" INTEGER,
"Path" TEXT
)
CREATE TABLE "Product" (
"ProductID" INTEGER PRIMARY KEY,
"ParentProductID" INTEGER,
"levelType" INTEGER,
"SKU" TEXT,
"Name" TEXT,
"BrandID" INTEGER,
"Option1" INTEGER,
"Option2" INTEGER,
"Option3" INTEGER,
"Option4" INTEGER,
"Option5" INTEGER,
"Price" NUMERIC,
"RRP" NUMERIC,
"averageRating" INTEGER,
"publishedDate" DateTime,
"salesLastWeek" INTEGER
)
CREATE TABLE "ProductDeals" (
"ID" INTEGER,
"ProductID" INTEGER,
"DiscountTypeID" INTEGER,
"ProductDiscountID" INTEGER,
"Quantity" INTEGER,
"Value" INTEGER
)
Do you have indexes on foreign key columns (productimages.productid and product.parentproductid), and the columns you use to find right product deal (productdeals.productid and productdeals.discounttypeid)? If not, that could be the cause of poor performance.
You can create them like this:
CREATE INDEX idx_images_productid ON productimages(productid);
CREATE INDEX idx_products_parentid ON products(parentproductid);
CREATE INDEX idx_deals ON productdeals(productid, discounttypeid);
The below query could help you out to reduce the execution time, moreover try to create the indexes the fields correctly to fasten your query.
select pd.discounttypeid, pd.productdiscountid, pd.quantity, pd.value, p.name,
p.price, pi.path from productdeals pd join product p on pd.productdiscountid =
p.parentproductid join productimages pi on pd.productdiscountid = pi.productid where
pd.productid = 53252 and pd.discounttypeid = 8769 and pi.type = 362 order by pd.id
limit 1
Thanks
Related
I have 2 tables in my postgresql timescaledb database (version 12.06) that I try to query through inner join.
Tables' structure:
CREATE TABLE currency(
id serial PRIMARY KEY,
symbol TEXT NOT NULL,
name TEXT NOT NULL,
quote_asset TEXT
);
CREATE TABLE currency_price (
currency_id integer NOT NULL,
dt timestamp WITHOUT time ZONE NOT NULL,
open NUMERIC NOT NULL,
high NUMERIC NOT NULL,
low NUMERIC NOT NULL,
close NUMERIC,
volume NUMERIC NOT NULL,
PRIMARY KEY (
currency_id,
dt
),
CONSTRAINT fk_currency FOREIGN KEY (currency_id) REFERENCES currency(id)
);
The query I'm trying to make is:
SELECT currency_id AS id, symbol, MAX(close) AS close, DATE(dt) AS date
FROM currency_price
JOIN currency ON
currency.id = currency_price.currency_id
GROUP BY currency_id, symbol, date
LIMIT 100;
Basically, it returns all the rows that exist in currency_price table. I know that postgres doesn't allow select columns without an aggregate function or including them in "group by" clause. So, if I don't include dt column in my select query, i receive expected results, but if I include it, the output shows rows of every single day of each currency while I only want to have the max value of every currency and filter them out based on various dates afterwards.
I'm very inexperienced with SQL in general.
Any suggestions to solve this would be very appreciated.
There are several ways to do it, easiest one comes to mind is using window functions.
select *
from (
SELECT currency_id,symbol,close,dt
,row_number() over(partition by currency_id,symbol
order by close desc,dt desc) as rr
FROM currency_price
JOIN currency ON currency.id = currency_price.currency_id
where dt::date = '2021-06-07'
)q1
where rr=1
General window functions:
https://www.postgresql.org/docs/9.5/functions-window.html
works also with standard aggregate functions like SUM,AVG,MAX,MIN and others.
Some examples: https://www.postgresqltutorial.com/postgresql-window-function/
I have 2 big tables
CREATE TABLE "public"."linkages" (
"supplierid" integer NOT NULL,
"articlenumber" character varying(32) NOT NULL,
"article_id" integer,
"vehicle_id" integer
);
CREATE INDEX "__linkages_datasupplierarticlenumber" ON "public"."__linkages" USING btree ("datasupplierarticlenumber");
CREATE INDEX "__linkages_supplierid" ON "public"."__linkages" USING btree ("supplierid");
having 215 000 000 records, and
CREATE TABLE "public"."article" (
"id" integer DEFAULT nextval('tecdoc_article_id_seq') NOT NULL,
"part_number" character varying(32),
"supplier_id" integer,
CONSTRAINT "tecdoc_article_part_number_supplier_id" UNIQUE ("part_number", "supplier_id")
) WITH (oids = false);
having 5 500 000 records.
I need to update linkages.article_id according article.part_number and article.supplier_id, like this:
UPDATE linkages
SET article_id = article.id
FROM
article
WHERE
linkages.supplierid = article.supplier_id AND
linkages.articlenumber = article.part_number;
But it is to heavy. I tried this, but it processed for a day with no result. So I had terminated it.
I need to do this update only once to normalize my datatable structure for using foreign keys in Django ORM. How can I resolve this issue?
Thanks a lot!
I have a table consisting of products (with ID's, ~15k records) and another table price_changes (~88m records) recording a change in the price for a given productID at a given changedate.
I'm now interested in the price for each product at given points in time (say every 2 hours for a year, so altogether ~ 4300 points; altogether resulting in ~64m data points of interest). While it's very straight forward to determine the price for a given product at a given time, it seems to be quite time-consuming to determine all 64m data points.
My approach is to pre-populate a new target table fullprices with the data points of interest:
insert into fullprices(obsdate,productID)
select obsdate, productID from targetdates, products
and then update each price observation in this new table like this:
update fullprices f set price = (select price from price_changes where
productID = f.productID and date < f.obsdate
order by date desc
limit 1)
which should give me the most recent price change in each point in time.
Unfortunately, this takes ... well, ages. Is there any better way to do it?
== Edit: My tables are created as follows: ==
CREATE TABLE products
(
productID uuid NOT NULL,
name text NOT NULL,
CONSTRAINT products_pkey PRIMARY KEY (productID )
);
CREATE TABLE price_changes
(
id integer NOT NULL,
productID uuid NOT NULL,
price smallint,
date timestamp NOT NULL
);
CREATE INDEX idx_pc_date
ON price_changes USING btree
(date);
CREATE INDEX idx_pc_productID
ON price_changes USING btree
(productID);
CREATE TABLE targetdates
(
obsdate timestamp
);
CREATE TABLE fullprices
(
obsdate timestamp NOT NULL,
productID uuid NOT NULL,
price smallint
);
There is a table 'TICKETS' in PostgreSQL.I perform an ETL job using Pentaho to populate this table.
There is also a GUI on which a user makes changes and the result is reflected in this table.
The fields in the table are :
"OID" Char(36) <------ **PRIMARY KEY**
, "CUSTOMER" VARCHAR(255)
, "TICKETID" VARCHAR(255)
, "PRIO_ORIG" CHAR(36)
, "PRIO_COR" CHAR(36)
, "CATEGORY" VARCHAR(255)
, "OPENDATE_ORIG" TIMESTAMP
, "OPENDATE_COR" TIMESTAMP
, "TTA_ORIG" TIMESTAMP
, "TTA_COR" TIMESTAMP
, "TTA_DUR" DOUBLE PRECISION
, "MTTA_TARGET" DOUBLE PRECISION
, "TTA_REL_ORIG" BOOLEAN
, "TTA_REL_COR" BOOLEAN
, "TTA_DISCOUNT_COR" DOUBLE PRECISION
, "TTA_CHARGE_COR" DOUBLE PRECISION
, "TTR_ORIG" TIMESTAMP
, "TTR_COR" TIMESTAMP
, "TTR_DUR" DOUBLE PRECISION
, "MTTR_TARGET" DOUBLE PRECISION
, "TTR_REL_ORIG" BOOLEAN
, "TTR_REL_COR" BOOLEAN
, "TTR_DISCOUNT_COR" DOUBLE PRECISION
, "TTR_CHARGE_COR" DOUBLE PRECISION
, "COMMENT" VARCHAR(500)
, "USER" CHAR(36)
, "MODIFY_DATE" TIMESTAMP
, "CORRECTED" BOOLEAN
, "OPTIMISTICLOCKFIELD" INTEGER
, "GCRECORD" INTEGER
, "ORIGINATOR" Char(36)
I want to update the table when columns TICKETID+ORIGINATOR+CUSTOMERS are same. Otherwise, an insert will be performed.
How should I do it using Pentaho? Is the step Dimension/Lookup update fine for it or just the Update/Insert step will do the work ?
Any help would be much appreciated. Thanks in advance.
Eugene Lisitsky suggestion is good practice: you may hard wire it in the database constraints and let PostgesSQL do the job.
For a PDI solution: your table does not look like Slowly Changing Dimension therefore the Insert/Update covers your need.
If you want to use the Dimension_update, you need to alter the table in the Pentaho SCD format: add a version column and valid_form_date/valid_upto_date (with PDI the alter is a one button operation).
After that, when a new row comes in, the TICKETID+ORIGINATOR+CUSTOMERS is searched in the table and if found it receives a valitity_upto=now(). At the same time, a version+1 is created in the table valid from now() to the end-of-time.
The (main) pro is that you can retrieve the state of the database as it was at any date in the past with a simple where now() between validity_from and validity_upto.
The (mian) con is that you have to alter the table which may have some impact on the GUIs (plural).
i have two sql server table like this :
[Management].[Person](
[PersonsID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](50) NOT NULL,
[LastName] [nvarchar](100) NOT NULL,
[Semat] [nvarchar](50) NOT NULL,
[Vahed] [nvarchar](50) NOT NULL,
[Floor] [int] NOT NULL,
[ShowInList] [bit] NOT NULL,
[LastState] [nchar](10) NOT NULL)
and
[Management].[PersonEnters](
[PersonEnters] [int] IDENTITY(1,1) NOT NULL,
[PersonID] [int] NOT NULL,
[Vaziat] [nchar](10) NOT NULL,
[Time] [nchar](10) NOT NULL,
[PDate] [nchar](10) NOT NULL)
that PersonsID in second table is a foreign key.
i register every person's enter to system on PersonsEnter Table.
i want to show all person enter stastus in a certain date (PDate field) , if a person entered to system show it's information an if did not, show null insted,
i tried this query :
select * from [Management].[Person] left outer join [Management].[PersonEnters]
on [Management].[Person].[PersonsID] = [Management].[PersonEnters].[PersonID]
where [Management].[PersonEnters].PDate = '1392/11/14'
but it just shows registered person enter data at 1392/11/14 and shows nothing for others,
i wanna show this data plus null or a constant string like "NOT REGISTERED" for other persons that not registered their enter in PersonEnters Table on '1392/11/14'. Please Help Me.
Logically, the WHERE clause will be applied after the join. If some Person entries do not have matches in PersonEnters, they will have NULLs in PDate as a result of the join, but the WHERE clause will filter them out because the comparison NULL = '1392/11/14' will not yield true.
If I understand your question correctly, you essentially want an outer join to a subset of PersonEnters (the one where PDate = '1392/11/14'), not to the entire table. One way to express that could be like this:
SELECT *
FROM Management.Person AS p
LEFT JOIN (
SELECT *
FROM Management.PersonEnters
WHERE PDate = '1392/11/14'
) AS pe
ON p.Person.ID = pe.PersonID
;
As you can see, this query very explicitly tells the server that a particular subset should be derived from PersonEnters before the join takes place – because you want to indicate matches with that particular subset, not with the whole table.
However, the same intent could be rewritten in a more concise way (without a derived table):
SELECT *
FROM Management.Person AS p
LEFT JOIN Management.PersonEnters AS pe
ON p.Person.ID = pe.PersonID AND pe.PDate = '1392/11/14'
;
The effect of the above query would be the same and you would get all Person entries, with matching results from PersonEnters only if they have PDate = '1392/11/14'.
select *
from [Management].[Person]
left outer join [Management].[PersonEnters]
on [Management].[Person].[PersonsID] = [Management].[PersonEnters].[PersonID]
and [Management].[PersonEnters].PDate = '1392/11/14'