Update big amount of data postgresql - postgresql

The main table used is transaction, and can store million rows (let's say 4-5 million max). I need to update a status as fast as possible.
The update query looks like this :
UPDATE transaction SET transaction.status = 'TO_EXECUTE'
WHERE transaction.id IN (SELECT transaction.id FROM transaction
JOIN anotherTable ON transaction.id = anotherTable.id
JOIN anotherTable2 ON transaction.serviceId = ontherTable2.id
WHERE transaction.status = :filter1, transaction.filter2 = :filter2, ...)
Do you have a better solution? Could it be better to create another table to store the status an the id ? (I red that updating large Tables can be really slow).

The IN part of your query could likely be re-written as "exists" to potentially get improvements, depending on the other table layouts and volume. Also, it's highly possible that you do not need the transaction table mentioned yet again in the sub query (exists or in)
UPDATE transaction tx SET transaction.status = 'TO_EXECUTE'
WHERE exists (SELECT *
FROM anotherTable
JOIN anotherTable2 ON tx.serviceId = anotherTable2.id
WHERE anothertable.id=tx.id and
transaction.status = :filter1 and transaction.filter2 = :filter2,
...)

try this:
UPDATE transaction
SET transaction.status = 'TO_EXECUTE'
From anotherTable
JOIN anotherTable2 ON transaction.serviceId = anotherTable2.id
WHERE transaction.id = anotherTable.id AND transaction.status = :filter1, transaction.filter2 = :filter2, ...

Related

T-SQL Question for Getting One Customer Type When There Can be More Than One Value

We have an organization that can have more than one customer type basically. However, what a user wants to see is either the partner or direct type (customer type is either Direct, Partner1, Partner2, or Partner3 but can be direct plus a partner value but only can be one of the partner values). So if a customer is both (ex: Direct and Partner1) they just want the type that is a partner (ex: Partner1). So I tried splitting out partners only into one temp table from a few tables joining together different org data. I have the same query without any limit pulling into a different temp table. Then I calculate count and put that into a temp table. Then I tried gathering data from all the temp tables. That is where I run into trouble and lose some of the customers where the type is direct (I have a image link below for a directcustomer and a customer who is both). I have been out of SQL for a bit so this one is throwing me...I figure the issue is the fact that I have a case statement referencing a table that a direct customer will not exist in (#WLPO). However I am not sure how to achieve pulling in these customers while also only selecting which partner type it is for a customer that has a partner and is also direct. FYI using MSSMS for querying.
If OBJECT_ID('tempdb..#WLPO') IS NOT NULL
DROP TABLE #WLPO
IF OBJECT_ID('tempdb..#org') IS NOT NULL
DROP TABLE #org
IF OBJECT_ID('tempdb..#OrgCount') IS NOT NULL
DROP TABLE #OrgCount
IF OBJECT_ID('tempdb..#cc') IS NOT NULL
DROP TABLE #cc
Select
o.OrganizationID,
o.OrganizationName,
os.WhiteLabelPartnerID,
s.StateName
INTO #WLPO
from [Org].[Organizations] o
join [Org].[OrganizationStates] os on o.OrganizationID=os.OrganizationID --and os.WhiteLabelPartnerID = 1
join [Lookup].[States] s on os.StateID = s.StateID
join [Org].[PaymentOnFile] pof on pof.OrganizationID=o.OrganizationID
where os.WhiteLabelPartnerID in (2,3,4)
and os.StateID in (1, 2, 3)
and o.OrganizationID = 7613
select * from #WLPO
Select
o.OrganizationID,
o.OrganizationName,
os.WhiteLabelPartnerID,
s.StateName
INTO #org
from [Org].[Organizations] o
join [Org].[OrganizationStates] os on o.OrganizationID=os.OrganizationID --and os.WhiteLabelPartnerID = 1
join [Lookup].[States] s on os.StateID = s.StateID
join [Org].[PaymentOnFile] pof on pof.OrganizationID=o.OrganizationID
where 1=1--os.WhiteLabelPartnerID = 1
and os.StateID in (1, 2, 3)
and o.OrganizationID = 7613
select * from #org
Select
OrganizationID,
count(OrganizationID) AS CountOrgTypes
INTO #OrgCount
from #org
where OrganizationID = 7613
group by OrganizationID
select * from #OrgCount
Select distinct
ct.OrganizationID,
ok.OrganizationName,
ct.CountOrgTypes,
case when ct.CountOrgTypes = 2 then wlp.WhiteLabelPartnerID
when ct.CountOrgTypes = 1 then ok.WhiteLabelPartnerID
END AS CustomerTypeCode,
case when ct.CountOrgTypes = 2 then wlp.StateName
when ct.CountOrgTypes = 1 then ok.StateName END As OrgState
INTO #cc
from #org ok
left join #WLPO wlp on wlp.OrganizationID=ok.OrganizationID
join #OrgCount ct on wlp.OrganizationID=ct.OrganizationID
select * from #cc
Select
OrganizationID,
OrganizationName,
CountOrgTypes,
case when CustomerTypeCode = 1 then 'Direct'
when CustomerTypeCode = 2 then 'Partner1'
when CustomerTypeCode = 3 then 'Partner2'
when CustomerTypeCode = 4 then 'Partner3' ELSE Null END AS CustomerType,
OrgState
from #cc
order by OrganizationName asc
DirectCustomer
CustomerwithBoth

Postgresql Update & Inner Join

I am trying to update data in Table: local.import_payments from Table: local.payments based on update and Inner Join queries. The query I used:
Update local.import_payments
Set local.import_payments.client_id = local.payments.payment_for_client__record_id,
local.import_payments.client_name = local.payments.payment_for_client__company_name,
local.import_payments.customer_id = local.payments.customer__record_id,
local.import_payments.customer_name = local.payment_from_customer,
local.import_payments.payment_id = local.payments.payment_id
From local.import_payments
Inner Join local.payments
Where local.payments.copy_to_imported_payments = 'true'
The client_id, client_name, customer_id, customer_name in the local.import_payments need to get updated with the values from the table local.payments based on the condition that the field copy_to_imported_payments is checked.
I am getting a syntax error while executing the query. I tried a couple of things, but they did not work. Can anyone look over the queries and let me know where the issue is
Try the following
UPDATE local.import_payments
Set local.import_payments.client_id =
local.payments.payment_for_client__record_id,
local.import_payments.client_name =
local.payments.payment_for_client__company_name,
local.import_payments.customer_id = local.payments.customer__record_id,
local.import_payments.customer_name = local.payment_from_customer,
local.import_payments.payment_id = local.payments.payment_id
FROM local.payments as lpay
WHERE lpay.<<field>> = local.import_payments.<<field>>
AND local.payments.copy_to_imported_payments = 'true'
You shouldn't to specify the schema/table for updated columns, only column names:
Do not include the table's name in the specification of a target column — for example, UPDATE table_name SET table_name.col = 1 is invalid.
from the doc
You shouldn't to use the updating table in the from clause except of the case of self-join.
You can to make your query shorter using "column-list syntax".
update local.import_payments as target
set (
client_id,
client_name,
customer_id,
customer_name,
payment_id) = (
source.payment_for_client__record_id,
source.payment_for_client__company_name,
source.customer__record_id,
source.payment_from_customer,
source.payment_id)
from local.payments as source
where
<join condition> and
source.copy_to_imported_payments = 'true'

Optimizing Postgres query with timestamp filter

I have a query:
SELECT DISTINCT ON (analytics_staging_v2s.event_type, sent_email_v2s.recipient, sent_email_v2s.sent) sent_email_v2s.id, sent_email_v2s.user_id, analytics_staging_v2s.event_type, sent_email_v2s.campaign_id, sent_email_v2s.recipient, sent_email_v2s.sent, sent_email_v2s.stage, sent_email_v2s.sequence_id, people.role, people.company, people.first_name, people.last_name, sequences.name as sequence_name
FROM "sent_email_v2s"
LEFT JOIN analytics_staging_v2s ON sent_email_v2s.id = analytics_staging_v2s.sent_email_v2_id
JOIN people ON sent_email_v2s.person_id = people.id
JOIN sequences on sent_email_v2s.sequence_id = sequences.id
JOIN users ON sent_email_v2s.user_id = users.id
WHERE "sent_email_v2s"."status" = 1
AND "people"."person_type" = 0
AND (sent_email_v2s.sequence_id = 1888) AND (sent_email_v2s.sent >= '2016-03-18')
AND "users"."team_id" = 1
When I run EXPLAIN ANALYZE on it, I get:
Then, if I change that to the following (Just removing the (sent_email_v2s.sent >= '2016-03-18')) as follows:
SELECT DISTINCT ON (analytics_staging_v2s.event_type, sent_email_v2s.recipient, sent_email_v2s.sent) sent_email_v2s.id, sent_email_v2s.user_id, analytics_staging_v2s.event_type, sent_email_v2s.campaign_id, sent_email_v2s.recipient, sent_email_v2s.sent, sent_email_v2s.stage, sent_email_v2s.sequence_id, people.role, people.company, people.first_name, people.last_name, sequences.name as sequence_name
FROM "sent_email_v2s"
LEFT JOIN analytics_staging_v2s ON sent_email_v2s.id = analytics_staging_v2s.sent_email_v2_id
JOIN people ON sent_email_v2s.person_id = people.id
JOIN sequences on sent_email_v2s.sequence_id = sequences.id
JOIN users ON sent_email_v2s.user_id = users.id
WHERE "sent_email_v2s"."status" = 1
AND "people"."person_type" = 0
AND (sent_email_v2s.sequence_id = 1888) AND "users"."team_id" = 1
when I run EXPLAIN ANALYZE on this query, the results are:
EDIT:
The results above from today are about as I expected. When I ran this last night, however, the difference created by including the timestamp filter was about 100x slower (0.5s -> 59s). The EXPLAIN ANALYZE from last night showed all of the time increase to be attributed to the first unique/sort operation in the query plan above.
Could there be some kind of caching issue here? I am worried now that there might be something else going on (transiently) that might make this query take 100x longer since it happened at least once.
Any thoughts are appreciated!

DB2 Query : insert data in history table if not exists already

I have History table and transaction table.....and reference table...
If status in reference table is CLOSE then take those record verify in History table if not there insert from transaction table..... wiring query like this .... checking better one... please advice.. this query can be used for huge data ?
INSERT INTO LIB1.HIST_TBL
( SELECT R.ACCT, R.STATUS, R.DATE FROM
LIB2.HIST_TBL R JOIN LIB1.REF_TBL C
ON R.ACCT = C.ACCT WHERE C.STATUS = '5'
AND R.ACCT NOT IN
(SELECT ACTNO FROM LIB1.HIST_TBL)) ;
If you're on a current release of DB2 for i, take a look at the MERGE statement
MERGE INTO hist_tbl H
USING (SELECT * FROM ref_tbl R
WHERE r.status = 'S')
ON h.actno = r.actno
WHEN NOT MATCHED THEN
INSERT (actno,histcol2, histcol3) VALUES (r.actno,r.refcol2,r.refcol3)
--if needed
WHEN MATCHED
UPDATE SET (actno,histcol2, histcol3) = (r.actno,r.refcol2,r.refcol3)

Metadata about a column in SQL Server 2008 R2?

I'm trying to figure out a way to store metadata about a column without repeating myself.
I'm currently working on a generic dimension loading SSIS package that will handle all my dimensions. It currently does :
Create a temporary table identical to the given table name in parameters (this is a generic stored procedure that receive the table name as parameter, and then do : select top 0 * into ##[INSERT ORIGINAL TABLE NAME HERE] from [INSERT ORIGINAL TABLE NAME HERE]).
==> Here we insert custom code for this particular dimension that will first query the data from a datasource and get my delta, then transform the data and finally loads it into my temporary table.
Merge the temporary table into my original table with a T-SQL MERGE, taking care of type1 and type2 fields accordingly.
My problem right now is that I have to maintain a table with all the fields in it to store a metadata to tell my scripts if this particular field is type1 or type2... this is nonsense, I can get the same data (minus type1/type2) from sys.columns/sys.types.
I was ultimately thinking about renaming my fields to include their type in it, such as :
FirstName_T2, LastName_T2, Sex_T1 (well, I know this can be type2, let's not fall into that debate here).
What do you guyz would do with that? My solution (using a table with that metadata) is currently in place and working, but it's obvious that repeating myself from the systables to a custom table is nonsense, just for a simple type1/type2 info.
UPDATE: I also thought about creating user defined types like varchar => t1_varchar, t2_varchar, etc. This sounds like something a bit sluggy too...
Everything you need should already be in INFORMATION_SCHEMA.COLUMNS
I can't follow your thinking of not using provided tables/views...
Edit: As scarpacci mentioned, this somewhat portable if needed.
I know this is bad, but I will post an answer to my own question... Thanks to GBN for the help tho!
I am now storing "flags" in the "description" field of my columns. I, for example, can store a flag this way : "TYPE_2_DATA".
Then, I use this query to get the flag back for each and every column :
select columns.name as [column_name]
,types.name as [type_name]
,extended_properties.value as [column_flags]
from sys.columns
inner join sys.types
on columns.system_type_id = types.system_type_id
left join sys.extended_properties
on extended_properties.major_id = columns.object_id
and extended_properties.minor_id = columns.column_id
and extended_properties.name = 'MS_Description'
where object_id = ( select id from sys.sysobjects where name = 'DimDivision' )
and is_identity = 0
order by column_id
Now I can store metadata about columns without having to create a separate table. I use what's already in place and I don't repeat myself. I'm not sure this is the best possible solution yet, but it works and is far better than duplicating information.
In the future, I will be able to use this field to store more metadata, where as : "TYPE_2_DATA|ANOTHER_FLAG|ETC|OH BOY!".
UPDATE :
I now store the information in separate extended properties. You can manage extended properties using sp_addextendedproperty and sp_updateextendedproperty stored procedures. I have created a simple store procedure that help me to update those values regardless if they currently exist or not :
create procedure [dbo].[UpdateSCDType]
#tablename nvarchar(50),
#fieldname nvarchar(50),
#scdtype char(1),
#dbschema nvarchar(25) = 'dbo'
as
begin
declare #already_exists int;
if ( #scdtype = '1' or #scdtype = '2' )
begin
select #already_exists = count(1)
from sys.columns
inner join sys.extended_properties
on extended_properties.major_id = columns.object_id
and extended_properties.minor_id = columns.column_id
and extended_properties.name = 'ScdType'
where object_id = (select sysobjects.id from sys.sysobjects where sysobjects.name = #tablename)
and columns.name = #fieldname
if ( #already_exists = 0 )
begin
exec sys.sp_addextendedproperty
#name = N'Scd_Type',
#value = #scdtype,
#level0type = N'SCHEMA',
#level0name = #dbschema,
#level1type = N'TABLE',
#level1name = #tablename,
#level2type = N'COLUMN',
#level2name = #fieldname
end
else
begin
exec sys.sp_updateextendedproperty
#name = N'Scd_Type',
#value = #scdtype,
#level0type = N'SCHEMA',
#level0name = #dbschema,
#level1type = N'TABLE',
#level1name = #tablename,
#level2type = N'COLUMN',
#level2name = #fieldname
end
end
end
Thanks again