Best way to write CTAS with outer joins in query - oracle12c

am trying to create a table using ctas, below is my query which is taking 24 mins..is there any way i can improve my query?
create table CHUB1
nologging
parallel
as
select /*+ PARALLEL */ D.CUSt_ID,
D.FIRST_NM,
D.MIDDLE_NM
from Chub_DETAIL_OT D,--29 m records
Chub_ADDR_OT A,--28 m records
CHUB_PHONES P,--22 m records
CHUB_EMAILS E --5 m records
where 1=1
and A.CUSTOMER_ID(+) = D.CUSTOMER_ID
and P.CUSTOMER_ID(+) = D.CUSTOMER_ID
and E.CUSTOMER_ID(+) = D.CUSTOMER_ID;
Database : Oracle 12c,
Please let me know if you need further details. thank you!

Related

Does spark supports the below cascaded query?

I have one requirement to run some queries against some tables in the postgresql database to populate a dataframe. Tables are as following.
table 1 has the below data.
QueryID, WhereClauseID, Enabled
1 1 true
2 2 true
3 3 true
...
table 2 has the below data.
WhereClauseID, WhereClauseString
1 a>b
2 a>c
3 a>b && a<c
...
table 3 has the below data.
a, b, c, value
30, 20, 30, 100
20, 10, 40, 200
...
I want to query in the following way. For table 1, I want to pick up the rows when Enabled is true. Based on the WhereClauseID in each row, I want to pick up the rows in table 2. Based on the WhereClause condition picked up from table 2, I want to run the query using Where Clause to query table 3 to get the Value. Finally, I want to get all records in table 3 meeting the WhereClauses enabled in table 1.
I know I can go through table 1 row by row, and use the parameterized string to build sql query to query table 3. But I think the efficiency is very low to query row by row, especially if table 1 is big. Are there some better way to organize the query to improve the efficiency? Thanks a lot!
Depending on you usecase, but for pyspark databases, you'd might be able to solve it using the .when statement in pyspark.
Here is a suggestion.
import pyspark.sql.functions as F
tbl1 = spark.table("table1")
tbl3 = spark.table("table3")
tbl3 = (
tbl3
.withColumn("WhereClauseID",
## You can do some fancy parsing of your tbl2
## here if you want this to be evaluated programatically from your table2.
(
F.when( F.col("a") > F.col("b"), 1)
.when( F.col("a") > F.col("b"), 2)
.otherwise(-1)
)
)
)
tbl1_with_tbl_3 = tbl1.join(tbl3, "WhereClauseID", "left")

Select query became very very very slow in postgresql

I have one table which contains "133,072,194" records and I am trying to execute
SELECT COUNT(test)
FROM mytable
WHERE test = false
but it is taking Execution time: 128320.712 ms
I already have indexing on test column. Could you please let me know, what I can optimize or change, so my query became faster?
Because of this, my other select query is also not working.
If there are many rows where test is FALSE, you won't be able to get an exact result faster than with a sequential scan, which is slow for big tables.
If you have only few rows that satisfy the condition, you should create a partial index:
CREATE INDEX mytable_notest_ind ON mytable(id) WHERE NOT test;
(assuming that id is the primary key) and keep mytable autovacuumed often enough that you get an index only scan.
But usually exact results for queries like this are not required.
You could calculate an estimated count from the table statistics with a query like this:
SELECT t.reltuples
* (1 - t.nullfrac)
* mcv.freq AS count_false
FROM pg_stats AS s
CROSS JOIN LATERAL unnest(s.most_common_vals::text::boolean[],
s.most_common_freqs) AS mcv(val, freq)
JOIN pg_class AS t
ON s.tablename = t.relname
AND s.schemaname = t.relnamespace::regnamespace::text
WHERE s.tablename = 'mytable'
AND s.attname = 'test'
AND mcv.val = FALSE;
That would be very fast.
See my blog post for more considerations about the speed of SELECT count(*).

Query to retrive only Identity Columns in Teradata

In Oracle DBA_SEQUENCES will retrieve all the sequences columns from each and every table.
Can you please tell me how can I find the same in Teradata?
Identity information is stored in dbc.idcol, there's no Data Dictionary view on top of that, but it's easy to write:
SELECT
d.DatabaseName
,t.tvmName AS TABLENAME
,c.FieldName
,id.AvailValue
,id.StartValue
,id.MINVALUE
,id.MAXVALUE
,id.INCREMENT
,id.cyc
FROM dbc.IdCol AS id
JOIN dbc.Dbase AS d
ON id.DatabaseId = d.DatabaseId
JOIN dbc.tvm AS t
ON id.TableId = t.tvmID
JOIN dbc.TVFields AS c
ON c.TableId = id.TableID
WHERE c.IdColType IS NOT NULL
;

Sql Server Union Query Optimization

I have given a task to optimize the below sql query. Currently the query is timing out and causing a lot of blocking . I just started using t-sql, so please help me with optimizing the query.
select ExcludedID
from OfferConditions with (NoLock)
where OfferID = 27251
and ExcludedID in (210,223,409,423,447,480,633,...lots and lots of these...,
13346,13362,13380,13396,13407,1,2)
union
select CustomerGroupID as ExcludedID
from CPE_IncentiveCustomerGroups ICG with (NoLock)
inner join CPE_RewardOptions RO with (NoLock)
on RO.RewardOptionID = ICG.RewardOptionID
where RO.IncentiveID = 27251
AND ICG.Deleted = 0 and RO.Deleted = 0 and
and ExcludedUsers = 1
and CustomerGroupID in (210,223,409,423,447,480,633,...lots and lots of these...,
13346,13362,13380,13396,13407,1,2);
You can try to insert those IDs to temp table and join it instead of using IN statement.
The key to solving you problem is NOT to fix the SQL, but to fix indexes on your tables. For example, you should have a compound index on the OfferConditions table with OfferID and ExcludedID.
When you create the indexes on the other tables, remember that if the field is in the where OR the join filter, it should be part of your compound index.

Long running query on a self joined table

I try to improve the performance of a query which updates a coloumn on each row of a table, by comparing the actual row's values with all other rows in the same table. Here is the query:
update F set
PartOfPairRC = 1
from RangeChange F
where Reject=0
and exists(
select 1 from RangeChange S
where F.StoreID = S.StoreID
and F.ItemNo = S.ItemNo
and F.Reject = S.Reject
and F.ChangeDateEnd = S.ChangeDate - 1)
The query's performance degrades rapidly as the number of rows in the table increases. I have 50 millon rows in the table.
Is there a better way to do this? Would SSIS be able to handle such an operation better?
Any help much appreciated, thanks Robert
You can try to create a index on that table:
create index idx_test on RangeChange(StoreID, ItemNo, Reject, ChangeDateEnd) where reject = 0
--when you are not using the SQL enterprise get rid of the where condition in the index and put the reject column as included column in the index
--make sure you have a clustered index already on the table (when not you can create the index above as clustered)
-- I would write the query as a join:
update F set
F.PartOfPairRC = 1
from RangeChange F
join RangeChange S
on F.StoreID = S.StoreID
and F.ItemNo = S.ItemNo
and F.Reject = S.Reject
and F.ChangeDateEnd = S.ChangeDate - 1
where F.Reject=0 and S.Reject = 0