if I do
select *
from table1
where table1.col1 = 'xx'
and table1.col2 = 'yy'
and table1.col3= 'zz'`
the execution plan shows full table scan.
The indexes on this table exist for col4 and col5.
Do I need to set an index on each one of col1,col2,col3 to make the query perform better?
Also if the query is like this:
select *
from table1,table2
where table1.col1=table2.col2
and table1.col2 = 'yy'
and table1.col3= 'zz'
If we create an index on col1 and col2, will it suffice?
You should try adding indexes on the columns that you are using in the query:
table1 col1
table1 col2
table1 col3
table2 col2
Note that it can also be advantageous in some cases to have multi-column indexes, for example:
table1 (col2, col3)
It's hard to predict which index will work best without knowing more about your data, but you can try a few different possibilities and see what works best.
Related
I have a table in postgres with two columns:
col1 col2
a a
b c
d e
f f
I would like to have distinct on the two columns and make one column and later assign the tag of column name from where it is coming. The desired output is:
col source
a col1, col2
b col1
c col1
d col1
e col1
f col1, col2
I am able to find distinct in individual columns but not able to make a single column and add label source.
below is the query i am using:
select distinct on (col1, col2) col1, col2 from table
Any suggestions would be really helpful.
You can un-pivot the columns and the aggregate them back:
select u.value, string_agg(distinct u.source, ',' order by u.source)
from data
cross join lateral (
values('col1', col1), ('col2', col2)
)as u(source,value)
group by u.value
order by u.value;
Online example
Alternatively, if you don't want to list each column, you can convert the row to a JSON value and then un-pivot that:
select x.value, string_agg(distinct x.source, ',' order by x.source)
from data d
cross join lateral jsonb_each_text(to_jsonb(d)) as x(source, value)
group by x.value
order by x.value;
I need to delete some rows with a ROW_NUMBER. As Postgres doesn't support delete on subquery, I need to do:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY col1, col2 ORDER BY col3 DESC, col4 DESC) AS rank_id FROM {working_table}
)
DELETE FROM {working_table} AS s
USING cte AS t
WHERE
s.col1 = t.col1
AND s.col2= t.col2
AND s.col3 = t.col3
AND …
AND rank_id > 3
AND s.time <= …
which is, I have to do a self-join with all PK cols. The perf is very bad when it is a big table. I'm thinking to insert another table with only SELECT, so I don't have to self-join. And then drop original table and rename the new table. The original table will keep getting new rows. I need to make sure when I do this, now rows will lose. What's the best way to do this?
I have a list of values:
(56957,85697,56325,45698,21367,56397,14758,39656)
and a 'template' row in a table.
I want to do this:
for value in valuelist:
{
insert into table1 (field1, field2, field3, field4)
select value1, value2, value3, (value)
from table1
where ID = (ID of template row)
}
I know how I would do this in code, like c# for instance, but I'm not sure how to 'loop' this while passing in a new value to the insert statement. (i know that code makes no sense, just trying to convey what I'm trying to accomplish.
There is no need to loop here, SQL is a set based language and you apply your operations to entire sets of data all at once as opposed to looping through row by row.
insert statements can come from either an explicit list of values or from the result of a regular select statement, for example:
insert into table1(col1, col2)
select col3
,col4
from table2;
There is nothing stopping you selecting your data from the same place you are inserting to, which will duplicate all your data:
insert into table1(col1, col2)
select col1
,col2
from table1;
If you want to edit one of these column values - say by incrementing the value currently held, you simply apply this logic to your select statement and make sure the resultant dataset matches your target table in number of columns and data types:
insert into table1(col1, col2)
select col1
,col2+1 as col2
from table1;
Optionally, if you only want to do this for a subset of those values, just add a standard where clause:
insert into table1(col1, col2)
select col1
,col2+1 as col2
from table1
where col1 = <your value>;
Now if this isn't enough for you to work it out by yourself, you can join your dataset to you values list to get a version of the data to be inserted for each value in that list. Because you want each row to join to each value, you can use a cross join:
declare #v table(value int);
insert into #v values(56957),(85697),(56325),(45698),(21367),(56397),(14758),(39656);
insert into table1(col1, col2, value)
select t.col1
,t.col2
,v.value
from table1 as t
cross join #v as v
I have two tables like
table1 table2
------------ ----------------
col1 col2 col1 col2
I need to count the distinct col1 from table1 if itis matching with table2 col1
note: table2 col1 also distinct
select count(distinct table1.col1)
from table1,table2
where table1.col1=table2.col1
As you select the distinct col of table1, and set the join, the col1 of table2 will also be selected distinctly.
I'm trying to get the output of queries within the with clause of my final query as csv or some sort of text files. I only have query access, I'm not allowed to create tables for this database. I have a set of queries that do some calculations on a data set, another set of queries that compute on the previous set and yet another that calculates on the final set. I don't want to run all of it as three seperate queries because the results from the first two are actually in the last one.
WITH
Q1 AS(
SELECT col1, col2, col3, col4, col5, col6, col7
FROM table1
),
Q2 AS(
SELECT AVG(col1) as col1Avg, MAX(col1) as col1Max, col2, col3,col4
FROm Q1
GROUP BY col2, col3, col4
)
SELECT
AVG(col1AVG), col3
FROM
Q2
GROUP BY col3
I would like the results from Q1, Q2 and the final select statement as preferably 3 csv files but I could live with all of it in one csv file. Is this possible?
Thanks!
Edit: Just to clarify, the columns from the queries are very different. I'm definitely pulling more columns from my first query than my second. I've edited the above code a bit to make this more clear.
To combine all the results together you'd use UNION ALL, but the number and data types of the columns must match.
select col1, col2, col2
from blah
union all
select col1, col2, col2
from blah2
union all
... etc
You can reference CTE's in there of course ...
with
cte_1 as (
select ... from ...),
cte_2 as (
select ... from ... cte_1),
cte_3 as (
select ... from ... cte_2)
select col1, col2, col2
from cte_1
union all
select col1, col2, col2
from cte_2
union all
select col1, col2, col2
from cte_3
If your final output is a csv then it looks like you have multiple row formats in there -- checksums? If so, in the queries that you union all together you might like to combine all the columns from each query into one string ...
with
cte_1 as (
select ... from ...),
cte_2 as (
select ... from ... cte_1),
cte_3 as (
select ... from ... cte_2)
select col1||','||col2||','||col2
from cte_1
union all
select col1||','||col2
from cte_2
union all
select col1
from cte_3