How to make a postgres query run faster? - postgresql

I have following tables with Table1 with rows: >25million rows
Table1:
chrom strand ref_base alt_base pos gene_ensembl_identifier seq_window_9mers mutated_base seq_window_mut_9mers
----- ------ -------- -------- -------- ----------------------- ----------------- ------------ --------------------
3 1 C T 40457498 ENSG00000168032 ACGCTCTACACACACAG A ACGCTCTAAACACACAG
Table2
seq_window_mut_9mers start substring
-------------------- ----- ---------
ACGCTCTAAACACACAG 1 ACGCTCTAA
ACGCTCTAAACACACAG 2 CGCTCTAAA
ACGCTCTAAACACACAG 3 GCTCTAAAC
ACGCTCTAAACACACAG 4 CTCTAAACA
ACGCTCTAAACACACAG 5 TCTAAACAC
ACGCTCTAAACACACAG 6 CTAAACACA
ACGCTCTAAACACACAG 7 TAAACACAC
ACGCTCTAAACACACAG 8 AAACACACA
ACGCTCTAAACACACAG 9 AACACACAG
I would like to perform a join to have the following table on column seq_window_mut_9mers.
final_table
chrom strand ref_base alt_base pos gene_ensembl_identifier seq_window_mut_9mers substring
----- ------ -------- -------- -------- ----------------------- ----------------- ------------ --------------------
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG ACGCTCTAA
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG CGCTCTAAA
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG GCTCTAAAC
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG CTCTAAACA
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG TCTAAACAC
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG CTAAACACA
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG TAAACACAC
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG AAACACACA
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG AACACACAG
I am running following postgres query through dbvisualizer. At the moment, the query is running very slow (still waiting for the output.. >10 mins).
SELECT
chrom, strand, ref_base, alt_base, pos, gene_ensembl_identifier, mut.seq_window_mut_9mers substring
FROM table1
LEFT JOIN table2 ON mer9.seq_window_mut_9mers = table1.seq_window_mut_9mers;
How can I make it run faster? Any suggestions will be really helful.
Thanks

It looks like you don't really need to join with table2. You can generate it on the fly with substring function, like this:
select
table1.*,
offsets.start,
substring(seq_window_mut_9mers from offsets.start for 9) as substring
from
table1,
(select generate_series(1,9) as start) as offsets;
It will be much faster that a join.

Related

kdb+ Q : select from table where column value =

How do i select all rows from a table where a specific column value equals something?
i have tried the following:
select from tablname where columnvalue = value
thanks
you could do:
q)table:([]a:1 2 3 4 5;b:`a`b`c`d`e;c:`hi`bye`bye`bye`hi)
q)table
a b c
-------
1 a hi
2 b bye
3 c bye
4 d bye
5 e hi
q)select from table where c=`bye
a b c
-------
2 b bye
3 c bye
4 d bye
You could do:
q)tbl:([] a:1 2 3;b:4 5 6;c:7 8 9)
q)tbl
a b c
-----
1 4 7
2 5 8
3 6 9
q)select a from tbl
a
-
1
2
3

T-SQL: split string on multiple delimiters

I have been given a T-SQL task: to convert/format names which are in ALL CAPS into Title Case. I have decided that splitting the names into tokens, and capitalizing the first letter out of each token, would be a reasonable approach (I am willing to take advice if there's a better option, especially in T-SQL).
That said, to accomplish this, I'd have to split the name fields on spaces AND dashes, hyphens, etc. Then, once it is tokenized, I can worry about normalizing the case.
Is there any reasonable way to split a string along any delimiter in a list?
If ease & performance is important then grab a copy of PatExtract8k.
Here's a basic example where I split on any character that is not a letter or number ([^a-z0-9]):
-- Sample String
DECLARE #string VARCHAR(8000) = 'abc.123&xyz!4445556__5566^rrr';
-- Basic Use
SELECT pe.* FROM samd.patExtract8K(#string,'[^a-z0-9]') AS pe;
Output:
itemNumber itemIndex itemLength item
--------------- ----------- ----------- -------------
1 1 3 abc
2 5 3 123
3 9 3 xyz
4 13 7 4445556
5 22 4 5566
6 27 3 rrr
It returns what you need as well as:
the length of the item (ItemLength)
It's position in the string (ItemIndex)
It's ordinal position in the string (ItemNumber.)
Now against a table. Here we're doing the same thing but I'll explicitly call out the characters I want to use as a delimiter. Here it's any of these characters: *.&,?%/>
-- Sample Table
DECLARE #table TABLE (SomeId INT IDENTITY, SomeString VARCHAR(100));
INSERT #table VALUES('abc***332211,,XXX'),('abc.123&&555%jjj'),('ll/111>ff?12345');
SELECT t.*, pe.*
FROM #table AS t
CROSS APPLY samd.patExtract8K(t.SomeString,'[*.&,?%/>]') AS pe;
This returns:
SomeId SomeString itemNumber itemIndex itemLength item
----------- ------------------- ------------ ---------- ----------- ---------
1 abc***332211,,XXX 1 1 3 abc
1 abc***332211,,XXX 2 7 6 332211
1 abc***332211,,XXX 3 15 3 XXX
2 abc.123&&555%jjj 1 1 3 abc
2 abc.123&&555%jjj 2 5 3 123
2 abc.123&&555%jjj 3 10 3 555
2 abc.123&&555%jjj 4 14 3 jjj
3 ll/111>ff?12345 1 1 2 ll
3 ll/111>ff?12345 2 4 3 111
3 ll/111>ff?12345 3 8 2 ff
3 ll/111>ff?12345 4 11 5 12345
On the other hand - If I wanted to extract the delimiters I could change the pattern like this: [^*.&,?%/>]. Now the same query returns:
SomeId itemNumber itemIndex itemLength item
----------- -------------------- -------------------- ----------- ---------
1 1 4 3 ***
1 2 13 2 ,,
2 1 4 1 .
2 2 8 2 &&
2 3 13 1 %
3 1 3 1 /
3 2 7 1 >
3 3 10 1 ?

nber of rows within a group in oracle

In order to generate a report in ireport i need this query in oracle 10g.
SCHOOL:
SELECT STID,NAME,DEPT,SUM(CHARGE)
STID | PROG | DEPT | CHARGE
1 1 A 1
2 1 B 2
3 2 A 2
4 2 B 1
5 1 A 2
Desired OUTPUT:
DEPT | PROG | NBER_OF_STID | TOT_CHG
A 1 2 3
2 1 2
B 1 1 2
2 1 1
this is my query
SELECT DISTINCT DEPT, DISTINCT PROG, COUNT(STID), SUM (CHARGE) TOT_CHG
FROM SCHOOL
GROUP BY DEPT, PROG, STID, CHARGE
Help Thanks.
You need to group by only the columns that aren't going to be aggregated.
Try this:
SELECT DEPT, PROG, COUNT(STID) NBER_OF_STID, SUM (CHARGE) TOT_CHG
FROM SCHOOL
GROUP BY DEPT, PROG
Note: in your query you'll always get a tabular view, so results will be like this:
DEPT | PROG | NBER_OF_STID | TOT_CHG
A 1 2 3
A 2 1 2
B 1 1 2
B 2 1 1
IMHO, the visual formatting should be made in the report itself (ireport)

TSQL A recursive update?

I'm wondering if exists a recursive update in tsql (CTE)
ID parentID value
-- -------- -----
1 NULL 0
2 1 0
3 2 0
4 3 0
5 4 0
6 5 0
I it possible to update the column value recursively using e.g CTE from ID = 6 to the top most row ?
Yes, it should be. MSDN gives an example:
USE AdventureWorks;
GO
WITH DirectReports(EmployeeID, NewVacationHours, EmployeeLevel)
AS
(SELECT e.EmployeeID, e.VacationHours, 1
FROM HumanResources.Employee AS e
WHERE e.ManagerID = 12
UNION ALL
SELECT e.EmployeeID, e.VacationHours, EmployeeLevel + 1
FROM HumanResources.Employee as e
JOIN DirectReports AS d ON e.ManagerID = d.EmployeeID
)
UPDATE HumanResources.Employee
SET VacationHours = VacationHours * 1.25
FROM HumanResources.Employee AS e
JOIN DirectReports AS d ON e.EmployeeID = d.EmployeeID;

Select max value rows from table column

my table look like this..
id name count
-- ---- -----
1 Mike 0
2 Duke 2
3 Smith 1
4 Dave 6
5 Rich 3
6 Rozie 8
7 Romeo 0
8 Khan 1
----------------------
I want to select rows with max(count) limit 5 (TOP 5 Names with maximum count)
that would look sumthing like...
id name count
-- ---- -----
6 Rozie 8
4 Dave 6
5 Rich 3
2 Duke 2
3 Smith 1
please help,,
thanks
Here is how:
MySQL:
SELECT * FROM tableName ORDER BY count DESC LIMIT 5
MS SQL:
SELECT TOP 5 * FROM tableName ORDER BY count DESC