Need help building complex multi-table queries - tsql

This question is something that a lot of people learning bioinformatics and new to DNA data analysis are struggling with:
Lets say I have 20 tables with the same column headings. Each table represents a patient sample and each row represents a locus (site) which has mutated in that sample. Each site is uniquely identified by two columns together - chromosome number and base number (eg. 1 and 43535, 1 and 33456, 1 and 3454353). There are several columns which give different characteristics of each mutation including a column called Gene which gives the gene at that site.. Multiple sites can be mutated in a gene - meaning the Gene column can have the same value multiple times in one table.
I want to query all these tables at the same time by lets say Gene. I input a value from the Gene column and I want as output the names of all the tables (samples) in which the gene name is present in the Gene column and also the entire line(s) (preferably) for each sample so that I can compare the characteristics of the mutation in that gene across multiple samples on one output page.
I also want to input a number say 4 and want as output a list of genes which have mutated in at least 4 of 20 patients (list of genes whose names appear in the Gene column in atleast 4 of 20 tables).
What is the "easiest way" to do this? What is the "best way" assuming I want to make more flexible queries, besides these two?
I am a MD, do not have any particular software expertise but I am willing to put in the necessary time to build this query system. A few lines of code won't put me off..
Eg data:
Func Gene ExonicFunc Chr Start End Ref Obs
exonic ACTRT2 nonsynonymous SNV 1 2939346 2939346 G A
exonic EIF4G3 nonsynonymous SNV 1 21226201 21226201 G A
exonic CSMD2 nonsynonymous SNV 1 34123714 34123714 C T
This is just a third of the columns. Multiple columns were removed to fit the page size here...
Thank you.

Create a view that union's all the tables together. You should probably add additional information about which table ti comes from:
create view allpatients as
select 'a' as whichtable, t.*
from tableA t
union all
select 'b' as whichtable, t.*
from tableB t
...
You might find that it is easier to "instantiate" the view by creating a table with all patients. Just have a stored procedure that recreates the table by combining the 20 tables.
Alternatively, you could find that you have large individual tables (millions of rows). In this case, you would want to treat each of the original tables as a partition.

If what you have is a bunch of Excel files, you can import them all into the same table, with a distinct column for patient id. There is no need to create 20 different tables for this -- in fact, it would be a bad idea.
Once you do, go to Access' query design, SQL view and use these queries:
To create a query that returns all fields for the input gene name:
select *
from gene_data
where gene = [GeneName]
To create a query that returns gene names that are mutated in more than 4 samples:
select gene
from
(select gene, sample_id
from gene_data
group by gene, sample_id) g
group by gene
having count(sample_id) > 4
After this, change to design view -- you'll see how to create similar queries using the GUI.

Related

Aminoacid screening library in Knime

I have a task to create tetrapeptide screening library aminoacids using Knime. I have never used Knime before sadly. I need to create a workflow with all 20 aminoacids, multiply it with another 20, then multiply the result with another 20 and repeat to get final result of tetrapeptides. Can someone suggest me how to input aminoacids on the Knime? Thank you very much!
Use a Table Creator node to enter the Amino acid single-letter codes, one per table. Now use a Cross Joiner node to cross-join the table to itself - you should now have a table with rows like:
A|A
A|C
etc.
Now put this table into both inputs of a second Cross Joiner node, which should give you now quite a long table starting something like:
A|A|A|A
A|A|A|C
A|C|A|A
A|C|A|C
etc.
Now use a Column Aggregator node, select all column as aggregation columns, the aggregation method as Concatenate and change the delimiter to an empty string:
and:
This will give you a table with a single column, 'Peptide':
AAAA
AAAC
ACAA
ACAC
etc.
If you want the output as a chemical structure, then as of v1.36.0 the Vernalis community contribution contains a node Speedy Sequence to SMILES which will convert the sequence to a SMILES string (make sure you select the option that your input column is a Protein!)
The full workflow is as shown:

In Tableau: Count entries Column B that have the same value in Column A

I have a table with two columns. For simplicity, lets say Column A is General Contractors and Column B is subcontractors. Any given general contractor can have a variable number of subcontractors. I would like to add a third column that simply displays a count of how many subcontractors each contractor has.
I have tried several calculations using "fixed" and "include" functions as well as "Count" and "CountD" functions and have tried directly using the count functions (right-click>>measure>>count) but all I get are 1's in the resulting column.
The data come from a table where there is one row for each subcontractor, so the if a general contractor had 5 subcontractors then there would be 5 rows where the general contractor repeats it self over and over with a different subcontractor next to it.
There are far too many different general contractors to use conditional statements.
Is what I'm doing possible and what other things should I try?
Try this
{Fixed [general contractor]: Countd([sub contractor]) }
Add this field to your view after contractor and sub-contractor, you'll get that variable count say 5 repeated in each row that general contractor.

Compare two sql data using mysqlworkbench

I have 3 tables on mysqlworkbench, 1 table need to combine with 2 data ch(17million row) and cl(9million row) suppose to be one table, other table name alldoc.(121k)
Basically i need to combine ch and cl as one table, and compare with alldoc data. Technically they are suppose to be same but people made mistake that why i need to compare. 100 column
enter image description here
enter image description here
I plan to write query till i hit to 100 because i have 100 columns in all data. Just rows sizes are different.
Thank you from advance. I know complicated but i really need to compare these two data writing query

Exclude Combination of Data Items From One Table From Another

I have a view, A, with 20 columns which forms my primary data. I have a table B which lists some of the columns from A and contains data I want to exclude from A.
For example table B will have 6 columns 2 of which are 'customer' and 'country' and contain the data 'HP' and 'America'. These columns exist in A. But I want to write a query that brings back data from A except where any rows that have a combination HP and America.
There are 6 columns and table B can have any combination of rows. Anywhere between 1 and all 6 rows could be filled in or there could be a row which has 5 columns filled in. Also another row with a different 5 columns filled in and so on.
I want to be prepared for any possible combination of the 6 rows and the query to search A for the combination and exclude any rows with that data from B.
I have tried this
SELECT *
FROM A T1
WHERE not EXISTS
(SELECT * FROM [dbo].[ExcludedItems] T2
WHere ReportNumber=1
AND
(
T1.job=ISNULL(T2.job,T1.job) and T1.CustomerName=ISNULL(T2.CustomerName,T1.CustomerName) and
T1.COUNTRY= ISNULL(T2.COUNTRY,T1.COUNTRY) and T1.CONTINENT=ISNULL(T2.CONTINENT,T1.CONTINENT) AND
T1.continer= ISNULL(T2.ContainerName, T1.continer) and T1.UnscheduledJob= ISNULL(T2.unscheduledJob, T1.UnscheduledJob) and
T1.[Price]= ISNULL(T2.Price, T1.Price) and
T1.[Haulage]= ISNULL(T2.[Haulage], T1.[Haulage]) and
T1.SiteAdress= ISNULL(T2.SiteAddress, T1.SiteAdress) and T1.Delta=ISNULL(T2.Delta, T1.Delta) and
T1.Cost= ISNULL(T2.Cost, T1.Cost)
)
)
The problem is the result set is not correct. I have tried with a smaller column sample and able to exclude the correct combination of Customer and Country but when I introduce a 3rd or 4th column combination I can eyeball the result set and immediately see its incorrect. Not sure if I have to use multiple NOT EXISTS for each possible combination, was hoping not to.
A constraint is A has to be a view not a table. Otherwise I would have used variables in some manner and wrapped the whole thing in a stored procedure.
Appreciate any help, fall back is to manually add to the code each time an item combination is supplied in B!

Showing 4 records in a portal from same table

I have a table that contains students' results. These results are generally broken into four types: term1, term2, term3 and term4. So over a year, a student may have up to four records in that table containing his results.
I want to create a layout that contain a portal that will show all the 4 records in a single portal row. Is there any way to do this? Or any workaround?
The reason why I do not want to display the records as four rows in the portal is because there are different subjects and will not be right if each subject occupy four rows and there are many subjects a student may take.
I can think of two ways to approach this, both of which would require a relationship from your Results table occurrence to another table occurrence based on Results, let's call it Results~SameStudentID. (The matching field would the foreign key to the Student table, FK_StudentID = FK_StudentID.)
Create 4 calculation fields in your Results table: Result_1, Result_2, Result_n, etc. The formula to use for each of the calculation (starting from the context of the Results table occurrence) would be:
GetNthRecord ( Results~SameStudentID::Result ; n )
Then, simply include the 4 "Result_n" fields in your portal
Create just one field, Results_1_4, with the following formula:
Substitute ( List ( Results~SameStudentID::Result ) ; ΒΆ ; " " )