sphinx filter by relation column - sphinx

i have 2 tables:
apartment
building
each apartment have building_id
apartment has "floor" column, where is stored apartment floor.
each building has "totalFloors" column, where is stored total floors(for example, current apartment is 3 of 9 floor)
i need to filter apartments like "where apartment floor is not last floor".
How can i do this?
Im using SphinxQL.

Copied from the comments, but
SELECT *,totalFloors - floor AS filter FROM index WHERE filter > 0;
should do it :)

Related

Filter by/Aggregate multiple columns in Power BI

I would like to group by item and count for each store how many rows are there for my sales data.
table:
Id Item Store Qty
1 A store1 5
2 B store1 2
3 A store2 3
4 B store2 10
....
To group by item I tried:
groupby_item = SUMMARIZE(table, table[Item], "Count", COUNT(table[Item]))
which gives the table:
Item Count
A 2
B 2
but I want to introduce a Store slicer in a visual and I couldn't because Store column is absent in the aggregated table. Can I group by Store then by item and count?
Like in Python you could maybe do:
table.groupby('Item').agg({'Store': 'first', 'Id': 'count'})
to keep the Store information by keeping the first value of Store in each item group.
Would you be able to do that in Power BI? Or is there a better way to do this?
Why create an aggregate table in the first place? You can use the base table in the visual and it would reflect any filters on Store.
The default behavior of PBI visuals is to group categorical columns and aggregate numerical ones.

Tableau - Related Data Source Filter

I have data split between two different tables, at different levels of detail. The first table has transaction data that, in the fomrat:
category item spend
a 1 10
a 2 5
a 3 10
b 1 15
b 2 10
The second table is a budget by category in the format
category limit
a 40
b 30
I want to show three BANs, Total Spend, Total Limit, and Total Limit - Spend, and be able to filter by category across the related data source (transaction is related to budget table by category). However, I can't seem to get the filter / relationship right. That is, if I use category as a filter from the transaction table and set it to filter all using related data source, it doesn't filter the Total Limit amount. Using 2018.1, fyi.
Although you have data split across 2 tables they can be joined using the category field and available as a single data source. You would be then be able to use category as a quick filter.

Add computed value as new column while also removing duplicates

Let's say I have the following table playgrounds:
serialnumber length breadth country
1 15 10 Brazil
2 12 11 Chile
3 14 10 Brazil
4 14 10 Brazil
Now, I want to add a column area to the table, that is essentially length*breadth.
Obviously, I can do this update:
UPDATE playground set area = length*breadth where country = 'Brazil'.
Using the above statement, I will have to unnecessarily compute length * breadth twice for serial number 3 and 4. Is there a way to add group by and minimize the amount of calculations?
Something like:
UPDATE playground set area = length*breadth where country = 'Brazil'
group by length, breadth;
The first thing to note is that you should not add the area as a column. Data items that happen to be the result of simple arithmetic operations do not need their own column.
The second point is that you don't need to worry about doing a multiplication operation once each for rows 3 and 4. That's almost zero effort for the server
Third point is that if you are worried about rows 3 and 4, that means they are duplicated, and duplicated data should not be in the database. Consider deleting duplicates as described here: https://wiki.postgresql.org/wiki/Deleting_duplicates
To answer your question:
Is there a way, I could add group by and minimize the amount of calculations?
SELECT DISTINCT ON (1,2,3)
length, breadth, country, length * breadth AS area
FROM playgrounds
ORDER BY 1, 2, 3, serialnumber;
This takes the row with the smallest serialnumber from each set of duplicates. Detailed explanation:
Select first row in each GROUP BY group?
But consider the #e4c5's answer and Pavel's comment first. Don't store functionally dependent values that can be computed on the fly cheaply. Just drop duplicate rows and use a view:
To permanently delete dupes with greater serialnumber:
DELETE FROM playgrounds p
WHERE EXISTS (
SELECT 1
FROM playgrounds
WHERE length = p.length
breadth = p.breadth
country = p.country
AND serialnumber < p.serialnumber
);
Then:
CREATE VIEW playgrounds_plus AS
SELECT *, length * breadth AS area
FROM playgrounds;
Related:
Clean up SQL data before unique constraint

Why Nulls are automatically skipped in Tableau

I created a tableau view that gives no. of students in each school.
My input dataset is below. Intentionally I kept null student_name .
As you can see 3rd row and 4th row are having null names
student_name school
Stev Boston Academy
Mike Florida school
Boston Academy
Boston Academy
Sue Florida school
Jim Florida school
But here nulls are automatically skipped .
Even if I apply quick filter to include nulls then also nulls are skipped .
As you can see there are 2 null names for Boston Academy.. I am expecting count as 3 for Boston academy.
Below is the view
I would like know how tableau behaves if we have null for student_name.
Does it skip null? Does it skip null even if we apply filter to include null?
Count() by definition ignores nulls like the other aggregation functions. More precisely, CNT([Student Name]) returns the number of records with a non-null value for the field [Student Name]).
That is standard database behavior.
If you want to count the number of data rows per school, regardless of whether [Student Name] has a value, then you can use CNT(1) (the 1 could be any non-null constant value), or possibly slightly less efficiently SUM(1) or equivalently SUM([Number of Records])

Need help building complex multi-table queries

This question is something that a lot of people learning bioinformatics and new to DNA data analysis are struggling with:
Lets say I have 20 tables with the same column headings. Each table represents a patient sample and each row represents a locus (site) which has mutated in that sample. Each site is uniquely identified by two columns together - chromosome number and base number (eg. 1 and 43535, 1 and 33456, 1 and 3454353). There are several columns which give different characteristics of each mutation including a column called Gene which gives the gene at that site.. Multiple sites can be mutated in a gene - meaning the Gene column can have the same value multiple times in one table.
I want to query all these tables at the same time by lets say Gene. I input a value from the Gene column and I want as output the names of all the tables (samples) in which the gene name is present in the Gene column and also the entire line(s) (preferably) for each sample so that I can compare the characteristics of the mutation in that gene across multiple samples on one output page.
I also want to input a number say 4 and want as output a list of genes which have mutated in at least 4 of 20 patients (list of genes whose names appear in the Gene column in atleast 4 of 20 tables).
What is the "easiest way" to do this? What is the "best way" assuming I want to make more flexible queries, besides these two?
I am a MD, do not have any particular software expertise but I am willing to put in the necessary time to build this query system. A few lines of code won't put me off..
Eg data:
Func Gene ExonicFunc Chr Start End Ref Obs
exonic ACTRT2 nonsynonymous SNV 1 2939346 2939346 G A
exonic EIF4G3 nonsynonymous SNV 1 21226201 21226201 G A
exonic CSMD2 nonsynonymous SNV 1 34123714 34123714 C T
This is just a third of the columns. Multiple columns were removed to fit the page size here...
Thank you.
Create a view that union's all the tables together. You should probably add additional information about which table ti comes from:
create view allpatients as
select 'a' as whichtable, t.*
from tableA t
union all
select 'b' as whichtable, t.*
from tableB t
...
You might find that it is easier to "instantiate" the view by creating a table with all patients. Just have a stored procedure that recreates the table by combining the 20 tables.
Alternatively, you could find that you have large individual tables (millions of rows). In this case, you would want to treat each of the original tables as a partition.
If what you have is a bunch of Excel files, you can import them all into the same table, with a distinct column for patient id. There is no need to create 20 different tables for this -- in fact, it would be a bad idea.
Once you do, go to Access' query design, SQL view and use these queries:
To create a query that returns all fields for the input gene name:
select *
from gene_data
where gene = [GeneName]
To create a query that returns gene names that are mutated in more than 4 samples:
select gene
from
(select gene, sample_id
from gene_data
group by gene, sample_id) g
group by gene
having count(sample_id) > 4
After this, change to design view -- you'll see how to create similar queries using the GUI.