Converting complex query with inner join to tableau - tableau-api

I have a query like this, which we use to generate data for our custom dashboard (A Rails app) -
SELECT AVG(wait_time) FROM (
SELECT TIMESTAMPDIFF(MINUTE,a.finished_time,b.start_time) wait_time
FROM (
SELECT max(start_time + INTERVAL avg_time_spent SECOND) finished_time, branch
FROM mytable
WHERE name IN ('test_name')
AND status = 'SUCCESS'
GROUP by branch) a
INNER JOIN
(
SELECT MIN(start_time) start_time, branch
FROM mytable
WHERE name IN ('test_name_specific')
GROUP by branch) b
ON a.branch = b.branch
HAVING avg_time_spent between 0 and 1000)t
GROUP BY week
Now I am trying to port this to tableau, and I am not being able to find a way to represent this data in tableau. I am stuck at how to represent the inner group by in a calculated field. I can also try to just use a custom sql data source, but I am already using another data source.
columns in mytable -
start_time
avg_time_spent
name
branch
status
I think this could be achieved new Level Of Details formulas, but unfortunately I am stuck at version 8.3

Save custom SQL for rare cases. This doesn't look like a rare case. Let Tableau generate the SQL for you.
If you simply connect to your table, then you can usually write calculated fields to get the information you want. I'm not exactly sure why you have test_name in one part of your query but test_name_specific in another, so ignoring that, here is a simplified example to a similar query.
If you define a calculated field called worst_case_test_time
datediff(min(start_time), dateadd('second', max(start_time), avg_time_spent)), which seems close to what your original query says.
It would help if you explained what exactly you are trying to compute. It appears to be some sort of worst case bound for avg test time. There may be an even simpler formula, but its hard to know without a little context.
You could filter on status = "Success" and avg_time_spent < 1000, and place branch and WEEK(start_time) on say the row and column shelves.
P.S. Your query seems a little off. Don't you need an aggregation function like MAX or AVG after the HAVING keyword?

Related

where column in (single value) performance

I am writing dynamic sql code and it would be easier to use a generic where column in (<comma-seperated values>) clause, even when the clause might have 1 term (it will never have 0).
So, does this query:
select * from table where column in (value1)
have any different performance than
select * from table where column=value1
?
All my test result in the same execution plans, but if there is some knowledge/documentation that sets it to stone, it would be helpful.
This might not hold true for each and any RDBMS as well as for each an any query with its specific circumstances.
The engine will translate WHERE id IN(1,2,3) to WHERE id=1 OR id=2 OR id=3.
So your two ways to articulate the predicate will (probably) lead to exactly the same interpretation.
As always: We should not really bother about the way the engine "thinks". This was done pretty well by the developers :-) We tell - through a statement - what we want to get and not how we want to get this.
Some more details here, especially the first part.
I Think this will depend on platform you are using (optimizer of the given SQL engine).
I did a little test using MySQL Server and:
When I query select * from table where id = 1; i get 1 total, Query took 0.0043 seconds
When I query select * from table where id IN (1); i get 1 total, Query took 0.0039 seconds
I know this depends on Server and PC and what.. But The results are very close.
But you have to remember that IN is non-sargable (non search argument able), it will not use the index to resolve the query, = is sargable and support the index..
If you want the best one to use, You should test them in your environment because they both work so good!!

Construct SQL where clause to pull data within Power Query

Basically I want to retrieve rows of data that meet my clause conditions using Power Query.
I got 400 rows of lookup values in my spreadsheet.
Each row represent 1 lookup code for example, code AAA1, AAB2 and so on
So lets say I have a select statement and I want to construct the where clauses using the above codes so my end sql statement will look like
select * from MyTable where Conditions in ('AA1', 'AAB2')
so so far I have this
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Form ID",
Int64.Type}}),
test = Sql.Database("myserver", "myDB", [Query="SELECT * FROM myTable where" & #"Changed Type" & "])"
in
test
Obviously that didnt work but thats my pseduo scenario anyway.
Please could you advice what to do?
Thank you
Peddie
I would create a "lookup" Power Query based on the Excel table. I would set the "Load To" properties to "Only Create Connection".
Then I would start the main Query by connecting to the SQL server using the Navigator to select "MyTable". Then I would add a Merge step to the main Query, to join to the "lookup" Query, matching the "Conditions" column to the "lookup" code. I would set the Join Type to "Inner". The Merge properties window will show you visually if the 2 columns you select actually contain matching data.
This approach does not require any coding, and is easier to build, extend and maintain.
Mike Honey's join is best for your problem, but here's a more general solution if you find yourself needing other logic in your where clause.
Normally Power query only generates row filters on an equality expression, but you can put any code you want in a Table.SelectRows filter, like each List.Contains({"AA1", "AAB2"}, [Conditions])
So for your table, your query would look something like:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Form ID", Int64.Type}}),
test = Sql.Database("myserver", "myDB"),
yourTable = test{[Name="myTable"]}[Data],
filtered = Table.SelectRows(yourTable, each List.Contains(#"Changed Type"[Form ID], [Conditions]))
in
filtered
The main downside to using the library functions is that Table.SelectRows only knows how to generate SQL where clauses for specific expression patterns, so the row filter probably runs on your machine after downloading the whole table, instead of having the Sql Server run the filter.

COUNT(field) returns correct amount of rows but full SELECT query returns zero rows

I have a UDF in my database which basically tries to get a station (e.g. bus/train) based on some input data (geographic/name/type). Inside this function i try to check if there are any rows matching the given values:
SELECT
COUNT(s.id)
INTO
firsttry
FROM
geographic.stations AS s
WHERE
ST_DWithin(s.the_geom,plocation,0.0017)
AND
s.name <-> pname < 0.8
AND
s.type ~ stype;
The firsttry variable now contains the value 1. If i use the following (slightly extended) SELECT statement i get no results:
RETURN query SELECT
s.id, s.name, s.type, s.the_geom,
similarity(
regexp_replace(s.name::text,'(Hauptbahnhof|Hbf)','Hbf'),
regexp_replace(pname::text,'(Hauptbahnhof|Hbf)','Hbf')
)::double precision AS sml,
st_distance(s.the_geom,plocation) As dist from geographic.stations AS s
WHERE ST_DWithin(s.the_geom,plocation,0.0017) and s.name <-> pname < 0.8
AND s.type ~ stype
ORDER BY dist asc,sml desc LIMIT 1;
the parameters are as follows:
stype = '^railway'
pname = 'Amsterdam Science Park'
plocation = ST_GeomFromEWKT('SRID=4326;POINT(4.9492530 52.3531670)')
the tuple i need to be returned is:
id name type geom (displayed as ST_AsText)
909658;"Amsterdam Sciencepark";"railway_station";"POINT(4.9482893 52.352904)"
The same UDF returns quite well for a lot of other stations, but this is one (of more) which just won't work. Any suggestions?
P.S. The use of the <-> operator is coming from the pg_trgm module.
Some ideas on how to troubleshoot this:
Break your troubleshooting into steps. Start with the simplest query possible. No aggregates, just joins and no filters. Then add filters. Then add order by, then add aggregates. Look at exactly where the change occurs.
Try reindexing the database.
One possibility that occurs to me based on this is that it could be a corrupted index used in the second query but not the first. I have seen corrupted indexes in the past and usually they throw errors but at least in theory they should be able to create a problem like this.
If this is correct, your query will suddenly return rows if you remove the ORDER BY clause.
If you have a corrupted index, then you need to pay close attention to hardware. Is the RAM ECC? Is the processor overheating? How are you disks doing?
A second possibility is that there is a typo on a join condition of filter statement. Normally this is something I would suspect first but it is easy enough to weed out index problems to start there. If removing the ORDER BY doesn't change things, then chances are it is a typo. If you can't find a typo, then try reindexing.

SQL Select rows by comparison of value to aggregated function result

I have a table listing (gameid, playerid, team, max_minions) and I want to get the players within each team that have the lowest max_minions (within each team, within each game). I.e. I want a list (gameid, team, playerid_with_lowest_minions) for each game/team combination.
I tried this:
SELECT * FROM MinionView GROUP BY gameid, team
HAVING MIN(max_minions) = max_minions;
Unfortunately, this doesn't seem to work as it seems to select a random row from the available rows for each (gameid, team) and then does the HAVING comparison. If the randomly selected row doesn't match, it's simply skipped.
Using WHERE won't work either since you can't use aggregate functions within WHERE clauses.
LIMIT won't work since I have many more games and LIMIT limits the total number of rows returned.
Is there any way to do this without adding another table/view that contains (gameid, teamid, MIN(max_minions))?
Example data:
sqlite> SELECT * FROM MinionView;
gameid|playerid|team|champion|max_minions
21|49|100|Champ1|124
21|52|100|Champ2|18
21|53|100|Champ3|303
21|54|200|Champ4|356
21|57|200|Champ5|180
21|58|200|Champ6|21
64|49|100|Champ7|111
64|50|100|Champ8|208
64|53|100|Champ9|8
64|54|200|Champ0|226
64|55|200|ChampA|182
64|58|200|ChampB|15
...
Expected result (I mostly care about playerid, but included champion, max_minions here for better overview):
21|52|100|Champ2|18
21|58|200|Champ6|21
64|53|100|Champ9|8
64|58|200|ChampB|15
...
I'm using Sqlite3 under Python 3.1 if that matters.
This is in SQL Server, hopefully the syntax works for you too:
SELECT
MV.*
FROM
(
SELECT
team, gameid, min(max_minions) as maxmin
FROM
MinionView
GROUP BY
team, gameid
) groups
JOIN MinionView MV ON
MV.team = groups.team
AND MV.gameid = groups.gameid
AND MV.max_minions = groups.maxmin
In words, first you make the usual grouping query (the nested one). At this point you have the min value for each group but you don't know to which row it belongs. For this you join with the original table and match the "keys" (team, game and min) to get the other columns as well.
Note that if a team will have more than one member with the same value for max_minions then all these rows will be selected. If you only want one of them then that's probably a bit more complicated.

How to avoid T-SQL function being called more times when needing combined results?

I have two T-SQL scalar functions that both perform calculations over large sums of data (taking 'a lot' of time) and return a value, e.g. CalculateAllIncomes(EmployeeID) and CalculateAllExpenditures(EmployeeID).
I run a select statement that calls these and returns results for each Employee. I also need the balance of each employee calculated as AllIncomes-AllExpenditures.
I have a function GetBalance(EmployeeID) that calls the two above mentioned functions and returns the result {CalculateAllIncomes(EmployeeID) - CalculateAllExpenditures(EmployeeID)}. But if I do:
Select CalculateAllIncomes(EmployeeID), CalculateAllExpenditures(EmployeeID), GetBalance(EmployeeID) .... the functions CalcualteAllIncomes() and CalculateAllExpenditures get called twice (once explicitly and once inside the GetBalance funcion) and so the resulting query takes twice as long as it should.
I'd like to find some better solution. I tried:
select alculateAllIncomes(EmployeeID), AS Incomes, CalculateAllExpenditures
(EmployeeID) AS Expenditures, (Incomes - Expenditures) AS Balance....
but it throws errors:
Invalid column name Incomes and
Invalid column name Expenditures.
I'm sure there has to be a simple solution, but I cannot figure it out. For some reason it seems that I am not able to use column Aliases in the SELECT clause. Is it so? And if so, what could be the workaround in this case?
Thanks for any suggestions.
Forget function calls: you can probably do it everything in one normal query.
Function calls misused (trying for OO encapsulation) force you into this situation. In addition, if you have GetBalance(EmployeeID) per row in the Employee table then you are CURSORing over the table. And you've now compounded this by multiple calls too.
What you need is something like this:
;WITH cSUMs AS
(
SELECT
SUM(CASE WHEN type = 'Incomes' THEN SomeValue ELSE 0 END) AS Income),
SUM(CASE WHEN type = 'Expenditures' THEN SomeValue ELSE 0 END) AS Expenditure)
FROM
MyTable
WHERE
EmployeeID = #empID --optional for all employees
GROUP BY
EmployeeID
)
SELECT
Income, Expenditure, Income - Expenditure
FROM
cSUMs
I once got a query down from a weekend to under a second by eliminating this kind of OO thinking from a bog standard set based aggregate query.