Aggregate at either of two levels

Aggregate at either of two levels - tableau-api

In Tableau, I am joining two tables where a header can have multiple details
Work Order Header
Work Order Details
The joined data looks like this:
Header.ID | Header.ManualTotal | Details.ID | Details.LineTotal
A | 1000 | 1 | 550
A | 1000 | 2 | 35
A | 1000 | 3 | 100
B | 335 | 1 | 250
B | 335 | 2 | 300
C | null | 1 | 50
C | null | 2 | 25
C | null | 3 | 5
C | null | 4 | 5
Where there is a manual total, use that, if there is no manual total, use the sum of the line totals
ID | Total
A | 1000
B | 335
C | 85
I tried something like this:
ifnull( sum({fixed [Header ID] : [Manual Total] }), sum([Line Total]) )
basically I need to use the ifnull, then use the manual total if it exists, or sum line totals if it doesn't
Please advise on how to use LODs or some other solution to get the correct answer

Here is a solution that does not require a level-of-detail calculation.
Just try this:
use an inner join on id of the two tables
create this calculation: ifnull(median([Manual Total]),sum([Line Total]))
insert agg(your_calculation) into your sheet

Related

Get the max value for each column in a table

I have a table for player stats like so:
player_id | game_id | rec | rec_yds | td | pas_att | pas_yds | ...
--------------------------------------------------------
1 | 3 | 1 | 5 | 0 | 3 | 20 |
2 | 3 | 0 | 8 | 1 | 7 | 20 |
3 | 3 | 3 | 9 | 0 | 0 | 0 |
4 | 3 | 5 | 15 | 0 | 0 | 0 |
I want to return the max values for every column in the table except player_id and game_id.
I know I can return the max of one single column by doing something like so:
SELECT MAX(rec) FROM stats
However, this table has almost 30 columns, so I would just be repeating the query below, for all 30 stats, just replacing the name of the stat.
SELECT MAX(rec) as rec FROM stats
This would get tedious real quick, and wont scale.
Is there any way to kind of loop over columns, get every column in the table and return the max value like so:
player_id | game_id | rec | rec_yds | td | pas_att | pas_yds | ...
--------------------------------------------------------
4 | 3 | 5 | 15 | 1 | 7 | 20 |

You can get the maximum of multiple columns in a single query:
SELECT
MAX(rec) AS rec_max,
MAX(rec_yds) AS rec_yds_max,
MAX(td) AS td_max,
MAX(pas_att) AS pas_att_max,
MAX(pas_yds) AS pas_yds_max
FROM stats
However, there is no way to dynamically get an arbitrary number of columns. You could dynamically build the query by loading all column names of the table, then apply conditions such as "except player_id and game_id", but that cannot be done within the query itself.

Select from 2 rows in one table into a single row with 2 or more columns in the second table

I have a table that has 2 columns. One is a type column and the other is a value amount column. There are only 2 types/ I would like to select columns of this table into another table with 2 combined columns based on type and value. For example, the table may have order with 2 of the types in 2 rows. It would be inserted into the 2nd table as one row.
Example:
Table 1
| ID | OrderID | Type | Value |
|:-----|:--------:|:------------:|-------:|
| 1 | 300 | bike | 100 |
| 2 | 300 | skateboard | 150 |
| 3 | 700 | bike | 200 |
| 4 | 700 | skateboard | 50 |
| 5 | 800 | bike | 150 |
| 6 | 800 | skateboard | 100 _
What is the TSQL to have it inserted into the 2nd table with these values?
Table 2
| ID | OrderID | BikeValue | SkateboardValue |
|:----|:--------:|:----------:|-----------------:|
| 1 | 300 | 100 | 150 |
| 2 | 700 | 200 | 50 |
| 3 | 800 | 150 | 100 |

Just make it simple for yourself. Do two SQL statements. One to insert and another to update.
INSERT INTO Table2 (OrderID, BikeValue)
SELECT Table1.OrderID, Table1.Value
FROM Table1 (NOLOCK)
WHERE Table1.Type = 'bike'
UPDATE Table2 SET Table2.SkateboardValue = Table1.Value
FROM Table2
INNER JOIN Table1 ON Table1.OrderID = Table2.OrderID
WHERE Table1.Type = 'skateboard'

query count of rows where id is less than a series of values in Redshift

I have a table etl_control which stores latest_id of x_data table everyday. Now I have a requirement to get the number of rows for each day.
My idea is to run a query to get the count based on a condition x_data.id <= etl_control.latest_id for everyday and get the count.
The table structures are as follows.
etl_control:
record_date | latest_id |
---------------------------------
2016-11-01 | 55 |
2016-11-02 | 125 |
2016-11-03 | 154 |
2016-11-04 | 190 |
2016-11-05 | 201 |
2016-11-06 | 225 |
2016-11-07 | 287 |
x_data:
id | value |
---------------------------------
10 | xyz |
11 | xyz |
21 | xyz |
55 | xyz |
101 | xyz |
108 | xyz |
125 | xyz |
142 | xyz |
154 | xyz |
160 | xyz |
166 | xyz |
178 | xyz |
190 | xyz |
191 | xyz |
The end result should have the number of rows in x_data for each day. I tried a number of variations using JOIN, WITH and COUNT(*) OVER. But the biggest hurdle is to iteratively compare x_data.id with etl_control.latest_id.

Really sorry folks. Got the answer myself after posting the question.
The query is really simple.
WITH data AS (
SELECT e.latest_id
FROM x_data AS x, etl_control AS e
WHERE x.id <= e.latest_id)
SELECT latest_id, count(*) FROM data GROUP BY latest_id;
This basically creates a temp table with latest_id repeated for each row. The latest_id is always greater than or equal to the id from x_data.
A simple group by on this temp table would give the expected result.

Full Outer Join on two columns is omitting rows

Some background, I am making a table in Postgres 9.5 that counts the number of actions performed by a user and grouping these actions by month using date_trunc(). The counts for each individual action are divided into separate tables, following this format:
Feedback table:
id | month | feedback_counted
----+---------+-------------------
1 | 2 | 3
1 | 3 | 10
1 | 4 | 7
1 | 5 | 2
Comments table:
id | month | comments_counted
----+---------+-------------------
1 | 4 | 12
1 | 5 | 4
1 | 6 | 57
1 | 7 | 12
Ideally, I would like to do a FULL OUTER JOIN of these tables ON the "id" and "month" columns at the same time and produce this query:
Combined table:
id | month | feedback_counted | comments_counted
----+---------+--------------------+-------------------
1 | 2 | 3 |
1 | 3 | 10 |
1 | 4 | 7 | 12
1 | 5 | 2 | 4
1 | 6 | | 57
1 | 7 | | 12
However, my current query does not capture the feedback dates, displaying it like such:
Rollup table:
id | month | feedback_counted | comments_counted
----+---------+--------------------+-------------------
| | |
| | |
1 | 4 | 7 | 12
1 | 5 | 2 | 4
1 | 6 | | 57
1 | 7 | | 12
This is my current statement, note that it uses date_trunc in place of month. I add the action counts later, the main issue is somewhere here.
CREATE TABLE rollup_table AS
SELECT c.id, c.date_trunc
FROM comments_counted c FULL OUTER JOIN feedback_counted f
ON c.id = f.id AND c.date_trunc = f.date_trunc
GROUP BY c.id, c.date_trunc, f.id, f.date_trunc;
I'm a bit of a novice with SQL and am not sure how to fix this, any help would be appreciated.

Replace ON c.id = f.id AND c.month = f.month with USING(id, month).
SELECT id, month, feedback_counted, comments_counted
FROM comments c
FULL OUTER JOIN feedback f
USING(id, month);
id | month | feedback_counted | comments_counted
----+-------+------------------+------------------
1 | 2 | 3 |
1 | 3 | 10 |
1 | 4 | 7 | 12
1 | 5 | 2 | 4
1 | 6 | | 57
1 | 7 | | 12
(6 rows)
Test it in db<>fiddle.

USING() basically is the same as ON, just that if the 2 tables share the same column names, you can use USING() instead of ON to save some typing effort. That being said, using USING() won't work. In Postgresql (not sure about other sql versions), you still need to specify c.id, and c.month, even with USING(). And as long as you specify the columns, Postgresql will only pull the rows where the values of these columns exist. That's why you will have missing rows under the full outer join.
Here is a way that at least works for me.
SELECT COALESCE(c.id, f.id) AS id,
COALESCE(c.month, f.month) AS month,
feedback_counted,
comments_counted
FROM comments c
FULL OUTER JOIN feedback f
ON c.id = f.id AND c.month = f.month;

How to set sequence number of sub-elements in TSQL unsing same element as parent?

I need to set a sequence inside T-SQL when in the first column I have sequence marker (which is repeating) and use other column for ordering.
It is hard to explain so I try with example.
This is what I need:
|------------|-------------|----------------|
| Group Col | Order Col | Desired Result |
|------------|-------------|----------------|
| D | 1 | NULL |
| A | 2 | 1 |
| C | 3 | 1 |
| E | 4 | 1 |
| A | 5 | 2 |
| B | 6 | 2 |
| C | 7 | 2 |
| A | 8 | 3 |
| F | 9 | 3 |
| T | 10 | 3 |
| A | 11 | 4 |
| Y | 12 | 4 |
|------------|-------------|----------------|
So my marker is A (each time I met A I must start new group inside my result). All rows before first A must be set to NULL.
I know that I can achieve that with loop but it would be slow solution and I need to update a lot of rows (may be sometimes several thousand).
Is there a way to achive this without loop?

You can use window version of COUNT to get the desired result:
SELECT [Group Col], [Order Col],
COUNT(CASE WHEN [Group Col] = 'A' THEN 1 END)
OVER
(ORDER BY [Order Col]) AS [Desired Result]
FROM mytable
If you need all rows before first A set to NULL then use SUM instead of COUNT.
Demo here

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Aggregate at either of two levels - tableau-api

Here is a solution that does not require a level-of-detail calculation. Just try this: use an inner join on id of the two tables create this calculation: ifnull(median([Manual Total]),sum([Line Total])) insert agg(your_calculation) into your sheet

Related

Get the max value for each column in a table

Select from 2 rows in one table into a single row with 2 or more columns in the second table

query count of rows where id is less than a series of values in Redshift

Full Outer Join on two columns is omitting rows

How to set sequence number of sub-elements in TSQL unsing same element as parent?

Categories

Resources