Subtract fields of a column - Tableau - tableau-api

I would like to subtract promoters and detractors in Tableau by creating a new column. Thanks for all the help!
Customer Type Table (I would like to create the NPS field as shown below):
+---------+------------+----------+-----------+--------------+
| Quarter | Detractors | Passives | Promoters | NPS |
+---------+------------+----------+-----------+--------------+
| Q1 15 | 40.56 | 23.56 | 35.79 | =35.79-40.56 |
| ... | ... | ... | ... | ... |
+---------+------------+----------+-----------+--------------+

Simply create a calculated field (called NPS):
[Promoters] - [Detractors]
This will add a new field to every row of your partition called NPS.
Check out the Tableau online help on calculated fields - this is a skill well worth learning.

I understand the OPs question. The data comes in like this:
+---------+---------------+------+
| Quarter | Customer Type | Score|
+---------+------------+---------+
| Q1 15 | Detractors | 25 |
| Q1 15 | Promoters | 32 |
| Q1 15 | Passives | 45 |
| Q1 15 | Detractors | 17 |
| Q1 15 | Detractors | 28 |
| ... | ... | ... |
+---------+------------+---------+
And when brought into Tableau, the [Customer Type] field is put in the Column shelf and this arranges the data like the table below. The OP wants to calculate the [NPS] column (Promoters - Detractors).
+---------+------------+----------+-----------+--------------+
| Quarter | Detractors | Passives | Promoters | NPS |
+---------+------------+----------+-----------+--------------+
| Q1 15 | 40.56 | 23.56 | 35.79 | =35.79-40.56 |
| ... | ... | ... | ... | ... |
+---------+------------+----------+-----------+--------------+
I hope this clarifies. I am stuck with a similar situation (I want a column that shows the difference between 2015 and 2016):
+---------+-------+-------+------------+
| Measure | 2015 | 2016 | Difference |
+---------+---------------+------------+
| # Hires | 100 | 115 | 15 |
| # Terms | 9 | 6 | 3 |
+---------+---------------+------------+
I believe the steps are similar. I hope someone can help.

Related

Copy the date in org table

Suppose such a spreadsheet in org table
|------------+-------+------------+--------+--------+------------|
| Date | Items | Unit Price | Amount | Amount | Categories |
|------------+-------+------------+--------+--------+------------|
| 2019/09/17 | A | 2.64 | 1 | 2.64 | materials |
| | B | 52.67 | 2 | 105.34 | diagnosis |
| | C | 3.08 | 1 | 3.08 | materials |
| | D | 3.85 | 2 | 7.7 | materials |
| | E | 33.66 | 2 | 67.32 | materials |
| | F | 40 | 1 | 40 | treatments |
| | G | 16.5 | 1 | 16.5 | materials |
| | H | 4 | 3 | 12 | treatments |
| | I | 40 | 1 | 40 | bed |
| | M | 6 | 13 | 78 | treatments |
|------------+-------+------------+--------+--------+------------|
#+TBLFM: $5=$3*$4
How could copy the date 2019/09.17 to the bottom of data column?
The link that #manandearth posted in the comments describes how to duplicate (perhaps with slight modifications) the entries in a column. Briefly, pressing S-RET in a cell duplicates its contents from the cell above (if it is not empty) - if the cell is full and the next cell is empty then it duplicates the full cell to the empty cell. If the contents are numeric, then the "duplication" involves a slight modification: it increases the value by 1. The same happens with a date: it increases the date to next day (but the date has to be in a format that Org mode recognizes: either an active date <YYYY-MM-DD> or an inactive data [YYYY-MM-DD]). The increment by default is 1 in these cases, but can be set to something else by setting the variable org-table-copy-increment to a different value. That's the "interactive" case I mention in my comment.
The other way to fill a column in a table is by using a formula. For example here's a formula to fill the first column with a copy of the first entry in the column:
#+TBLFM: #3$1..#>$1 = #2$1
This says: Set all rows from row 3 (#3) to the last row (#>) of column 1 ($1) to the value of the cell in row 2 (#2), column 1 ($1). Note that row 1 is the header. Press C-c C-c on the table formula line above and ... wait, what happened?
|------------+-------+------------+--------+--------+------------|
| Date | Items | Unit Price | Amount | Amount | Categories |
|------------+-------+------------+--------+--------+------------|
| 2019/09/17 | A | 2.64 | 1 | 2.64 | materials |
| 13.196078 | B | 52.67 | 2 | 105.34 | diagnosis |
| 13.196078 | C | 3.08 | 1 | 3.08 | materials |
| 13.196078 | D | 3.85 | 2 | 7.7 | materials |
| 13.196078 | E | 33.66 | 2 | 67.32 | materials |
| 13.196078 | F | 40 | 1 | 40 | treatments |
| 13.196078 | G | 16.5 | 1 | 16.5 | materials |
| 13.196078 | H | 4 | 3 | 12 | treatments |
| 13.196078 | I | 40 | 1 | 40 | bed |
| 13.196078 | M | 6 | 13 | 78 | treatments |
|------------+-------+------------+--------+--------+------------|
#+TBLFM: #3$1..#>$1 = #2$1
It does not quite work in this case for a technical reason: Org mode uses Calc in table formula calculations and Calc looks at 2019/09/17 and says: "Aha, I have to divide 2019 by 9 and then divide the result by 17", and fills the rest of the column with the result of the divisions: 13.196078. You may have meant 2019/09/17 to be a date, but Org mode does not know that: it gives it to Calc which interprets it as an arithmetic expression. The solution here is the same as in the linked answer: make Org mode aware that it's a date by making it either an active date: <2019-09-17> or an inactive date: [2019-09-17]:
|------------------+-------+------------+--------+--------+------------|
| Date | Items | Unit Price | Amount | Amount | Categories |
|------------------+-------+------------+--------+--------+------------|
| [2019-09-17] | A | 2.64 | 1 | 2.64 | materials |
| [2019-09-17 Tue] | B | 52.67 | 2 | 105.34 | diagnosis |
| [2019-09-17 Tue] | C | 3.08 | 1 | 3.08 | materials |
| [2019-09-17 Tue] | D | 3.85 | 2 | 7.7 | materials |
| [2019-09-17 Tue] | E | 33.66 | 2 | 67.32 | materials |
| [2019-09-17 Tue] | F | 40 | 1 | 40 | treatments |
| [2019-09-17 Tue] | G | 16.5 | 1 | 16.5 | materials |
| [2019-09-17 Tue] | H | 4 | 3 | 12 | treatments |
| [2019-09-17 Tue] | I | 40 | 1 | 40 | bed |
| [2019-09-17 Tue] | M | 6 | 13 | 78 | treatments |
|------------------+-------+------------+--------+--------+------------|
#+TBLFM: #3$1..#>$1 = #2$1
This does not do automatic incrementation but if that's what you want, it's easy to accomplish: Calc can do calculations on dates, so we can increment daily by adding to the date in each row the row number minus 2 (e.g. row 3 would get an increment of 3 - 2 = 1, row 4 would get 4 - 2 = 2, etc). To accomplish this, you have to get the row number of the current row: the idiom is ##. Then the formula becomes:
#+TBLFM: #3$1..#>$1 = #2$1 + ## - 2
and the table becomes:
|------------------+-------+------------+--------+--------+------------|
| Date | Items | Unit Price | Amount | Amount | Categories |
|------------------+-------+------------+--------+--------+------------|
| [2019-09-17] | A | 2.64 | 1 | 2.64 | materials |
| [2019-09-18 Wed] | B | 52.67 | 2 | 105.34 | diagnosis |
| [2019-09-19 Thu] | C | 3.08 | 1 | 3.08 | materials |
| [2019-09-20 Fri] | D | 3.85 | 2 | 7.7 | materials |
| [2019-09-21 Sat] | E | 33.66 | 2 | 67.32 | materials |
| [2019-09-22 Sun] | F | 40 | 1 | 40 | treatments |
| [2019-09-23 Mon] | G | 16.5 | 1 | 16.5 | materials |
| [2019-09-24 Tue] | H | 4 | 3 | 12 | treatments |
| [2019-09-25 Wed] | I | 40 | 1 | 40 | bed |
| [2019-09-26 Thu] | M | 6 | 13 | 78 | treatments |
|------------------+-------+------------+--------+--------+------------|
#+TBLFM: #3$1..#>$1 = #2$1+ ## - 2
The various anomalies of the display of dates (do we include the day of the week? do we include the time?) might be worked around using org-time-stamp-custom-formats but that gets us into waters that I have not explored.

Create Calculated Pivot from Several Query Results in PostgreSQL

I have question regarding how to make a calculated pivot table from several query results on PostgreSQL. I've managed to make three queries results but don't have any idea how to combine and calculate all the data into a single table. I've tried to google it but found out that most of the question is about how to make a pivot table from a single table, which I'm able to do using sum, case, and group by. Well, Here's the simplified version of my query results
Query from query 1 which contains gross value
| city | code | gross |
|-------|------|--------|
| city1 | 21 | 194793 |
| city1 | 25 | 139241 |
| city1 | 28 | 231365 |
| city2 | 21 | 282025 |
| city2 | 25 | 334458 |
| city2 | 28 | 410852 |
| city3 | 21 | 109237 |
Result from query 2 which contains positive adjustments
| city | code | adj_pos |
|-------|------|---------|
| city1 | 21 | 16259 |
| city1 | 25 | 13634 |
| city1 | 28 | 45854 |
| city2 | 25 | 18060 |
| city2 | 28 | 18220 |
Result from query 3 which contains negative adjustments
| city | code | adj_neg |
|-------|------|---------|
| city1 | 25 | 23364 |
| city2 | 21 | 27478 |
| city2 | 25 | 23474 |
And what I want to to is to create something like this
| city | 21_gross | 25_gross | 28_gross | 21_pos | 25_pos | 28_pos | 21_neg | 25_neg | 28_neg |
|-------|----------|----------|----------|--------|--------|--------|--------|--------|--------|
| city1 | 194793 | 139241 | 231365 | 16259 | 13634 | 45854 | | 23364 | |
| city2 | 282025 | 334458 | 410852 | | 18060 | 18220 | 27478 | 23474 | |
| city3 | 109237 | | | | | | | | |
or probably final calculation which come from gross + positive adjustment -
negative adjustment from each city on each code like this
| city | 21_nett | 25_nett | 28_nett |
|-------|---------|---------|---------|
| city1 | 211052 | 129511 | 277219 |
| city2 | 254547 | 329044 | 429072 |
| city3 | 109237 | 0 | 0 |
Any suggestion will be appreciated. Thank you!
I think the best you can achieve is to get the pivoting part as JSON - http://sqlfiddle.com/#!17/b7d64/23:
select
city,
json_object_agg(
code,
coalesce(gross,0) + coalesce(adj_pos,0) - coalesce(adj_neg,0)
) as js
from q1
left join q2 using (city,code)
left join q3 using (city,code)
group by city

PostgreSQL Crosstab generate_series of weeks for columns

From a table of "time entries" I'm trying to create a report of weekly totals for each user.
Sample of the table:
+-----+---------+-------------------------+--------------+
| id | user_id | start_time | hours_worked |
+-----+---------+-------------------------+--------------+
| 997 | 6 | 2018-01-01 03:05:00 UTC | 1.0 |
| 996 | 6 | 2017-12-01 05:05:00 UTC | 1.0 |
| 998 | 6 | 2017-12-01 05:05:00 UTC | 1.5 |
| 999 | 20 | 2017-11-15 19:00:00 UTC | 1.0 |
| 995 | 6 | 2017-11-11 20:47:42 UTC | 0.04 |
+-----+---------+-------------------------+--------------+
Right now I can run the following and basically get what I need
SELECT COALESCE(SUM(time_entries.hours_worked),0) AS total,
time_entries.user_id,
week::date
--Using generate_series here to account for weeks with no time entries when
--doing the join
FROM generate_series( (DATE_TRUNC('week', '2017-11-01 00:00:00'::date)),
(DATE_TRUNC('week', '2017-12-31 23:59:59.999999'::date)),
interval '7 day') as week LEFT JOIN time_entries
ON DATE_TRUNC('week', time_entries.start_time) = week
GROUP BY week, time_entries.user_id
ORDER BY week
This will return
+-------+---------+------------+
| total | user_id | week |
+-------+---------+------------+
| 14.08 | 5 | 2017-10-30 |
| 21.92 | 6 | 2017-10-30 |
| 10.92 | 7 | 2017-10-30 |
| 14.26 | 8 | 2017-10-30 |
| 14.78 | 10 | 2017-10-30 |
| 14.08 | 13 | 2017-10-30 |
| 15.83 | 15 | 2017-10-30 |
| 8.75 | 5 | 2017-11-06 |
| 10.53 | 6 | 2017-11-06 |
| 13.73 | 7 | 2017-11-06 |
| 14.26 | 8 | 2017-11-06 |
| 19.45 | 10 | 2017-11-06 |
| 15.95 | 13 | 2017-11-06 |
| 14.16 | 15 | 2017-11-06 |
| 1.00 | 20 | 2017-11-13 |
| 0 | | 2017-11-20 |
| 2.50 | 6 | 2017-11-27 |
| 0 | | 2017-12-04 |
| 0 | | 2017-12-11 |
| 0 | | 2017-12-18 |
| 0 | | 2017-12-25 |
+-------+---------+------------+
However, this is difficult to parse particularly when there's no data for a week. What I would like is a pivot or crosstab table where the weeks are the columns and the rows are the users. And to include nulls from each (for instance if a user had no entries in that week or week without entries from any user).
Something like this
+---------+---------------+--------------+--------------+
| user_id | 2017-10-30 | 2017-11-06 | 2017-11-13 |
+---------+---------------+--------------+--------------+
| 6 | 4.0 | 1.0 | 0 |
| 7 | 4.0 | 1.0 | 0 |
| 8 | 4.0 | 0 | 0 |
| 9 | 0 | 1.0 | 0 |
| 10 | 4.0 | 0.04 | 0 |
+---------+---------------+--------------+--------------+
I've been looking around online and it seems that "dynamically" generating a list of columns for crosstab is difficult. I'd rather not hard code them, which seems weird to do anyway for dates. Or use something like this case with week number.
Should I look for another solution besides crosstab? If I could get the series of weeks for each user including all nulls I think that would be good enough. It just seems that right now my join strategy isn't returning that.
Personally I would use a Date Dimension table and use that table as the basis for the query. I find it far easier to use tabular data for these types of calculations as it leads to SQL that's easier to read and maintain. There's a great article on creating a Date Dimension table in PostgreSQL at https://medium.com/#duffn/creating-a-date-dimension-table-in-postgresql-af3f8e2941ac, though you could get away with a much simpler version of this table.
Ultimately what you would do is use the Date table as the base for the SELECT cols FROM table section and then join against that, or probably use Common Table Expressions, to create the calculations.
I'll write up a solution to that if you would like demonstrating how you could create such a query.

Tableau - Calculated field for difference between date and maximum date in table

I have the following table that I have loaded in Tableau (It has only one column CreatedOnDate)
+-----------------+
| CreatedOnDate |
+-----------------+
| 1/1/2016 |
| 1/2/2016 |
| 1/3/2016 |
| 1/4/2016 |
| 1/5/2016 |
| 1/6/2016 |
| 1/7/2016 |
| 1/8/2016 |
| 1/9/2016 |
| 1/10/2016 |
| 1/11/2016 |
| 1/12/2016 |
| 1/13/2016 |
| 1/14/2016 |
+-----------------+
I want to be able to find the maximum date in the table, compare it with every date in the table and get the difference in days. For the above table, the maximum date in table is 1/14/2016. Every date is compared to 1/14/2016 to find the difference.
Expected Output
+-----------------+------------+
| CreatedOnDate | Difference |
+-----------------+------------+
| 1/1/2016 | 13 |
| 1/2/2016 | 12 |
| 1/3/2016 | 11 |
| 1/4/2016 | 10 |
| 1/5/2016 | 9 |
| 1/6/2016 | 8 |
| 1/7/2016 | 7 |
| 1/8/2016 | 6 |
| 1/9/2016 | 5 |
| 1/10/2016 | 4 |
| 1/11/2016 | 3 |
| 1/12/2016 | 2 |
| 1/13/2016 | 1 |
| 1/14/2016 | 0 |
+-----------------+------------+
My goal is to create this Difference calculated field. I am struggling to find a way to do this using DATEDIFF.
And help would be appreciated!!
woodhead92, this approach would work, but means you have to use table calculations. Much more flexible approach (available since v8) is Level of Details expressions:
First, define a MAX date for the whole dataset with this calculated field called MaxDate LOD:
{FIXED : MAX(CreatedOnDate) }
This will always calculate the maximum date on table (will overwrite filters as well, if you need to reflect them, make sure you add them to context.
Then you can use pretty much the same calculated field, but no need for ATTR or Table Calculations:
DATEDIFF('day', [CreatedOnDate], [MaxDate LOD])
Hope this helps!

In postgresql, how do you find aggregate base on time range

For example, if I have a database table of transactions done over the counter. And I would like to search whether there was any time that was defined as extremely busy (Processed more than 10 transaction in the span of 10 minutes). How would I go about querying it? Could I aggregate based on time range and count the amount of transaction id within those ranges?
Adding example to clarify my input and desired output:
+----+--------------------+
| Id | register_timestamp |
+----+--------------------+
| 25 | 08:10:50 |
| 26 | 09:07:36 |
| 27 | 09:08:06 |
| 28 | 09:08:35 |
| 29 | 09:12:08 |
| 30 | 09:12:18 |
| 31 | 09:12:44 |
| 32 | 09:15:29 |
| 33 | 09:15:47 |
| 34 | 09:18:13 |
| 35 | 09:18:42 |
| 36 | 09:20:33 |
| 37 | 09:20:36 |
| 38 | 09:21:04 |
| 39 | 09:21:53 |
| 40 | 09:22:23 |
| 41 | 09:22:42 |
| 42 | 09:22:51 |
| 43 | 09:28:14 |
+----+--------------------+
Desired output would be something like:
+-------+----------+
| Count | Min |
+-------+----------+
| 1 | 08:10:50 |
| 3 | 09:07:36 |
| 7 | 09:12:08 |
| 8 | 09:20:33 |
+-------+----------+
How about this:
SELECT time,
FROM (
SELECT count(*) AS c, min(time) AS time
FROM transactions
GROUP BY floor(extract(epoch from time)/600);
)
WHERE c > 10;
This will find all ten minute intervals for which more than ten transactions occurred within that interval. It assumes that the table is called transactions and that it has a column called time where the timestamp is stored.
Thanks to redneb, I ended up with the following query:
SELECT count(*) AS c, min(register_timestamp) AS register_timestamp
FROM trak_participants_data
GROUP BY floor(extract(epoch from register_timestamp)/600)
order by register_timestamp
It works close enough for me to be able tell which time chunks are the most busiest for the counter.