PostgreSQL Crosstab generate_series of weeks for columns - postgresql

From a table of "time entries" I'm trying to create a report of weekly totals for each user.
Sample of the table:
+-----+---------+-------------------------+--------------+
| id | user_id | start_time | hours_worked |
+-----+---------+-------------------------+--------------+
| 997 | 6 | 2018-01-01 03:05:00 UTC | 1.0 |
| 996 | 6 | 2017-12-01 05:05:00 UTC | 1.0 |
| 998 | 6 | 2017-12-01 05:05:00 UTC | 1.5 |
| 999 | 20 | 2017-11-15 19:00:00 UTC | 1.0 |
| 995 | 6 | 2017-11-11 20:47:42 UTC | 0.04 |
+-----+---------+-------------------------+--------------+
Right now I can run the following and basically get what I need
SELECT COALESCE(SUM(time_entries.hours_worked),0) AS total,
time_entries.user_id,
week::date
--Using generate_series here to account for weeks with no time entries when
--doing the join
FROM generate_series( (DATE_TRUNC('week', '2017-11-01 00:00:00'::date)),
(DATE_TRUNC('week', '2017-12-31 23:59:59.999999'::date)),
interval '7 day') as week LEFT JOIN time_entries
ON DATE_TRUNC('week', time_entries.start_time) = week
GROUP BY week, time_entries.user_id
ORDER BY week
This will return
+-------+---------+------------+
| total | user_id | week |
+-------+---------+------------+
| 14.08 | 5 | 2017-10-30 |
| 21.92 | 6 | 2017-10-30 |
| 10.92 | 7 | 2017-10-30 |
| 14.26 | 8 | 2017-10-30 |
| 14.78 | 10 | 2017-10-30 |
| 14.08 | 13 | 2017-10-30 |
| 15.83 | 15 | 2017-10-30 |
| 8.75 | 5 | 2017-11-06 |
| 10.53 | 6 | 2017-11-06 |
| 13.73 | 7 | 2017-11-06 |
| 14.26 | 8 | 2017-11-06 |
| 19.45 | 10 | 2017-11-06 |
| 15.95 | 13 | 2017-11-06 |
| 14.16 | 15 | 2017-11-06 |
| 1.00 | 20 | 2017-11-13 |
| 0 | | 2017-11-20 |
| 2.50 | 6 | 2017-11-27 |
| 0 | | 2017-12-04 |
| 0 | | 2017-12-11 |
| 0 | | 2017-12-18 |
| 0 | | 2017-12-25 |
+-------+---------+------------+
However, this is difficult to parse particularly when there's no data for a week. What I would like is a pivot or crosstab table where the weeks are the columns and the rows are the users. And to include nulls from each (for instance if a user had no entries in that week or week without entries from any user).
Something like this
+---------+---------------+--------------+--------------+
| user_id | 2017-10-30 | 2017-11-06 | 2017-11-13 |
+---------+---------------+--------------+--------------+
| 6 | 4.0 | 1.0 | 0 |
| 7 | 4.0 | 1.0 | 0 |
| 8 | 4.0 | 0 | 0 |
| 9 | 0 | 1.0 | 0 |
| 10 | 4.0 | 0.04 | 0 |
+---------+---------------+--------------+--------------+
I've been looking around online and it seems that "dynamically" generating a list of columns for crosstab is difficult. I'd rather not hard code them, which seems weird to do anyway for dates. Or use something like this case with week number.
Should I look for another solution besides crosstab? If I could get the series of weeks for each user including all nulls I think that would be good enough. It just seems that right now my join strategy isn't returning that.

Personally I would use a Date Dimension table and use that table as the basis for the query. I find it far easier to use tabular data for these types of calculations as it leads to SQL that's easier to read and maintain. There's a great article on creating a Date Dimension table in PostgreSQL at https://medium.com/#duffn/creating-a-date-dimension-table-in-postgresql-af3f8e2941ac, though you could get away with a much simpler version of this table.
Ultimately what you would do is use the Date table as the base for the SELECT cols FROM table section and then join against that, or probably use Common Table Expressions, to create the calculations.
I'll write up a solution to that if you would like demonstrating how you could create such a query.

Related

Spotfire - Calculate average only if there are minimum 3 values

I want to create a cross table in Spotfire where in which Average is calculated only when there are at least 3 values. If there are no values or less than 3 values the average should be blank.
+-------+-----+---------+
| Month | Age | Average |
+-------+-----+---------+
| 1 | 10 | |
| 2 | 11 | |
| 3 | 2 | 7.7 |
| 4 | | |
| 5 | 13 | |
| 6 | 14 | |
| 7 | | |
| 8 | 19 | |
| 9 | 20 | |
| 10 | 21 | 20 |
+-------+-----+---------+
If I'm understanding you correctly, you want to group by Month, and then have something like this as your aggregation:
If(Count()>2,Avg([Age]),null) as [AverageAge_3Min]

Are pg_stat_database and pg_stat_activity really listing the same stuff aka how do I get a list of all backends

In this answer to the question Right query to get the current number of connections in a PostgreSQL DB the poster implies that
SELECT sum(numbackends) FROM pg_stat_database;
and
SELECT count(*) FROM pg_stat_activity;
give the same results.
However, if I do this on my db the first one says 119 and the second one 30.
This is the difference as shown by summing numbackends and counting:
+------+-------------+-------+
| | numbackends | count |
+------+-------------+-------+
| db1 | 1 | 1 |
| db2 | 1 | 1 |
| db3 | 1 | 1 |
| db4 | 1 | 1 |
| db5 | 2 | 2 |
| db6 | 2 | 2 |
| db7 | 12 | 3 | <--
| db8 | 4 | 4 |
| db9 | 5 | 5 |
| db10 | 78 | 35 | <--
+------+-------------+-------+
Why does this difference exist?
How can I list each of the 119-30=89 backends not shown in pg_stat_activity?

Tableau - Calculated field for difference between date and maximum date in table

I have the following table that I have loaded in Tableau (It has only one column CreatedOnDate)
+-----------------+
| CreatedOnDate |
+-----------------+
| 1/1/2016 |
| 1/2/2016 |
| 1/3/2016 |
| 1/4/2016 |
| 1/5/2016 |
| 1/6/2016 |
| 1/7/2016 |
| 1/8/2016 |
| 1/9/2016 |
| 1/10/2016 |
| 1/11/2016 |
| 1/12/2016 |
| 1/13/2016 |
| 1/14/2016 |
+-----------------+
I want to be able to find the maximum date in the table, compare it with every date in the table and get the difference in days. For the above table, the maximum date in table is 1/14/2016. Every date is compared to 1/14/2016 to find the difference.
Expected Output
+-----------------+------------+
| CreatedOnDate | Difference |
+-----------------+------------+
| 1/1/2016 | 13 |
| 1/2/2016 | 12 |
| 1/3/2016 | 11 |
| 1/4/2016 | 10 |
| 1/5/2016 | 9 |
| 1/6/2016 | 8 |
| 1/7/2016 | 7 |
| 1/8/2016 | 6 |
| 1/9/2016 | 5 |
| 1/10/2016 | 4 |
| 1/11/2016 | 3 |
| 1/12/2016 | 2 |
| 1/13/2016 | 1 |
| 1/14/2016 | 0 |
+-----------------+------------+
My goal is to create this Difference calculated field. I am struggling to find a way to do this using DATEDIFF.
And help would be appreciated!!
woodhead92, this approach would work, but means you have to use table calculations. Much more flexible approach (available since v8) is Level of Details expressions:
First, define a MAX date for the whole dataset with this calculated field called MaxDate LOD:
{FIXED : MAX(CreatedOnDate) }
This will always calculate the maximum date on table (will overwrite filters as well, if you need to reflect them, make sure you add them to context.
Then you can use pretty much the same calculated field, but no need for ATTR or Table Calculations:
DATEDIFF('day', [CreatedOnDate], [MaxDate LOD])
Hope this helps!

In postgresql, how do you find aggregate base on time range

For example, if I have a database table of transactions done over the counter. And I would like to search whether there was any time that was defined as extremely busy (Processed more than 10 transaction in the span of 10 minutes). How would I go about querying it? Could I aggregate based on time range and count the amount of transaction id within those ranges?
Adding example to clarify my input and desired output:
+----+--------------------+
| Id | register_timestamp |
+----+--------------------+
| 25 | 08:10:50 |
| 26 | 09:07:36 |
| 27 | 09:08:06 |
| 28 | 09:08:35 |
| 29 | 09:12:08 |
| 30 | 09:12:18 |
| 31 | 09:12:44 |
| 32 | 09:15:29 |
| 33 | 09:15:47 |
| 34 | 09:18:13 |
| 35 | 09:18:42 |
| 36 | 09:20:33 |
| 37 | 09:20:36 |
| 38 | 09:21:04 |
| 39 | 09:21:53 |
| 40 | 09:22:23 |
| 41 | 09:22:42 |
| 42 | 09:22:51 |
| 43 | 09:28:14 |
+----+--------------------+
Desired output would be something like:
+-------+----------+
| Count | Min |
+-------+----------+
| 1 | 08:10:50 |
| 3 | 09:07:36 |
| 7 | 09:12:08 |
| 8 | 09:20:33 |
+-------+----------+
How about this:
SELECT time,
FROM (
SELECT count(*) AS c, min(time) AS time
FROM transactions
GROUP BY floor(extract(epoch from time)/600);
)
WHERE c > 10;
This will find all ten minute intervals for which more than ten transactions occurred within that interval. It assumes that the table is called transactions and that it has a column called time where the timestamp is stored.
Thanks to redneb, I ended up with the following query:
SELECT count(*) AS c, min(register_timestamp) AS register_timestamp
FROM trak_participants_data
GROUP BY floor(extract(epoch from register_timestamp)/600)
order by register_timestamp
It works close enough for me to be able tell which time chunks are the most busiest for the counter.

Subtract fields of a column - Tableau

I would like to subtract promoters and detractors in Tableau by creating a new column. Thanks for all the help!
Customer Type Table (I would like to create the NPS field as shown below):
+---------+------------+----------+-----------+--------------+
| Quarter | Detractors | Passives | Promoters | NPS |
+---------+------------+----------+-----------+--------------+
| Q1 15 | 40.56 | 23.56 | 35.79 | =35.79-40.56 |
| ... | ... | ... | ... | ... |
+---------+------------+----------+-----------+--------------+
Simply create a calculated field (called NPS):
[Promoters] - [Detractors]
This will add a new field to every row of your partition called NPS.
Check out the Tableau online help on calculated fields - this is a skill well worth learning.
I understand the OPs question. The data comes in like this:
+---------+---------------+------+
| Quarter | Customer Type | Score|
+---------+------------+---------+
| Q1 15 | Detractors | 25 |
| Q1 15 | Promoters | 32 |
| Q1 15 | Passives | 45 |
| Q1 15 | Detractors | 17 |
| Q1 15 | Detractors | 28 |
| ... | ... | ... |
+---------+------------+---------+
And when brought into Tableau, the [Customer Type] field is put in the Column shelf and this arranges the data like the table below. The OP wants to calculate the [NPS] column (Promoters - Detractors).
+---------+------------+----------+-----------+--------------+
| Quarter | Detractors | Passives | Promoters | NPS |
+---------+------------+----------+-----------+--------------+
| Q1 15 | 40.56 | 23.56 | 35.79 | =35.79-40.56 |
| ... | ... | ... | ... | ... |
+---------+------------+----------+-----------+--------------+
I hope this clarifies. I am stuck with a similar situation (I want a column that shows the difference between 2015 and 2016):
+---------+-------+-------+------------+
| Measure | 2015 | 2016 | Difference |
+---------+---------------+------------+
| # Hires | 100 | 115 | 15 |
| # Terms | 9 | 6 | 3 |
+---------+---------------+------------+
I believe the steps are similar. I hope someone can help.