Postgresql Use count on multiple columns - postgresql

I have two tables. The first generate the condition for counting records in the second. The two tables are linked by a relation of 1:1 by Timestamp.
The problem is that the second table have many columns, and we need a count for each column that match the condition in the first column.
Example:
Tables met and pot
CREATE TABLE met (
tstamp timestamp without time zone NOT NULL,
h1_rad double precision,
CONSTRAINT met_pkey PRIMARY KEY (tstamp)
)
CREATE TABLE pot (
tstamp timestamp without time zone NOT NULL,
c1 double precision,
c2 double precision,
c3 double precision,
CONSTRAINT met_pkey PRIMARY KEY (tstamp)
)
REALLY pot have 108 columns from c1 to c108.
Tables values:
+ Table met + + Table pot +
+----------------+--------+--+----------------+------+------+------+
| tstamp | h1_rad | | tstamp | c1 | c2 | c3 |
+----------------+--------+--+----------------+------+------+------+
| 20150101 00:00 | 0 | | 20150101 00:00 | 5,5 | 3,3 | 15,6 |
| 20150101 00:05 | 1,8 | | 20150101 00:05 | 12,8 | 15,8 | 1,5 |
| 20150101 00:10 | 15,4 | | 20150101 00:10 | 25,4 | 4,5 | 1,4 |
| 20150101 00:15 | 28,4 | | 20150101 00:15 | 18,3 | 63,5 | 12,5 |
| 20150101 00:20 | 29,4 | | 20150101 00:20 | 24,5 | 78 | 17,5 |
| 20150101 00:25 | 13,5 | | 20150101 00:25 | 12,8 | 5,4 | 18,4 |
| 20150102 00:00 | 19,5 | | 20150102 00:00 | 11,1 | 25,6 | 6,5 |
| 20150102 00:05 | 2,5 | | 20150102 00:05 | 36,5 | 21,4 | 45,2 |
| 20150102 00:10 | 18,4 | | 20150102 00:10 | 1,4 | 35,5 | 63,5 |
| 20150102 00:15 | 20,4 | | 20150102 00:15 | 18,4 | 23,4 | 8,4 |
| 20150102 00:20 | 6,8 | | 20150102 00:20 | 16,8 | 12,5 | 18,4 |
| 20150102 00:25 | 17,4 | | 20150102 00:25 | 25,8 | 23,5 | 9,5 |
+----------------+--------+--+----------------+------+------+------+
What i need is the number of rows of pot where value is higher than 15 when in met the value is higher than 15 with the same timestamp, grouped by day.
With the data supplied we need something like:
+----------+----+----+----+
| day | c1 | c2 | c3 |
+----------+----+----+----+
| 20150101 | 3 | 2 | 1 |
| 20150102 | 2 | 4 | 1 |
+----------+----+----+----+
How can i get this ?
Is this possible with a single query even with subquerys ?
Actually the raw data is stored every minute in others tables. The tables met and pot are summarized and filtered tables for performance.
If necessary, i can create tables with data summarized by days if this simplify the solution.
Thanks
P.D.
Sorry for my english

You can solve this with some CASE statements. Test for both conditions, and if true return a 1. Then SUM() the results using a GROUP BY on the timestamp converted to a date to get your total:
SELECT
date(met.tstamp),
SUM(CASE WHEN met.h1_rad > 15 AND pot.c1 > 15 THEN 1 END) as C1,
SUM(CASE WHEN met.h1_rad > 15 AND pot.c2 > 15 THEN 1 END) as C2,
SUM(CASE WHEN met.h1_rad > 15 AND pot.c3 > 15 THEN 1 END) as C3
FROM
met INNER JOIN pot ON met.tstamp = pot.tstamp
GROUP BY date(met.tstamp)

Related

Value of first day of current month (except zero)

I need a way to get the value on column B that corresponds to the 1st day of the present month like so:
Table A
+---------------------+------+
| ColA | ColB |
+---------------------+------+
| 28/10/2012 00:19:01 | 42 |
| 29/10/2012 00:29:01 | 100 |
| 30/10/2012 00:39:01 | 23 |
| 31/10/2012 00:29:01 | 1 |
| 1/11/2012 00:19:01 | 24 |<---
| 2/11/2012 00:19:01 | 4 |
| 3/11/2012 00:19:01 | 2 |
+---------------------+------+
Table B
+---------------------+------+
| ColA | ColB |
+---------------------+------+
| 28/11/2012 00:19:01 | 67 |
| 29/11/2012 00:29:01 | 2 |
| 30/11/2012 00:39:01 | 63 |
| 31/11/2012 00:29:01 | 5 |
| 1/12/2012 00:19:01 | 69 |<---
| 2/12/2012 00:19:01 | 42 |
| 3/12/2012 00:19:01 | 6 |
+---------------------+------+
Table C
+---------------------+------+
| ColA | ColB |
+---------------------+------+
| 28/11/2012 00:19:01 | 11 |
| 29/11/2012 00:29:01 | 12 |
| 30/11/2012 00:39:01 | 3 |
| 31/11/2012 00:29:01 | 20 |
| 1/12/2012 00:19:01 | 0 |
| 2/12/2012 00:19:01 | 71 |<---
| 3/12/2012 00:19:01 | 21 |
+---------------------+------+
So I need to be able to have a formula that I can use in a cell anywhere on my sheet that gets me that "24" in Table A. In other words, it would always get me the value corresponding to the first day of the current month.
Then when the month is over and the next month starts, the same formula will get me the value on column B corresponding to the 1st day of the present month again, in Table B that would be "69".
Now one thing I'd like to add is, if the value on column B corresponding to the 1st day of the month is equal to "0" then the formula will search for the next cell/day until if finds a value greater than "0", and it outputs that one. See Table C. In this example it would be "71".
Is this possible? I imagine so I just can't figure out how to go about doing it.
Dummy file:
https://docs.google.com/spreadsheets/d/1ExXtmQ8nyuV1o_UtabVJ-TifIbORItFMWjtN6ZlruWc/edit?usp=sharing
if your timestamps are sorted like in your example try:
=--FILTER(B:B, B:B>0, MONTH(A:A)=MONTH(TODAY()))
You can use MINIFS to get the first date of this month that has a corresponding value > 0:
=MINIFS(A1:A7,B1:B7,">0",A1:A7,">"&EOMONTH(TODAY(),-1),A1:A7,"<="&EOMONTH(TODAY(),0))
You can then combine that formula with a basic INDEX/MATCH to get the correct value:
=INDEX(B1:B7,MATCH(MINIFS(A1:A7,B1:B7,">0",A1:A7,">"&EOMONTH(TODAY(),-1),A1:A7,"<="&EOMONTH(TODAY(),0)),A1:A7,0))

T-SQL Query with SUM, DATEADD logic

Iโ€™m looking for a query that can calculate the Calculation table from the Value table.
The query had to take the Date value, like โ€˜2017-03-01โ€™ but I need to take the values from the records 2 months before, but it must be the record with the same ID. In this scenario It must take the values from 2017-03-01 , 2017-02-01, 2017-01-01 (993, 492, 312) and sum together (1,797) and store it in the 2017-03-01 record like below from the customer where CustomerID = 1001.
|1001 | 2017-02-01 | 492 | |
|1001 | 2017-03-01 | 993 | |
|1002 | 2017-01-01 | 838 | 1797 |
This need to be done of all records.
Of course, some record cannot go back minus 2 months, but those values can stay null.
I really donโ€™t know to write this query.
Got some test queries to make some steps like:
SELECT SUM(Value) FROM TestTable WHERE Date BETWEEN Date AND DATEADD(month, -2, Date);
+------------+------------+-------+-------------+
| CustomerID | Date | Value | Calculation |
+------------+------------+-------+-------------+
| 1001 | 2016-08-01 | 123 | |
| 1001 | 2016-09-01 | 434 | |
| 1001 | 2016-10-01 | 423 | |
| 1001 | 2016-11-01 | 235 | |
| 1001 | 2016-12-01 | 432 | |
| 1001 | 2017-01-01 | 312 | |
| 1001 | 2017-02-01 | 492 | |
| 1001 | 2017-03-01 | 993 | |
| 1002 | 2017-01-01 | 838 | |
| 1002 | 2017-02-01 | 234 | |
| 1002 | 2017-03-01 | 453 | |
| 1002 | 2017-04-01 | 838 | |
| 1003 | 2017-01-01 | 746 | |
| 1003 | 2017-02-01 | 242 | |
| 1003 | 2017-03-01 | 432 | |
| 1004 | 2017-01-01 | 431 | |
| 1004 | 2017-02-01 | 113 | |
+------------+------------+-------+-------------+
I want my table like below
+------------+------------+-------+-------------+
| CustomerID | Date | Value | Calculation |
+------------+------------+-------+-------------+
| 1001 | 2016-08-01 | 123 | NULL |
| 1001 | 2016-09-01 | 434 | NULL |
| 1001 | 2016-10-01 | 423 | 980 |
| 1001 | 2016-11-01 | 235 | 1092 |
| 1001 | 2016-12-01 | 432 | 1090 |
| 1001 | 2017-01-01 | 312 | 979 |
| 1001 | 2017-02-01 | 492 | 1236 |
| 1001 | 2017-03-01 | 993 | 1797 |
| 1002 | 2017-01-01 | 838 | NULL |
| 1002 | 2017-02-01 | 234 | NULL |
| 1002 | 2017-03-01 | 453 | 1525 |
| 1002 | 2017-04-01 | 838 | 1525 |
| 1003 | 2017-01-01 | 746 | NULL |
| 1003 | 2017-02-01 | 242 | NULL |
| 1003 | 2017-03-01 | 432 | 1420 |
| 1004 | 2017-01-01 | 431 | NULL |
| 1004 | 2017-02-01 | 113 | NULL |
+------------+------------+-------+-------------+
I hope you can help me with this! ๐Ÿ˜‰
--First Create Table
create table Testtable(
CustomerID int,
Date date,
Value int
)
--Insert test values
insert into Testtable VALUES(1001,'2016-08-01',123),
(1001,'2016-09-01',434),
(1001,'2016-10-01',423),
(1001,'2016-11-01',235),
(1001,'2016-12-01',432),
(1001,'2017-01-01',312),
(1001,'2017-02-01',492),
(1001,'2017-03-01',993),
(1002,'2017-01-01',838),
(1002,'2017-02-01',234),
(1002,'2017-03-01',453),
(1002,'2017-04-01',838),
(1003,'2017-01-01',746),
(1003,'2017-02-01',242),
(1003,'2017-03-01',432),
(1004,'2017-01-01',431),
(1004,'2017-02-01',113);
--Select Query
SELECT
CustomerID,
Date,
Value,
CASE WHEN (SELECT COUNT(*) FROM Testtable T4 WHERE T4.CustomerID = T3.CustomerID AND T4.Date < T3.Date) < 2 THEN NULL
ELSE Calculation END AS Calculation
FROM
(SELECT
*,
(SELECT SUM(T2.Value) FROM Testtable T2 WHERE T.CustomerID = T2.CustomerID AND T2.Date BETWEEN DATEADD(month,-2,T.Date) AND T.Date) AS Calculation
FROM Testtable T) AS T3
This might take some trial and error to get completely correct but I'll give it a shot, try below:
SELECT CustomerID
, Date
, Value
, Value +
(SELECT Value from table_name where CustomerID = x.CustomerID and Date =
DATEADD(m,-1,x.Date)) +
(SELECT Value from table_name where CustomerID = x.CustomerID and Date =
DATEADD(m,-2,x.Date)) as Calculation
FROM table_name x
Note that this will only work it the CustomerID/Date are a composite key in your table.
Hope this helps!

Return unique grouped rows with the latest timestamp [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 3 years ago.
At the moment I'm struggling with a problem that looks very easy.
Tablecontent:
Primay Keys: Timestamp, COL_A,COL_B ,COL_C,COL_D
+------------------+-------+-------+-------+-------+--------+--------+
| Timestamp | COL_A | COL_B | COL_C | COL_D | Data_A | Data_B |
+------------------+-------+-------+-------+-------+--------+--------+
| 31.07.2019 15:12 | - | - | - | - | 1 | 2 |
| 31.07.2019 15:32 | 1 | 1 | 100 | 1 | 5000 | 20 |
| 10.08.2019 09:33 | - | - | - | - | 1000 | 7 |
| 31.07.2019 15:38 | 1 | 1 | 100 | 1 | 33 | 5 |
| 06.08.2019 08:53 | - | - | - | - | 0 | 7 |
| 06.08.2019 09:08 | - | - | - | - | 0 | 7 |
| 06.08.2019 16:06 | 3 | 3 | 3 | 3 | 0 | 23 |
| 07.08.2019 10:43 | - | - | - | - | 0 | 42 |
| 07.08.2019 13:10 | - | - | - | - | 0 | 24 |
| 08.08.2019 07:19 | 11 | 111 | 111 | 12 | 0 | 2 |
| 08.08.2019 10:54 | 2334 | 65464 | 565 | 76 | 1000 | 19 |
| 08.08.2019 11:15 | 232 | 343 | 343 | 43 | 0 | 2 |
| 08.08.2019 11:30 | 2323 | rtttt | 3434 | 34 | 0 | 2 |
| 10.08.2019 14:47 | - | - | - | - | 123 | 23 |
+------------------+-------+-------+-------+-------+--------+--------+
Needed query output:
+------------------+-------+-------+-------+-------+--------+--------+
| Timestamp | COL_A | COL_B | COL_C | COL_D | Data_A | Data_B |
+------------------+-------+-------+-------+-------+--------+--------+
| 31.07.2019 15:38 | 1 | 1 | 100 | 1 | 33 | 5 |
| 06.08.2019 16:06 | 3 | 3 | 3 | 3 | 0 | 23 |
| 08.08.2019 07:19 | 11 | 111 | 111 | 12 | 0 | 2 |
| 08.08.2019 10:54 | 2334 | 65464 | 565 | 76 | 1000 | 19 |
| 08.08.2019 11:15 | 232 | 343 | 343 | 43 | 0 | 2 |
| 08.08.2019 11:30 | 2323 | rtttt | 3434 | 34 | 0 | 2 |
| 10.08.2019 14:47 | - | - | - | - | 123 | 23 |
+------------------+-------+-------+-------+-------+--------+--------+
As you can see, I'm trying to get single rows for my primary keys, using the latest timestamp, which is also a primary key.
Currently, I tried a query like:
SELECT Timestamp, COL_A, COL_B, COL_C, COL_D, Data_A, Data_B From Table XY op
WHERE Timestamp = (
SELECT MAX(Timestamp) FROM XY as tsRow
WHERE op.COL_A = tsRow.COL_A
AND op.COL_B = tsRow.COL_B
AND op.COL_C = tsRow.COL_C
AND op.COL_D = tsRow."COL_D
);
which gives me result that looks fine at first glance.
Is there a better or more safe way to get my preferred result?
demo:db<>fiddle
You can use the DISTINCT ON clause, which gives you the first record of an ordered group. Here your group is your (A, B, C, D). This is ordered by the Timestamp column, in descending order, to get the most recent record to be the first.
SELECT DISTINCT ON ("COL_A", "COL_B", "COL_C", "COL_D")
*
FROM
mytable
ORDER BY "COL_A", "COL_B", "COL_C", "COL_D", "Timestamp" DESC
If you want to get your expected order, you need a second ORDER BY after this operation:
SELECT
*
FROM (
SELECT DISTINCT ON ("COL_A", "COL_B", "COL_C", "COL_D")
*
FROM
mytable
ORDER BY "COL_A", "COL_B", "COL_C", "COL_D", "Timestamp" DESC
) s
ORDER BY "Timestamp"
Note: If you have the Timestamp column as part of the PK, are you sure, you really need the four other columns as PK as well? It seems, that the TS column is already unique.

PostgreSQL Crosstab generate_series of weeks for columns

From a table of "time entries" I'm trying to create a report of weekly totals for each user.
Sample of the table:
+-----+---------+-------------------------+--------------+
| id | user_id | start_time | hours_worked |
+-----+---------+-------------------------+--------------+
| 997 | 6 | 2018-01-01 03:05:00 UTC | 1.0 |
| 996 | 6 | 2017-12-01 05:05:00 UTC | 1.0 |
| 998 | 6 | 2017-12-01 05:05:00 UTC | 1.5 |
| 999 | 20 | 2017-11-15 19:00:00 UTC | 1.0 |
| 995 | 6 | 2017-11-11 20:47:42 UTC | 0.04 |
+-----+---------+-------------------------+--------------+
Right now I can run the following and basically get what I need
SELECT COALESCE(SUM(time_entries.hours_worked),0) AS total,
time_entries.user_id,
week::date
--Using generate_series here to account for weeks with no time entries when
--doing the join
FROM generate_series( (DATE_TRUNC('week', '2017-11-01 00:00:00'::date)),
(DATE_TRUNC('week', '2017-12-31 23:59:59.999999'::date)),
interval '7 day') as week LEFT JOIN time_entries
ON DATE_TRUNC('week', time_entries.start_time) = week
GROUP BY week, time_entries.user_id
ORDER BY week
This will return
+-------+---------+------------+
| total | user_id | week |
+-------+---------+------------+
| 14.08 | 5 | 2017-10-30 |
| 21.92 | 6 | 2017-10-30 |
| 10.92 | 7 | 2017-10-30 |
| 14.26 | 8 | 2017-10-30 |
| 14.78 | 10 | 2017-10-30 |
| 14.08 | 13 | 2017-10-30 |
| 15.83 | 15 | 2017-10-30 |
| 8.75 | 5 | 2017-11-06 |
| 10.53 | 6 | 2017-11-06 |
| 13.73 | 7 | 2017-11-06 |
| 14.26 | 8 | 2017-11-06 |
| 19.45 | 10 | 2017-11-06 |
| 15.95 | 13 | 2017-11-06 |
| 14.16 | 15 | 2017-11-06 |
| 1.00 | 20 | 2017-11-13 |
| 0 | | 2017-11-20 |
| 2.50 | 6 | 2017-11-27 |
| 0 | | 2017-12-04 |
| 0 | | 2017-12-11 |
| 0 | | 2017-12-18 |
| 0 | | 2017-12-25 |
+-------+---------+------------+
However, this is difficult to parse particularly when there's no data for a week. What I would like is a pivot or crosstab table where the weeks are the columns and the rows are the users. And to include nulls from each (for instance if a user had no entries in that week or week without entries from any user).
Something like this
+---------+---------------+--------------+--------------+
| user_id | 2017-10-30 | 2017-11-06 | 2017-11-13 |
+---------+---------------+--------------+--------------+
| 6 | 4.0 | 1.0 | 0 |
| 7 | 4.0 | 1.0 | 0 |
| 8 | 4.0 | 0 | 0 |
| 9 | 0 | 1.0 | 0 |
| 10 | 4.0 | 0.04 | 0 |
+---------+---------------+--------------+--------------+
I've been looking around online and it seems that "dynamically" generating a list of columns for crosstab is difficult. I'd rather not hard code them, which seems weird to do anyway for dates. Or use something like this case with week number.
Should I look for another solution besides crosstab? If I could get the series of weeks for each user including all nulls I think that would be good enough. It just seems that right now my join strategy isn't returning that.
Personally I would use a Date Dimension table and use that table as the basis for the query. I find it far easier to use tabular data for these types of calculations as it leads to SQL that's easier to read and maintain. There's a great article on creating a Date Dimension table in PostgreSQL at https://medium.com/#duffn/creating-a-date-dimension-table-in-postgresql-af3f8e2941ac, though you could get away with a much simpler version of this table.
Ultimately what you would do is use the Date table as the base for the SELECT cols FROM table section and then join against that, or probably use Common Table Expressions, to create the calculations.
I'll write up a solution to that if you would like demonstrating how you could create such a query.

Crosstab function and Dates PostgreSQL

I had to create a cross tab table from a Query where dates will be changed into column names. These order dates can be increase or decrease as per the dates passed in the query. The order date is in Unix format which is changed into normal format.
Query is following:
Select cd.cust_id
, od.order_id
, od.order_size
, (TIMESTAMP 'epoch' + od.order_date * INTERVAL '1 second')::Date As order_date
From consumer_details cd,
consumer_order od,
Where cd.cust_id = od.cust_id
And od.order_date Between 1469212200 And 1469212600
Order By od.order_id, od.order_date
Table as follows:
cust_id | order_id | order_size | order_date
-----------|----------------|---------------|--------------
210721008 | 0437756 | 4323 | 2016-07-22
210721008 | 0437756 | 4586 | 2016-09-24
210721019 | 10749881 | 0 | 2016-07-28
210721019 | 10749881 | 0 | 2016-07-28
210721033 | 13639 | 2286145 | 2016-09-06
210721033 | 13639 | 2300040 | 2016-10-03
Result will be:
cust_id | order_id | 2016-07-22 | 2016-09-24 | 2016-07-28 | 2016-09-06 | 2016-10-03
-----------|----------------|---------------|---------------|---------------|---------------|---------------
210721008 | 0437756 | 4323 | 4586 | | |
210721019 | 10749881 | | | 0 | |
210721033 | 13639 | | | | 2286145 | 2300040