transpose a matrix in postgresql without using crosstab - postgresql

In order to share some insights automatically in a slack channel, I need to created a transposed matrix from the input data i have, below you can find the input and output (used excel).
Please note that i cannot use tablefunc extention.

You can use filtered aggregation as shown in this answer
select cat,
max(lr_perc) filter (where day = 1) as "1",
max(lr_perc) filter (where day = 2) as "2",
max(lr_perc) filter (where day = 3) as "3",
....
from the_table
group by cat;

Related

Azure Data Factory : returns an array of dates from a specified range

I'm trying to returns an array of dates in data factory. But i just want the user to specify a date range with two parameters, startDate and endDate :
I want to return this array by specifying "12-08-2020" and "12-13-2020" in trigger :
["12-08-2020","12-09-2020","12-10-2020","12-12-2020","12-13-2020"]
Do did not find a simple way to do it yet.
One way i thought about would be :
add a lookup activity on a date dimension,
then add two filters to select only items greater than startDate and lower than endDate.
But this seems to be cumbersome and overkill. Is there a simpler way to do it ?
EDIT :
This answer seems to be relevant (i did not see it at first) : Execute azure data factory foreach activity with start date and end date
I think we can use recursive query in Lookup activity.
The pseudo code is as follows:
In sql we can use this query to get a table:
;with temp as
(
select CONVERT(varchar(100),'12-08-2020', 110) as dt
union all
select CONVERT(varchar(100), DATEADD(day,1,dt), 110) from temp
where datediff(day,CONVERT(varchar(100), DATEADD(day,1,dt), 110),'12-13-2020')>=0
) select * from temp
The result is as follows:
So in ADF, I think we can use a Lookup sql query to return the result what you want.
According to this official document, we only need to replace the parameters of the sql statement.
Next,I will use '#{pipeline().parameters.startDate}' to return a date string, note: There is a pair of single quotes outside.
I set two parameters as follows:
Type the following code into a Lookup activity.
;with temp as
(
select CONVERT(varchar(100),'#{pipeline().parameters.startDate}', 110) as dt
union all
select CONVERT(varchar(100), DATEADD(day,1,dt), 110) from temp
where datediff(day,CONVERT(varchar(100), DATEADD(day,1,dt), 110),'#{pipeline().parameters.endDate}')>=0
) select * from temp
Don't select First row only.
The debug result is as follows:
I had similar use case and ended up using Until with little changes.
Two parameters which takes two parameters start_day and end_day
Also have to introduce two variables for implementing counter logic. more details can be found at how-to-increment-parameter-data-factory
and finally the expression in the untill block is
#less(int(adddays(pipeline().parameters.end_day, 0, 'yyyyMMdd')), int(adddays(pipeline().parameters.start_day, int(variables('counter')), 'yyyyMMdd')))
final note the until block executes when the expression returns false and loops out on true
I managed to get something similar to work with a combination of derived column transformation using a mapLoop() function followed by a flatten transformation.
The derived column expression first calculates an array of dates in a single column.
mapLoop(toInteger((To_Date - From_Date)/86400000)),toDate(addDays(From_Date,#index)))
where 86400000 is the number of miliseconds in 24 hours
The flatten transformation uses this column to unroll the array into separate rows.

Calculate sum of unique tag values

sometimes it appears that one have to calculate SUM of unique TAG values in InfluxDB. How to do it?
For example I have multiple users who downloads software. Now I want to extract how many unique users downloaded it.
Following query was tested in Grafana to calculate unique users and also consider time filter applied to the database.
To do this we need to apply subquery first to calculate mean values, this basically will result a table with value 1 associated for each user:
SELECT mean("count") FROM "autogen"."downloads" WHERE $timeFilter GROUP BY "username"
Here count is an integer value that is equal to 1 for each time user downloads the software.
After we can calculate sum of these mean values, yes, this is not cheap if you have a huge database, but still is a working solution:
SELECT SUM(mean) FROM (
SELECT mean("count") FROM "autogen"."downloads" WHERE $timeFilter GROUP BY "username"
)
Please go ahead and propose best performing/more native solution, this will be nice to apply for larger DBs

Group by month using cell above filtered cells

I have a spreadsheet that looks like this:
https://docs.google.com/spreadsheets/d/1b29gyEgCDwor_KJ6ACP2rxdvauOzacDI9FL2K-jgg5E/edit?usp=sharing
I have two columns I'm interested in, Date and Count. Every few dates, there will be a "TOTAL" line where all the Counts corresponding to that TOTAL will be summed.
I want an output that looks like the cells to the right, where all the TOTAL counts are summed according to month. The problem lies in that Column A has only the date or TOTAL, in separate rows, and this layout can't be changed, leaving me thinking I need to reference the cell directly above TOTAL in column A, which has the correct month I want to group that TOTAL by.
The reason why I can't just filter column A by date range is because of inconsistent use, where sometimes the count data is only entered in the TOTAL row.
I've scoured the internet exploring FILTER, INDIRECT, QUERY, SUMIFS, etc... but can't find exactly how to do this.
I can easily filter column B where A:A="TOTAL", but what I think I am needing to do after that is use each cell above where A:A="TOTAL" as a range for the month criteria, somehow using what I found here: https://exceljet.net/formula/sum-by-month, expressed by ">="&D3 and "<="&EOMONTH(D3,0).
Any help or alternatives would be appreciated. Thank you.
or a different (offset) approach:
=QUERY(FILTER({EOMONTH(INDIRECT("A1:A"&ROWS(B2:B)), 0), B2:B}, A2:A="total"),
"select Col1,sum(Col2)
group by Col1
label sum(Col2)''
format Col1'mmmm'", 0)
Query formula is great for these kind of situations but looking at it by month will introduce issues if you plan on looking at multi-year data:
=arrayformula(QUERY(QUERY({row(A:A),TEXT(A:A,"MMMM"),B:B},"SELECT max(Col1),Col2,sum(Col3) where Col3 is not null group by Col2 order by max(Col1) label Col2 'Month', sum(Col3) 'Count'"),"SELECT Col2,Col3"))
try:
=ARRAYFORMULA(QUERY({IF((B2:B="")*(A2:A=""),,VLOOKUP(ROW(A2:A),
IF(A2:A<>"total", {ROW(A2:A), DATEVALUE("01/"&MONTH(A2:A)&"/2000")}), 2, 1)),
IF(A2:A= "total", A2:A, ), B2:B},
"select Col1,sum(Col3)
where lower(Col2) = 'total'
group by Col1
label sum(Col3)''
format Col1'mmmm'", 0))

Google Sheets Query - Sort By Date; Blanks/Null to the bottom

Running into an issue when running a query in Google Sheets. The results of the array formula query are correct but the column utilized to order the results (Col1) is comprised of both blank/null cells and dates. As such, when ordered by this column the blank/null values are listed first before the dates. Is it possible to have the dates ranked first and push the blank/null cells to the bottom?
Ordering by DESC will not work as I would want the earlier dates listed first. Additionally, the blank/null cells cannot be excluded entirely from the results either (e.g. they correspond to tasks without deadlines but must still be listed).
The formula I am currently using is:
=ARRAYFORMULA((QUERY({DATA RANGE},"SELECT Col1 WHERE Col2 = X OR Col3 = X ORDER BY Col1 LIMIT 10",0))
Seems like there is an easy way to achieve this but I cannot find anything on the topic in other forums. Any help would be greatly appreciated.
Use SORT()
I believe for your example you could make it work like so:
=SORT(ARRAYFORMULA((QUERY({DATA RANGE},"SELECT Col1 WHERE Col2 = X OR Col3 = X",0)), 1, 1) (untested)
If your LIMIT 10 is important, then I think you could wrap the whole thing in another query and re-add the LIMIT.
Illustrated Example:
Range That Needs Querying and Sorting
Formula
Simple version defining a range in which the header is omitted:
=SORT(QUERY(A2:B7, "select *"), 1, 1)
Version that handles headers:
={A1:B1;SORT(QUERY(tabname!A2:B7, "select *"), 1, 1)}
This version creates an array combining the header row and the data rows so it can sort the data rows independently of the header.
Queried and Sorted Results
Breakdown of Formula Components
Array {[range 1]; [range 2]}
SORT() SORT([range], [column to sort on], [sort ascending - true/false or 1/0)
Query() QUERY([range], "[query]")

Pivot Charts in google Sheets by counting non-numeric data?

I have a dataset that I'd like to summarize in chart form. There are about 30 categories whose counts I'd like to display in a bar chart from about 300+ responses. I think a pivot table is probably the best way to do this, but when I create a pivot table and select multiple columns, each new column added gets entered as a sub-set of a previous column. My data looks something like the following
ID Country Age thingA thingB thingC thingD thingE thingF
1 US 5-9 thB thD thF
2 FI 5-9 thA thF
3 GA 5-9 thA thF
4 US 10-14 thC
5 US 10-14 thB thF
6 US 15-18
7 BR 5-9 thA
8 US 15-18 thD thF
9 FI 10-14 thA
So, I'd like to be able to create an interactive chart that showed the counts of "thing" items; I'd then like to be able to filter based upon demographic data (e.g., Country, Age). Notice that the data is non-numeric, so I have to use a CountA to see how many there are in each category.
Is there a simple way to display chart data that summarizes the counts and will allow me to filter based on different criteria?
The query can summarize the data in the form you want. The fact that you have "thA", "thB", etc, instead of "1" complicates the matter, but one can transform the strings to numeric data on the fly.
Assuming the data you've shown is in the cells A1:I10, the following formula will summarize it:
=query({B2:C10, arrayformula(if(len(D2:I10), 1, 0))}, "select Col1, Col2, count(Col3), sum(Col3), sum(Col4), sum(Col5), sum(Col6), sum(Col7) group by Col1, Col2", 0)
Explanation:
{B2:C10, arrayformula(if(len(D2:I10), 1, 0))} creates a table where the first two columns are your B,C (Country, Age) and the other six are filled with 1 or 0 depending on whether the cells in D-I are filled or not.
select Col1, Col2, count(Col3), sum(Col3), ... group by Col1, Col2 selects Country, Age, the total count of rows with this Country-Age combination, the number of rows with thingA for this Country-Age combination, etc.
the last argument, 0, indicates there are no header rows in the table passed to the query.
It's possible to give labels to the columns returned by the query, using label: see query language documentation. It would be something like
label Col1 'Country', Col2 'Age', count(Col3) 'Total count', sum(Col3) 'thingA count', ...
Add a Count column to your data with a "1" for whatever occurrence, this might solve your problem in the Pivot Table. I was just looking for a solution and thought about this. Working now for me.