How to use a (repeating) aggregate function value with other columns from the table I use the aggregate function on - postgresql

Problem: I have to count the number of times a certain user has a certificate and then return the users name, his number of certificates and the difference between the maximum number of certificates across all users and this specific users number of certificates. I succeeded in the first part (getting the number of certificates) which I'll denote as $query$ (because I have a feeling my problem has something to do with aliasing).
So $query$ looks like this:
User |N_Certificates
Geoff 4
Ann 2
Lisa 0
And my end result should look like this:
User |N_Certificates |Difference
Geoff 4 0
Ann 2 2
Lisa 0 4
I tried this query:
SELECT Sub.name, Sub.N_Certificates,
MAX(Sub.N_Certificates)- Sub.Certificates AS Difference FROM ($_query_$) AS SUB
but it returned a error (because I was trying to use an aggregate function in combination with a column I was not grouping by) or a wrong result (notably, difference=0 for all columns).
I tried a contraption with INNER JOIN on another version of sub (same $query$ code with another alias) but it also didn't work (same reason). I could ofcourse hard code the max but I don't think that's a good solution. My about screen tells me I'm using version 1.18 of pg_Admin.

You can't do it in this way, SQL syntax doesn't allow this.
The easiest way is to use a subquery:
SELECT Sub.name, Sub.N_Certificates,
(SELECT MAX(Sub.N_Certificates) FROM ($_query_$))
-
Sub.Certificates AS Difference
FROM ($_query_$) AS SUB
You can also use a common table expression:
WITH some_alias AS(
SELECT * FROM ($_query_$)
)
SELECT name, N_Certificates.
(SELECT MAX(N_Certificates)FROM some_alias)
-
Certificates AS Difference
FROM some_alias
And you can use a windows function: http://www.postgresql.org/docs/9.1/static/tutorial-window.html
SELECT Sub.name, Sub.N_Certificates,
MAX(Sub.N_Certificates) OVER ()
-
Sub.Certificates AS Difference
FROM ($_query_$) AS SUB

Related

Counting several repeated values in a cloumn

I need to count how many times certain value occurs in a column
It looks like this (select * from znajezyki order by klos)
klos - is a serial number of a person (this value is repeated)
Now I need to create a query that will show me how many people knows 1,2,3 languages
If the same "klos" value is repeated 2 times that means that this person knows 2 languages if it occurs 3 times that means that pearson knows 3 languages and so on
I'd like my result to look something like this:
I tried refering to this post here but i cannot understand it and can't get it to work
I hope y'all can understand what I am talking about :)
First do the simple thing: count how many languages each person knows
SELECT klos, COUNT(*) AS langs
FROM znajezyki
GROUP BY klos
Then use that result in a subquery to count the people by how many languages they know:
SELECT langs, COUNT(*) AS persons
FROM (
SELECT klos, COUNT(*) AS langs
FROM znajezyki
GROUP BY klos
) AS temp
GROUP BY langs

How to dynamically pivot based on rows data and parameter value?

I am trying to pivot using crosstab function and unable to achieve for the requirement. Is there is a way to perform crosstab dynamically and also dynamic result set?
I have tried using crosstab built-in function and unable to meet my requirement.
select * from crosstab ('select item,cd, type, parts, part, cnt
from item
order by 1,2')
AS results (item text,cd text, SUM NUMERIC, AVG NUMERIC);
Sample Data:
ITEM CD TYPE PARTS PART CNT
Item 1 A AVG 4 1 10
Item 1 B AVG 4 2 20
Item 1 C AVG 4 3 30
Item 1 D AVG 4 4 40
Item 1 A SUM 4 1 10
Item 1 B SUM 4 2 20
Item 1 C SUM 4 3 30
Item 1 D SUM 4 4 40
Expected Results:
ITEM CD PARTS TYPE_1 CNT_1 TYPE_1 CNT_1 TYPE_2 CNT_2 TYPE_2 CNT_2 TYPE_3 CNT_3 TYPE_3 CNT_3 TYPE_4 CNT_4 TYPE_4 CNT_4
Item 1 A 4 AVG 10 SUM 10 AVG 20 SUM 20 AVG 30 SUM 30 AVG 40 SUM 40
The PARTS value is based on a parameter passed by the user. If the user passes 2 for example, there will be 4 rows in the result set (2 parts for AVG and 2 parts of SUM).
Can I achieve this requirement using CROSSTAB function or is there a custom SQL statement that need to be developed?
I'm not following your data, so I can't offer examples based on it. But I have been looking at pivot/cross-tab features over the past few days. I was just looking at dynamic cross tabs just before seeing your post. I'm hoping that your question gets some good answers, I'll start off with a bit of background.
You can use the crosstab extension for standard cross tabs, what when wrong when you tried it? Here's an example I wrote for myself the other day with a bunch of comments and aliases for clarity. The pivot is looking at item scans to see where the scans were "to", like the warehouse or the floor.
/* Basic cross-tab example for crosstab (text) format of pivot command.
Notice that the embedded query has to return three columns, see the aliases.
#1 is the row label, it shows up in the output.
#2 is the category, what determines how many columns there are. *You have to work this out in advance to declare them in the return.*
#3 is the cell data, what goes in the cross tabs. Note that this form of the crosstab command may return NULL, and coalesce does not work.
To get rid of the null count/sums/whatever, you need crosstab (text, text).
*/
select *
from crosstab ('select
specialty_name as row_label,
scanned_to as column_splitter,
count(num_inst)::numeric as cell_data
from scan_table
group by 1,2
order by 1,2')
as scan_pivot (
row_label citext,
"Assembly" numeric,
"Warehouse" numeric,
"Floor" numeric,
"QA" numeric);
As a manual alternative, you can use a series of FILTER statements. Here's an example that summaries errors_log records by day of the week. The "down" is the error name, the "across" (columns) are the days of the week.
select "error_name",
count(*) as "Overall",
count(*) filter (where extract(dow from "updated_dts") = 0) as "Sun",
count(*) filter (where extract(dow from "updated_dts") = 1) as "Mon",
count(*) filter (where extract(dow from "updated_dts") = 2) as "Tue",
count(*) filter (where extract(dow from "updated_dts") = 3) as "Wed",
count(*) filter (where extract(dow from "updated_dts") = 4) as "Thu",
count(*) filter (where extract(dow from "updated_dts") = 5) as "Fri",
count(*) filter (where extract(dow from "updated_dts") = 6) as "Sat"
from error_log
where "error_name" is not null
group by "error_name"
order by 1;
You can do the same thing with CASE, but FILTER is easier to write.
It looks like you want something basic, maybe the FILTER solution appeals? It's easier to read than calls to crosstab(), since that was giving you trouble.
FILTER may be slower than crosstab. Probably. (The crosstab extension is written in C, and I'm not sure how smart FILTER is about reading off indexes.) But I'm not sure as I haven't tested it out yet. (It's on my to do list, but I haven't had time yet.) I'd be super interested if anyone can offer results. We're on 11.4.
I wrote a client-side tool to build FILTER-based pivots over the past few days. You have to supply the down and across fields, an aggregate formula and the tool spits out the SQL. With support for coalesce for folks who don't want NULL, ROLLUP, TABLESAMPLE, view creation, and some other stuff. It was a fun project. Why go to that effort? (Apart from the fun part.) Because I haven't found a way to do dynamic pivots that I actually understand. I love this quote:
"Dynamic crosstab queries in Postgres has been asked many times on SO all involving advanced level functions/types. Consider building your needed query in application layer (Java, Python, PHP, etc.) and pass it in a Postgres connected query call. Recall SQL is a special-purpose, declarative type while app layers are general-purpose, imperative types." – Parfait
So, I wrote a tool to pre-calculate and declare the output columns. But I'm still curious about dynamic options in SQL. If that's of interest to you, have a look at these two items:
https://postgresql.verite.pro/blog/2018/06/19/crosstab-pivot.html
Flatten aggregated key/value pairs from a JSONB field?
Deep magic in both.

SUM the NUMC field in SELECT

I need to group a table by the sum of a NUMC-column, which unfortunately seems not to be possible with ABAP / OpenSQL.
My code looks like that:
SELECT z~anln1
FROM zzanla AS z
INTO TABLE gt_
GROUP BY z~anln1 z~anln2
HAVING SUM( z~percent ) <> 100 " percent unfortunately is a NUMC -> summing up not possible
What would be the best / easiest practices here as I cannot alter the table itself?
Unfortunately the NUMC type is described as numerical text, so at the end it lands in the database as VARCHAR and that is why the functions like SUM or AVG cannot be used.
It all depends on how big your table is. If it is rather small you could get the group fields and the values for sum into an internal table and then sum it using COLLECT statement and eventually remove the rows for which the sum is equal 100%.
One solution is to define the field in the table using a more appropriate type.
NUMC is often used for key fields - like document numbers, which there would never be a reason to add together.
I didn't find a smooth solution.
What I did, was to copy everything in an internal table, looped over it converting the NUMC values to DEC values. Grouping and summing up worked at that point.
At the end, I converted the DEC values back to NUMC values.
It's been awhile. I came back to this post, because someone voted up my original answer. I was thinking about editing my old answer but I decided to post a new one. As this question was asked in 2017, there were some restictions but now it can be done by using CAST function in new OpenSQL.
SELECT z~anln1
FROM zzanla AS z
INTO TABLE #gt_
GROUP BY z~anln1, z~anln2
HAVING SUM( CAST( z~percent AS INT4 ) ) <> 100

n-th row in PostgreSQL for p-quantile

I'm trying to fetch the n-th row of a query result. Further posts suggested the use of OFFSET or LIMIT but those forbid the use of variables (ERROR: argument of OFFSET must not contain variables). Further I read about the usage of cursors but I'm not quite sure how to use them even after reading their PostgreSQL manpage. Any other suggestions or examples for how to use cursors?
My main goal is to calculate the p-quantile of a row and since PostgreSQL doesn't provide this function by default I have to write it on my own.
Cheers
The following returns the 5th row of a result set:
select *
from (
select <column_list>,
row_number() over (order by some_sort_column) as rn
) t
where rn = 5;
You have to include an order by because otherwise the concept of "5th row" doesn't make sense.
You mention "use of variable" so I'm not sure what you are actually trying to achive. But you should be able to supply the value 5 as a variable for this query (or even a sub-select).
You might also want to dig further into windowing functions. Because with that you could e.g. do a sum() over the 3 rows before the current row (or similar constructs) - which could also be useful for you.
if you would like to get 10th record, below query also work fine.
select * from table_name order by sort_column limit 1 offset 9
OFFSET simply skip that many rows before beginning to return rows as mentioned in LIMIT clause.

Incredibly slow Materialized View creation when using string aggregation, any performance suggestions?

I've got a load of materialized views, some of them take just a few seconds to create and refresh, whereas others can take me up to 40 minutes to compile, if SQLDeveloper doesn't crash before that.
I need to aggregate some strings in my query, and I have the following function
create or replace
function stragg
( input varchar2 )
return varchar2
deterministic
parallel_enable
aggregate using stragg_type
;
Then, in my MV I use a select statement such as
SELECT
hse.refno,
STRAGG (DISTINCT per.person_name) as PERSONS
FROM
HOUSES hse,
PERSONS per
This is great, because it gives me the following :
refno persons
1 Dave, John, Mary
2 Jack, Jill
Instead of :
refno persons
1 Dave
1 John
1 Mary
2 Jack
2 Jill
It seems that when I use this STRAGG function, the time it takes to create/refresh an MV increases dramatically. Is there an alternative method to achieve a comma separate list of values? I use this throughout my MVs so it is quite a required feature for me
Thanks
There are a number of techniques for string aggregation at the link below. They might provide better performance for you.
http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php