Counting several repeated values in a cloumn - postgresql

I need to count how many times certain value occurs in a column
It looks like this (select * from znajezyki order by klos)
klos - is a serial number of a person (this value is repeated)
Now I need to create a query that will show me how many people knows 1,2,3 languages
If the same "klos" value is repeated 2 times that means that this person knows 2 languages if it occurs 3 times that means that pearson knows 3 languages and so on
I'd like my result to look something like this:
I tried refering to this post here but i cannot understand it and can't get it to work
I hope y'all can understand what I am talking about :)

First do the simple thing: count how many languages each person knows
SELECT klos, COUNT(*) AS langs
FROM znajezyki
GROUP BY klos
Then use that result in a subquery to count the people by how many languages they know:
SELECT langs, COUNT(*) AS persons
FROM (
SELECT klos, COUNT(*) AS langs
FROM znajezyki
GROUP BY klos
) AS temp
GROUP BY langs

Related

How to select distinct combinations in T-SQL

I'm using SQL in Devexpress dashboard designer. I want to select distinct combinations of two parameters.
Perhaps Devexpress uses Transact-SQL but at the same time GROUP BY clause never works for me.
At the same time DISTINCT BY somehowe doesn't work as well.
Example:
There are two IDs 11 and 22
And there are two values of Date for 11, as an example: 21.01.2000 and 22.01.2000. And there's one for 22 as an example: 23.05.2008
Problem here is that I can't coose DISTINCT by date because there are many other IDs which have the same dates.
So I expect to have one distinct combination of ID and Date.
Does anyone faced with the same problem, can you advice any solution / code example?
Using select distinct will filter duplicates if you leave unique row properties out of the selected fields.
so:
Mike Smit
Mike Smit
Will be reduced to
Mike Smit
But if you're also asking for a PK like a Id field you get the following because id makes both rows distinct
1 Mike Smit
2 Mike Smit
Does this help?

Join data from 2 String dimensions in Tableau

I am wondering if any of you can help me on my problem.
I have a table containing money exchanges between individuals. Thus, the table is composed of columns ID A and ID B, which are unique IDs, and another column with an integer, a price.
My problem is that I want to perform the sum of the integer for a precise individual and I can find the same individual either in column ID A or ID B because the software is putting IDs in random columns. Therefore, I have 2 dimensions ID A and ID B.
I have some experience in Tableau but I am in a dead end on this one.
Do you have any idea ?
Thanks a lot !
Julien
If you only need to sum one individual at a time, use a parameter for the IDs.
Something like the following:
sum(IF [PARAMETER_ID] = [ID_A] THEN [PRICE] END)
+
sum(IF [PARAMETER_ID] = [ID_B] THEN [PRICE] END)
Matt got the answer. Make a custom SQL request to fuse the 2 ID columns. In the end you have the double of columns but hey that's what I wanted ;)
Also, it seems to be the most reasonable way to solve this.

How to dynamically pivot based on rows data and parameter value?

I am trying to pivot using crosstab function and unable to achieve for the requirement. Is there is a way to perform crosstab dynamically and also dynamic result set?
I have tried using crosstab built-in function and unable to meet my requirement.
select * from crosstab ('select item,cd, type, parts, part, cnt
from item
order by 1,2')
AS results (item text,cd text, SUM NUMERIC, AVG NUMERIC);
Sample Data:
ITEM CD TYPE PARTS PART CNT
Item 1 A AVG 4 1 10
Item 1 B AVG 4 2 20
Item 1 C AVG 4 3 30
Item 1 D AVG 4 4 40
Item 1 A SUM 4 1 10
Item 1 B SUM 4 2 20
Item 1 C SUM 4 3 30
Item 1 D SUM 4 4 40
Expected Results:
ITEM CD PARTS TYPE_1 CNT_1 TYPE_1 CNT_1 TYPE_2 CNT_2 TYPE_2 CNT_2 TYPE_3 CNT_3 TYPE_3 CNT_3 TYPE_4 CNT_4 TYPE_4 CNT_4
Item 1 A 4 AVG 10 SUM 10 AVG 20 SUM 20 AVG 30 SUM 30 AVG 40 SUM 40
The PARTS value is based on a parameter passed by the user. If the user passes 2 for example, there will be 4 rows in the result set (2 parts for AVG and 2 parts of SUM).
Can I achieve this requirement using CROSSTAB function or is there a custom SQL statement that need to be developed?
I'm not following your data, so I can't offer examples based on it. But I have been looking at pivot/cross-tab features over the past few days. I was just looking at dynamic cross tabs just before seeing your post. I'm hoping that your question gets some good answers, I'll start off with a bit of background.
You can use the crosstab extension for standard cross tabs, what when wrong when you tried it? Here's an example I wrote for myself the other day with a bunch of comments and aliases for clarity. The pivot is looking at item scans to see where the scans were "to", like the warehouse or the floor.
/* Basic cross-tab example for crosstab (text) format of pivot command.
Notice that the embedded query has to return three columns, see the aliases.
#1 is the row label, it shows up in the output.
#2 is the category, what determines how many columns there are. *You have to work this out in advance to declare them in the return.*
#3 is the cell data, what goes in the cross tabs. Note that this form of the crosstab command may return NULL, and coalesce does not work.
To get rid of the null count/sums/whatever, you need crosstab (text, text).
*/
select *
from crosstab ('select
specialty_name as row_label,
scanned_to as column_splitter,
count(num_inst)::numeric as cell_data
from scan_table
group by 1,2
order by 1,2')
as scan_pivot (
row_label citext,
"Assembly" numeric,
"Warehouse" numeric,
"Floor" numeric,
"QA" numeric);
As a manual alternative, you can use a series of FILTER statements. Here's an example that summaries errors_log records by day of the week. The "down" is the error name, the "across" (columns) are the days of the week.
select "error_name",
count(*) as "Overall",
count(*) filter (where extract(dow from "updated_dts") = 0) as "Sun",
count(*) filter (where extract(dow from "updated_dts") = 1) as "Mon",
count(*) filter (where extract(dow from "updated_dts") = 2) as "Tue",
count(*) filter (where extract(dow from "updated_dts") = 3) as "Wed",
count(*) filter (where extract(dow from "updated_dts") = 4) as "Thu",
count(*) filter (where extract(dow from "updated_dts") = 5) as "Fri",
count(*) filter (where extract(dow from "updated_dts") = 6) as "Sat"
from error_log
where "error_name" is not null
group by "error_name"
order by 1;
You can do the same thing with CASE, but FILTER is easier to write.
It looks like you want something basic, maybe the FILTER solution appeals? It's easier to read than calls to crosstab(), since that was giving you trouble.
FILTER may be slower than crosstab. Probably. (The crosstab extension is written in C, and I'm not sure how smart FILTER is about reading off indexes.) But I'm not sure as I haven't tested it out yet. (It's on my to do list, but I haven't had time yet.) I'd be super interested if anyone can offer results. We're on 11.4.
I wrote a client-side tool to build FILTER-based pivots over the past few days. You have to supply the down and across fields, an aggregate formula and the tool spits out the SQL. With support for coalesce for folks who don't want NULL, ROLLUP, TABLESAMPLE, view creation, and some other stuff. It was a fun project. Why go to that effort? (Apart from the fun part.) Because I haven't found a way to do dynamic pivots that I actually understand. I love this quote:
"Dynamic crosstab queries in Postgres has been asked many times on SO all involving advanced level functions/types. Consider building your needed query in application layer (Java, Python, PHP, etc.) and pass it in a Postgres connected query call. Recall SQL is a special-purpose, declarative type while app layers are general-purpose, imperative types." – Parfait
So, I wrote a tool to pre-calculate and declare the output columns. But I'm still curious about dynamic options in SQL. If that's of interest to you, have a look at these two items:
https://postgresql.verite.pro/blog/2018/06/19/crosstab-pivot.html
Flatten aggregated key/value pairs from a JSONB field?
Deep magic in both.

How to use a (repeating) aggregate function value with other columns from the table I use the aggregate function on

Problem: I have to count the number of times a certain user has a certificate and then return the users name, his number of certificates and the difference between the maximum number of certificates across all users and this specific users number of certificates. I succeeded in the first part (getting the number of certificates) which I'll denote as $query$ (because I have a feeling my problem has something to do with aliasing).
So $query$ looks like this:
User |N_Certificates
Geoff 4
Ann 2
Lisa 0
And my end result should look like this:
User |N_Certificates |Difference
Geoff 4 0
Ann 2 2
Lisa 0 4
I tried this query:
SELECT Sub.name, Sub.N_Certificates,
MAX(Sub.N_Certificates)- Sub.Certificates AS Difference FROM ($_query_$) AS SUB
but it returned a error (because I was trying to use an aggregate function in combination with a column I was not grouping by) or a wrong result (notably, difference=0 for all columns).
I tried a contraption with INNER JOIN on another version of sub (same $query$ code with another alias) but it also didn't work (same reason). I could ofcourse hard code the max but I don't think that's a good solution. My about screen tells me I'm using version 1.18 of pg_Admin.
You can't do it in this way, SQL syntax doesn't allow this.
The easiest way is to use a subquery:
SELECT Sub.name, Sub.N_Certificates,
(SELECT MAX(Sub.N_Certificates) FROM ($_query_$))
-
Sub.Certificates AS Difference
FROM ($_query_$) AS SUB
You can also use a common table expression:
WITH some_alias AS(
SELECT * FROM ($_query_$)
)
SELECT name, N_Certificates.
(SELECT MAX(N_Certificates)FROM some_alias)
-
Certificates AS Difference
FROM some_alias
And you can use a windows function: http://www.postgresql.org/docs/9.1/static/tutorial-window.html
SELECT Sub.name, Sub.N_Certificates,
MAX(Sub.N_Certificates) OVER ()
-
Sub.Certificates AS Difference
FROM ($_query_$) AS SUB

Greatest n per group with multiple criteria for greatest

I need to select the largest, most recent or currently active term across a number of schools, with the assumption that is possible for a school to have multiple concurrent terms (ie, one term that honors students are registered in, and another for non honors). Also need to take into account the end date, as the honors term may have the same start date but may be year long instead of just a semester, and I want the semester.
Code looks something like this:
SELECT t.school_id, t.term_id, COUNT(s.id) AS size, t.start_date, t.end_date
FROM term t
INNER JOIN students s ON t.term_id = s.term_id
WHERE t.school_id = (some school id)
GROUP BY t.school_id, t.term_id
ORDER BY t.start_date DESC, t.end_date ASC, size DESC LIMIT 1;
This works perfectly to find the largest currently or most recently active term, but I want to be able to eliminate the WHERE t.school_id = (some school id) part.
A standard greatest n per group can easily choose the largest OR most recent term, but I need to select the most recent term that ends soonest with the largest number of students.
Not sure I am interpreting your question correctly. Would be easier if you had supplied table definitions including primary and foreign keys.
If you want the the most recent term that ends soonest with the largest number of students per school, this might do it:
SELECT DISTINCT ON (t.school_id)
t.school_id, t.term_id, s.size, t.start_date, t.end_date
FROM term t
JOIN (
SELECT term_id, COUNT(s.id) AS size
FROM students
GROUP BY term_id
) s USING (term_id)
ORDER BY t.school_id, t.start_date DESC, t.end_date, size DESC;
More explanation for DISTINCT ON in this related answer:
Select first row in each GROUP BY group?