I have a table like this
Code
A123
B3123
C93485
D345
E29845
The first letter of rows in code column are classified into the following:
Char Category
A-B A
C B
D-E C
I would like to display the output table like this
Category Total Percentage
A 2 0.4%
B 1 0.2%
C 2 0.4%
Total 5 1.0%
I'm not sure how to start. Any hints or help is much appreciated
Here is one option:
SELECT
CASE WHEN SUBSTR(Code, 1, 1) IN ('A', 'B') THEN 'A'
WHEN SUBSTR(Code, 1, 1) = 'C' THEN 'B'
ELSE 'C' END AS Category,
COUNT(*) AS Total,
200.0 * COUNT(*) / SUM(COUNT(*)) OVER () AS Percentage
FROM yourTable
GROUP BY
ROLLUP(CASE WHEN SUBSTR(Code, 1, 1) IN ('A', 'B') THEN 'A'
WHEN SUBSTR(Code, 1, 1) = 'C' THEN 'B'
ELSE 'C' END);
Demo
This approach uses a CASE expression on the first letter of each code to assign a category. Then, we aggregate by category and find the totals, as well as the percentages. Note that ROLLUP is used to generate a total record at the bottom of the result set. As a side effect of this, we multiply by 200%, because the summary row already contains the entire table count, which then gets counted twice.
Related
Let's say I have the following data that represents taxes:
SELECT trunc(i*i, 3) tax
FROM generate_series(1.17, 5) i;
tax
--------
1.368
4.708
10.048
17.388
(4 rows)
Is there any nice way in PostgreSQL to put mill remainder into next line and if current line is the last it must have all leftovers.
So, I need to make it the following:
tax
--------
1.360
4.710
10.050
17.392
(4 rows)
It could be a query or SQL / PL/pgSQL function.
Next row and last row make sense only when the sort order is defined. I assume that the sort order is defined by tax asc.
The first subquery adds row numbers to the data, while the second one calculates the number of rows. The next part is a recursion based on increasing row numbers:
with recursive data as (
select trunc(i*i, 3) tax, row_number() over (order by i) as rn
from generate_series(1.17, 5) i
),
count as (
select count(*)
from data
),
result as (
select
tax, rn,
floor(tax* 100)/100 as new_tax,
tax- floor(tax* 100)/100 as remainder
from data
where rn = 1
union all
select
d.tax, d.rn,
case d.rn
when count then d.tax+ r.remainder
else floor((d.tax+ r.remainder)* 100)/100 end as new_tax,
d.tax+ r.remainder- floor((d.tax+ r.remainder)* 100)/100 as remainder
from data d
join result r on d.rn = r.rn+ 1
cross join count
)
select new_tax as tax
from result
order by rn;
Live demo in rextester.
I'm trying to create a table with the following columns:
I want to use a with recursive table to do this. The following code however is giving the following error:
'ERROR: column "b" does not exist'
WITH recursive numbers AS
(
SELECT 1,2,4 AS a, b, c
UNION ALL
SELECT a+1, b+1, c+1
FROM Numbers
WHERE a + 1 <= 10
)
SELECT * FROM numbers;
I'm stuck because when I just include one column this works perfectly. Why is there an error for multiple columns?
This appears to be a simple syntax issue: You are aliasing the columns incorrectly. (SELECT 1,2,4 AS a, b, c) is incorrect. Your attempt has 5 columns: 1,2,a,b,c
Break it down to just: Select 1,2,4 as a,b,c and you see the error but Select 1 a,2 b,4 c works fine.
b is unknown in the base select because it is being interpreted as a field name; yet no table exists having that field. Additionally the union would fail as you have 5 fields in the base and 3 in the recursive union.
DEMO: http://rextester.com/IUWJ67486
One can define the columns outside the select making it easier to manage or change names.
WITH recursive numbers (a,b,c) AS
(
SELECT 1,2,4
UNION ALL
SELECT a+1, b+1, c+1
FROM Numbers
WHERE a + 1 <= 10
)
SELECT * FROM numbers;
or this approach which aliases the fields internally so the 1st select column's names would be used. (a,b,c) vs somereallylongalias... in union query. It should be noted that not only the name of the column originates from the 1st query in the unioned sets; but also the datatype for the column; which, must match between the two queries.
WITH recursive numbers AS
(
SELECT 1 as a ,2 as b,4 as c
UNION ALL
SELECT a+1 someReallyLongAlias
, b+1 someReallyLongAliasAgain
, c+1 someReallyLongAliasYetAgain
FROM Numbers
WHERE a<5
)
SELECT * FROM numbers;
Lastly, If you truly want to stop at 5 then the where clause should be WHERE a < 5. The image depicts this whereas the query does not; so not sure what your end game is here.
I have a table (only one row) in my PostgreSQL 9.5 db with two columns i.e., count (bigint) and array (text).
count array
6 "112,19.3,142,142,19.3,172,172,20.3,202,202,20.3,232,232,19.3,262,262,19.3,292"
The array represents six (thus count = 6) set of values i.e., Lower_limit, Value and Upper_limit. Now, I need to conditionally modify my array i.e., when upper limit and lower limits are coinciding then select the first upper limit and last lower limit and return the most common value (which is 19.3) among the limits. My desired output would be like:
count array
1 112, 19.3, 292
Could anyone help me to have some pointers towards my desired output?
I must admin - I dont understand how you get count =1, but below is an example of how you can build array with firsrt, last and most common values. Mind if there would be several mos common values it would unpredictably pick on of em
t=#
with a(r) as (values(array[112,19.3,142,142,19.3,172,172,20.3,202,202,20.3,232,232,19.3,262,262,19.3,292]))
, p as (select * from a,unnest(a.r) with ordinality)
, t as (
select count(1) over (partition by unnest)
, unnest u
, r[1] a
, r[array_length(r,1)] e
from p
order by unnest
limit 1
)
select array[a,u,e]
from t
;
array
----------------
{112,19.3,292}
(1 row)
I am building one complex report with T-SQL. User gave me the table that is source for the report and has about 10 milion rows. Table contains descriptive attributes and numeric columns, something like this:
segment product_group gmis lpt numeric_field1 numeric_field2 numeric_field3
Report have about thousand rows, and the definition of the report goes row by row something like this:
'Name of the row one' - sum of the numeric_field1 for segment in 3,4,5 and lpt = 3 and gmis <> 50
'Row number two' - sum of num2 for segment in 1,2,3 and lpt <> 5, and gmis = 7
'Row number 3' - you take num2 + num3 for product_id = 7
'Row number 4' - row 1 + row 2
So I end up with t-sql query that have separate select for each row followed by union all:
'Row number 1' name, (select sum(num1) from source_table where segment in (3,4,5) and lpt=3 and gmis <> 50) value
union
'Row number 2' , (select sum(num2) from source_table where segment in (1,2,3) and lpt<> 5 and gmis = 7)
union
'Row number 3' , (select sum(num2 + num3) from source_table where product_id = 7)
.....
....
etc
Am I missing some smarter way to do this kind of query?? Because the report is very slow....
Assuming that you are always selecting from the same base table and always aggregating without grouping, the following should suit your purpose.
Instead of combining multiple selects, you should aim to perform a set of operations on a single table. Take your where clauses and combine them into a select statement as a set of when clauses to calculate the base values used for aggregation. This means that you only have to read the source table once:
Select
Case When segment in (3,4,5) and lpt = 3 and gmis <> 50 Then num1 Else 0 End As row1,
Case When segment in (1,2,3) and lpt<> 5 and gmis = 7 Then num1 Else 0 End As row2,
Case When product_id = 7 then (num2+num3) Else 0 End As row3
From
source_table
Use this as a table expression and perform the aggregation selecting from the table expression. In this example, I have used a CTE (Common Table Expression):
;With Column_Values
(
row1,
row2,
row3
)
As
(
Select
Case When segment in (3,4,5) and lpt = 3 and gmis <> 50 Then num1 Else 0 End As row1,
Case When segment in (1,2,3) and lpt<> 5 and gmis = 7 Then num1 Else 0 End As row2,
Case When product_id = 7 then (num2+num3) Else 0 End As row3
From
source_table
)
Select
name,
value
From
(
Select
Sum(row1) As [Row number 1],
Sum(row2) As [Row number 2],
Sum(row3) As [Row number 3],
Sum(row1) + Sum(row2) As [Row number 4]
From
Column_Values
)
You now have the values as a series of columns. To convert them to rows, use the unpivot command:
;With Column_Values
(
row1,
row2,
row3
)
As
(
Select
Case When segment in (3,4,5) and lpt = 3 and gmis <> 50 Then num1 Else 0 End As row1,
Case When segment in (1,2,3) and lpt<> 5 and gmis = 7 Then num1 Else 0 End As row2,
Case When product_id = 7 then (num2+num3) Else 0 End As row3
From
source_table
)
Select
name,
value
From
(
Select
Sum(row1) As [Row number 1],
Sum(row2) As [Row number 2],
Sum(row3) As [Row number 3],
Sum(row1 + row2) As [Row number 4]
From
Column_Values
) pvt
Unpivot
(
value For name In
(
[Row number 1],
[Row number 2],
[Row number 3],
[Row number 4]
)
) As upvt
If you are still having performance issues, there are the following options you may wish to consider, which will depend on your requirements:
• If you don't need to report live data you could pre-calculate the values, store them in another table and report from this table.
• If you have the Enterprise edition and don't need to report live data, you could put a columnstore index on the source table - in SQL Server 2012 this makes the table read only, so it would have to dropped and recreated each time new data is loaded into the source table. Columnstore indexes can offer a massive performance boost when aggregating large tables.
• If you need to report live data, you can use an indexed view. This may improve performance when aggregating data in the view.
Is here way to get function like custom aggregate when MAX and SUM is not enough to get result?
Here is my table:
DROP TABLE IF EXISTS temp1;
CREATE TABLE temp1(mydate text, code int, price decimal);
INSERT INTO temp1 (mydate, code, price) VALUES
('01.01.2014 14:32:11', 1, 9.75),
( '', 1, 9.99),
( '', 2, 40.13),
('01.01.2014 09:12:04', 2, 40.59),
( '', 3, 18.10),
('01.01.2014 04:13:59', 3, 18.20),
( '', 4, 10.59),
('01.01.2014 15:44:32', 4, 10.48),
( '', 5, 8.19),
( '', 5, 8.24),
( '', 6, 11.11),
('04.01.2014 10:22:35', 6, 11.09),
('01.01.2014 11:48:15', 6, 11.07),
('01.01.2014 22:18:33', 7, 22.58),
('03.01.2014 13:15:40', 7, 21.99),
( '', 7, 22.60);
Here is query for getting result:
SELECT code,
ROUND(AVG(price), 2),
MAX(price)
FROM temp1
GROUP BY code
ORDER BY code;
In short:
I have to get LAST price by date (written as text) for every grouped code if date exists otherwise (if date isn't written) price should be 0.
In column LAST is wanted result and result of AVG and MAX for illustration:
CODE LAST AVG MAX
------------------------------
1 9.75 9.87 9.99
2 40.59 40.36 40.59
3 18.20 18.15 18.20
4 10.48 10.54 10.59
5 0.00 8.22 8.24
6 11.09 11.09 11.11
7 21.99 22.39 22.60
How would I get wanted result?
How that query would look like?
EDITED
I simply have to try 'IMSoP's advices to update and use custom aggregate functions first/last.
SELECT code,
CASE WHEN MAX(mydate)<>'' THEN
(SELECT last(price ORDER BY TO_TIMESTAMP(mydate, 'DD.MM.YYYY HH24:MI:SS')))
ELSE
0
END AS "LAST",
ROUND(AVG(price), 2) AS "AVG",
MAX(price) AS "MAX"
FROM temp1
GROUP BY code
ORDER BY code;
With this simple query I get same results as with Mike's complex query.
And more, those one better consumes double (same) entries in mydate column, and is faster.
Is this possible? It look's similar to 'SELECT * FROM magic()' :)
You said in comments that one code can have two rows with the same date. So this is sane data.
01.01.2014 1 3.50
01.01.2014 1 17.25
01.01.2014 1 99.34
There's no deterministic way to tell which of those rows is the "last" one, even if you sort by code and "date". (In the relational model--a model based on mathematical sets--the order of columns is irrelevant, and the order of rows is irrelevant.) The query optimizer is free to return rows is the way it thinks best, so this query
select *
from temp1
order by mydate, code
might return this on one run,
01.01.2014 1 3.50
01.01.2014 1 17.25
01.01.2014 1 99.34
and this on another.
01.01.2014 1 3.50
01.01.2014 1 99.34
01.01.2014 1 17.25
Unless you store some value that makes the meaning of last obvious, what you're trying to do isn't possible. When people need to make last obvious, they usually use a timestamp.
After your changes, this query seems to return what you're looking for.
with distinct_codes as (
select distinct code
from temp1
),
corrected_table as (
select
case when mydate <> '' then TO_TIMESTAMP(mydate, 'DD.MM.YYYY HH24:MI:SS')
else null
end as mydate,
code,
price
from temp1
),
max_dates as (
select code, max(mydate) max_date
from corrected_table
group by code
)
select c1.mydate, d1.code, coalesce(c1.price, 0)
from corrected_table c1
inner join max_dates m1
on m1.code = c1.code
and m1.max_date = c1.mydate
right join distinct_codes d1
on d1.code = c1.code
order by code;