Sum of a column within the subquery in Postgresql - postgresql

I have a Postgresql table where I have 2 fields i.e. ID and Name ie column1 and column2 in the SQLFiddle. The default record_count I put for a particular ID is 1. I want to get the record_count for column 1 and sum that record_count by column1.
I tried to use this query but somehow its showing some error.
select sum(column_record) group by column_record ,
* from (select column1,1::int4 as column_record from test) a
Also find the Input/Output screenshot in the form of excel below :
SQL Fiddle for the same :
http://sqlfiddle.com/#!15/12fe9/1

If you're using a window function (you may want to use normal grouping, which is "a lot" more faster and performant), this is the way to do it:
-- create temp table test as (select * from (values ('a', 'b'), ('c', 'd')) a(column1, column2));
select sum(column_record) over (partition by column_record),
* from (select column1, 1::int4 as column_record from test) a;

Related

How to repeat a query in postgreSQL, and add the results to a table

I need to run a query 1000 times, and add the results to a table. I have the following code which is what I would like to be repeated:
Select max_gridco, count(max_gridco) as TaxLots, sum(population) as pop_sum
from
(
Select tlid, max_gridco, population
from (
select tlid, max_gridco, population, st_intersects(tle.geom, acres025.geom)
from tle, acres025
where max_gridco not in (0, 1)
order by RANDOM()
limit 1000
) as count
where st_intersects = 't'
order by max_gridco
) as gridcode_count
group by max_gridco;
Is there a way that I can run this 1000 times automatically, and in the output table have a column that includes the run number? So my table would look like the following:
Run number | max_gridco | TaxLots | pop_sum
I am trying to do a Loop command in PgAdmin3, but cannot seem to get the syntax correct.
use DO block or create a function, eg:
DO
$$
begin
for i in 1..1000 loop
insert into save_to_table (Run number,max_gridco,TaxLots,pop_sum)
Select i, max_gridco, count(max_gridco) as TaxLots, sum(population) as pop_sum
from
(
Select tlid, max_gridco, population
from (
select tlid, max_gridco, population, st_intersects(tle.geom, acres025.geom)
from tle, acres025
where max_gridco not in (0, 1)
order by RANDOM()
limit 1000
) as count
where st_intersects = 't'
order by max_gridco
) as gridcode_count
group by max_gridco;
end loop;
end;
$$
;
If you need to insert values from a specific select you can do it by using:
INSERT INTO table2 (column1, column2, column3, ...)
SELECT column1, column2, column3, ...
FROM table1
WHERE condition;
(https://www.w3schools.com/sql/sql_insert_into_select.asp)
Is that what you're looking for?

PostgreSQL - return most common value for all columns in a table

I've got a table with a lot of columns in it and I want to run a query to find the most common value in each column.
Ordinarily for a single column, I'd run something like:
SELECT country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
Does PostgreSQL have a built in function for doing this or can anyone suggest a query I could run to achieve this?
Using the same query, for more than one column you should do:
SELECT *
FROM
(
SELECT country
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) country
,(
SELECT city
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) city
This works for any type and will return all the values in the same row, with the columns having its original name.
For more columns just had more subquerys as:
,(
SELECT someOtherColumn
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) someOtherColumn
Edit:
You could reach it with window functions also. However it will not be better in performance nor in readability.
Starting from PG 9.4 there is aggregate function for this:
mode() WITHIN GROUP (ORDER BY sort_expression)
returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
And for earlier versions, you could create one...
CREATE OR REPLACE FUNCTION mode_array(anyarray)
RETURNS anyelement AS
$BODY$
SELECT a FROM unnest($1) a GROUP BY 1 ORDER BY COUNT(1) DESC, 1 LIMIT 1;
$BODY$
LANGUAGE SQL IMMUTABLE;
CREATE AGGREGATE mode(anyelement)(
SFUNC = array_append, --Function to call for each row. Just builds the array
STYPE = anyarray,
FINALFUNC = mode_array, --Function to call after everything has been added to array
INITCOND = '{}'--Initialize an empty array when starting
) ;
Usage: SELECT mode(column) FROM table;
If I were doing this, I'd write a query like this one:
SELECT 'country', country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
UNION ALL
SELECT 'city', city
FROM USERS
GROUP BY city
ORDER BY count(*) DESC
LIMIT 1
-- etc.
It should be noted this only works if all the columns are of compatible types. If they are not, you'll probably need a different solution.
This window function version will read the users table and the computed table once each. The correlated subquery version will read the users table once for each of the columns. If the columns are many as in the OPs case then my guess is that this is faster. SQL Fiddle
select distinct on (country_count, age_count) *
from (
select
country,
count(*) over(partition by country) as country_count,
age,
count(*) over(partition by age) as age_count
from users
) s
order by country_count desc, age_count desc
limit 1

T-SQL GROUP BY over a dynamic list

I have a table with an XML type column. This column contains a dynamic list of attributes that may be different between records.
I am trying to GROUP BY COUNT over these attributes without having to go through the table separately for each attribute.
For example, one record could have attributes A, B and C and the other would have B, C, D then, when I do the GROUP BY COUNT I would get A = 1, B = 2, C = 2 and D = 1.
Is there any straightforward way to do this?
EDIT in reply to Andrew's answer:
Because my knowledge of this construct is superficial at best I had to fiddle with it to get it to do what I want. In my actual code I needed to group by the TimeRange, as well as only select some attributes depending on their name. I am pasting the actual query below:
WITH attributes AS (
SELECT
Timestamp,
N.a.value('#name[1]', 'nvarchar(max)') AS AttributeName,
N.a.value('(.)[1]', 'nvarchar(max)') AS AttributeValue
FROM MyTable
CROSS APPLY AttributesXml.nodes('/Attributes/Attribute') AS N(a)
)
SELECT Datepart(dy, Timestamp), AttributeValue, COUNT(AttributeValue)
FROM attributes
WHERE AttributeName IN ('AttributeA', 'AttributeB')
GROUP BY Datepart(dy, Timestamp), AttributeValue
As a side-note: Is there any way to reduce this further?
WITH attributes AS (
SELECT a.value('(.)[1]', 'nvarchar(max)') AS attribute
FROM YourTable
CROSS APPLY YourXMLColumn.nodes('//path/to/attributes') AS N(a)
)
SELECT attribute, COUNT(attribute)
FROM attributes
GROUP BY attribute
CROSS APPLY is like being able to JOIN the xml as a table. The WITH is needed because you can't have xml methods in a group clause.
Here is a way to get the attribute data into a way that you can easily work with it and reduce the number of times you need to go through the main table.
--create test data
declare #tmp table (
field1 varchar(20),
field2 varchar(20),
field3 varchar(20))
insert into #tmp (field1, field2, field3)
values ('A', 'B', 'C'),
('B', 'C', 'D')
--convert the individual fields from seperate columns to one column
declare #table table(
field varchar(20))
insert into #table (field)
select field1 from #tmp
union all
select field2 from #tmp
union all
select field3 from #tmp
--run the group by and get the count
select field, count(*)
from #table
group by field

tsql - using internal stored procedure as parameter is where clause

I'm trying to build a stored procedure that makes use of another stored procedure. Taking its result and using it as part of its where clause, from some reason I receive an error:
Invalid object name 'dbo.GetSuitableCategories'.
Here is a copy of the code:
select distinct top 6 * from
(
SELECT TOP 100 *
FROM [dbo].[products] products
where products.categoryId in
(select top 10 categories.categoryid from
[dbo].[GetSuitableCategories]
(
-- #Age
-- ,#Sex
-- ,#Event
1,
1,
1
) categories
ORDER BY NEWID()
)
--and products.Price <=#priceRange
ORDER BY NEWID()
)as d
union
select * from
(
select TOP 1 * FROM [dbo].[products] competingproducts
where competingproducts.categoryId =-2
--and competingproducts.Price <=#priceRange
ORDER BY NEWID()
) as d
and here is [dbo].[GetSuitableCategories] :
if (#gender =0)
begin
select * from categoryTable categories
where categories.gender =3
end
else
begin
select * from categoryTable categories
where categories.gender = #gender
or categories.gender =3
end
I would use an inline table valued user defined function. Or simply code it inline is no re-use is required
CREATE dbo.GetSuitableCategories
(
--parameters
)
RETURNS TABLE
AS
RETURN (
select * from categoryTable categories
where categories.gender IN (3, #gender)
)
Some points though:
I assume categoryTable has no gender = 0
Do you have 3 genders in your categoryTable? :-)
Why do pass in 3 parameters but only use 1? See below please
Does #sex map to #gender?
If you have extra processing on the 3 parameters, then you'll need a multi statement table valued functions but beware these can be slow
You can't use the results of a stored procedure directly in a select statement
You'll either have to output the results into a temp table, or make the sproc into a table valued function to do what you doing.
I think this is valid, but I'm doing this from memory
create table #tmp (blah, blah)
Insert into #tmp
exec dbo.sprocName

Aggregate GREATEST in T-SQL

My SQL is rusty -- I have a simple requirement to calculate the sum of the greater of two column values:
CREATE TABLE [dbo].[Test]
(
column1 int NOT NULL,
column2 int NOT NULL
);
insert into Test (column1, column2) values (2,3)
insert into Test (column1, column2) values (6,3)
insert into Test (column1, column2) values (4,6)
insert into Test (column1, column2) values (9,1)
insert into Test (column1, column2) values (5,8)
In the absence of the GREATEST function in SQL Server, I can get the larger of the two columns with this:
select column1, column2, (select max(c)
from (select column1 as c
union all
select column2) as cs) Greatest
from test
And I was hoping that I could simply sum them thus:
select sum((select max(c)
from (select column1 as c
union all
select column2) as cs))
from test
But no dice:
Msg 130, Level 15, State 1, Line 7
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
Is this possible in T-SQL without resorting to a procedure/temp table?
UPDATE: Eran, thanks - I used this approach. My final expression is a little more complicated, however, and I'm wondering about performance in this case:
SUM(CASE WHEN ABS(column1 * column2) > ABS(column3 * column4)
THEN column5 * ABS(column1 * column2) * column6
ELSE column5 * ABS(column3 * column4) * column6 END)
Try this:
SELECT SUM(CASE WHEN column1 > column2
THEN column1
ELSE column2 END)
FROM test
Try this... Its not the best performing option, but should work.
SELECT
'LargerValue' = CASE
WHEN SUM(c1) >= SUM(c2) THEN SUM(c1)
ELSE SUM(c2)
END
FROM Test
SELECT
SUM(MaximumValue)
FROM (
SELECT
CASE WHEN column1 > column2
THEN
column1
ELSE
column2
END AS MaximumValue
FROM
Test
) A
FYI, the more complicated case should be fine, so long as all of those columns are part of the same table. It's still looking up the same number of rows, so performance should be very similar to the simpler case (as SQL Server performance is usually IO bound).
How to find max from single row data
-- eg (empid , data1,data2,data3 )
select emplid , max(tmp.a)
from
(select emplid,date1 from table
union
select emplid,date2 from table
union
select emplid,date3 from table
) tmp , table
where tmp.emplid = table.emplid
select sum(id) from (
select (select max(c)
from (select column1 as c
union all
select column2) as cs) id
from test
)
The best answer to this is simply put :
;With Greatest_CTE As
(
Select ( Select Max(ValueField) From ( Values (column1), (column2) ) ValueTable(ValueField) ) Greatest
From Test
)
Select Sum(Greatest)
From Greatest_CTE
It scales a lot better than the other answers with more than two value columns.