Trying to use a CTE calculation to update a column

Trying to use a CTE calculation to update a column - postgresql

I am trying to update column3, based on a calculation which is happening between column1 and column2. The theory is relatively simple, however I seem to be struggling with CTE's. If column1 is not null, then column1 * AVG(column2) gets put in column3.
I have searched the forums and tried a few different methods, including CTE and standard UPDATE queries. I seem to be making a mistake.
WITH cte_avg1 AS (
SELECT "column1" * AVG("column2") AS avg
FROM table1
)
UPDATE table1
SET "column3" = cte_avg1.avg
FROM cte_avg1
WHERE "column1" IS NOT NULL;
The error message which I am getting is as follows;
ERROR: column must appear in the GROUP BY clause or be used in an aggregate function
LINE 5: SELECT "column1" * AVG("column2"...

In an aggregating query all columns after SELECT must either be in the GROUP BY clause or a parameter to an aggregation function. Move the multiplication out of the CTE.
WITH cte_avg1
AS
(
SELECT avg(column2) avg
FROM table1
)
UPDATE table1
SET column3 = column1 * cte_avg1.avg
FROM cte_avg1
WHERE column1 IS NOT NULL;

Related

In Sql Server 2008, can I INSERT multiple rows with some fixed column values and some from a SELECT statement that uses one of the fixed values?

I’m building an insert statement dynamically to add multiple rows of data to a table in a batch, as I believe it is more efficient than inserting one at a time. However, I need the last couple of columns in each inserted row to be set with the results of querying another table using a value from the new row. This is my imaginary pseudocode version:
INSERT INTO TableA (column1, column2, column3, column4, column5)
VALUES (SELECT {value1a}, {value1b}, {value1c}, b.column1, b.column2 FROM TableB b WHERE b.column3 = {value1c}),
(SELECT {value2a}, {value2b}, {value2c}, b.column1, b.column2 FROM TableB b WHERE b.column3 = {value2c}),
…
Now here is another wrinkle: I have a unique index on TableA with an ignore clause, and there is a lot of redundant data to process, so only about 15% of the rows in any given batch insert will actually be added to the database. Does this mean it would be more efficient to insert the rows with values for columns 1 – 3, then query for the rows that were inserted, and update column 4 and 5? If so, would the following be the most efficient way to do that for all the inserted rows?
UPDATE a SET a.column4 = b.column1, a.column5 = b.column2
FROM TableA a INNER JOIN TableB b ON b.column3 = a.column3
WHERE a.CreatedAt >= {BatchInsertTime}
(assuming no other processes are adding rows to the table)

For better efficiency and a simpler way to join TableB, send all the TableA rows in a JSON doc, eg
insert into TableA (column1, column2, column3, column4, column5) …
select d.*, b.column1 column4, b.column2 column5
from openjson(#json)
with
(
column1 varchar(20),
column2 int,
column3 varchar(20)
) as d
left join TableB b
on b.column3 = d.column2
where #json is an NVARCHAR(MAX) parameter that looks like
[
{"column1":"foo", "column2":3,"column3":"bar" },
{"column1":"foo2", "column2":4,"column3":"bar2" },
. . .
]

Using SELECT sub-query inside jsonb_array_elements_text function in Postgres

I have the following query
SELECT DISTINCT ON (user_id) user_id, timestamp
FROM entries
WHERE user_id in (1,2)
AND entry_type IN(
SELECT jsonb_array_elements_text(
SELECT entry_types
FROM users INNER JOIN orgs
ON org_id = orgs.id
WHERE users.id = 1
)
);
I'm getting a syntax error at or near select
syntax error at or near "select" LINE 1: ... entry_type in( select
jsonb_array_elements_text(select ent.
The field entry_types is a JSONB field, so I am trying to convert it to text in order to use it in the WHERE IN clause.
PostgreSQL 13.0
This sub-query within jsonb_array_elements_text
SELECT entry_types
FROM users INNER JOIN orgs
ON org_id = orgs.id
WHERE users.id = 1
Returns a single JSONB entry like this:
entry_types
--------------------------------------------
["type1", "type2", "type3"]
I'm simply trying to use the array of text values returned there as the criteria inside the WHERE IN clause.

The syntax error seems to point somewhere else, so maybe I am wrong, but the problem I see is a missing pair of parentheses around the subquery:
jsonb_array_elements_text((SELECT ...))

Sum of a column within the subquery in Postgresql

I have a Postgresql table where I have 2 fields i.e. ID and Name ie column1 and column2 in the SQLFiddle. The default record_count I put for a particular ID is 1. I want to get the record_count for column 1 and sum that record_count by column1.
I tried to use this query but somehow its showing some error.
select sum(column_record) group by column_record ,
* from (select column1,1::int4 as column_record from test) a
Also find the Input/Output screenshot in the form of excel below :
SQL Fiddle for the same :
http://sqlfiddle.com/#!15/12fe9/1

If you're using a window function (you may want to use normal grouping, which is "a lot" more faster and performant), this is the way to do it:
-- create temp table test as (select * from (values ('a', 'b'), ('c', 'd')) a(column1, column2));
select sum(column_record) over (partition by column_record),
* from (select column1, 1::int4 as column_record from test) a;

PostgreSQL - return most common value for all columns in a table

I've got a table with a lot of columns in it and I want to run a query to find the most common value in each column.
Ordinarily for a single column, I'd run something like:
SELECT country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
Does PostgreSQL have a built in function for doing this or can anyone suggest a query I could run to achieve this?

Using the same query, for more than one column you should do:
SELECT *
FROM
(
SELECT country
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) country
,(
SELECT city
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) city
This works for any type and will return all the values in the same row, with the columns having its original name.
For more columns just had more subquerys as:
,(
SELECT someOtherColumn
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) someOtherColumn
Edit:
You could reach it with window functions also. However it will not be better in performance nor in readability.

Starting from PG 9.4 there is aggregate function for this:
mode() WITHIN GROUP (ORDER BY sort_expression)
returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
And for earlier versions, you could create one...
CREATE OR REPLACE FUNCTION mode_array(anyarray)
RETURNS anyelement AS
$BODY$
SELECT a FROM unnest($1) a GROUP BY 1 ORDER BY COUNT(1) DESC, 1 LIMIT 1;
$BODY$
LANGUAGE SQL IMMUTABLE;
CREATE AGGREGATE mode(anyelement)(
SFUNC = array_append, --Function to call for each row. Just builds the array
STYPE = anyarray,
FINALFUNC = mode_array, --Function to call after everything has been added to array
INITCOND = '{}'--Initialize an empty array when starting
) ;
Usage: SELECT mode(column) FROM table;

If I were doing this, I'd write a query like this one:
SELECT 'country', country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
UNION ALL
SELECT 'city', city
FROM USERS
GROUP BY city
ORDER BY count(*) DESC
LIMIT 1
-- etc.
It should be noted this only works if all the columns are of compatible types. If they are not, you'll probably need a different solution.

This window function version will read the users table and the computed table once each. The correlated subquery version will read the users table once for each of the columns. If the columns are many as in the OPs case then my guess is that this is faster. SQL Fiddle
select distinct on (country_count, age_count) *
from (
select
country,
count(*) over(partition by country) as country_count,
age,
count(*) over(partition by age) as age_count
from users
) s
order by country_count desc, age_count desc
limit 1

T-SQL Update table columns using function

I have the following table:
RecordID
Name
Col1
Col2
....
ColN
The RecordID is BIGINT PRIMARY KEY CLUSTERED IDENTITY(1,1) and RecordID and Name are initialized. The other columns are NULLs.
I have a function which returns information about the other columns by Name.
To initialized my table I use the following algorithm:
Create a LOOP
Get a row, select its Name value
Execute the function using the selected name, and store its result
in temp variables
Insert the temp variables in the table
Move to the next record
Is there a way to do this without looping?

Cross apply was basically built for this
SELECT D.deptid, D.deptname, D.deptmgrid
,ST.empid, ST.empname, ST.mgrid
FROM Departments AS D
CROSS APPLY fn_getsubtree(D.deptmgrid) AS ST;
Using APPLY
UPDATE some_table
SET some_row = another_row,
some_row2 = another_row/2
FROM some_table st
CROSS APPLY
(SELECT TOP 1 another_row FROM another_table at WHERE at.shared_id=st.shared_id)
WHERE ...
using cross apply in an update statement

You can simply say the following if you already have the records in the table.
UPDATE MyTable
SET
col1 = dbo.col1Method(Name),
col2 = dbo.col2Method(Name),
...
While inserting new records, assuming RecordID is auto-generated, you can say
INSERT INTO MyTable(Name, Col1, Col2, ...)
VALUES(#Name, dbo.col1Method(#Name), dbo.col2Method(#name), ...)
where #Name contains the value for the Name column.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Trying to use a CTE calculation to update a column - postgresql

Related

In Sql Server 2008, can I INSERT multiple rows with some fixed column values and some from a SELECT statement that uses one of the fixed values?

Using SELECT sub-query inside jsonb_array_elements_text function in Postgres

Sum of a column within the subquery in Postgresql

PostgreSQL - return most common value for all columns in a table

T-SQL Update table columns using function

Categories

Resources