T-SQL Error converting data type on join, on non-joining field - tsql

A query is written to join data from two tables and group it. If the two lines defining the join are commented out, the query returns correctly. Here is the query (with the join commented out):
SELECT tblTurbineLocations.TurbineLayoutProjectID as ProjectID
,TurbineLayoutNumber
,Count(HubHeight) as NumTurbines
,tblTurbineLocations.WTCode
FROM [TurbineLayout].[dbo].[tblTurbineLocations]
--LEFT OUTER JOIN [TurbineModel].[dbo].[tblTurbineModels]
--ON str(tblTurbineLocations.WTCode) = str(tblTurbineModels.WTCode) --Need to force string conversion to avoid data type conflict.
WHERE tblTurbineLocations.TurbineLayoutProjectID = 2255
AND tblTurbineLocations.TurbineLayoutNumber IN (406, 407)
GROUP BY tblTurbineLocations.TurbineLayoutProjectID ,tblTurbineLocations.TurbineLayoutNumber ,tblTurbineLocations.WTCode
Removing the two commented join lines then attempts a join on the WTCode field. This returns the following error:
Msg 8114, Level 16, State 5, Line 2
Error converting data type nvarchar to float.
The error points to line 2, rather than the line containing the join. The error raises that nvarchar cannot be converted to float. However the column in line 2, tblTurbineLocations.TurbineLayoutProjectID, is not an nvarchar; it is an int:
Reviewing the other columns in the query, none are of type nvarchar save for the joining column, WTCode (nvarchar(11) in one table, nvarchar(5) in the other). Both are cast as strings to avoid a different error (that is resolved by casting as str):
Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in the equal to operation.
WTCode is not being cast as a float in the query.
What is the error in my code or my approach?

As indicated by Larnu in the comments:
The issue is that the "str" function is expecting a float, thus is returning the error when it is fed an nvarchar. To solve the collation conflict, COLLATE can be used after the join to confirm what collation should be used for the nvarchar fields. Thus the following works:
SELECT tblTurbineLocations.TurbineLayoutProjectID as ProjectID
,TurbineLayoutNumber
,Count(HubHeight) as NumTurbines
,tblTurbineLocations.WTCode
FROM [TurbineLayout].[dbo].[tblTurbineLocations]
LEFT OUTER JOIN [TurbineModel].[dbo].[tblTurbineModels]
ON tblTurbineLocations.WTCode = tblTurbineModels.WTCode
COLLATE Latin1_General_CS_AS_KS_WS
WHERE tblTurbineLocations.TurbineLayoutProjectID = 2255
AND tblTurbineLocations.TurbineLayoutNumber IN (406, 407)
GROUP BY tblTurbineLocations.TurbineLayoutProjectID ,tblTurbineLocations.TurbineLayoutNumber ,tblTurbineLocations.WTCode

Related

Create rows from part of column names

Source data
I am working on an ELT project to load data from CSV files into PostgreSQL where I will transform it. The CSV files have many columns that are consistent across files, but also contain activity columns that are inconsistent with names like Date (05/19/2020), Type (05/19/2020), etc.
In the loading script I am merging all of the columns with dates in the column name into one jsonb column so I don't have to constantly add new columns to the raw data table.
The resulting jsonb column in the raw data table looks like this:
id
activity
12345678
{"Date (05/19/2020)": null, "Type (05/19/2020)": null, "Date (06/03/2020)": "06/01/2020", "Type (06/03/2020)": "E"}
98765432
{"Date (05/19/2020)": "05/18/2020", "Type (05/19/2020)": "B", "Date (10/23/2020)": "10/26/2020", "Type (10/23/2020)": "T"}
JSON to columns
Using the amazing create_jsonb_flat_view function from this post I can convert the jsonb to columns like this:
id
Date (05/19/2020)
Type (05/19/2020)
Date (06/03/2020)
Type (06/03/2020)
Type (10/23/2020
Date (10/23/2020)
Type (10/23/2020)
10629465
null
null
06/01/2020
E
98765432
05/18/2020
B
10/26/2020
T
Need to move part of column name to row
Now, this is where I'm stuck. I need to remove the portion of the column name that is the Activity Date (e.g. (05/19/2020)) and create a row for each id and ActivityDate with additional columns for Date and Type like this:
id
ActivityDate
Date
Type
12345678
05/19/2020
null
null
12345678
06/03/2020
06/01/2020
E
98765432
05/19/2020
05/18/2020
B
98765432
10/23/2020
10/26/2020
T
I followed your link to the create_jsonb_flat_view article yesterday and then forgot this question. While I thank you for pointing me there, I think that mentioning it worked against you.
A more conventional approach using regexp_replace() works here. I left the date values as strings, but you can convert them with to_date() if needed:
with parse as (
select id, e.k, e.v,
regexp_replace(e.k, '\s+\([0-9/]{10}\)', '') as k_no_date,
regexp_replace(e.k, '^.+([0-9/]{10}).+', '\1') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;
db<>fiddle here
#Mike-Organek's Answer works beautifully!
However, I was curious if the regexp_replace() calls might be slowing the query down a bit and it seemed I could get the same results using a simpler function.
Since Mike gave me a great example to start with I modified it to split on the space between Date and (05/19/2020).
For 20,000 rows, it went from taking an avg of 7 sec on my local machine to an avg of .9 sec.
Here is the resulting query:
with parse as (
select id, e.k, e.v,
split_part(e.k, ' ', 1) as k_no_date,
trim(split_part(e.k, ' ', 2),'()') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;

Literal SQL works: Array value must start with "{" or dimension information

I am trying to add an ARRAY to an existing jsonb ARRAY. This array will be added to the ARRAY[0] of the existing array. When I hardcode the details it's working but when I try to do it dynamically it fails with the above error. what am I doing wrong?
Postgresql 13 db server version
with whatposition as (select position pos from users cross join lateral
jsonb_array_elements(user_details->'Profile') with ordinality arr(elem,position)
where display_ok=false)
update users set user_details=jsonb_set(
user_details,concat('ARRAY[''userProfile'',''',(select pos-1 from whatposition)::text,'''',',''DocumentDetails'']')::text[],
'[{"y":"supernewValue"}]')
where display_ok=false;
SQL Error [22P02]: ERROR: malformed array literal:
"ARRAY['userProfile','0','DocumentDetails']" Detail: Array value
must start with "{" or dimension information.
This is the with subquery output.
with whatposition as (select position pos from users cross join lateral
jsonb_array_elements(user_details->'userProfile') with ordinality arr(elem,position)
where display_ok=false)
select concat('ARRAY[''userProfile'',''',(select pos-1 from whatposition)::text,'''',',''DocumentDetails'']');
OUTPUT OF THE ABOVE SQL
ARRAY['userProfile','0','DocumentDetails']
But when I pass the value as a literal to the above SQL it works just fine.
with whatposition as (select position pos from users cross join lateral
jsonb_array_elements(user_details->'userProfile') with ordinality arr(elem,position)
where display_ok=false)
update users set user_details=jsonb_set(
user_details,ARRAY['userProfile','0','DocumentDetails'],'[{"y":"cccValue"}]')
where display_ok=false;
You shouldn't put the ARRAY[…] syntax in a literal value.
with whatposition as (
select position pos
from users
cross join lateral jsonb_array_elements(user_details->'Profile') with ordinality arr(elem,position)
where display_ok=false
)
update users
set user_details=jsonb_set(
user_details,
ARRAY['userProfile', (select pos-1 from whatposition)::text, 'DocumentDetails'],
'[{"y":"supernewValue"}]'
)
where display_ok=false;
The query you are trying is broken beyond the superficial syntax error (which is addressed by Bergi).
If the CTE returns multiple rows (as expected), the ARRAY constructor will fail because the nested subselect is only allowed to return a single value in this place.
To "upsert" (insert or update) the property "DocumentDetails": [{"y": "cccValue"}]} to the first element (the one with subscript 0) of the nested JSON array user_details->'userProfile':
Postgres 14 or later
Make use of JSONB subscripting:
UPDATE users
SET user_details['userProfile'][0]['DocumentDetails'] = '[{"y":"cccValue"}]'
WHERE display_ok = FALSE;
Postgres 13
Use jsonb_set() - exactly like you already have in your last code example, only without the unneeded CTE:
UPDATE users
SET user_details = jsonb_set(user_details, '{userProfile, 0, DocumentDetails}', '[{"y":"cccValue"}]')
WHERE display_ok = FALSE;
db<>fiddle here

Postgresql - Interpreted type for NULL is wrong

I have the problem with the following CTE expression because prev_count in new_values is being interpreted as text, but the column I'm updating in counts is type integer. I'm getting this error on the marked line:
ERROR: column "prev_count" is of type integer but expression is of type text
LINE 12: prev_count = new_values.prev_count
Here's the query:
WITH
new_values (word,count,txid,prev_count) AS (
VALUES ('cat',1,5,NULL)),
updated AS (
UPDATE
counts t
SET
count = new_values.count,
txid = new_values.txid,
prev_count = new_values.prev_count -- ERROR HERE
FROM
new_values
WHERE (
t.word = new_values.word
)
RETURNING t.*)
INSERT INTO counts(
word,count,txid,prev_count
) SELECT
word,count,txid,prev_count FROM new_values
WHERE NOT EXISTS (
SELECT 1 FROM updated WHERE (updated.word = new_values.word))
My question is, what's an elegant way to fix the error? I would rather specify the type of prev_count in new_values instead of adding an explicit cast, but I don't see anything like that in the docs.
Adding this here as an explicit answer along with a detailed explanation.
The fix is:
WITH
new_values (word,count,txid,prev_count) AS (
VALUES ('cat',1,5,NULL::text)),
As a_horse_with_no_name suggested in the comments.
Why is this necessary? Because the row specification comes from the VALUES section and NULL is unknown. In this case PostgreSQL helpfully casts to text. But that is not what you want so you have to give a type to the NULL.
This often comes up in other cases too, such as UNION statements where a NULL in the first segment in the column list can be given an implicit type which clashes with the type of the column in another segment. So this is a tricky corner worth knowing about.

What does a column assignment using an aggregate in the columns area of a select do?

I'm trying to decipher another programmer's code who is long-gone, and I came across a select statement in a stored procedure that looks like this (simplified) example:
SELECT #Table2.Col1, Table2.Col2, Table2.Col3, MysteryColumn = CASE WHEN y.Col3 IS NOT NULL THEN #Table2.MysteryColumn - y.Col3 ELSE #Table2.MysteryColumn END
INTO #Table1
FROM #Table2
LEFT OUTER JOIN (
SELECT Table3.Col1, Table3.Col2, Col3 = SUM(#Table3.Col3)
FROM Table3
INNER JOIN #Table4 ON Table4.Col1 = Table3.Col1 AND Table4.Col2 = Table3.Col2
GROUP BY Table3.Col1, Table3.Col2
) AS y ON #Table2.Col1 = y.Col1 AND #Table2.Col2 = y.Col2
WHERE #Table2.Col2 < #EnteredValue
My question, what does the fourth column of the primary selection do? does it produce a boolean value checking to see if the values are equal? or does it set the #Table2.MysteryColumn equal to some value and then inserts it into #Table1? Or does it just update the #Table2.MysteryColumn and not output a value into #Table1?
This same thing seems to happen inside of the sub-query on the third column, and I am equally at a loss as to what that does as well.
MysteryColumn = gives the expression a name also called a column alias. The fact that a column in the table#2 also has the same name is besides the point.
Since it uses INTO syntax it also gives the column its name in the resulting temporary table. See the SELECT CLAUSE and note | column_alias = expression and the INTO CLAUSE

Postgres query error

I have a query in postgres
insert into c_d (select * from cd where ak = '22019763');
And I get the following error
ERROR: column "region" is of type integer but expression is of type character varying
HINT: You will need to rewrite or cast the expression.
An INSERT INTO table1 SELECT * FROM table2 depends entirely on order of the columns, which is part of the table definition. It will line each column of table1 up with the column of table2 with the same order value, regardless of names.
The problem you have here is whatever column from cd with the same order value as c_d of the table "region" has an incompatible type, and an implicit typecast is not available to clear the confusion.
INSERT INTO SELECT * statements are stylistically bad form unless the two tables are defined, and will forever be defined, exactly the same way. All it takes is for a single extra column to get added to cd, and you'll start getting errors about extraneous extra columns.
If it is at all possible, what I would suggest is explicitly calling out the columns within the SELECT statement. You can call a function to change type within each of the column references (or you could define a new type cast to do this implicitly -- see CREATE CAST), and you can use AS to set the column label to match that of your target column.
If you can't do this for some reason, indicate that in your question.
Check out the PostgreSQL insert documentation. The syntax is:
INSERT INTO table [ ( column [, ...] ) ]
{ DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) | query }
which here would look something like:
INSERT INTO c_d (column1, column2...) select * from cd where ak = '22019763'
This is the syntax you want to use when inserting values from one table to another where the column types and order are not exactly the same.