How to update new empty column with data that depends on mathematical operation of different data types? - amazon-redshift

[beginner]
I have a table that looks like this:
colA colB
1 <null>
2 <null>
3 <null>
colB is the new empty column I added to the table. colA is varchar and colB is double precision data type (float).
I want to update colB with a colA multiplied by 2.
New table should look like this:
colA colB
1 2
2 4
3 6
When I go to update colB like so:
update tablename set colB = colA * 2
I get error:
Invalid operation: Invalid input syntax for type numeric
Ive tried to work around this with solutions like this:
update tablename set colB = COALESCE(colA::numeric::text,'') * 2
but get the same error.
In a select statement on the same table, this works on colA which is varchar:
select colA * 2 from tablename
How can I update a column with mathematical operations with different datatype reference columns? I cant update datatype for colA.

I suppose that Laurenz Albe is correct and there are non-numeric values in col_a
So UPDATE must be guarded:
UPDATE T
SET col_b =
CASE
WHEN col_a ~'^([0-9]+\.?[0-9]*|\.[0-9]+)$' THEN col_a::numeric *2
END ;
-- or this way
UPDATE T
SET col_b = col_a::numeric *2
WHERE
col_a ~'^([0-9]+\.?[0-9]*|\.[0-9]+)$' ;
Look at fiddle: https://www.db-fiddle.com/f/4wFynf9WiEuiE499XMcsCT/1
Recipes for "isnumeric" you can get here: isnumeric() with PostgreSQL

There is a value in the string column that is not a valid number. You will have to fix the data or exclude certain rows with a WHERE condition.
If you say that running the query from your client works, that leads me to suspect that your client doesn't actually execute the whole query, but slaps a LIMIT on it (some client tools do that).
The following query will have to process all rows and should fail:
SELECT colA * 2 AS double
FROM tablename
ORDER BY double;

update tablename set colB = colA::numeric * 2

Related

DB2 - Concat all values in a column into a Single string

Let's say I have a table like this:
test
Col1
Col2
A
1
B
1
C
1
D
2
I am doing query select col1 from test where col2 = 1;
This will return a column with values A B and C in 3 separate rows.
I want the SQL to return a single row with value A|B|C. Is this possible to do? If it is how should I do it?
you can use LISTAGG function like this:
SELECT LISTAGG(col1, ',')
If LISTAGG is not available, it can be reproduced with XMLAGG:
SELECT SUBSTR(XMLSERIALIZE(XMLAGG(XMLTEXT('|'||"Col1"))),2)
FROM "test"
WHERE "Col2" = 1

TSQL Update Column B with Column A's Value that is also being updated

I am processing 155+ Million rows in a table and doing it in batches for performance and ran into a situation where the value in a column didn't get updated as expected. I know I can solve it a couple of different ways but I wanted to see if someone knows how to do it in a single update statement.
Below is a test case:
DECLARE #TempIds TABLE (
ColA int,
ColB int
)
--seed the values for a test
INSERT INTO #TempIds (ColA,ColB) VALUES (1, 2)
--do the update
UPDATE #TempIds SET ColA = (3+5), ColB = (1-ColA)
--see the results
SELECT * FROM #TempIds
The results of the above is:
ColA ColB
8 0
Desired Outcome is
ColA ColB
8 -7
The update statement is using the current value for ColA which is "1" when updating ColB instead of using the final value of ColA being "8".
I know that I can solve this by doing one of the following:
UPDATE #TempIds SET ColA = (3+5)
UPDATE #TempIds SET ColB = (1-ColA)
--see the results
SELECT * FROM #TempIds
OR
UPDATE #TempIds SET ColA = (3+5), ColB = 1-(3+5)
--see the results
SELECT * FROM #TempIds
Either one of the above will result in the following output:
ColA ColB
8 -7
This is a simplified version as the actual query has a lot of computations and formulas to calculate the value for ColA.
I am trying to avoid two update statements or the need to repeat the formula for ColA in the ColB formula.
Thanks in advance!
Could you possibly make use of an Updatable CTE by first creating a new column A and then referencing it:
with updateme as (
select *, (3+5) as NewA
from #TempIds
)
update updateme set colA = NewA, colb = 1-NewA;
select * from #TempIds;
There's the possibility to use a variable.
DECLARE #newcola integer;
UPDATE #tempids
SET #newcola = cola = 3 + 5,
colb = 1 - #newcola;
db<>fiddle

postgresql: How to grab an existing id from a not subsequent ids of a table

Postgresql version 9.4
I have a table with an integer column, which has a number of integers with some gaps, like the sample below; I'm trying to get an existing id from the column at random with the following query, but it returns NULL occasionally:
CREATE TABLE
IF NOT EXISTS test_tbl(
id INTEGER);
INSERT INTO test_tbl
VALUES (10),
(13),
(14),
(16),
(18),
(20);
-------------------------------
SELECT * FROM test_tbl;
-------------------------------
SELECT COALESCE(tmp.id, 20) AS classification_id
FROM (
SELECT tt.id,
row_number() over(
ORDER BY tt.id) AS row_num
FROM test_tbl tt
) tmp
WHERE tmp.row_num =floor(random() * 10);
Please let me know where I'm doing wrong.
but it returns NULL occasionally
and I must add to this that it sometimes returns more than 1 rows, right?
in your sample data there are 6 rows, so the column row_num will have a value from 1 to 6.
This:
floor(random() * 10)
creates a random number from 0 up to 0.9999...
You should use:
floor(random() * 6 + 1)::int
to get a random integer from 1 to 6.
But this would not solve the problem, because the WHERE clause is executed once for each row, so there is a case that row_num will never match the created random number, so it will return nothing, or it will match more than once so it will return more than 1 rows.
See the demo.
The proper (although sometimes not the most efficient) way to get a random row is:
SELECT id FROM test_tbl ORDER BY random() LIMIT 1
Also check other links from SO, like:
quick random row selection in Postgres
You could select one row and order by random(), this way you are ensured to hit an existing row
select id
from test_tbl
order by random()
LIMIT 1;

Does String Value Exists in a List of Strings | Redshift Query

I have some interesting data, I'm trying to query however I cannot get the syntax correct. I have a temporary table (temp_id), which I've filled with the id values I care about. In this example it is only two ids.
CREATE TEMPORARY TABLE temp_id (id bigint PRIMARY KEY);
INSERT INTO temp_id (id) VALUES ( 1 ), ( 2 );
I have another table in production (let's call it foo) which holds multiples those ids in a single cell. The ids column looks like this (below) with ids as a single string separated by "|"
ids
-----------
1|9|3|4|5
6|5|6|9|7
NULL
2|5|6|9|7
9|11|12|99
I want to evaluate each cell in foo.ids, and see if any of the ids in match the ones in my temp_id table.
Expected output
ids |does_match
-----------------------
1|9|3|4|5 |true
6|5|6|9|7 |false
NULL |false
2|5|6|9|7 |true
9|11|12|99 |false
So far I've come up with this, but I can't seem to return anything. Instead of trying to create a new column does_match I tried to filter within the WHERE statement. However, the issue is I cannot figure out how to evaluate all the id values in my temp table to the string blob full of the ids in foo.
SELECT
ids,
FROM foo
WHERE ids = ANY(SELECT LISTAGG(id, ' | ') FROM temp_ids)
Any suggestions would be helpful.
Cheers,
this would work, however not sure about performance
SELECT
ids
FROM foo
JOIN temp_ids
ON '|'||foo.ids||'|' LIKE '%|'||temp_ids.id::varchar||'|%'
you wrap the IDs list into a pair of additional separators, so you can always search for |id| including the first and the last number
The following SQL (I know it's a bit of a hack) returns exactly what you expect as an output, tested with your sample data, don't know how would it behave on your real data, try and let me know
with seq AS ( # create a sequence CTE to implement postgres' unnest
select 1 as i union all # assuming you have max 10 ids in ids field,
# feel free to modify this part
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10)
select distinct ids,
case # since I can't do a max on a boolean field, used two cases
# for 1s and 0s and converted them to boolean
when max(case
when t.id in (
select split_part(ids,'|',seq.i) as tt
from seq
join foo f on seq.i <= REGEXP_COUNT(ids, '|') + 1
where tt != '' and k.ids = f.ids)
then 1
else 0
end) = 1
then true
else false
end as does_match
from temp_id t, foo
group by 1
Please let me know if this works for you!

T-SQL Select between two ranges and in a list

I know this has to be an easy one, but I have been searching and cannot locate why my logic is wrong.
I have a select statement like below.
SELECT * from MyTable where Column1 between 1 and 5 or Column1 between 10
and 15 or Column2 in (1,2,3)
So I need values based on two ranges in Column1 and a list in Column2.
It is returning the correct rows for my ranges, but I am getting extra values based on my list. I know it has to be my AND/ OR but I cannot get it to work.
SELECT * from MyTable where (Column1 between 1 and 5) or (Column1 between 10
and 15) or Column2 in (1,2,3)
I guess sample data and desired results would have been good information. I ended up finding my own answer through trial and error. Thank you for looking. I needed to use my and operator in each of my ranges.
SELECT * from MyTable where Column1 between 1 and 5 and Column2 in (1,2,3)
or Column1 between 10 and 15 and Column2 in (1,2,3)