How retrieve CHAR and VARCHAR field definitions depending on Firebird database encoding - encoding

In Firebird (4.0), I get the field length with following query:
SELECT T.RDB$RELATION_NAME, RF.RDB$FIELD_NAME, RF.RDB$NULL_FLAG, F.RDB$FIELD_TYPE, F.RDB$FIELD_LENGTH
FROM RDB$RELATIONS T, RDB$RELATION_FIELDS RF, RDB$FIELDS F
WHERE T.RDB$VIEW_BLR IS NULL AND (T.RDB$SYSTEM_FLAG IS NULL OR T.RDB$SYSTEM_FLAG = 0)
AND (RF.RDB$SYSTEM_FLAG IS NULL OR RF.RDB$SYSTEM_FLAG = 0)
AND (T.RDB$RELATION_NAME = RF.RDB$RELATION_NAME)
AND (RF.RDB$FIELD_SOURCE = F.RDB$FIELD_NAME)
ORDER BY T.RDB$RELATION_NAME, RF.RDB$FIELD_NAME
F.RDB$FIELD_LENGTH is the allocated size, i.e. if I have a VARCHAR(256) with UTF8 encoding, then the corresponding F.RDB$FIELD_LENGTH value is 1024, but if this is default NONE encoded, the allocated value would be 256.
Is there a way to find the actual value X of VARCHAR(X) directly within the query itself, depending on the database encoding?
In my case, I only have two possible encodings, NONE or UTF8, and I can check it with following query:
SELECT A.RDB$CHARACTER_SET_NAME containing 'UTF8' FROM RDB$DATABASE A;
will return true and field lengths should be divided by 4, otherwise simply return the value itself.
Should I build a query based on an IF statement on the above queries? If so, what would it be? Or is there a better solution?

There are two ways:
Use RDB$FIELDS.RDB$CHARACTER_LENGTH, which contains the length in characters
Join RDB$CHARACTER_SETS and use RDB$CHARACTER_SETS.RDB$BYTES_PER_CHARACTER to calculate based on RDB$FIELDS.RDB$FIELD_LENGTH
An example including both:
SELECT T.RDB$RELATION_NAME, RF.RDB$FIELD_NAME, RF.RDB$NULL_FLAG, F.RDB$FIELD_TYPE,
F.RDB$FIELD_LENGTH,
F.RDB$CHARACTER_LENGTH,
F.RDB$FIELD_LENGTH / RCS.RDB$BYTES_PER_CHARACTER as calculated
FROM RDB$RELATIONS T
inner join RDB$RELATION_FIELDS RF
on RF.RDB$RELATION_NAME = T.RDB$RELATION_NAME
inner join RDB$FIELDS F
on F.RDB$FIELD_NAME = RF.RDB$FIELD_SOURCE
left join RDB$CHARACTER_SETS RCS
on RCS.RDB$CHARACTER_SET_ID = F.RDB$CHARACTER_SET_ID
WHERE T.RDB$VIEW_BLR IS NULL
AND (T.RDB$SYSTEM_FLAG IS NULL OR T.RDB$SYSTEM_FLAG = 0)
AND (RF.RDB$SYSTEM_FLAG IS NULL OR RF.RDB$SYSTEM_FLAG = 0)
ORDER BY T.RDB$RELATION_NAME, RF.RDB$FIELD_NAME
You should not rely on the default character set of the database for this, because that only tells you the character set of newly created columns without an explicit character set. Each column has a character set, which was either specified explicitly or derived from the default character set at the time the column was created.

Related

Postgresql update column of numeric type with NULL value fails if all value of this column is NULL

I have a database table like this:
idx[PK]
a[numeric]
b[numeric]
1
1
1
2
2
2
3
3
3
4
4
4
...
...
...
In pgadmin4, I tried to update this table with some null values, and I noticed the following queries failed:
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,NULL),(2,NULL,NULL),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,1),(2,NULL,2),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
The error message is like this:
ERROR: column "a" is of type numeric but expression is of type text
LINE 2: a = e.a,b = e.b
^
HINT: You will need to rewrite or cast the expression.
SQL state: 42804
Character: 43
However, this will be successful:
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,1),(2,2,NULL),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
It seems like if the new values for one of the columns I am updating are all NULL, then the query fails. However, as long as there is at least one value is numeric but NOT NULL, the query would be successful. Why is this?
I did simplify my real world case here as my actual table has millions of rows and more than 10 columns. Using Python and psycopg2, when I tried to update 50,000 rows in one query, even though there is a value in a column is NOT NULL, the previous error could still show up. I guess that is because the system scans a certain number of rows to decide if the type is correct or not instead of all 50,000 rows.
Therefore, how to avoid this failure in my real world situation? Is there a better query to use instead of UPDATE?
Thank you very much!
UPDATE
Per comments from #Marth and #Gordon Linoff, and as I am using psycopg2, I did the following in my code:
from psycopg2.extras import execute_values
sql = """UPDATE test as t SET
a = (e.a::numeric),
b = (e.b::numeric)
FROM (VALUES %s)
AS e(idx, a, b)
WHERE t.idx = e.idx"""
execute_values(cursor, sql, data)
cursor is from the database connection. data is a list of tuples in the form (idx, a, b) of my values.
This is due to the default behavior of how NULL works in these situations. NULL is generally an unknown type, which is then treated as whatever type is necessary.
In a values() statement, Postgres tries to decipher the types. It treats the individual records as it would with a union. But if all are NULL . . . well, then there is no information. And Postgres decides on using text as the universal default.
It is also important to understand that this fails with the same error:
UPDATE test t
SET a = ''
WHERE t.id = 1;
The issue is that Postgres does not convert empty strings to numbers (unlike some other databases).
In any case, this is easily fixed by casting the NULL to an appropriate type:
UPDATE test t
SET a = e.a,b = e.b
FROM (VALUES (1, NULL::numeric, NULL::numeric),
(2, NULL, NULL),
(3, NULL, NULL)
) e(idx, a, b)
WHERE t.idx = e.idx ;
You can be explicit for all occurrences of NULL, but that is not necessary.
Here is a db<>fiddle that illustrates some of this.

Update with ISNULL and operation

original query looks like this :
UPDATE reponse_question_finale t1, reponse_question_finale t2 SET
t1.nb_question_repondu = (9-(ISNULL(t1.valeur_question_4)+ISNULL(t1.valeur_question_6)+ISNULL(t1.valeur_question_7)+ISNULL(t1.valeur_question_9))) WHERE t1.APPLICATION = t2.APPLICATION;
I know you cannot update 2 tables in a single query so i tried this :
UPDATE reponse_question_finale t1
SET nb_question_repondu = (9-(COALESCE(t1.valeur_question_4,'')::int+COALESCE(t1.valeur_question_6,'')::int+COALESCE(t1.valeur_question_7)::int+COALESCE(t1.valeur_question_9,'')::int))
WHERE t1.APPLICATION = t1.APPLICATION;
But this query gaves me an error : invalid input syntax for integer: ""
I saw that the Postgres equivalent to MySQL is COALESCE() so i think i'm on the good way here.
I also know you cannot add varchar to varchar so i tried to cast it to integer to do that. I'm not sure if i casted it correctly with parenthesis at the good place and regarding to error maybe i cannot cast to int with coalesce.
Last thing, i can certainly do a co-related sub-select to update my two tables but i'm a little lost at this point.
The output must be an integer matching the number of questions answered to a backup survey.
Any thoughts?
Thanks.
coalesce() returns the first non-null value from the list supplied. So, if the column value is null the expression COALESCE(t1.valeur_question_4,'') returns an empty string and that's why you get the error.
But it seems you want something completely different: you want check if the column is null (or empty) and then subtract a value if it is to count the number of non-null columns.
To return 1 if a value is not null or 0 if it isn't you can use:
(nullif(valeur_question_4, '') is null)::int
nullif returns null if the first value equals the second. The IS NULL condition returns a boolean (something that MySQL doesn't have) and that can be cast to an integer (where false will be cast to 0 and true to 1)
So the whole expression should be:
nb_question_repondu = 9 - (
(nullif(t1.valeur_question_4,'') is null)::int
+ (nullif(t1.valeur_question_6,'') is null)::int
+ (nullif(t1.valeur_question_7,'') is null)::int
+ (nullif(t1.valeur_question_9,'') is null)::int
)
Another option is to unpivot the columns and do a select on them in a sub-select:
update reponse_question_finale
set nb_question_repondu = (select count(*)
from (
values
(valeur_question_4),
(valeur_question_6),
(valeur_question_7),
(valeur_question_9)
) as t(q)
where nullif(trim(q),'') is not null);
Adding more columns to be considered is quite easy then, as you just need to add a single line to the values() clause

What does a column assignment using an aggregate in the columns area of a select do?

I'm trying to decipher another programmer's code who is long-gone, and I came across a select statement in a stored procedure that looks like this (simplified) example:
SELECT #Table2.Col1, Table2.Col2, Table2.Col3, MysteryColumn = CASE WHEN y.Col3 IS NOT NULL THEN #Table2.MysteryColumn - y.Col3 ELSE #Table2.MysteryColumn END
INTO #Table1
FROM #Table2
LEFT OUTER JOIN (
SELECT Table3.Col1, Table3.Col2, Col3 = SUM(#Table3.Col3)
FROM Table3
INNER JOIN #Table4 ON Table4.Col1 = Table3.Col1 AND Table4.Col2 = Table3.Col2
GROUP BY Table3.Col1, Table3.Col2
) AS y ON #Table2.Col1 = y.Col1 AND #Table2.Col2 = y.Col2
WHERE #Table2.Col2 < #EnteredValue
My question, what does the fourth column of the primary selection do? does it produce a boolean value checking to see if the values are equal? or does it set the #Table2.MysteryColumn equal to some value and then inserts it into #Table1? Or does it just update the #Table2.MysteryColumn and not output a value into #Table1?
This same thing seems to happen inside of the sub-query on the third column, and I am equally at a loss as to what that does as well.
MysteryColumn = gives the expression a name also called a column alias. The fact that a column in the table#2 also has the same name is besides the point.
Since it uses INTO syntax it also gives the column its name in the resulting temporary table. See the SELECT CLAUSE and note | column_alias = expression and the INTO CLAUSE

Prepare dynamic case statement using PostgreSQL 9.3

I have the following case statement to prepare as a dynamic as shown below:
Example:
I have the case statement:
case cola
when cola between '2001-01-01' and '2001-01-05' then 'G1'
when cola between '2001-01-10' and '2001-01-15' then 'G2'
when cola between '2001-01-20' and '2001-01-25' then 'G3'
when cola between '2001-02-01' and '2001-02-05' then 'G4'
when cola between '2001-02-10' and '2001-02-15' then 'G5'
else ''
end
Note: Now I want to create dynamic case statement because of the values dates and name passing as a parameter and it may change.
Declare
dates varchar = '2001-01-01to2001-01-05,2001-01-10to2001-01-15,
2001-01-20to2001-01-25,2001-02-01to2001-02-05,
2001-02-10to2001-02-15';
names varchar = 'G1,G2,G3,G4,G5';
The values in the variables may change as per the requirements, it will be dynamic. So the case statement should be dynamic without using loop.
You may not need any function for this, just join to a mapping data-set:
with cola_map(low, high, value) as (
values(date '2001-01-01', date '2001-01-05', 'G1'),
('2001-01-10', '2001-01-15', 'G2'),
('2001-01-20', '2001-01-25', 'G3'),
('2001-02-01', '2001-02-05', 'G4'),
('2001-02-10', '2001-02-15', 'G5')
-- you can include as many rows, as you want
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
If your date ranges could collide, you can use DISTINCT ON or GROUP BY to avoid row duplication.
Note: you can use a simple sub-select too, I used a CTE, because it's more readable.
Edit: passing these data (as a single parameter) can be achieved by passing a multi-dimensional array (or an array of row-values, but that requires you to have a distinct, predefined composite type).
Passing arrays as parameters can depend on the actual client (& driver) you use, but in general, you can use the array's input representation:
-- sql
with cola_map(low, high, value) as (
select d[1]::date, d[2]::date, d[3]
from unnest(?::text[][]) d
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
// client pseudo code
query = db.prepare(sql);
query.bind(1, "{{2001-01-10,2001-01-15,G2},{2001-01-20,2001-01-25,G3}}");
query.execute();
Passing each chunk of data separately is also possible with some clients (or with some abstractions), but this is highly depends on your driver/orm/etc. you use.

Postgres: buckets always filled from left in crosstab query

My query looks like this:
SELECT mthreport.*
FROM crosstab
('SELECT
to_char(ipstimestamp, ''mon DD HH24h'') As row_name,
varid::text || log.varid || ''_'' || ips.objectname::text As bucket,
COUNT(*)::integer As bucketvalue
FROM loggingdb_ips_boolean As log
INNER JOIN IpsObjects As ips
ON log.Varid=ips.ObjectId
WHERE ((log.varid = 37551)
OR (log.varid = 27087)
OR (log.varid = 50876)
OR (log.varid = 45096)
OR (log.varid = 54708)
OR (log.varid = 47475)
OR (log.varid = 54606)
OR (log.varid = 25528)
OR (log.varid = 54729))
GROUP BY to_char(ipstimestamp, ''yyyy MM DD HH24h''), row_name, objectid, bucket
ORDER BY to_char(ipstimestamp, ''yyyy MM DD HH24h''), row_name, objectid, bucket' )
As mthreport(item_name text, varid_37551 integer,
varid_27087 integer ,
varid_50876 integer ,
varid_45096 integer ,
varid_54708 integer ,
varid_47475 integer ,
varid_54606 integer ,
varid_25528 integer ,
varid_54729 integer ,
varid_29469 integer)
the query can be tested against a test table with this connection string:
"host=bellariastrasse.com port=5432 dbname=IpsLogging user=guest password=guest"
The query is syntactically correct and runs fine. My problem is that it the COUNT(*) values are always filling the leftmost column. however, in many instances the left columns should have a zero, or a NULL, and only the 2nd (or n-th) column should be filled. My brain is melting and I cannot figure out what is wrong!
The solution for your problem is to use the crosstab() variant with two parameters.
The second parameter (another query string) produces the list of output columns, so that NULL values in the data query (the first parameter) are assigned correctly.
Check the manual for the tablefunc extension, and in particular crosstab(text, text):
The main limitation of the single-parameter form of crosstab is that
it treats all values in a group alike, inserting each value into the
first available column. If you want the value columns to correspond to
specific categories of data, and some groups might not have data for
some of the categories, that doesn't work well. The two-parameter form
of crosstab handles this case by providing an explicit list of the
categories corresponding to the output columns.
Emphasis mine. I posted a couple of related answers recently here or here or here.