Check if set of columns have scientific value - postgresql

I need to check if 4 columns (varchar) have at least 1 row with scientific connotation values (E+)
I'm doing this for a single column:
declare
_ean int;
begin
t_query = '
select count(*) from mytable where trim_to_null(myfield) is not null and (trim_to_null(myfield) ilike '%E+%');';
execute t_query into _ean;
IF _ean != 0 THEN
RAISE NOTICE 'EAN has a scientific connotation, please review the file';
return 'Error The file contains % EAN with scientific connotation';
END IF;
return null;
It works ok for this one column but now I need to also check 4 more columns and I need to tell on which column the scientific connotation was found, I could do this by multiples "IF" to check on each column but I bet there's a better way to do it in one sentence, and return the column/s name which had the scientific connotation.

As stated in the comments, you don't need dynamic SQL for that.
Also, storing numbers as strings is really bad practice – your queries get more complicated, and somebody could store non-numbers as well.
All that said, I thing your query ignored the fact that scientific notation could also be 1e-7 or 1e4.
So I think the query should contain
WHERE trim_to_null(myfield) ILIKE '%E%'
or, if you want to check the number for correctness, something like
WHERE trim_to_null(myfield) ~ '^[+-]?[0-9]+(\.[0-9]*)?[eE][+-]?[0-9]+$'
But to your original question:
You could run
SELECT id, col1_is_ean, col2_is_ean, col3_is_ean
FROM (SELECT id,
col1 ILIKE '%E%' AS col1_is_ean,
col2 ILIKE '%E%' AS col2_is_ean,
col3 ILIKE '%E%' AS col3_is_ean
FROM mytable) AS q
WHERE col1_is_ean OR col2_is_ean OR col3_is_ean;

Related

Casting rows to arrays in PostgreSQL

I need to query a table as in
SELECT *
FROM table_schema.table_name
only each row needs to be a TEXT[] with array values corresponding to column values casted to TEXT coming in the same order as in SELECT * so assuming the table has columns a, b and c I need the result to look like
SELECT ARRAY[a::TEXT, b::TEXT, c::TEXT]
FROM table_schema.table_name
only it shouldn't explicitly list columns by name. Ideally it should look like
SELECT as_text_array(a)
FROM table_schema.table_name AS a
The best I came up with looks ugly and relies on "hstore" extension
WITH columnz AS ( -- get ordered column name array
SELECT array_agg(attname::TEXT ORDER BY attnum) AS column_name_array
FROM pg_attribute
WHERE attrelid = 'table_schema.table_name'::regclass AND attnum > 0 AND NOT attisdropped
)
SELECT hstore(a)->(SELECT column_name_array FROM columnz)
FROM table_schema.table_name AS a
I am having a feeling there must be a simpler way to achieve that
UPDATE 1
Another query that achieves the same result but arguably as ugly and inefficient as the first one is inspired by the answer by #bspates. It may be even less efficient but doesn't rely on extensions
SELECT r.text_array
FROM table_schema.table_name AS a
INNER JOIN LATERAL ( -- parse ROW::TEXT presentation of a row
SELECT array_agg(COALESCE(replace(val[1], '""', '"'), NULLIF(val[2], ''))) AS text_array
FROM regexp_matches(a::text, -- parse double-quoted and simple values separated by commas
'(?<=\A\(|,) (?: "( (?:[^"]|"")* )" | ([^,"]*) ) (?=,|\)\Z)', 'xg') AS t(val)
) AS r ON TRUE
It is still far from ideal
UPDATE 2
I tested all 3 options existing at the moment
Using JSON. It doesn't rely on any extensions, it is short to write, easy to understand and the speed is ok.
Using hstore. This alternative is the fastest (>10 times faster than JSON approach on a 100K dataset) but requires an extension. hstore in general is very handy extension to have through.
Using regex to parse TEXT presentation of a ROW. This option is really slow.
A somewhat ugly hack is to convert the row to a JSON value, then unnest the values and aggregate it back to an array:
select array(select (json_each_text(to_json(t))).value) as row_value
from some_table t
Which is to some extent the same as your hstore hack.
If the order of the columns is important, then using json and with ordinality can be used to keep that:
select array(select val
from json_each_text(to_json(t)) with ordinality as t(k,val,idx)
order by idx)
from the_table t
The easiest (read hacky-est) way I can think of is convert to a string first then parse that string into an array. Like so:
SELECT string_to_array(table_name::text, ',') FROM table_name
BUT depending on the size and type of the data in the table, this could perform very badly.

How to add a leading zero when the length of the column is unknown?

How can I add a leading zero to a varchar column in the table and I don't know the length of the column. If the column is not null, then I should add a leading zero.
Examples:
345 - output should be 0345
4567 - output should be 04567
I tried:
SELECT lpad(column1,WHAT TO SPECIFY HERE?, '0')
from table_name;
I will run an update query after I get this.
You may be overthinking this. Use plain concatenation:
SELECT '0' || column1 AS padded_col1 FROM table_name;
If the column is NULL, nothing happens: concatenating anything to NULL returns NULL.
In particular, don't use concat(). You would get '0' for NULL columns, which you do not want.
If you also have empty strings (''), you may need to do more, depending on what you want.
And since you mentioned your plan to updated the table: Consider not doing this, you are adding noise, that could be added for display with the simple expression. A VIEW might come in handy for this.
If all your varchar values are in fact valid numbers, use an appropriate numeric data type instead and format for display with the same expression as above. The concatenation automatically produces a text result.
If circumstances should force your hand and you need to update anyway, consider this:
UPDATE table_name
SET column1 = '0' || column1
WHERE column1 IS DISTINCT FROM '0' || column1;
The added WHERE clause to avoid empty updates. Compare:
How do I (or can I) SELECT DISTINCT on multiple columns?
try concat instead?..
SELECT concat(0::text,column1) from table_name;

SELECT * except nth column

Is it possible to SELECT * but without n-th column, for example 2nd?
I have some view that have 4 and 5 columns (each has different column names, except for the 2nd column), but I do not want to show the second column.
SELECT * -- how to prevent 2nd column to be selected?
FROM view4
WHERE col2 = 'foo';
SELECT * -- how to prevent 2nd column to be selected?
FROM view5
WHERE col2 = 'foo';
without having to list all the columns (since they all have different column name).
The real answer is that you just can not practically (See LINK). This has been a requested feature for decades and the developers refuse to implement it. The best practice is to mention the column names instead of *. Using * in itself a source of performance penalties though.
However, in case you really need to use it, you might need to select the columns directly from the schema -> check LINK. Or as the below example using two PostgreSQL built-in functions: ARRAY and ARRAY_TO_STRING. The first one transforms a query result into an array, and the second one concatenates array components into a string. List components separator can be specified with the second parameter of the ARRAY_TO_STRING function;
SELECT 'SELECT ' ||
ARRAY_TO_STRING(ARRAY(SELECT COLUMN_NAME::VARCHAR(50)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME='view4' AND
COLUMN_NAME NOT IN ('col2')
ORDER BY ORDINAL_POSITION
), ', ') || ' FROM view4';
where strings are concatenated with the standard operator ||. The COLUMN_NAME data type is information_schema.sql_identifier. This data type requires explicit conversion to CHAR/VARCHAR data type.
But that is not recommended as well, What if you add more columns in the long run but they are not necessarily required for that query?
You would start pulling more column than you need.
What if the select is part of an insert as in
Insert into tableA (col1, col2, col3.. coln) Select everything but 2 columns FROM tableB
The column match will be wrong and your insert will fail.
It's possible but I still recommend writing every needed column for every select written even if nearly every column is required.
Conclusion:
Since you are already using a VIEW, the simplest and most reliable way is to alter you view and mention the column names, excluding your 2nd column..
-- my table with 2 rows and 4 columns
DROP TABLE IF EXISTS t_target_table;
CREATE TEMP TABLE t_target_table as
SELECT 1 as id, 1 as v1 ,2 as v2,3 as v3,4 as v4
UNION ALL
SELECT 2 as id, 5 as v1 ,-6 as v2,7 as v3,8 as v4
;
-- my computation and stuff that i have to messure, any logic could be done here !
DROP TABLE IF EXISTS t_processing;
CREATE TEMP TABLE t_processing as
SELECT *, md5(t_target_table::text) as row_hash, case when v2 < 0 THEN true else false end as has_negative_value_in_v2
FROM t_target_table
;
-- now we want to insert that stuff into the t_target_table
-- this is standard
-- INSERT INTO t_target_table (id, v1, v2, v3, v4) SELECT id, v1, v2, v3, v4 FROM t_processing;
-- this is andvanced ;-)
INSERT INTO t_target_table
-- the following row select only the columns that are pressent in the target table, and ignore the others.
SELECT r.* FROM (SELECT to_jsonb(t_processing) as d FROM t_processing) t JOIN LATERAL jsonb_populate_record(NULL::t_target_table, d) as r ON TRUE
;
-- WARNING : you need a object that represent the target structure, an exclusion of a single column is not possible
For columns col1, col2, col3 and col4 you will need to request
SELECT col1, col3, col4 FROM...
to omit the second column. Requesting
SELECT *
will get you all the columns

How to return gaps in numbers stored as char with leading zeros?

I have a table with a char(5) field for tracking Bin Numbers. The numbers are stored with leading zeros. The numbers go from 00200 through 90000. There are a lot of gaps in the numbers already in use and I need to be able to query them out so the user knows which numbers are available to use.
Assume you have a table of valid bin numbers.
Table: bins
bin_num
--
00200
00201
00202
...
90000
Assume your table is named "inventory". The bin numbers returned by this query are the ones that aren't in "inventory".
select bins.bin_num
from bins
left join inventory t2
on bins.bin_num = t2.bin_num
where t2.bin_num is null
order by bins.bin_num
If your version of SQL Server supports analytic functions (and, solely for convenience, common table expressions), you can find most of the gaps like this.
with bin_and_next_bin as (
select bin, lead(bin) over (order by bin) next_bin
from inventory
)
select bin
from bin_and_next_bin
where cast(bin as integer) <> cast(next_bin as integer) - 1
Analytic functions don't require a table of valid bin numbers, although you can make a really strong case that you ought to have such a table in the first place. If you're working in an environment where you don't have such a table, and you're not allowed to build such a table, a common table expression can save the day. (It doesn't show "missing" bin numbers before the first used bin number, though, as it's written here.)
One other disadvantage of this statement is that the WHERE clause isn't sargable; it can't use an index. Yet another is that it assumes bin numbers can be cast to integer. The table-based approach doesn't assume anything about the value or data type of the bin number; it works just as well with mixed alphanumerics as it does with integers or anything else.
I was able to get exactly what I needed by reading this article by Pinal Dave
I created a stored procedure that returned the gaps in the bin number sequence starting from the first bin number to the last. In my application I group the bin numbers by Shop (Vehicles would be 1000 through 2000, Buildings 2001 through 3000, etc).
ALTER PROCEDURE [dbo].[spSelectLOG_BinsAvailable]
(#Shop varchar(9))
AS
BEGIN
declare #start as varchar(5) = (SELECT b.Start FROM BinShopCodeBlocks b WHERE b.Shop = #Shop)
declare #finish as varchar(5) = (SELECT b.Finish FROM BinShopCodeBlocks b WHERE b.Shop = #Shop)
SET NOCOUNT ON;
WITH CTE
AS
(SELECT
CAST(#Start as int) as start,
cast(#Finish as int) as finish
UNION ALL
SELECT
Start + 1,
Finish
FROM
CTE
WHERE
Start < Finish
)
SELECT
RIGHT('00000' + CAST(Start AS VARCHAR(5)), 5)
FROM CTE
WHERE
NOT EXISTS
(SELECT *
FROM
BinMaster b
WHERE
b.BinNumber = RIGHT('00000' + CAST(Start AS VARCHAR(5)), 5)
)
OPTION (MAXRECURSION 0);
END

how to do 'any(::text[]) ilike ::text'

here is table structure
table1
pk int, email character varying(100)[]
data
1, {'mr_a#gmail.com', 'mr_b#yahoo.com', 'mr_c#postgre.com'}
what i try to achieve is find any 'gmail' from record
query
select * from table1 where any(email) ilike '%gmail%';
but any() can only be in left-side and unnest() might slow down performance. anyone have any idea?
edit
actually i kinda confuse a bit when i first post. i try to achieve through any(array[]).
this is my actual structure
pk int,
code1 character varying(100),
code2 character varying(100),
code3 character varying(100), ...
my first approch is
select * from tabl1 where code1 ilike '%code%' or code2 ilike '%code%' or...
then i try
select * from table1 where any(array[code1, code2, ...]) ilike '%code%'
which is not working.
Create an operator that implements ILIKE "backwards", e.g.:
CREATE FUNCTION backward_texticlike(text, text) RETURNS booleans
STRICT IMMUTABLE LANGUAGE SQL
AS $$ SELECT texticlike($2, $1) $$;
CREATE OPERATOR !!!~~* (
PROCEDURE = backward_texticlike,
LEFTARG = text,
RIGHTARG = text,
COMMUTATOR = ~~*
);
(Note that ILIKE internally corresponds to the operator ~~*. Pick your own name for the reverse.)
Then you can run
SELECT * FROM table1 WHERE '%code%' !!!~~* ANY(ARRAY[code1, code2, ...]);
Store email addresses in a normalized table structure. Then you can avoid the expense of unnest, have "proper" database design, and take full advantage of indexing. If you're looking to do full text style queries, you should be storing your email addresses in a table and then using a tsvector datatype so you can perform full text queries AND use indexes. ILIKE '%whatever%' is going to result in a full table scan since the planner can't take advantage of any query. With your current design and a sufficient number of records, unnest will be the least of your worries.
Update Even with the updates to the question, using a normalized codes table will cause you the least amount of headache and result in optimal scans. Anytime that you find yourself creating numbered columns, it's a good indication that you might want to normalize. That being said, you can create a computed text column to use as a search words column. In your case you could create a search_words column that is populated on insert and update by a trigger. Then you can create a tsvector to build full text queries on the search_words