How to filter out data with special character in vectorwise - special-characters

There is a column named Prod_code in my product table.
I need to select only valid prod_code from product table and load it into another table.
valid prod_code are the codes which don't have any special characters in it.
VALID prod_code: WER1234, ASD1345
INVALID prod_code: ABC$123,LPS????,$$$ (which I need to check and filter out).
How can I achieve this?
Src list of prod code
WER1234
ASD1345
ABC$123
LPS????
$$$
target list of prod code
WER1234
ASD1345

This is pretty hideous but it does the job. It uses the LIKE predicate to reject unacceptable strings. You can probably also use the SIMILAR TO predicate to specify acceptable strings but I would guess that LIKE is faster.
select prod_code from products
where prod_code not like '%\['+x'00'+'-'+x'2f'+','+
x'3a'+'-'+x'40'+','+
x'5d'+'-'+x'7f'+'\]%' escape '\'
and prod_code not like '%\%'
and prod_code not like '%[%'

Related

Postgres fuzzy array intersection

I'm using Postgresql 13 and my problem was easily solved with #> operator like this:
select id from documents where keywords #> '{"winter", "report", "2020"}';
meaning that keywords array should contain all these elements. Also I've created a GIN index on this column.
Is it possible to achieve similar behavior even if I provide my request like '{"re", "202", "w"}' ? I heard that ngrams have semantics like this, but "intersection" capabilities of arrays are crucial for me.
In your example, the matches are all prefixes. Is that the general rule here? If so, you would probably went to use the match feature of full text search, not trigrams. It would require you reformat your data, or at least your query.
select * from
(values (to_tsvector('simple','winter report 2020'))) f(x)
where x## 're:* & 202:* & w:*'::tsquery;
If the strings can contain punctuation which you want preserved, you would need to take pains to properly format them into a quoted tsvector yourself rather than just letting to_tsvector deal with it. Using 'simple' config gets rid of the stemming and stop word removal features, which would interfere with what you want to do.

Using arrays with pg-promise

I'm using pg-promise and am not understanding how to run this query. The first query works, but I would like to use pg-promise's safe character escaping, and then I try the second query it doesn't work.
Works:
db.any(`SELECT title FROM books WHERE id = ANY ('{${ids}}') ORDER BY id`)
Doesn't work
db.any(`SELECT title FROM books WHERE id = ANY ($1) ORDER BY id`, ids)
The example has 2 problems. First, it goes against what the documentation tells you:
IMPORTANT: Never use the reserved ${} syntax inside ES6 template strings, as those have no knowledge of how to format values for PostgreSQL. Inside ES6 template strings you should only use one of the 4 alternatives - $(), $<>, $[] or $//.
Manual query formatting, like in your first example, is a very bad practice, resulting in bad things, ranging from broken queries to SQL injection.
And the second issue is that after switching to the correct SQL formatting, you should use the CSV Filter to properly format the list of values:
db.any(`SELECT title FROM books WHERE id IN ($/ids:csv/) ORDER BY id`, {ids})
or via an index variable:
db.any(`SELECT title FROM books WHERE id IN ($1:csv) ORDER BY id`, [ids])
Note that I also changed from ANY to IN operand, as we are providing a list of open values here.
And you can use filter :list interchangeably, whichever you like.

Efficient way to find ordered string's exact, prefix and postfix match in PostgreSQL

Given a table name table and a string column named column, I want to search for the word word in that column in the following way: exact matches be on top, followed by prefix matches and finally postfix matches.
Currently I got the following solutions:
Solution 1:
select column
from (select column,
case
when column like 'word' then 1
when column like 'word%' then 2
when column like '%word' then 3
end as rank
from table) as ranked
where rank is not null
order by rank;
Solution 2:
select column
from table
where column like 'word'
or column like 'word%'
or column like '%word'
order by case
when column like 'word' then 1
when column like 'word%' then 2
when column like '%word' then 3
end;
Now my question is which one of the two solutions are more efficient or better yet, is there a solution better than both of them?
Your 2nd solution looks simpler for the planner to optimize, but it is possible that the first one gets the same plan as well.
For the Where, is not needed as it is covered by ; it might confuse the DB to do 2 checks instead of one.
But the biggest problem is the third one as this has no way to be optimized by an index.
So either way, PostgreSQL is going to scan your full table and manually extract the matches. This is going to be slow for 20,000 rows or more.
I recommend you to explore fuzzy string matching and full text search; looks like that is what you're trying to emulate.
Even if you don't want the full power of FTS or fuzzy string matching, you definitely should add the extension "pgtrgm", as it will enable you to add a GIN index on the column that will speedup LIKE '%word' searches.
https://www.postgresql.org/docs/current/pgtrgm.html
And seriously, have a look to FTS. It does provide ranking. If your requirements are strict to what you described, you can still perform the FTS query to "prefilter" and then apply this logic afterwards.
There are tons of introduction articles to PostgreSQL FTS, here's one:
https://www.compose.com/articles/mastering-postgresql-tools-full-text-search-and-phrase-search/
And even I wrote a post recently when I added FTS search to my site:
https://deavid.wordpress.com/2019/05/28/sedice-adding-fts-with-postgresql-was-really-easy/

How do I match variables from FreeRADIUS in PostgreSQL with the LIKE operator?

In a PostgreSQL query, executed by FreeRADIUS, I want to do something similar to (the table names and values are just examples):
SELECT name
FROM users
WHERE city LIKE '%blahblah%';
but there is a catch: the blahblah value is contained in a FreeRADIUS variable, represented with '%{variable-name}'. It expands to 'blahblah'.
Now my question is: How do I match the %{variable-name} variable to the value stored in the table using the LIKE operator?
I tried using
SELECT name
FROM users
WHERE city LIKE '%%{variable-name}%';
but it doesn't expand correctly like that and is obviously incorrect.
The final query I want to achieve is
...
WHERE city LIKE '%blahblah%';
so it matches the longer string containing 'blahblah' stored in the table, but I want the variable to expand dynamically into the correct query. Is there a way to do it?
Thanks!
Wild guess:
Assuming that FreeRADIUS is doing dumb substitution across the entire SQL string, with no attempt to parse literals, etc, before sending the SQL to PostgreSQL then you could use:
SELECT name
FROM users
WHERE city LIKE '%'||'%{variable-name}'||'%';
Edit: To avoid the warnings caused by FreeRADIUS not parsing cleverly enough, hide the %s as hex chars:
WHERE city LIKE E'\x25%{variable-name}\x25';
Note the leading E for the string marking it as a string subject to escape processing.
SELECT name
FROM users
WHERE city LIKE '%%'||'%{variable-name}'||'%%';
Is slightly cleaner. %% is FreeRADIUS' escape sequence for percents.

Parameterized SQL Columns?

I have some code which utilizes parameterized queries to prevent against injection, but I also need to be able to dynamically construct the query regardless of the structure of the table. What is the proper way to do this?
Here's an example, say I have a table with columns Name, Address, Telephone. I have a web page where I run Show Columns and populate a select drop-down with them as options.
Next, I have a textbox called Search. This textbox is used as the parameter.
Currently my code looks something like this:
result = pquery('SELECT * FROM contacts WHERE `' + escape(column) + '`=?', search);
I get an icky feeling from it though. The reason I'm using parameterized queries is to avoid using escape. Also, escape is likely not designed for escaping column names.
How can I make sure this works the way I intend?
Edit:
The reason I require dynamic queries is that the schema is user-configurable, and I will not be around to fix anything hard-coded.
Instead of passing the column names, just pass an identifier that you code will translate to a column name using a hardcoded table. This means you don't need to worry about malicious data being passed, since all the data is either translated legally, or is known to be invalid. Psudoish code:
#columns = qw/Name Address Telephone/;
if ($columns[$param]) {
$query = "select * from contacts where $columns[$param] = ?";
} else {
die "Invalid column!";
}
run_sql($query, $search);
The trick is to be confident in your escaping and validating routines. I use my own SQL escape function that is overloaded for literals of different types. Nowhere do I insert expressions (as opposed to quoted literal values) directly from user input.
Still, it can be done, I recommend a separate — and strict — function for validating the column name. Allow it to accept only a single identifier, something like
/^\w[\w\d_]*$/
You'll have to rely on assumptions you can make about your own column names.
I use ADO.NET and the use of SQL Commands and SQLParameters to those commands which take care of the Escape problem. So if you are in a Microsoft-tool environment as well, I can say that I use this very sucesfully to build dynamic SQL and yet protect my parameters
best of luck
Make the column based on the results of another query to a table that enumerates the possible schema values. In that second query you can hardcode the select to the column name that is used to define the schema. if no rows are returned then the entered column is invalid.
In standard SQL, you enclose delimited identifiers in double quotes. This means that:
SELECT * FROM "SomeTable" WHERE "SomeColumn" = ?
will select from a table called SomeTable with the shown capitalization (not a case-converted version of the name), and will apply a condition to a column called SomeColumn with the shown capitalization.
Of itself, that's not very helpful, but...if you can apply the escape() technique with double quotes to the names entered via your web form, then you can build up your query reasonably confidently.
Of course, you said you wanted to avoid using escape - and indeed you don't have to use it on the parameters where you provide the ? place-holders. But where you are putting user-provided data into the query, you need to protect yourself from malicious people.
Different DBMS have different ways of providing delimited identifiers. MS SQL Server, for instance, seems to use square brackets [SomeTable] instead of double quotes.
Column names in some databases can contain spaces, which mean you'd have to quote the column name, but if your database contains no such columns, just run the column name through a regular expression or some sort of check before splicing into the SQL:
if ( $column !~ /^\w+$/ ) {
die "Bad column name [$column]";
}