String verification for alphabets and symbols - postgresql

How can i check if my string has got any characters(a-z) or symbols(#,#,- etc) in postgresql?

Regular Expression (regex) example from my comment:
select *
from mytable
where my_field ~ '[a-z]' -- any lowercase character
other examples:
'[A-Z]' -- any uppercase
'[aeiou]' -- any vowel
'[##-]' -- the symbols you listed -- put the hyphen last, otherwise it's range
'[A-Za-z##-]' -- all letters and your symbols
The Pg docs on Regex are superb:
https://www.postgresql.org/docs/9.5/static/functions-matching.html

Related

Search problem with special characters in PostgreSQL

SELECT * FROM "main_parse_user"
WHERE ("main_parse_user"."bio"::text ~* '\mFounder of JoJoWorld | Python'
OR "main_parse_user"."first_name"::text ~* '\mFounder of JoJoWorld | Python')
I'm looking for text with this code
And sometimes such words with '|'
How can I make it so that '|' treated like a normal line
But with text without such characters, everything works correctly
You'll have to escape characters that have a special meaning in regular expressions with a backslash to deprive them of their special meaning. Per the documentation:
\k (where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g., \\ matches a backslash character

Postgres escape double quotes

I am working with a malformed database which seems to have double quotes as part of the column names.
Example:
|"Market" |
|---------|
|Japan |
|UK |
|USA |
And I want to select like below
SELECT "\"Market\"" FROM mytable; /* Does not work */
How does one select such a thing?
The documentation says
[A] delimited identifier or quoted identifier […] is formed by enclosing an arbitrary sequence of characters in double-quotes ("). […]
Quoted identifiers can contain any character, except the character with code zero. (To include a double quote, write two double quotes.)
So you'll want to use
SELECT """Market""" AS "Market" FROM mytable;
An alternative would be
A variant of quoted identifiers allows including escaped Unicode characters identified by their code points. This variant starts with U& (upper or lower case U followed by ampersand) immediately before the opening double quote, without any spaces in between, for example U&"foo". […] Inside the quotes, Unicode characters can be specified in escaped form by writing a backslash followed by the four-digit hexadecimal code point number or alternatively a backslash followed by a plus sign followed by a six-digit hexadecimal code point number.
which in your case would mean
SELECT U&"\0022Market\0022" AS "Market" FROM mytable;
SELECT U&"\+000022Market\+000022" AS "Market" FROM mytable;
Disclaimer: your database may not actually have double quotes as part of the name itself. As mentioned in the comments, this might just be the way in which the tool you are using does display a column named Market (not market) since
Quoting an identifier also makes it case-sensitive
So all you might need could be
SELECT "Market" FROM mytable;

Postgres only select rows with latin characters

I want to use a condition in a where clause that specifies to only select rows with latin characters (and numbers). Currently I am trying this, but this also returns me rows with Cyrillic characters:
SELECT * FROM temp WHERE temp.name SIMILAR TO '%[^[:ascii:]]%';
How to only select rows that only contain latin characters (and numbers)?
Try the following regular expression:
... WHERE name ~ '^[[:ascii:]]*$' COLLATE "C"
That means that the pattern must consist entirely of ASCII characters. Note that control characters like “newline”, spaces and other ASCII symbols like $ also are ASCII characters. If you want only letters and numbers, try
... WHERE name ~ '^[a-zA-Z0-9]*$' COLLATE "C"

Error when running postgresql COPY Command

Would like to be able to add characters like '-' in the schema name when running COPY command in postgresSQL. Any way to get around this ? Thanks!
`psql -d postgres -c "\COPY (SELECT * FROM test-schema.tableName) TO data.csv DELIMITER ',' CSV"
ERROR: syntax error at or near "-"`enter code here`
LINE 1: COPY ( SELECT * FROM test-schema.tableName ) TO STDOUT DELIMITER ',...`
Yes though I tend to discourage it.
Identifiers:
SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($). Note that dollar signs are not allowed in identifiers according to the letter of the SQL standard, so their use might render applications less portable. The SQL standard will not define a key word that contains digits or starts or ends with an underscore, so identifiers of this form are safe against possible conflict with future extensions of the standard.
There is a second kind of identifier: the delimited identifier or quoted identifier. It is formed by enclosing an arbitrary sequence of characters in double-quotes (").
So:
create schema "test-schema";
CREATE SCHEMA
\dn "test-schema"
List of schemas
Name | Owner
-------------+----------
test-schema | postgres
create table "test-schema"."test-table"(id int);
select * from test-schema."test-table";
ERROR: syntax error at or near "-"
LINE 1: select * from test-schema."test-table";
select * from "test-schema"."test-table";
id
----
(0 rows)
As you see, if you double quote an identifier to get around the identifier naming rules then you are bound to always quoting it.

Column names with line breaks

I know that for text strings in PostgreSQL line breaks are unified by appending symbol E or e in front of the text:
SELECT E'first\nsecond'
results in:
first
second
But PostgreSQL also support line breaks within column names - not sure why or how evil this practice is, but one can do the following:
CREATE TABLE One("first\nsecond" text);
CREATE TABLE Two("first
second" text);
When you are unfortunate enough to run into one of these, you would find that while these queries work:
SELECT "first\nsecond" from One;
SELECT "first
second" from Two;
these ones do not:
SELECT "first
second" from One;
SELECT "first\nsecond" from Two;
My question is: Is there a way in PostgreSQL that unifies such differences, similar to the situation with the column values?
I have tried putting E in front of "first\nsecond" column names, but it is not supported. Trying to put \r\n instead (I'm using Windows) gave me a third type of column names, one that can only be queried as:
SELECT "first\r\nsecond" FROM Third
Column names are identifiers, and the gory details of the syntax for identifiers are described at:
http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
TL;DR: use the U&"..." syntax to inject non-printable characters into identifiers through their Unicode codepoints, and there's no way to unify CR,LF with LF alone.
How to refer to the column in a single line
We're allowed to use Unicode escape sequences in identifiers, so per documentation, the following does work:
select U&"first\000asecond" from Two;
if it's just a newline character between the two words.
What happens with the queries on the first table
The table is created with:
CREATE TABLE One("first\nsecond" text);
As the backslash character has no special meaning here, this column does not contain any newline.
It contains first followed by \ followed by n followed by second.
So:
SELECT "first\nsecond" from One;
does work because it's the same as what's in the CREATE TABLE
whereas
SELECT "first
second" from One;
fails because there's a newline in that SELECT where the actual column name in the table has a backslash followed by a n.
What happens with the queries on the second table
This is the opposite of "One".
CREATE TABLE Two("first
second" text);
The newline is taken verbatim and is part of the column.
So
SELECT "first
second" from Two;
works because the newline is there exactly as in the CREATE TABLE,
with an embedded newline,
whereas
SELECT "first\nsecond" from Two;
fails because as previously \n in this context does not mean a newline.
Carriage Return followed by Newline, or anything weirder
As mentioned in comments and your edit, this could be carriage return and newline instead, in which case the following should do:
select U&"first\000d\000asecond" from Two;
although in my test, hitting Enter in the middle of a column with psql on Unix and Windows has the same effect: a single newline in the column's name.
To check what exact characters ended up in a column name, we can inspect them in hexadecimal.
When applied to your create table example, from inside psql under Unix:
CREATE TABLE Two("first
second" text);
select convert_to(column_name::text,'UTF-8')
from information_schema.columns
where table_schema='public'
and table_name='two';
The result is:
convert_to
----------------------------
\x66697273740a7365636f6e64
For more complex cases (e.g. non-ascii characters with several bytes in UTF-8), a more advanced query might help, for easy-to-read codepoints:
select c,lpad(to_hex(ascii(c)),4,'0') from (
select regexp_split_to_table(column_name::text,'') as c
from information_schema.columns
where table_schema='public'
and table_name='two'
) as g;
c | lpad
---+------
f | 0066
i | 0069
r | 0072
s | 0073
t | 0074
+| 000a
|
s | 0073
e | 0065
c | 0063
o | 006f
n | 006e
d | 0064