PostgreSQL - how to check if my data contains a backslash - postgresql

SELECT count(*) FROM table WHERE column ilike '%/%';
gives me the number of values containing "/"
How to do the same for "\"?

SELECT count(*)
FROM table
WHERE column ILIKE '%\\\\%';

Excerpt from the docs:
Note that the backslash already has a special meaning in string literals, so to write a pattern constant that contains a backslash you must write two backslashes in an SQL statement (assuming escape string syntax is used, see Section 4.1.2.1). Thus, writing a pattern that actually matches a literal backslash means writing four backslashes in the statement. You can avoid this by selecting a different escape character with ESCAPE; then a backslash is not special to LIKE anymore. (But it is still special to the string literal parser, so you still need two of them.)

Better yet - don't use like, just use standard position:
select count(*) from table where 0 < position( E'\\' in column );

I found on 12.5 I did not need an escape character
# select * from t;
x
-----
a/b
c\d
(2 rows)
# select count(*) from t where 0 < position('/' in x);
count
-------
1
(1 row)
# select count(*) from t where 0 < position('\' in x);
count
-------
1
(1 row)
whereas on 9.6 I did.
Bit strange but there you go.
Usefully,
position(E'/' in x)
worked on both versions.
You need to be careful - E'//' seems to work (i.e. parses) but does not actually find a slash.

You need E'\\\\' because the argument to LIKE is a regex and regex escape char is already \ (e.g ~ E'\\w' would match any string containing a printable char).
See the doc

Related

db2 remove all non-alphanumeric, including non-printable, and special characters

This may sound like a duplicate, but existing solutions does not work.
I need to remove all non-alphanumerics from a varchar field. I'm using the following but it doesn't work in all cases (it works with diamond questionmark characters):
select TRANSLATE(FIELDNAME, '?',
TRANSLATE(FIELDNAME , '', 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'))
from TABLENAME
What it's doing is the inner translate parse all non-alphanumeric characters, then the outer translate replace them all with a '?'. This seems to work for replacement character�. However, it throws The second, third or fourth argument of the TRANSLATE scalar function is incorrect. which is expected according to IBM:
The TRANSLATE scalar function does not allow replacement of a character by another character which is encoded using a different number of bytes. The second and third arguments of the TRANSLATE scalar function must end with correctly formed characters.
Is there anyway to get around this?
Edit: #Paul Vernon's solution seems to be working:
· 6005308 ??6005308
–6009908 ?6009908
–6011177 ?6011177
��6011183�� ??6011183??
Try regexp_replace(c,'[^\w\d]','') or regexp_replace(c,'[^a-zA-Z\d]','')
E.g.
select regexp_replace(c,'[^a-zA-Z\d]','') from table(values('AB_- C$£abc�$123£')) t(c)
which returns
1
---------
ABCabc123
BTW Note that the allowed regular expression patterns are listed on this page Regular expression control characters
Outside of a set, the following must be preceded with a backslash to be treated as a literal
* ? + [ ( ) { } ^ $ | \ . /
Inside a set, the follow must be preceded with a backslash to be treated as a literal
Characters that must be quoted to be treated as literals are [ ] \
Characters that might need to be quoted, depending on the context are - &

PostgreSQL: Sorting letters before numbers

I would like to order a bpchar column in the following order (first order by a-z, then numbers):
abc
bcd
xrf/1
zyd
0/abc
0/bdc
0/efg
How could I accomplish that?
Thank you
Can't fully tell from your question what you actually want. If it is the first character of the string you want to check whether it's numeric or alphabet, you may use a CASE expression in ORDER BY like this.
select * FROM t ORDER BY
CASE
WHEN col ~ '^[a-zA-Z]' THEN 1
WHEN col ~ '^[0-9]' THEN 2
END,col;
Demo

Strip out the characters which is non numeric, dashes and pipes

I am trying to find a solution but somehow i am getting wrong output (referred some online solutions and confusing myself. please advise where i am going wrong.
I need to Strip out any characters that is non-numeric,dash "-" or pipe "|" using plsql.
As an example:
if I need to filter the string 0094-78556232_imk*.ext|4444; the output should be 0094-78556232|4444
Use REGEXP_REPLACE:
SELECT
col,
REGEXP_REPLACE (col, '[^0-9|-]', '') AS col_updated
FROM yourTable;
Demo
Don't use regexp_replace, especially if performance is important.
Instead use the standard string function TRANSLATE. Like so:
select col,
translate(col, '0123456789|-' || col, '01234567890|-') as col_updated
from yourTable;
This translates each character in the col value, according to the following scheme: 0 is translated to itself, ...., - is translated to itself. Any other character in col, which is not in this list already, is "translated" to nothing, since there is nothing for it to be translated to in the third argument to the function. So those characters that are NOT on the list are simply removed from the string.

Macro variable generating a space while creating a macro variable in Proc SQL

I'm creating a macro variable but when use the same macro variable in my Proc Report this macro is generating a space in front of the value
Select COUNT(DISTINCT USUBJID) into: N1 from DMDD where ARMN=1;
How do I rectify it in the same code??
This is actually 'working as designed' for PROC SQL SELECT INTO. While all of the other answers are, in some ways, correct, this is a special case as opposed to normal macro variables, such as
%let var= 5 ;
%put [&var];
where that will return just
[5]
while this is not doing that. That is a behavior of PROC SQL SELECT INTO, and is intentional.
These two statements:
proc sql;
select name into :name from sashelp.class where name='Alfred';
select name into :shortname separated by ' ' from sashelp.class where name='Alfred';
quit;
%put `&name` `&shortname`;
are non-identical. separated by ' ' (or any other separated by) will always trim automatically unless notrim is included, and if you have 9.3 or newer, you have a new option, trimmed, which you can use if you intend to select a single variable. I think this behavior was introduced in 9.2 (the non-trimming of select into without a separated by, by default).
If you are only selecting a single value, adding separated by ' ' will have no impact on your result other than to cause the trimming to occur.
This is because any macro variable is stored as a character. If the source data is numeric then SAS uses the best12. format to convert to character, therefore the result is padded with leading blanks. I get around this by using the CATS function which strips out leading and trailing blanks. You can't use the LEFT or STRIP functions as these only work against character variables.
Select cats(COUNT(DISTINCT USUBJID)) into: N1 from DMDD where ARMN=1;
Use the %cmpres() macro to remove blanks.
http://support.sas.com/documentation/cdl/en/mcrolref/64754/HTML/default/viewer.htm#n0tvdbcgr9xc6dn14wmx9hpd6h51.htm
Select trim(put(COUNT(DISTINCT USUBJID), 16. -L)) into: N1 from DMDD where ARMN=1;
Use PUT() to format the output string with -L (left) alignement.

bytea type & nulls, Postgres

I'm using a bytea type in PostgreSQL, which, to my understanding, contains just a series of bytes. However, I can't get it to play well with nulls. For example:
=# select length(E'aa\x00aa'::bytea);
length
--------
2
(1 row)
I was expecting 5. Also:
=# select md5(E'aa\x00aa'::bytea);
md5
----------------------------------
4124bc0a9335c27f086f24ba207a4912
(1 row)
That's the MD5 of "aa", not "aa\x00aa". Clearly, I'm Doing It Wrong, but I don't know what I'm doing wrong. I'm also on an older version of Postgres (8.1.11) for reasons outside of my control. (I'll see if this behaves the same on the latest Postgres as soon as I get home...)
Try this:
# select length(E'aa\\000aa'::bytea);
length
--------
5
Updated: Why the original didn't work? First, understand the difference between one slash and two:
pg=# select E'aa\055aa', length(E'aa\055aa') ;
?column? | length
----------+--------
aa-aa | 5
(1 row)
pg=# select E'aa\\055aa', length(E'aa\\055aa') ;
?column? | length
----------+--------
aa\055aa | 8
In the first case, I'm writing a literal string, 4 characters unescaped('a') and one escaped. The slash is consumed by the parser in a first pass, which converts the full \055
to a single char ('-' in this case).
In the second case, the first slash just escapes the second, the pair \\ is translated by the parser to a single \ and the 055 is seen as three characters.
Now, when converting a text to a bytea, escape characters (in a already parsed or produced text) are parsed/interpreted again! (Yes, this is confusing).
So, when I write
select E'aa\000aa'::bytea;
in the first parsing, the literal E'aa\000aa' is converted to an internal text with a null character in the third position (and depending on your postgresql version, the null character is interpreted as an EOS, and the text is assumed to be of length two - or in other versions an illegal string error is thrown).
Instead, when I write
select E'aa\\000aa'::bytea;
in the first parsing, the literal string "aa\000aa" (eight characters) is seen, and is asigned to a text; then in the casting to bytea, it is parsed again, and the sequence of characters '\000' is interpreted as a null byte.
IMO postgresql kind of sucks here.
You can use regular strings or dollar-quoted strings instead of escaped strings:
# select length('aa\000aa'::bytea);
length
════════
5
(1 row)
# select length($$aa\000aa$$::bytea);
length
════════
5
(1 row)
I think that dollar-quoted strings are a better option because, if the configuration parameter standard_conforming_strings is off, then PostgreSQL recognizes backslash escapes in both regular and escape string constants. However, as of PostgreSQL 9.1, the default is on, meaning that backslash escapes are recognized only in escape string constants.