Postgresql sorting mixed alphanumeric data - postgresql

Running this query:
select name from folders order by name
returns these results:
alphanumeric
a test
test 20
test 19
test 1
test 10
But I expected:
a test
alphanumeric
test 1
test 10
test 19
test 20
What's wrong here?

You can simply cast name column to bytea data type allowing collate-agnostic ordering:
SELECT name
FROM folders
ORDER BY name::bytea;
Result:
name
--------------
a test
alphanumeric
test 1
test 10
test 19
test 20
(6 rows)

All of this methods sorted my selection in alphabetical order:
test 1
test 10
test 2
test 20
This solution worked for me (lc_collate: 'ru_RU.UTF8'):
SELECT name
FROM folders
ORDER BY SUBSTRING(name FROM '([0-9]+)')::BIGINT ASC, name;
test 1
test 2
test 10
test 20

select * from "public"."directory" where "directoryId" = 17888 order by
COALESCE(SUBSTRING("name" FROM '^(\d+)')::INTEGER, 99999999),
SUBSTRING("name" FROM '[a-zA-z_-]+'),
COALESCE(SUBSTRING("name" FROM '(\d+)$')::INTEGER, 0),
"name";
NOTE: Escape the regex as you need, in some languages, you will have to add one more "\".
In my Postgres DB, name column contains following, when I use simple order by name query:
1
10
2
21
A
A1
A11
A5
B
B2
B22
B3
M 1
M 11
M 2
Result of Query, After I have modified it:
1
2
10
21
A
A1
A5
A11
B
B2
B3
B22
M 1
M 2
M 11

You may be able to manually sort by splitting the text up in case there is trailing numerals, like so:
SELECT * FROM sort_test
ORDER BY SUBSTRING(text FROM '^(.*?)( \\d+)?$'),
COALESCE(SUBSTRING(text FROM ' (\\d+)$')::INTEGER, 0);
This will sort on column text, first by all characters optionally excluding an ending space followed by digits, then by those optional digits.
Worked well in my test.
Update fixed the string-only sorting with a simple coalesce (duh).

OverZealous answer helped me but didn't work if the string in the database begun with numbers followed by additional characters.
The following worked for me:
SELECT name
FROM folders
ORDER BY
COALESCE(SUBSTRING(name FROM '^(\\d+)')::INTEGER, 99999999),
SUBSTRING(name FROM '^\\d* *(.*?)( \\d+)?$'),
COALESCE(SUBSTRING(name FROM ' (\\d+)$')::INTEGER, 0),
name;
So this one:
Extracts the first number in the string, or uses 99999999.
Extracts the string that follows the possible first number.
Extracts a trailing number, or uses 0.

A Vlk's answer above helped me a lot, but it sorted items only by the numeric part, which in my case came second. My data was like (desk 1, desk 2, desk 3 ...) a string part, a space and a numeric part. The syntax in A Vlk's answer returned the data sorted by the number, and at that it was the only answer from the above that did the trick. However when the string part was different, (eg desk 3, desk 4, table 1, desk 5...) table 1 would get first from desk 2. I fixed this using the syntax below:
...order by SUBSTRING(name,'\\w+'), SUBSTRINGname FROM '([0-9]+)')::BIGINT ASC;

Tor's last SQL worked for me. However if you are calling this code from php you need add extra slashes.
SELECT name
FROM folders
ORDER BY
COALESCE(SUBSTRING(name FROM '^(\\\\d+)')::INTEGER, 99999999),
SUBSTRING(name FROM '^\\\\d* *(.*?)( \\\\d+)?$'),
COALESCE(SUBSTRING(name FROM ' (\\\\d+)$')::INTEGER, 0),
name;

Related

Putting keyword data into a csv file MATLAB

Given a table of the following format in MATLAB:
userid | itemid | keywords
A = [ 3 10 'book'
3 10 'briefcase'
3 10 'boat'
12 20 'windows'
12 20 'picture'
12 35 'love'
4 10 'day'
12 10 'working day'
... ... ... ];
where A is a table of size (58000*3), I want to write the data in a csv file with the following format:
csv.file
itemid keywords
10 book, briefcase, boat, day, working day, ...
20 windows, picture, ...
35 love, ...
where we the list of itemids is stored in Iids = [10,20,35,...]
I would like to avoide using loops for this as you can imagine the matrix is big-sized. Any idea is appreciated.
I wasn't able to think of a solution without loops. But you can optimize your loop by:
using logical indexing
running such loop only M times (if M is the number of unique itemid elements) instead of N times (if N is the number of elements in your table).
The solution I come up with is this.
First of all, create your table
A=table([3;3;3;12;12;12;4;12], [10;10;10;20;20;35;10;10],{'book','briefcase','boat','windows','picture','love','day','working day'}','VariableNames',{'userid','itemid','keywords'});
which looks like
Select the unique values for column itemid (your Iids):
Iids=unique(A.itemid);
which looks like
Create a new, empty, table which will contain the results:
NewTable=table();
And now the minimal loop I've come up with:
for id=Iids'
% select rows with given itemid value
RowsWithGivenId=A(A.itemid==id,:);
% create new row in NewTable with the id and the (joined together) keywords from the selected rows
NewTable=[NewTable; table(id,{strjoin(RowsWithGivenId.keywords,', ')})];
end
Also, append the new column names in NewTable
NewTable.Properties.VariableNames = {'itemid','keywords'};
And now NewTable looks like:
Please note: due to the fact that the keywords in the new table are separated by comma, a csv file is not the format I recommend. By using writetable() as writetable(NewTable,'myfile.csv');
what you'll get is
As instead, by replacing ; instead of a separating comma (in strjoin()), you'll get a nicer format:

Substring SQL Select statement

I have a number of references with a length of 20 and I need to remove the 1st 12 numbers, replace with a G and select the next 7 numbers
An example of the format of the numbers being received
50125426598525412584
I then need to remove first 12 digits and select the next 7 (not including the last)
2541258
Lastly I need to put a G in front of the number so I'm left with
G25412584
My SQL is as follows:
SELECT SUBSTRING(ref, 12, 7) AS ref
FROM mytable
WHERE ref LIKE '5012%'
The results of this will leave me with
25412584
But how do I insert the G in front of the number in the same SQL statement?
Many thanks
SELECT 'G'+SUBSTRING(ref, 12, 7) AS ref FROM mytable where ref like '5012%'
SELECT CONCAT( 'G', SUBSTRING('50125426598525412584', 13,7)) from dual;

getting categoryid fo more than one shortname passed

I have the following tables:
business
id catid subcatid
---------------------
10 {1} {10,20}
20 {2} {30,40}
30 {3} {50,60,70}
cat_subcat
catid shortname parent_id bid
--------------------------------------------
1 A 10
2 B 20
3 c 30
10 x 1 10
20 y 1 10
30 z 2 20
40 w 2 20
Both the tables have a relationship using id. The problem I am getting is outlined below. Here is my query currently:
SELECT ARRAY[category_id]::int[] from cat_subcat
where parentcategoryid IS not NULL and shortname ilike ('x,y');
I want to get the category_id for an entered shortname, but my query is not giving the proper output. If I pass one shortname it will retrieve the category_id, but if I pass more than one shortname it will not display category_id. Please tell me how to get the category_id for more than one shortname passed.
To actually use pattern matching with ILIKE, you cannot use a simple IN expression. Instead, you need ILIKE ANY (...) or ALL (...), depending on whether you want the tests ORed or ANDed:
Also, your ARRAY constructor will be applied to individual rows, which seems rather pointless. I assume you want this instead (educated guess):
SELECT array_agg(catid) AS cats
FROM cat_subcat
WHERE parent_id IS NOT NULL
AND shortname ILIKE ANY ('{x,y}');
Well, as long as you don't use wildcards (%, _) for your pattern, you can translate this to:
AND lower(shortname) IN ('x','y');
But that would be rather pointless, since Postgres internally converts this to:
AND lower(shortname) = ANY ('{x,y}');
.. before evaluating.

Adding leading zero if length is not equal to 10 digit using sql

I am trying to join 2 tables but my problem is that one of the table has 10 digit number and the other one may have 10 or less digit number. For this reason, i am loosing some data so i would like to do is check the length first if the length is less than 10 digit then i want to add leading zeros so i can make it 10 digit number. I want to do this when i am joining this so i am not sure if this is possible. Here is an example if i i have 251458 in the TABLE_WITHOUT_LEADING_ZERO then i want to change it like this: 0000251458. Here is what i have so far:
select ACCT_NUM, H.CODE
FROM TABLE_WITH_LEEDING_ZERO D, TABLE_WITHOUT_LEADING_ZERO H
WHERE substring(D.ACCT_NUM from position('.' in D.ACCT_NUM) + 2) = cast (H.CODE as varchar (10))
thanks
Another alternative:
SELECT TO_CHAR(12345,'fm0000000000');
to_char
------------
0000012345
In Netezza you can use LPAD:
select lpad(s.sample,10,0) as result
from (select 12345 as sample) s
result
-------
0000012345
However it would be more efficient to remove the zeros like in the example below:
select cast(trim(Leading '0' from s.sample) as integer) as result
from (select '0000012345' as sample) s
result
-------
12345

Perl + PostgreSQL-- Selective Column to Row Transpose

I'm trying to find a way to use Perl to further process a PostgreSQL output. If there's a better way to do this via PostgreSQL, please let me know. I basically need to choose certain columns (Realtime, Value) in a file to concatenate certains columns to create a row while keeping ID and CAT.
First time posting, so please let me know if I missed anything.
Input:
ID CAT Realtime Value
A 1 time1 55
A 1 time2 57
B 1 time3 75
C 2 time4 60
C 3 time5 66
C 3 time6 67
Output:
ID CAT Time Values
A 1 time 1,time2 55,57
B 1 time3 75
C 2 time4 60
C 3 time5,time6 66,67
You could do this most simply in Postgres like so (using array columns)
CREATE TEMP TABLE output AS SELECT
id, cat, ARRAY_AGG(realtime) as time, ARRAY_AGG(value) as values
FROM input GROUP BY id, cat;
Then select whatever you want out of the output table.
SELECT id
, cat
, string_agg(realtime, ',') AS realtimes
, string_agg(value, ',') AS values
FROM input
GROUP BY 1, 2
ORDER BY 1, 2;
string_agg() requires PostgreSQL 9.0 or later and concatenates all values to a delimiter-separated string - while array_agg() (v8.4+) creates am array out of the input values.
About 1, 2 - I quote the manual on the SELECT command:
GROUP BY clause
expression can be an input column name, or the name or ordinal number
of an output column (SELECT list item), or ...
ORDER BY clause
Each expression can be the name or ordinal number of an output column
(SELECT list item), or
Emphasis mine. So that's just notational convenience. Especially handy with complex expressions in the SELECT list.