unicode() for all characters of a query - unicode

The unicode() function in SQLite just accepts one parameter; if I execute this query:
select unicode(text) from table
and suppose the table has just 1 row which is abc; then the result would be: 97 which is the numeric unicode code for just a character; what if I want to get the numeric unicode code for all characters in the result of the query? Is such a thing possible within SQLite?
(I know that I can get the numerical codes out of SQLite within the environment of a programming language; I'm just curious if such a thing is possible with a SQL command within SQLite or not.)

SQLite does not really have looping constructs; doing this in the programming language would be much easier.
In SQLite 3.8.3 or later, you could use a recursive common table expression:
WITH RECURSIVE
all_chars(id, u, rest) AS (
SELECT id, unicode(text), substr(text, 2)
FROM MyTable
UNION ALL
SELECT id, unicode(rest), substr(rest, 2)
FROM all_chars
WHERE rest <> ''),
unicodes(id, list) AS (
SELECT id, group_concat(u)
FROM all_chars
GROUP BY id)
SELECT * FROM unicodes
The ID is required to merge the values of a single source row back together; doing this for a single, specific row in MyTable would be easier.

Related

Alphanumeric sorting without any pattern on the strings [duplicate]

I've got a Postgres ORDER BY issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again
When em_code reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
Any idea how to select EM1000?
Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en#colNumeric=yes');
-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
Value
0
0001
001
1
06
6
13
۱۳
14
One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
I came up with something slightly different.
The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
https://github.com/ccsalway/dbNaturalSort
The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

What PostgreSQL type is good for stroring array of strings and offering fast lookup afterwards

I am using PostgreSQL 11.9
I have a table containing a jsonb column with arbitrary number of key-values. There is a requirement when we perform a search to include all values from this column as well. Searching in jsonb is quite slow so my plan is to create a trigger which will extract all the values from the jsonb column:
select t.* from app.t1, jsonb_each(column_jsonb) as t(k,v)
with something like this. And then insert the values in a newly created column in the same table so I can use this column for faster searches.
My question is what type would be most suitable for storing the keys and then searchin within them. Currently the search looks like this:
CASE
WHEN something IS NOT NULL
THEN EXISTS(SELECT value FROM jsonb_each(column_jsonb) WHERE value::text ILIKE search_term)
END
where the search_term is what the user entered from the front end.
This is not going to be pretty, and normalizing the data model would be better.
You can define a function
CREATE FUNCTION jsonb_values_to_string(
j jsonb,
separator text DEFAULT ','
) RETURNS text LANGUAGE sql IMMUTABLE STRICT
AS 'SELECT string_agg(value->>0, $2) FROM jsonb_each($1)';
Then you can query like
WHERE jsonb_values_to_string(column_jsonb, '|') ILIKE 'search_term'
and you can define a trigram index on the left hand side expression to speed it up.
Make sure that you choose a separator that does not occur in the data or the pattern...

PostgreSQL integer thousands separator

Sometimes I need to generate large amount of synthetic test data, like several millions of rows.
Normally, when I expect there to be, say, ten millions rows in a table, I write queries like the one below to generate this synthetic data
SELECT generate_series(1, regexp_replace('10 000 000', ' ', '', 'g')::INTEGER) AS id;
But I am bothered by this useless regex function call and explicit integer conversion.
Is there any default thousands separator in postgres dialect? Like underscores in Java and C# (10_000_000) so I could write the query above this way:
SELECT generate_series(1, 10_000_000) AS id;
Or maybe there is another solution to making the code more understandable than this separator?
Could you use scentific notation, like this
select generate_series(1, 1e10)
It would meet your ergonomic needs as long as you don't need to specify the exact end of the series

Process a row with unknown structure in a cursor

I am new to using cursors for looping through a set of rows. But so far I had prior knowledge of which columns I am about to read.
E.g.
DECLARE db_cursor FOR
SELECT Column1, Column2
FROM MyTable
DECLARE #ColumnOne VARCHAR(50), #ColumnTwo VARCHAR(50)
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #ColumnOne, #ColumnTwo
...
But the tables I am about to read into my key/value table have no specific structure and I should be able to process them one row at a time. How, using a nested cursor, can I loop through all the columns of the fetched row and process them according to their type and name?
TSQL cursors are not really designed to read data from tables of unknown structure. The two possibilities I can think of to achieve something in that direction are:
First read the column names of an unknown table from the Information Schema Views (see System Information Schema Views (Transact-SQL)). Then use dynamic SQL to create the cursor.
If you simply want to get any columns as a large string value, you might also try a simple SELECT * FROM TABLE_NAME FOR XML AUTO and further process the retrieved data for your purposes (see FOR XML (SQL Server)).
SQL is not very good in dealing with sets generically. In most cases you must know column names, data types and much more in advance. But there is XQuery. You can transform any SELECT into XML rather easily and use the mighty abilities to deal with generic structures there. I would not recommend this, but it might be worth a try:
CREATE PROCEDURE dbo.Get_EAV_FROM_SELECT
(
#SELECT NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #tmptbl TABLE(TheContent XML);
DECLARE #cmd NVARCHAR(MAX)= N'SELECT (' + #SELECT + N' FOR XML RAW, ELEMENTS XSINIL);';
INSERT INTO #tmptbl EXEC(#cmd);
SELECT r.value('*[1]/text()[1]','nvarchar(max)') AS RowID
,c.value('local-name(.)','nvarchar(max)') AS ColumnKey
,c.value('text()[1]','nvarchar(max)') AS ColumnValue
FROM #tmptbl t
CROSS APPLY t.TheContent.nodes('/row') A(r)
CROSS APPLY A.r.nodes('*[position()>1]') B(c)
END;
GO
EXEC Get_EAV_FROM_SELECT #SELECT='SELECT TOP 10 o.object_id,o.* FROM sys.objects o';
GO
--Clean-Up for test purpose
DROP PROCEDURE Get_EAV_FROM_SELECT;
The idea in short
The select is passed into the procedure as string. With the SP we create a statement dynamically and create XML from it.
The very first column is considered to be the Row's ID, if not (like in sys.objects) we can write the SELECT and force it that way.
The inner SELECT will read each row and return a classical EAV-list.

Why is SELECT without columns valid

I accidently wrote a query like select from my_table; and surprisingly it is valid statement. Even more interesting to me is that even SELECT; is a valid query in PostgreSQL. You can try to write a lot funny queries with this:
select union all select;
with t as (select) select;
select from (select) a, (select) b;
select where exists (select);
create table a (b int); with t as (select) insert into a (select from t);
Is this a consequence of some definition SQL standard, or there is some use case for it, or it is just funny behavior that no one cared to programatically restrict?
Right from the manual:
The list of output expressions after SELECT can be empty, producing a zero-column result table. This is not valid syntax according to the SQL standard. PostgreSQL allows it to be consistent with allowing zero-column tables. However, an empty list is not allowed when DISTINCT is used.
The possibility of "zero-column" tables is a side effect of the table inheritance if I'm not mistaken. There were discussions over this on the Postgres mailing lists (but I can't find them right now)