Extracting values from non-standard markup strings in PostgreSQL - postgresql

Unfortunately, I have a table like the following:
DROP TABLE IF EXISTS my_list;
CREATE TABLE my_list (index int PRIMARY KEY, mystring text, status text);
INSERT INTO my_list
(index, mystring, status) VALUES
(12, '', 'D'),
(14, '[id] 5', 'A'),
(15, '[id] 12[num] 03952145815', 'C'),
(16, '[id] 314[num] 03952145815[name] Sweet', 'E'),
(19, '[id] 01211[num] 03952145815[name] Home[oth] Alabama', 'B');
Is there any trick to get out number of [id] as integer from the mystring text shown above? As though I ran the following query:
SELECT index, extract_id_function(mystring), status FROM my_list;
and got results like:
12 0 D
14 5 A
15 12 C
16 314 E
19 1211 B
Preferably with only simple string functions and if not regular expression will be fine.

If I understand correctly, you have a rather unconventional markup format where [id] is followed by a space, then a series of digits that represents a numeric identifier. There is no closing tag, the next non-numeric field ends the ID.
If so, you're going to be able to do this with non-regexp string ops, but only quite badly. What you'd really need is the SQL equivalent of strtol, which consumes input up to the first non-digit and just returns that. A cast to integer will not do that, it'll report an error if it sees non-numeric garbage after the number. (As it happens I just wrote a C extension that exposes strtol for decoding hex values, but I'm guessing you don't want to use C extensions if you don't even want regex...)
It can be done with string ops if you make the simplifying assumption that an [id] nnnn tag always ends with either end of string or another tag, so it's always [ at the end of the number. We also assume that you're only interested in the first [id] if multiple appear in a string. That way you can write something like the following horrible monstrosity:
select
"index",
case
when next_tag_idx > 0 then substring(cut_id from 0 for next_tag_idx)
else cut_id
end AS "my_id",
"status"
from (
select
position('[' in cut_id) AS next_tag_idx,
*
from (
select
case
when id_offset = 0 then null
else substring(mystring from id_offset + 4)
end AS cut_id,
*
from (
select
position('[id] ' in mystring) AS id_offset,
*
from my_list
) x
) y
) z;
(If anybody ever actually uses that query for anything, kittens will fall from the sky and splat upon the pavement, wailing in horror all the way down).
Or you can be sensible and just use a regular expression for this kind of string processing, in which case your query (assuming you only want the first [id]) is:
regress=> SELECT
"index",
coalesce((SELECT (regexp_matches(mystring, '\[id\]\s?(\d+)'))[1])::integer, 0) AS my_id,
status
FROM my_list;
index | my_id | status
-------+----------------+--------
12 | 0 | D
14 | 5 | A
15 | 12 | C
16 | 314 | E
19 | 01211 | B
(5 rows)
Update: If you're having issues with unicode handling in regex, upgrade to Pg 9.2. See https://stackoverflow.com/a/14293924/398670

Related

Postgresql: Concatenating regexp_split_to_table into a single string

In Postgresql, I need to take an ASCII string like CNR39XL and turn it into a number like 21317392311 (ASCII values - 65, but digits left as is). This is to take some logic in our software and push it down into the databse level. I’m partway there with this:
ourdb=> SELECT CASE
ourdb-> WHEN ASCII(c) < 65 THEN c::integer
ourdb-> ELSE ASCII(c)-65
ourdb-> END
ourdb-> FROM regexp_split_to_table('CNR39XL','') s(c);
case
------
2
13
17
3
9
23
11
(7 rows)
However, I can't figure out how to take that final set of numbers and convert it into the string. What am I looking for?
Use string_agg
SELECT CAST(
string_agg(
CAST (CASE ... END AS text),
''
)
AS numeric)
FROM ...

DB2: Converting varchar to money

I have 2 varchar(64) values that are decimals in this case (say COLUMN1 and COLUMN2, both varchars, both decimal numbers(money)). I need to create a where clause where I say this:
COLUMN1 < COLUMN2
I believe I have to convert these 2 varchar columns to a different data types to compare them like that, but I'm not sure how to go about that. I tried a straight forward CAST:
CAST(COLUMN1 AS DECIMAL(9,2)) < CAST(COLUMN2 AS DECIMAL(9,2))
But I had to know that would be too easy. Any help is appreciated. Thanks!
You can create a UDF like this to check which values can't be cast to DECIMAL
CREATE OR REPLACE FUNCTION IS_DECIMAL(i VARCHAR(64)) RETURNS INTEGER
CONTAINS SQL
--ALLOW PARALLEL -- can use this on Db2 11.5 or above
NO EXTERNAL ACTION
DETERMINISTIC
BEGIN
DECLARE NOT_VALID CONDITION FOR SQLSTATE '22018';
DECLARE EXIT HANDLER FOR NOT_VALID RETURN 0;
RETURN CASE WHEN CAST(i AS DECIMAL(31,8)) IS NOT NULL THEN 1 END;
END
For example
CREATE TABLE S ( C VARCHAR(32) );
INSERT INTO S VALUES ( ' 123.45 '),('-00.12'),('£546'),('12,456.88');
SELECT C FROM S WHERE IS_DECIMAL(c) = 0;
would return
C
---------
£546
12,456.88
It really is that easy...this works fine...
select cast('10.15' as decimal(9,2)) - 1
from sysibm.sysdummy1;
You've got something besides a valid numerical character in your data..
And it's something besides leading or trailing whitespace...
Try the following...
select *
from table
where translate(column1, ' ','0123456789.')
<> ' '
or translate(column2, ' ','0123456789.')
<> ' '
That will show you the rows with alpha characters...
If the above does't return anything, then you've probably got a string with double decimal points or something...
You could use a regex to find those.
There is a built-in ability to do this without UDFs.
The xmlcast function below does "safe" casting between (var)char and decfloat (you may use as double or as decimal(X, Y) instead, if you want). It returns NULL if it's impossible to cast.
You may use such an expression twice in the WHERE clause.
SELECT
S
, xmlcast(xmlquery('if ($v castable as xs:decimal) then xs:decimal($v) else ()' passing S as "v") as decfloat) D
FROM (VALUES ( ' 123.45 '),('-00.12'),('£546'),('12,456.88')) T (S);
|S |D |
|---------|------------------------------------------|
| 123.45 |123.45 |
|-00.12 |-0.12 |
|£546 | |
|12,456.88| |

Inserting values into multiple columns by splitting a string in PostgreSQL

I have the following heap of text:
"BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,
URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,
16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,
1.0,Version,1.0,".
What I'd like to do is extract data from this in the following manner:
BundleSize:155648
DynamicSize:204800
Identifier:com.URLConnectionSample
Name:URLConnectionSample
ShortVersion:1.0
Version:1.0
BundleSize:155648
DynamicSize:16384
Identifier:com.IdentifierForVendor3
Name:IdentifierForVendor3
ShortVersion:1.0
Version:1.0
All tips and suggestions are welcome.
It isn't quite clear what do you need to do with this data. If you really need to process it entirely in the database (looks like the task for your favorite scripting language instead), one option is to use hstore.
Converting records one by one is easy:
Assuming
%s =
BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0
SELECT * FROM each(hstore(string_to_array(%s, ',')));
Output:
key | value
--------------+-------------------------
Name | URLConnectionSample
Version | 1.0
BundleSize | 155648
Identifier | com.URLConnectionSample
DynamicSize | 204800
ShortVersion | 1.0
If you have table with columns exactly matching field names (note the quotes, populate_record is case-sensitive to key names):
CREATE TABLE data (
"BundleSize" integer, "DynamicSize" integer, "Identifier" text,
"Name" text, "ShortVersion" text, "Version" text);
You can insert hstore records into it like this:
INSERT INTO data SELECT * FROM
populate_record(NULL::data, hstore(string_to_array(%s, ',')));
Things get more complicated if you have comma-separated values for more than one record.
%s = BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,1.0,Version,1.0,
You need to break up an array into chunks of number_of_fields * 2 = 12 elements first.
SELECT hstore(row) FROM (
SELECT array_agg(str) AS row FROM (
SELECT str, row_number() OVER () AS i FROM
unnest(string_to_array(%s, ',')) AS str
) AS str_sub
GROUP BY (i - 1) / 12) AS row_sub
WHERE array_length(row, 1) = 12;
Output:
"Name"=>"URLConnectionSample", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.URLConnectionSample", "DynamicSize"=>"204800", "ShortVersion"=>"1.0"
"Name"=>"IdentifierForVendor3", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.IdentifierForVendor3", "DynamicSize"=>"16384", "ShortVersion"=>"1.0"
And inserting this into the aforementioned table:
INSERT INTO data SELECT (populate_record(NULL::data, hstore(row))).* FROM ...
the rest of the query is the same.

Split a string and populate a table for all records in table in SQL Server 2008 R2

I have a table EmployeeMoves:
| EmployeeID | CityIDs
+------------------------------
| 24 | 23,21,22
| 25 | 25,12,14
| 29 | 1,2,5
| 31 | 7
| 55 | 11,34
| 60 | 7,9,21,23,30
I'm trying to figure out how to expand the comma-delimited values from the EmployeeMoves.CityIDs column to populate an EmployeeCities table, which should look like this:
| EmployeeID | CityID
+------------------------------
| 24 | 23
| 24 | 21
| 24 | 22
| 25 | 25
| 25 | 12
| 25 | 14
| ... and so on
I already have a function called SplitADelimitedList that splits a comma-delimited list of integers into a rowset. It takes the delimited list as a parameter. The SQL below will give me a table with split values under the column Value:
select value from dbo.SplitADelimitedList ('23,21,1,4');
| Value
+-----------
| 23
| 21
| 1
| 4
The question is: How do I populate EmployeeCities from EmployeeMoves with a single (even if complex) SQL statement using the comma-delimited list of CityIDs from each row in the EmployeeMoves table, but without any cursors or looping in T-SQL? I could have 100 records in the EmployeeMoves table for 100 different employees.
This is how I tried to solve this problem. It seems to work and is very quick in performance.
INSERT INTO EmployeeCities
SELECT
em.EmployeeID,
c.Value
FROM EmployeeMoves em
CROSS APPLY dbo.SplitADelimitedList(em.CityIDs) c;
UPDATE 1:
This update provides the definition of the user-defined function dbo.SplitADelimitedList. This function is used in above query to split a comma-delimited list to table of integer values.
CREATE FUNCTION dbo.fn_SplitADelimitedList1
(
#String NVARCHAR(MAX)
)
RETURNS #SplittedValues TABLE(
Value INT
)
AS
BEGIN
DECLARE #SplitLength INT
DECLARE #Delimiter VARCHAR(10)
SET #Delimiter = ',' --set this to the delimiter you are using
WHILE len(#String) > 0
BEGIN
SELECT #SplitLength = (CASE charindex(#Delimiter, #String)
WHEN 0 THEN
datalength(#String) / 2
ELSE
charindex(#Delimiter, #String) - 1
END)
INSERT INTO #SplittedValues
SELECT cast(substring(#String, 1, #SplitLength) AS INTEGER)
WHERE
ltrim(rtrim(isnull(substring(#String, 1, #SplitLength), ''))) <> '';
SELECT #String = (CASE ((datalength(#String) / 2) - #SplitLength)
WHEN 0 THEN
''
ELSE
right(#String, (datalength(#String) / 2) - #SplitLength - 1)
END)
END
RETURN
END
Preface
This is not the right way to do it. You shouldn't create comma-delimited lists in SQL Server. This violates first normal form, which should sound like an unbelievably vile expletive to you.
It is trivial for a client-side application to select rows of employees and related cities and display this as a comma-separated list. It shouldn't be done in the database. Please do everything you can to avoid this kind of construction in the future. If at all possible, you should refactor your database.
The Right Answer
To get the list of cities, properly expanded, from a table containing lists of cities, you can do this:
INSERT dbo.EmployeeCities
SELECT
M.EmployeeID,
C.CityID
FROM
EmployeeMoves M
CROSS APPLY dbo.SplitADelimitedList(M.CityIDs) C
;
The Wrong Answer
I wrote this answer due to a misunderstanding of what you wanted: I thought you were trying to query against properly-stored data to produce a list of comma-separated CityIDs. But I realize now you wanted the reverse: to query the list of cities using existing comma-separated values already stored in a column.
WITH EmployeeData AS (
SELECT
M.EmployeeID,
M.CityID
FROM
dbo.SplitADelimitedList ('23,21,1,4') C
INNER JOIN dbo.EmployeeMoves M
ON Convert(int, C.Value) = M.CityID
)
SELECT
E.EmployeeID,
CityIDs = Substring((
SELECT ',' + Convert(varchar(max), CityID)
FROM EmployeeData C
WHERE E.EmployeeID = C.EmployeeID
FOR XML PATH (''), TYPE
).value('.[1]', 'varchar(max)'), 2, 2147483647)
FROM
(SELECT DISTINCT EmployeeID FROM EmployeeData) E
;
Part of my difficulty in understanding is that your question is a bit disorganized. Next time, please clearly label your example data and show what you have, and what you're trying to work toward. Since you put the data for EmployeeCities last, it looked like it was what you were trying to achieve. It's not a good use of people's time when questions are not laid out well.

How do you exclude a column from showing up if there is no value?

Question about a query I'm trying to write in SQL Server Management Studio 2008. I am pulling 2 rows. The first row being the header information, the second row being the information for a certain Line Item. Keep in mind, the actual header information reads as "Column 0, 1, 2, 3, 4,.... etc."
The data looks something like this:
ROW 1: Model # | Item Description| XS | S | M | L | XL|
ROW 2: 3241 | Gray Sweatshirt| | 20 | 20 | 30 | |
Basically this shows that there are 20 smalls, 20 mediums, and 30 larges of this particular item. There are no XS's or XL's.
I want to create a subquery that puts this information in one row, but at the same time, disinclude the sizes with a blank quantity amount as shown under the XS and XL sizes.
I want it to look like this when all is said and done:
ROW 1: MODEL #| 3241 | ITEM DESCRIPTION | Gray Sweatshirt | S | 10 | M | 20 | L | 30 |
Notice there are no XS or XL's included. How do I do make it so those columns do not appear?
Since you are not posting your query, nor your table structure, I guess it is with columns Id, Description, Size. If so, you could do this and just replace with your table and column names:
DECLARE #columns varchar(8000)
SELECT #columns = COALESCE (#columns + ',[' + cast(Size as varchar) + ']', '[' + cast(Size as varchar) + ']' )
FROM YourTableName
WHERE COUNT(Size) > 0
DECLARE #query varchar(8000) = 'SELECT Id, Description, '
+ #columns +'
FROM
(SELECT Id, Description, Size
FROM YourTableName) AS Source
PIVOT
(
COUNT(Size)
FOR Size IN ('+ #columns +')
) AS Pvt'
EXEC(#query)
Anyhow, I also agree with #MichaelFredickson. I have implemented this pivot solution, yet it is absolutely better to let the presentation layer to take care of this after just pulling the raw data from SQL. If not, you would be processing the data twice, one on SQL to create the table and the other in the presentation when reading and displaying the values with your c#/vb/other code.