I have been having fun with an issue where I need to break apart a string in SQL Server 2012 and test for values it may or may not contain. The values, when present, will be separated by up to two different ; symbols.
When there is nothing, it will be blank.
When there is a single value, it will show up without the delimiter.
When there are two or more, up to 3, they will be separated by the delimiter.
As I said, if there is nothing in the record, it will be blank. Below are some example of how the data may come across:
' ',
'1',
'24',
'15;1;24',
'10;1;22',
'5;1;7',
'12;1',
'10;12',
'1;5',
'1;1;1',
'15;20;22'
I have searched the forums and found many clues, but I have not been able to come up with a total solution given all potential data values. Essentially, I would like to break it into 3 separate values.
text before the first delimiter or in the absence of the delimiter, just the text.
Text after the first delimiter and before the second in situation where there are two delimiters.
The following has worked consistently:
substring(SUBSTRING(Food_Desc, charindex(';', Food_Desc) + 1, 4), 0,
charindex(';', SUBSTRING(Food_Desc, charindex(';', Food_Desc) + 1, 4))) as [Middle]
Text after the second delimiter in the even there are two delimiters and there is a third value
The main challenge is the fact that the delimiter, when present, moves depending on the value in the table. values 1-9 make it show up as the second character in the string, values 10-24 make it show up as the 3rd, etc.
Any help would be greatly appreciated.
This is simple if you have a well written t-sql splitter function. For this solution I'm using Jeff Moden's delimitedsplit8k.
sample data and solution
DECLARE #table table (someid int identity, sometext varchar(100));
INSERT #table VALUES (' '),('1'),('24'),('15;1;24'),('10;1;22'),
('5;1;7'),('12;1'),('10;12'),('1;5'),('1;1;1'),('15;20;22');
SELECT
someid,
sometext,
ItemNumber,
Item
FROM #table
CROSS APPLY dbo.DelimitedSplit8K_LEAD(sometext, ';');
results
someid sometext ItemNumber Item
----------- ----------------- ----------- --------
1 1
2 1 1 1
3 24 1 24
4 15;1;24 1 15
4 15;1;24 2 1
4 15;1;24 3 24
5 10;1;22 1 10
5 10;1;22 2 1
5 10;1;22 3 22
6 5;1;7 1 5
6 5;1;7 2 1
6 5;1;7 3 7
7 12;1 1 12
7 12;1 2 1
8 10;12 1 10
8 10;12 2 12
9 1;5 1 1
9 1;5 2 5
10 1;1;1 1 1
10 1;1;1 2 1
10 1;1;1 3 1
11 15;20;22 1 15
11 15;20;22 2 20
11 15;20;22 3 22
Below is a modified version of a similar question How do I split a string so I can access item x?. Changing the text value for #sample to each of your possibilities listed seemed to work for me.
DECLARE #sample VARCHAR(200) = '15;20;22';
DECLARE #individual VARCHAR(20) = NULL;
WHILE LEN(#sample) > 0
BEGIN
IF PATINDEX('%;%', #sample) > 0
BEGIN
SET #individual = SUBSTRING(#sample, 0, PATINDEX('%;%', #sample));
SELECT #individual;
SET #sample = SUBSTRING(#sample, LEN(#individual + ';') + 1, LEN(#sample));
END;
ELSE
BEGIN
SET #individual = #sample;
SET #sample = NULL;
SELECT #individual;
END;
END;
Missing newline: Unexpected character 0x22 found at location 0
Table Create statement is
CREATE TABLE venue (
City varchar(45) ,
Country varchar(2) ,
Description varchar(82) ,
lat_lon varchar(30) ,
Region varchar(30) ,
State varchar(15) ,
Venue_Config_ID int ,
zip varchar(8) ,
CT_ID int ,
CN_ID int ,
DS_ID int ,
RG_ID int ,
ZP_ID int ,
ST_ID int
)
and sample line from CSV file is
" D e n v e r"," U S"," E l l i e C a u l k i n s O p e r a H o u s e"," 3 9 . 7 4 3 6 4 7 9 , - 1 0 4 . 9 9 8 1 4"," D e n v e r"," C O","1230057"," 8 0 2 0 4","11","1","8771","11","2673","11"
Any help would be appreciated
0x22 is a quotation mark ("). This could be related to the fact that you are loading int fields with text that is quoted.
Try using the REMOVEQUOTES option on your COPY command to remove the quotes.
I had the same problem, what I found was that my csv files were unicode, and Redshift COPY expects UTF-8:
By default, the COPY command expects the source data to be in character-delimited UTF-8 text files. The default delimiter is a pipe character ( | ). If the source data is in another format, use the following parameters to specify the data format.
I downloaded database with translations of countries and cities to 70 languages (some of translations are ''), but translations and technical information (populations, flags, territory, phones, etc) about cities\countries saved in the same table.
I mean every translation has its own columns (tranlation itself + description on the language of translation) next to the other info which is not related to translation. Totally about 190 colums, including 70*2 (translation + description).
I don't think this is proper way and I want to move all translations to seperated table keeping FK to main\technical-info table.
So, now I have a table "cities" with the structure like below:
id region_id countries_id phone population lang_1 description_1 lang_2 description_2 lang_3 description_3 .... lang_70 description_70
1 1 1 +7 123 Москва SomeDesc Moscow SomeDesc2 Moskwa SomeText3 Translation70 SomeDesc70
2 1 1 +7 123 Кубинка SomeDesc Kubinka SomeDesc2 Kubinka '' Translation70 SomeDesc70
with 2.5M rows\cities.
I want to move all "lang_(1-70)" and their descriptions to new table "cities_translated" which should look like that:
id cities_id name description lang
1 1 Москва SomeDesc lang_1
2 1 Moscow SomeDesc2 lang_2
3 1 Moskwa SomeText3 lang_3
...
70 1 Translation70 SomeDesc70 lang_70
71 2 Кубинка SomeDesc lang_1
72 2 Kubinka SomeDesc2 lang_2
73 2 Kubinka SomeDesc3 lang_3
...
140 2 Translation70 SomeDesc70 lang_70
Could anyone help me please with proper query to do this transfer?
P.S. I have already a table "languages" and as the next step I will replace all values like 'lang_1', 'lang_2' and so on to proper FK.
Hoped to get raw sql solution in order to improve my sql knowledge, but due to no anwers, i decided to use Python.
initial_table = 'countries.city'
init_table_columns = ['lang_1', 'description_1', 'lang_2', 'description_2', 'lang_3', 'description_3', 'lang_4', 'description_4', 'lang_5', 'description_5', 'lang_6', 'description_6', 'lang_7', 'description_7', 'lang_8', 'description_8', 'lang_9', 'description_9', 'lang_10', 'description_10', 'lang_11', 'description_11', 'lang_12', 'description_12', 'lang_13', 'description_13', 'lang_14', 'description_14', 'lang_15', 'description_15', 'lang_16', 'description_16', 'lang_17', 'description_17', 'lang_18', 'description_18', 'lang_19', 'description_19', 'lang_20', 'description_20', 'lang_21', 'description_21', 'lang_22', 'description_22', 'lang_23', 'description_23', 'lang_24', 'description_24', 'lang_25', 'description_25', 'lang_26', 'description_26', 'lang_27', 'description_27', 'lang_28', 'description_28', 'lang_29', 'description_29', 'lang_30', 'description_30', 'lang_31', 'description_31', 'lang_32', 'description_32', 'lang_33', 'description_33', 'lang_34', 'description_34', 'lang_35', 'description_35', 'lang_36', 'description_36', 'lang_37', 'description_37', 'lang_38', 'description_38', 'lang_39', 'description_39', 'lang_40', 'description_40', 'lang_41', 'description_41', 'lang_42', 'description_42', 'lang_43', 'description_43', 'lang_44', 'description_44', 'lang_45', 'description_45', 'lang_46', 'description_46', 'lang_47', 'description_47', 'lang_48', 'description_48', 'lang_49', 'description_49', 'lang_50', 'description_50', 'lang_51', 'description_51', 'lang_52', 'description_52', 'lang_53', 'description_53', 'lang_54', 'description_54', 'lang_55', 'description_55', 'lang_56', 'description_56', 'lang_57', 'description_57', 'lang_58', 'description_58', 'lang_59', 'description_59', 'lang_60', 'description_60', 'lang_61', 'description_61', 'lang_62', 'description_62', 'lang_63', 'description_63', 'lang_64', 'description_64', 'lang_65', 'description_65', 'lang_66', 'description_66', 'lang_67', 'description_67', 'lang_68', 'description_68', 'lang_69', 'description_69', 'lang_70', 'description_70']
table_translation = 'countries.city_translated'
import psycopg2
import re
conn = psycopg2.connect(database="countries", host='localhost', user="postgres", password="Password")
cur = conn.cursor()
new_cursor = conn.cursor()
cur.execute("""SELECT id FROM %s """ % initial_table)
rows = cur.fetchall()
print("%i rows retrieved" % cur.rowcount)
new_cursor.execute("""BEGIN""")
for row in rows:
print('row:', row)
get_id = row[0]
cur.execute("""SELECT %s FROM %s WHERE id=%s """ % (",".join(init_table_columns), initial_table, get_id))
row_w_info = cur.fetchall()
for i in range(140):
if i%2==0:
name = row_w_info[0][i]
description = row_w_info[0][i+1]
lang_text = init_table_columns[i]
lang_id = int(re.findall(r'\d+', lang_text)[0])
# There are 70 translations, but there is no info what languages are 68, 69, 70
if lang_id >= 68:
lang_id = None
new_cursor.execute("INSERT INTO countries.city_translated (city_id, name, description, lang_id) VALUES (%s, %s, %s, %s)", (get_id, name, description, lang_id))
new_cursor.execute("""COMMIT""")
cur.close()
new_cursor.close()
conn.close()
Unfortunately, I have a table like the following:
DROP TABLE IF EXISTS my_list;
CREATE TABLE my_list (index int PRIMARY KEY, mystring text, status text);
INSERT INTO my_list
(index, mystring, status) VALUES
(12, '', 'D'),
(14, '[id] 5', 'A'),
(15, '[id] 12[num] 03952145815', 'C'),
(16, '[id] 314[num] 03952145815[name] Sweet', 'E'),
(19, '[id] 01211[num] 03952145815[name] Home[oth] Alabama', 'B');
Is there any trick to get out number of [id] as integer from the mystring text shown above? As though I ran the following query:
SELECT index, extract_id_function(mystring), status FROM my_list;
and got results like:
12 0 D
14 5 A
15 12 C
16 314 E
19 1211 B
Preferably with only simple string functions and if not regular expression will be fine.
If I understand correctly, you have a rather unconventional markup format where [id] is followed by a space, then a series of digits that represents a numeric identifier. There is no closing tag, the next non-numeric field ends the ID.
If so, you're going to be able to do this with non-regexp string ops, but only quite badly. What you'd really need is the SQL equivalent of strtol, which consumes input up to the first non-digit and just returns that. A cast to integer will not do that, it'll report an error if it sees non-numeric garbage after the number. (As it happens I just wrote a C extension that exposes strtol for decoding hex values, but I'm guessing you don't want to use C extensions if you don't even want regex...)
It can be done with string ops if you make the simplifying assumption that an [id] nnnn tag always ends with either end of string or another tag, so it's always [ at the end of the number. We also assume that you're only interested in the first [id] if multiple appear in a string. That way you can write something like the following horrible monstrosity:
select
"index",
case
when next_tag_idx > 0 then substring(cut_id from 0 for next_tag_idx)
else cut_id
end AS "my_id",
"status"
from (
select
position('[' in cut_id) AS next_tag_idx,
*
from (
select
case
when id_offset = 0 then null
else substring(mystring from id_offset + 4)
end AS cut_id,
*
from (
select
position('[id] ' in mystring) AS id_offset,
*
from my_list
) x
) y
) z;
(If anybody ever actually uses that query for anything, kittens will fall from the sky and splat upon the pavement, wailing in horror all the way down).
Or you can be sensible and just use a regular expression for this kind of string processing, in which case your query (assuming you only want the first [id]) is:
regress=> SELECT
"index",
coalesce((SELECT (regexp_matches(mystring, '\[id\]\s?(\d+)'))[1])::integer, 0) AS my_id,
status
FROM my_list;
index | my_id | status
-------+----------------+--------
12 | 0 | D
14 | 5 | A
15 | 12 | C
16 | 314 | E
19 | 01211 | B
(5 rows)
Update: If you're having issues with unicode handling in regex, upgrade to Pg 9.2. See https://stackoverflow.com/a/14293924/398670