using patindex to replace characters - tsql

I have a table with a name column in it that contains names like:
A & A Turf
C & D Railways
D & B Railways
I have the following query that will get me the correct columns I want
select name from table where patindex('_ & _ %', name) > 0
What I need to accomplish is making anything with that type of pattern collapsed. Like this
A&A Turf
C&D Railways
D&B Railways
I'm also looking how I can do the same thing with single letter followed by a space followed by another single letter followed by a space then words with more then one letter like this
A F Consulting -> AF Consulting
D B Catering -> DB Consulting
but only if the single letter stuff is at the beginning of the value.
Example would be if the name has the pattern mentioned above anywhere in the name then don't do anything unless it's at the beginning
ALBERS, J K -> ALBERS, J K This would not change because it's a name and it's not at the beginning.
So something like this would be the desired result:
Original Name New Name Rule
____________ __________ ___________
A & K Consulting A&K Consulting Space Taken out between & for single characters
C B Finance CB Finance space taken out only if beginning beginning
Albert J K Albert J K not at beginning so left alone

This can be done without PATINDEX. Because what needs to be replaced is at the start, and has fixed patterns. So you already know the positions.
Example snippet:
DECLARE #Table TABLE (ID INT IDENTITY(1,1) PRIMARY KEY, name VARCHAR(30));
INSERT INTO #Table (name) VALUES
('A & K Consulting'),
('C B Finance'),
('Albert J K'),
('Foo B & A & R');
SELECT
name AS OldName,
(CASE
WHEN name LIKE '[A-Z] [A-Z] %' THEN STUFF(name,2,1,'')
WHEN name LIKE '[A-Z] & [A-Z] %' THEN STUFF(name,2,3,'&')
ELSE name
END) AS NewName
FROM #Table;
Test on rextester here

The first one is straightforward: replace " & " with "&". The second I'll have to take more time.

Related

to find new lines character in postgres

Having couple of entries in database table that have multiple line "names" data.
I try to find single newline character from it.
SELECT
id,
strpos ( NAME, E'\n' ) AS Position_of_substring
FROM
problems
WHERE
strpos ( NAME, E'\n' ) > 0;
But it fails for the data that have more than 1 new line character (\n).
ANy way to find "n" number of "\n" in names data.
regexp_matches will emit a row for each match. doc
SELECT
id,
strpos ( NAME, E'\n' ) AS Position_of_substring
FROM
problems p
WHERE
(select count(*) from regexp_matches(p.name,E'\n','g') ) = ?;
This one gives you a list of all indexes with \n in your string. I am not sure if you were expecting this result:
demo:db<>fiddle
SELECT
name,
array_remove( -- 5
(array_agg(sum))::int[], -- 4
length(name) + 1
)
FROM (
-- 3
SELECT
name,
SUM(length(lines) + 1) OVER (PARTITION BY name ORDER BY row_number)
FROM (
-- 2
SELECT
*,
row_number() OVER ()
FROM (
-- 1
SELECT
name,
regexp_split_to_table(name, '\n') as lines
FROM problems
)s
)s
) s
GROUP BY name
Splitting the string at the \n chars. Every split part is now one row in a temporary table.
Adding a row_count to assure the right order of the split parts
This counts the length of all single split parts. The (length + 1) gives the position of the \n. The SUM window function sums up all values within a group (your original text). That's why the order is relevant. For example: The first two parts of "abc\nde\nfgh" have the lengths of 3 and 2. So the breaks are at 4 (abc = 3, + 1) and 3 (de = 2, + 1). But the 3 of the second part is no real index, but if you sum up these values you get the right indexes: 4 and 7.
Aggregating these results
If (as in my example) the last char is always a \n and you are only interested in the \n chars the string you could remove the last entry of the aggregated array.
Changed problem in comments below:
Would like to replace \n with spaces. So I am thinking how above query
will look in the Update statement. – Pranav Unde
Replacing the \n by spaces is a quiet different problem then getting indexes for all occurances of a special character. And it's much simpler:
UPDATE problems
SET name = trim(regexp_replace(name, E'\n', ' ', 'g'));
regexp_replace(..., 'g') finds all occurances of \n and does the replacing
trim() removes the whitespaces before and after the string if necessary (maybe because there was a trailing \n as in my example - which was replaced by a space as well in the step before)
demo:db<>fiddle

SQL: Split post code value to return 'outer' part of code

I've got a results set of UK postcodes. Some are formatted with spaces, and some are not e.g. S14HG and S1 4HG
I want my select query to just return the outer part of the post code value in the results, i.e. 'S1'
I can do this in Excel using the following formula:
=IF(ISERROR(LEFT(A1,LEN(A1)-3)),””,LEFT(A1,LEN(A1)-3))
Is it possible to perform the same function in SQL through a SELECT query?
UK postcode can have one of many formats for their outward code.
However, as you can see from the possible formats in that link, there is a consistent format for the remainder of the postcode. If you are confident your postcodes are correct, you can simply remove any spaces and the last 3 characters:
declare #Postcodes table (Postcode nvarchar(10));
insert into #Postcodes values
('S1 4HG')
,('S14HG')
,('S10 4HG')
,('S104HG');
select Postcode
,replace(left(Postcode,len(Postcode)-3),' ','') as OutwardCode
from #Postcodes
Output:
Postcode OutwardCode
S1 4HG S1
S14HG S1
S10 4HG S10
S104HG S10
You can use LEFT() regardless of spaces since you only want the first two, which won't have a space.
SELECT LEFT('S1 4HG',2)
Or just get rid of the spaces...
declare #t varchar(64) = 'S 1 4 H G'
SELECT LEFT(REPLACE(#t,' ',''),2)

T-SQL Join on foreign key that has leading zero

I need to link various tables that each have a common key (a serial number in this case). In some tables the key has a leading zero e.g. '037443' and on others it doesn't e.g. '37443'. In both cases the serial refers to the same product. To confound things serial 'numbers' are not always just numeric e.g. may be "BDO1234", in these cases there is never a leading zero.
I'd prefer to use the WHERE statement (WHERE a.key = b.key) but could use joins if required. Is there any way to do this?
I'm still learning so please keep it simple if possible. Many thanks.
Based on the accepted answer in this link, I've written a small tsql sample to show you what I meant by 'the right direction':
Create the test table:
CREATE TABLE tblTempTest
(
keyCol varchar(20)
)
GO
Populate it:
INSERT INTO tblTempTest VALUES
('1234'), ('01234'), ('10234'), ('0k234'), ('k2304'), ('00034')
Select values:
SELECT keyCol,
SUBSTRING(keyCol, PATINDEX('%[^0]%', keyCol + '.'), LEN(keyCol)) As trimmed
FROM tblTempTest
Results:
keyCol trimmed
-------------------- --------------------
1234 1234
01234 1234
10234 10234
0k234 k234
k2304 k2304
00034 34
Cleanup:
DROP TABLE tblTempTest
Note that the values are alpha-numeric, and only leading zeroes are trimmed.
One possible drawback is that if there is a 0 after a white space it will not be trimmed, but that's an easy fix - just add ltrim:
SUBSTRING(LTRIM(keyCol), PATINDEX('%[^0]%', LTRIM(keyCol + '.')), LEN(keyCol)) As trimmed
You need to create a function
CREATE FUNCTION CompareSerialNumbers(#SerialA varchar(max), #SerialB varchar(max))
RETURNS bit
AS
BEGIN
DECLARE #ReturnValue AS bit
IF (ISNUMERIC(#SerialA) = 1 AND ISNUMERIC(#SerialB) = 1)
SELECT #ReturnValue =
CASE
WHEN CAST(#SerialA AS int) = CAST(#SerialB AS int) THEN 1
ELSE 0
END
ELSE
SELECT #ReturnValue =
CASE
WHEN #SerialA = #SerialB THEN 1
ELSE 0
END
RETURN #ReturnValue
END;
GO
If both are numeric then it compares them as integers otherwise it compares them as strings.

Get substring into a new column

I have a table that contains a column that has data in the following format - lets call the column "title" and the table "s"
title
ab.123
ab.321
cde.456
cde.654
fghi.789
fghi.987
I am trying to get a unique list of the characters that come before the "." so that i end up with this:
ab
cde
fghi
I have tried selecting the initial column into a table then trying to do an update to create a new column that is the position of the dot using "ss".
something like this:
t: select title from s
update thedot: (title ss `.)[0] from t
i was then going to try and do a 3rd column that would be "N" number of characters from "title" where N is the value stored in "thedot" column.
All i get when i try the update is a "type" error.
Any ideas? I am very new to kdb so no doubt doing something simple in a very silly way.
the reason why you get the type error is because ss only works on string type, not symbol. Plus ss is not vector based function so you need to combine it with each '.
q)update thedot:string[title] ss' "." from t
title thedot
---------------
ab.123 2
ab.321 2
cde.456 3
cde.654 3
fghi.789 4
There are a few ways to solve your problem:
q)select distinct(`$"." vs' string title)[;0] from t
x
----
ab
cde
fghi
q)select distinct(` vs' title)[;0] from t
x
----
ab
cde
fghi
You can read here for more info: http://code.kx.com/q/ref/casting/#vs
An alternative is to make use of the 0: operator, to parse around the "." delimiter. This operator is especially useful if you have a fixed number of 'columns' like in a csv file. In this case where there is a fixed number of columns and we only want the first, a list of distinct characters before the "." can be returned with:
exec distinct raze("S ";".")0:string title from t
`ab`cde`fghi
OR:
distinct raze("S ";".")0:string t`title
`ab`cde`fghi
Where "S " defines the types of each column and "." is the record delimiter. For records with differing number of columns it would be better to use the vs operator.
A variation of WooiKent's answer using each-right (/:) :
q)exec distinct (` vs/:x)[;0] from t
`ab`cde`fghi

PostgreSQL and word games

In a word game similar to Ruzzle or Letterpress, where users have to construct words out of a given set of letters:
I keep my dictionary in a simple SQL table:
create table good_words (
word varchar(16) primary key
);
Since the game duration is very short I do not want to check every entered word by calling a PHP script, which would look that word up in the good_words table.
Instead I'd like to download all possible words by one PHP script call before the round starts - since all letters are known.
My question is: if there is a nice SQLish way to find such words?
I.e. I could run a longer-taking script once to add a column to good_words table, which would have same letters as in the word columnt, but sorted alphabetically... But I still can't think of a way to match for it given a set of letters.
And doing the word matching inside of a PHP script (vs. inside the database) would probably take too long (because of bandwidth: would have to fetch every row from the database to the PHP script).
Any suggestions or insights please?
Using postgresql-8.4.13 with CentOS Linux 6.3.
UPDATE:
Other ideas I have:
Create a constantly running script (cronjob or daemon) which would prefill an SQL table with precompiled letters board and possible words - but still feels like a waste of bandwidth and CPU, I would prefer to solve this inside the database
Add integer columns a, b, ... , z and whenever I store a word into good_words, store the letter occurences there. I wonder if it is possible to create an insert trigger in Pl/PgSQL for that?
Nice question, I upvoted.
What you're up to is a list of all possible permutations of the given letters of a given length. As described in the PostgreSQL wiki, you can create a function and call it like this (matches highlighted letters in your screenshot):
SELECT * FROM permute('{E,R,O,M}'::text[]);
Now, to query the good_words use something like:
SELECT gw.word, gw.stamp
FROM good_words gw
JOIN permute('{E,R,O,M}'::text[]) s(w) ON gw.word=array_to_string(s.w, '');
This could be a start, except that it doesn't check if we have enough letters, only if he have the right letters.
SELECT word from
(select word,generate_series(0,length(word)) as s from good_words) as q
WHERE substring(word,s,1) IN ('t','h','e','l','e','t','t','e','r','s')
GROUP BY word
HAVING count(*)>=length(word);
http://sqlfiddle.com/#!1/2e3a2/3
EDIT:
This query select only the valid words though it seems a bit redundant. It's not perfect but certainly proves it can be done.
WITH words AS
(SELECT word, substring(word,s,1) as sub from
(select word,generate_series(1,length(word)) as s from good_words) as q
WHERE substring(word,s,1) IN ('t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'))
SELECT w.word FROM
(
SELECT word,words.sub,count(DISTINCT s) as cnt FROM
(SELECT s, substring(array_to_string(l, ''),s,1) as sub FROM
(SELECT l, generate_subscripts(l,1) as s FROM
(SELECT ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'] as l)
as q)
as q) as let JOIN
words ON let.sub=words.sub
GROUP BY words.word,words.sub) as let
JOIN
(select word,sub,count(*) as cnt from words
GROUP BY word, sub)
as w ON let.word=w.word AND let.sub=w.sub AND let.cnt>=w.cnt
GROUP BY w.word
HAVING sum(w.cnt)=length(w.word);
Fiddle with all possible 3+ letters words (485) for that image: http://sqlfiddle.com/#!1/2fc66/1
Fiddle with 699 words out of which 485 are correct: http://sqlfiddle.com/#!1/4f42e/1
Edit 2:
We can use array operators like so to get a list of words that contain the letters we want:
SELECT word as sub from
(select word,generate_series(1,length(word)) as s from good_words) as q
GROUP BY word
HAVING array_agg(substring(word,s,1)) <# ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'];
So we can use it to narrow down the list of words we need to check.
WITH words AS
(SELECT word, substring(word,s,1) as sub from
(select word,generate_series(1,length(word)) as s from
(
SELECT word from
(select word,generate_series(1,length(word)) as s from good_words) as q
GROUP BY word
HAVING array_agg(substring(word,s,1)) <# ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s']
)as q) as q)
SELECT DISTINCT w.word FROM
(
SELECT word,words.sub,count(DISTINCT s) as cnt FROM
(SELECT s, substring(array_to_string(l, ''),s,1) as sub FROM
(SELECT l, generate_subscripts(l,1) as s FROM
(SELECT ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'] as l)
as q)
as q) as let JOIN
words ON let.sub=words.sub
GROUP BY words.word,words.sub) as let
JOIN
(select word,sub,count(*) as cnt from words
GROUP BY word, sub)
as w ON let.word=w.word AND let.sub=w.sub AND let.cnt>=w.cnt
GROUP BY w.word
HAVING sum(w.cnt)=length(w.word) ORDER BY w.word;
http://sqlfiddle.com/#!1/4f42e/44
We can use GIN indexes to work on arrays so we probably could create a table that would store the arrays of letters and make words point to it (act, cat and tact would all point to array [a,c,t]) so probably that would speed things up but that's up for testing.
Create a table that has entries (id, char), be n the number of characters you are querying for.
select id, count(char) AS count from chartable where (char = x or char = y or char = z ...) and count = n group by id;
OR (for partial matching)
select id, count(char) AS count from chartable where (char = x or char = y or char = z ...) group by id order by count;
The result of that query has all the word-id's that fit the specifications. Cache the result in a HashSet and simple do a lookup whenever a word is entered.
You can add the column with sorterd letters formatted like '%a%c%t%'. Then use query:
select * from table where 'abcttx' like sorted_letters
to find words that can be built from letters 'abcttx'. I don't know about performance, but simplicity probably can't be beaten :)
Here is a query that finds the answers that can be found by walking through adjacent fields.
with recursive
input as (select '{{"t","e","s","e"},{"r","e","r","o"},{"r","e","m","a"},{"s","d","s","s"}}'::text[] as inp),
dxdy as(select * from (values(-1,-1),(-1,0),(-1,1),(0,1),(0,-1),(1,-1),(1,0),(1,1)) as v(dx, dy)),
start_position as(select * from generate_series(1,4) x, generate_series(1,4) y),
work as(select x,y,inp[y][x] as word from start_position, input
union
select w.x + dx, w.y + dy, w.word || inp[w.y+dy][w.x+dx]
from dxdy cross join input cross join work w
inner join good_words gw on gw.word like w.word || '%'
)
select distinct word from work
where exists(select * from good_words gw where gw.word = work.word)
(other answers don't take this into account).
Sql fiddle link: http://sqlfiddle.com/#!1/013cc/14 (notice You need an index with varchar_pattern_ops for the query to be reasonably fast).
Does not work in 8.4. Probably 9.1+ only. SQL Fidlle
select word
from (
select unnest(string_to_array(word, null)) c, word from good_words
intersect all
select unnest(string_to_array('TESTREROREMASDSS', null)) c, word from good_words
) s
group by word
having
array_agg(c order by c) =
(select array_agg(c order by c) from unnest(string_to_array(word, null)) a(c))
My own solution is to create an insert trigger, which writes letter frequencies into an array column:
create table good_words (
word varchar(16) primary key,
letters integer[26]
);
create or replace function count_letters() returns trigger as $body$
declare
alphabet varchar[];
i integer;
begin
alphabet := regexp_split_to_array('abcdefghijklmnopqrstuvwxyz', '');
new.word := lower(new.word);
for i in 1 .. array_length(alphabet, 1)
loop
-- raise notice '%: %', i, alphabet[i];
new.letters[i] := length(new.word) - length(replace(new.word, alphabet[i], ''));
end loop;
return new;
end;
$body$ language plpgsql;
create trigger count_letters
before insert on good_words
for each row execute procedure count_letters();
Then I generate similar array for the random board string tesereroremasdss
and compare both arrays using the array contains operator #>
Any new ideas or improvements are always welcome!