Replacing a phrase with a leading space in T-SQL - but it's also replacing the phrases without the leading space - tsql

I've run into an interesting problem I'm hoping someone can shed some light on.
I'm trying to pull a unique list of names from an MS SQL Database - but the company has been sloppy with their names. They were tacking on a code to the end of last name for some users. I need to remove that code.
Example:
firstname lastname
John Doe
Mary Smith AST
Mike Jackson AST
Brian Astor
Jackie Masterson
In the example, "AST" is the code they tack on. It's not tacked on to all last names either. I need to get an output of just the last names without the code.
I would have expected this is a simple use of REPLACE. I tried:
select REPLACE(lastname, ' AST', '') from table
Note the leading space in the quotes for the search phrase... this does work to remove the "AST" appended to the last names.
However - my problem is that it will also remove anywhere AST appears at the BEGINNING of the field. So Brian Astor comes out as "Brian or" since the field started with AST. However... it correctly does not remove ast from the middle, so Jackie Masterson is fine.
Any ideas why it is ignoring the leading space in my search phrase for the beginning of the field? I've tried ltrim to eliminate the possibility the field has leading spaces.
Thanks!

Replace with an empty string will eliminate the searched string anywhere in your source string. So the behaviour is as expected.
If you only need to replace ' ast' at the end of your searched string, try something like this:
select replace(lastname + '$$$', ' AST$$$', '') from table
Of course you need to be sure that the $$$ appended don't appear by chance in your source string (lastname). Which I guess is not that likely.

Related

I can't understand the behaviour of btrim()

I'm currently working with postgresql, I learned about this function btrim, I checked many websites for explanation, but I don't really understand.
Here they mention this example:
btrim('xyxtrimyyx', 'xyz')
It gives trim.
When I try this example:
btrim('xyxtrimyyx', 'yzz')
or
btrim('xyxtrimyyx', 'y')
I get this: xyxtrimyyx
I don't understand this. Why didn't it remove the y?
From the docs you point to, the definition says:
Remove the longest string consisting only of characters in characters
(a space by default) from the start and end of string
The reason your example doesn't work is because the function tries to strip the text from Both sides of the text, consisting only of the characters specified
Lets take a look at the first example (from the docs):
btrim('xyxtrimyyx', 'xyz')
This returns trim, because it goes through xyxtrimyyx and gets up to the t and doesn't see that letter in xyz, so that is where the function stops stripping from the front.
We are now left with trimyyx
Now we do the same, but from the end of the string.
While one of xyz is the last letter, remove that letter.
We do this until m, so we are left with trim.
Note: I have never worked with any form of sql. I could be wrong about the exact way that postgresql does this, But I am fairly certain from the docs that this is how it is done.

PostgreSQL Trimming Leading and Trailing Characters: = and "

I'm working to build an import tool that utilizes a quoted CSV file. However, several of the fields in the CSV file are reported as such:
"=""38000"""
Where 38000 is the data I need. The data integration software I use (Talend 6.11) already strips the leading and trailing double quotes for me (so, "38000" becomes 38000), but I can't find a way to get rid of those others.
So, essentially, I need "=""38000""" to become "38000" where the leading "=" is removed and the trailing "" is removed.
Is there a TRIM function that can accomplish this for me? Perhaps there is a method in Talend that can do this?
As the other answer stated, you could do that operation in SQL. Or, you could do it in Java, Groovy, etc, within Talend. However, if there is an existing Talend component which does the job, my preference is to use it. That leads to faster development, potentially less testing, and easier maintenance. Having said that, it is important to review all the components which are available, so you know what's available to you.
You can use the Talend component tReplace, to inspect each of the input columns you want to trim of quotes and equal signs. A single tReplace component can do search and replace operations on multiple input columns. If all the of the replaces are related to each other, I would keep them within a single tReplace. When it gets to the point of doing unrelated replacements, I might place those within a new tReplace so that logical operations are organized and grouped together.
tReplace
For a given Input Column
search for "=", replace with ""
search for "\"", replace with ""
Something like that:
SELECT format( '"%s"', trim( both '"=' from '"=""38000"""' ) );
-[ RECORD 1 ]---
format | "38000"
1st: trim() function removes all " and = chars. Result is simply 38000
2nd: with format can add double quote back to get wishful end result
Alternatively, can use regexp and other Postgres string functions.
See more:
https://www.postgresql.org/docs/current/static/functions-string.html

Postgres import a double quote value

I have a large .csv file with 9 million rows. Some of these columns contain text with quotes or other special characters in them I would like to import from this .csv file into the database. For example I would like to import this row:
ID BH Units Name Type_building Year_cons
1 4 900.00 schoolgebouw "De Bolster Schoolgebouw 2014-01-01
As you can see there is a double quote in the fourth column. None of the values in the .csv file are quoted, but sometimes a double quote or backslash '\' appears in the text. When I try to upload the data using:
\COPY <tablename> FROM <path to file> WITH CSV DELIMITER ';' NULL '\N';
It gives an error message: ERROR value to long for type character varying(25).
Apparently it sees the double quote as the start of a string and it tries to combine everything after it in the .csv file (including the fifth and sixth column) into a single cell (so that cell will contain 'De Bolster Schoolgebouw 2014-01-01'), which doesnt fit because the 'Name' column allows max 25 characters.
I found a similar topic (Is it possible to turn off quote processing in the Postgres COPY command with CSV format?) in which this solution was presented:
\COPY <tablename> FROM <path to file> WITH CSV DELIMITER ';' QUOTE E'\b' NULL '\N';
I think what it does is sets the quote value (default is double quote) to something else, in this case a backspace, so it won't recognize a double quote as a quote anymore. However when I run this I get another error: INVALID input syntax for integer.
What has happened is that every value now is quoted, so ID with value '1' becomes value '"1"' and because ID is defined as an integer it won't accept quotes.
Do you have any idea how to import double quotes and other special characters from a .csv file into a postgres database?
Thanks in advance!!
Based on the error message, I'd be suspicious it has anything to do with double quoting or anything of the sort -- had it been so, it would have been a widely reported bug and fixed ages ago.
When it comes to Postgres, the error messages are almost always correct and helpful. As such, consider the very real possibility that there are more characters than meets the eye.
My own guess is that you've some trailing (or leading) spaces in there somewhere, and as such have pieces of data that look 24 characters long when viewed in a spreadsheet while being, in fact, longer.
If you don't, my second guess would be some kind of bizarro character sets conflicts or effects. Perhaps you've some double byte characters, or two single characters behaving as a single one due to a diacritic in there. These look fine in the viewer you're using for your data; but then when these get interpreted or viewed as utf8 they end up counting as two distinct characters. Unlikely imo, but possible (example).
Lastly and per Frank's suggestion, try removing the length constraint. It is only slowing you down as things stand, because it slows down inserts and is preventing you to move forward. Once done importing, re-add the constraint to the table's definition. You'll then be able to find the offending rows using the likes of:
select name from table where length(name) > 24;
... and upon fixing them, you'll be able to re-add your constraint if it serves any purpose. (Hint: it doesn't, or at the very least shouldn't have. There's a real person out there whose name is: "Kim-Jong Sexy Glorious Beast Divine Dick Father Lovely Iron Man Even Unique Poh Un Winn Charlie Ghora Khaos Mehan Hansa Kimmy Humbero Uno Master Over Dance Shake Bouti Bepop Rocksteady Shredder Kung Ulf Road House Gilgamesh Flap Guy Theo Arse Hole Im Yoda Funky Boy Slam Duck Chuck Jorma Jukka Pekka Ryan Super Air Ooy Rusell Salvador Alfons Molgan Akta Papa Long Nameh Ek.")

Strip excess padding from a string

I asked a question earlier today and got a really quick answer from llbrink. I really should have asked that question before I spent several hours trying to find an answer.
So - here's another question that I have never found an answer for (although I have created a work-around which seems very cludgy).
My AHK program asks the user for a login name. The program then compares the login name with an existing list of names in a file.
The login name in the file may contain spaces, but there are never spaces at the beginning of the name. When the user enters the name, he may include spaces at the beginning. This means that when my program compares the name with those in the file, it can not find a match (because of the extra spaces).
I want to find a way of stripping the spaces from the beginning of the input.
My work-round has been to split the input string into an array (which does ignore leading spaces) and then use the first element of the array. This is my code :
name := DoStrip(name)
DoStrip(xyz) ; strip leading and trailing spaces from string
{
StringSplit, out, xyz, `,, %A_Space%
Return out1
}
This seems to be a very laboured way to do it - is there a better way ?
I don't see a problem with your example if it works on all cases.
There is a much simpler way; just use Autotrim which works like this.
AutoTrim, On ; not required it is on by default
my_variable = %my_variable%
There are also many other different ways to trim string in autohotkey,
which you can combine into something useful.
You can also use #LTrim and #RTrim to remove white spaces at the beginning and at the end of the string.

crystal reports : substring error

I've developed a workaround since crystal reports doesn't seem to have a substring function with the following formula:
right({_v_hardware.groupname},
truncate(instr(replace({_v_hardware.groupname},".",
","), ","))
What I'm trying to do is search for the period (".") in a string and replace it with a comma. Then find the comma position in the string and print all characters following after the comma. This is assuming the string will only have 1 period in the entire string.
Now when I attempt to do this, I get some weird characters which look like wingdings. Any ideas?
thanks in advance.
I don't know the entire issue that you are attempting to accomplish, but for this question alone, the step of replacing the period with a comma seems to be unnecessary. If you know that there is only one period in the string and you only want the characters right of the period then you should be able to do something like the following (this is #first_formula):
right({_v_hardware.groupname}, len({_v_hardware.groupname}) - instr({_v_hardware.groupname},"."))
If for some reason you want to show the comma then I'd do that in a separate formula. If you need the entire screen with the comma replaced then just do:
replace({_v_hardware.groupname},".",",")
And if you need the comma plus included in the string then it might just be easier to do something like:
"," + {#first_formula}
Hope this helps.