How to get rid of some characters from a field of string in Hive? - hiveql

I want to get rid of some characters in a field which are stting format. For example, replace the punctuation as blank character. How to realize it given the candidate characters to erase.

Use regexp_replace with pattern for example '[_.,!?-]', list characters you want to get rid in the pattern. This will replace character in the pattern with spaces:
select regexp_replace('test_string_with-puctuations,.!?','[_.,!?-]',' ');
Output:
test string with puctuations

Related

Inserting hyphens into length limited String using regex

Within a Swift project I have some regex which at present ensures that an input can only be 10 characters long:
"^[\\da-zA-Z]{10,10}$"
I need to tweak this slightly, so that the string which this is working on will have the below format:
#####-####
i.e, inserting a character after the fifth character.
So far I have tried combining what I have with some other regex, however this is incorrect and I can't figure out what I need to do differently to make this work:
"^[\\da-zA-Z]{10,10}$(.{5}),$1-$2"
If you have as string of 10 characters and you want to replace the character after the sixth character you could use 2 capturing groups.
Capture the first 5 characters in the first group, then match the sixth character which you want to replace and capture the last 4 in the second group.
^([\\da-zA-Z]{5})[\\da-zA-Z]([\\da-zA-Z]{4})$
regex demo
In the replacement use $1-$2 which in total will be 10 characters as in your desired pattern #####-####
Note that {10,10} can be written as {10}

Conditional replacement of a character

I would like to replace a character in a long string only if a special sequence is present in the input.
Example:
This string is a sample! I wrote it to describe my problem! I hope somebody can help me with this! I have the ID: 12345! That's all!
My desired output is:
This string is a sample. I wrote it to describe my problem. I hope somebody can help me with this. I have the ID: 12345. That's all.
Only when '12345' present in the input string.
I tried (positive|negative) look(ahead|behind)
(?<!=12345)(!+(.*))+
Does not work, so as ?=, ?!...
Is this possible with PCRE replacement in one step?
In general, this is possible with any regex flavor supporting \G "string start/end of the previous match" operator. You may replace with $1 + desired text when searching with the following patterns:
(?:\G(?!^)|^(?=.*CHECKME))(.*?)REPLACEME <-- Replace REPLACEME if CHECKME is present
(?:\G(?!^)|^(?!.*CHECKME))(.*?)REPLACEME <-- Replace REPLACEME if CHECKME is absent
With Perl/PCRE/Onigmo that support \K, you may replace with your required text when searching with
(?:\G(?!^)|^(?=.*CHECKME)).*?\KREPLACEME <-- Replace REPLACEME if CHECKME is present
(?:\G(?!^)|^(?!.*CHECKME)).*?\KREPLACEME <-- Replace REPLACEME if CHECKME is absent
In your case, since the text searched for is a single character, you may use a more efficient regex with just one .*:
(?:\G(?!^)|^(?=.*12345))[^!]*\K!
and replace with . (or with $1. if you use (?:\G(?!^)|^(?=.*12345))([^!]*)!). See the regex demo.
If there can be line breaks in the string use (?s)(?:\G(?!^)|^(?=.*12345))[^!]*\K!.
Details
(?:\G(?!^)|^(?=.*12345)) - either the end of the previous match (\G(?!^)) or (|) the start of a string position followed with any 0+ chars as many as possible up to the last occurrence of 12345 (^(?=.*12345))
[^!]* - 0 or more chars other than !
\K - match reset operator that discards all text matched so far in the match memory buffer
! - a ! char.

SCALA Replace with $

I want replace a Letter with a literal $. I tried:
var s = string.replaceAll("Register","$10")
I want that this text Register saved to be changed to: $10 saved
Illegal group reference is the error I get.
If you look at the scaladoc for replaceAll, you'll see that it takes a regular expression string as the parameter. Escape the $ with a \, or use replaceAllLiterally
replaceAll uses a regular expressions to find the match. In the replacement string $ is a special character that refers to a specific capture group in the matching string. You have no capture groups so this is an error. It's not what you want anyway since you want the literal text "$10".
Usereplaceinstead ofreplaceAll`. It just does a direct string replacement.

Search a pattern in the first 100 characters of a string

I want to display first 1000 characters of a string (literals are replaced by special symbol). I am using pcre library to replace the literal. After replacing every literal I am checking for the length of the string and if it is > 1000 then stop matching and display the string.
My problem is, Suppose I am sending a string with length 1GB, and if there is no literal in that string, pcre will check for the entire string. I want to search the pattern in the first 1000 characters. Is there any way to do this?
Just cut a 1000-chars head of your string and use substitution for it, not for the whole text.
In case you get less than 1000 chars after substitution, just cut another 1000-chars head, use substitution and concatenate two results. Do it in loop until you get 1000-chars string or reach the end of the whole text.

I want to find and replace an ordered list in word from the . to the )

I have tried [0-9] and checked the use wildcard box but it replaces the individual numbers with the literal [0-9] string. How do I replace with the number it found plus a character?
Backreferences. Your unspecified environment may or may not support them, but if it does, you would:
replace \([0-9]*\)
with \1 <then, whatever the character you want is>