Removing a trailing Space from Regex Matched group - iphone

I'm using regular expression lib icucore via RegKit on the iPhone to
replace a pattern in a large string.
The Pattern i'm looking for looks some thing like this
| hello world (P1)|
I'm matching this pattern with the following regular expression
\|((\w*|.| )+)\((\w\d+)\)\|
This transforms the input string into 3 groups when a match is found, of which group 1(string) and group 3(string in parentheses) are of interest to me.
I'm converting these formated strings into html links so the above would be transformed into
Hello world
My problem is the trailing space in the third group. Which when the link is highlighted and underlined, results with the line extending beyond the printed characters.
While i know i could extract all the matches and process them manually, using the search and replace feature of the icu lib is a much cleaner solution, and i would rather not do that as a result.
Many thanks as always

Would the following work as an alternate regular expression?
\|((\w*|.| )+)\s+\((\w\d+)\)\| Where inserting the extra \s+ pulls the space outside the 1st grouping.
Though, given your example & regex, I'm not sure why you don't just do:
\|(.+)\s+\((\w\d+)\)\|
Which will have the same effect. However, both your original regex and my simpler one would both fail, however on:
| hello world (P1)| and on the same line | howdy world (P1)|
where it would roll it up into 1 match.

\|\s*([\w ,.-]+)\s+\((\w\d+)\)\|
will put the trailing space(s) outside the capturing group. This will of course only work if there always is a space. Can you guarantee that?
If not, use
\|\s*([\w ,.-]+(?<!\s))\s*\((\w\d+)\)\|
This uses a lookbehind assertion to make sure the capturing group ends in a non-space character.

Related

PostgreSQL Trimming Leading and Trailing Characters: = and "

I'm working to build an import tool that utilizes a quoted CSV file. However, several of the fields in the CSV file are reported as such:
"=""38000"""
Where 38000 is the data I need. The data integration software I use (Talend 6.11) already strips the leading and trailing double quotes for me (so, "38000" becomes 38000), but I can't find a way to get rid of those others.
So, essentially, I need "=""38000""" to become "38000" where the leading "=" is removed and the trailing "" is removed.
Is there a TRIM function that can accomplish this for me? Perhaps there is a method in Talend that can do this?
As the other answer stated, you could do that operation in SQL. Or, you could do it in Java, Groovy, etc, within Talend. However, if there is an existing Talend component which does the job, my preference is to use it. That leads to faster development, potentially less testing, and easier maintenance. Having said that, it is important to review all the components which are available, so you know what's available to you.
You can use the Talend component tReplace, to inspect each of the input columns you want to trim of quotes and equal signs. A single tReplace component can do search and replace operations on multiple input columns. If all the of the replaces are related to each other, I would keep them within a single tReplace. When it gets to the point of doing unrelated replacements, I might place those within a new tReplace so that logical operations are organized and grouped together.
tReplace
For a given Input Column
search for "=", replace with ""
search for "\"", replace with ""
Something like that:
SELECT format( '"%s"', trim( both '"=' from '"=""38000"""' ) );
-[ RECORD 1 ]---
format | "38000"
1st: trim() function removes all " and = chars. Result is simply 38000
2nd: with format can add double quote back to get wishful end result
Alternatively, can use regexp and other Postgres string functions.
See more:
https://www.postgresql.org/docs/current/static/functions-string.html

Perl Pattern Matching extracting inside brackets

I really need help with coming up with the pattern matching solution...
If the string is <6>[ 84.982642] Killing the process
How can I extract them into three separate strings...
I need one for 6, 84.982642, and Killing the process..
I've tried many things but these brackets and blank spaces are really confusing me and I keep getting the error message
"WARNING: Use of uninitialized value $bracket in pattern match..."
Is there anyway I can somehow write in this way
($num_1, $num_2, $name_process) = split(/[\-,. :;!?()[\]{}]+/);
Not sure how to extract these..
Help Please?
Thank you so much
Assuming the input is in $_
($num_1, $num_2, $name_process) = /^<(\d+)>\[([^\]]+)\]\s+(.*)$/;
This assumes the first token in the angle brackets is always a number. For a little more generality use
($num_1, $num_2, $name_process) = /^<([^>]+)>\[([^\]]+)\]\s+(.*)$/;
Explanation:
<([^>]+)> - a left-angle-bracket followed one or more characters that are not a right angle-bracket, followed by a right-angle bracket.
\[([^\]]+)\] - a left-bracket followed by one or more characters that are not a right bracket, followed by a right bracket
\s+(.*) - one or more spaces, then capture everything starting with the first non-blank after that.

Regex multiline matching

I was wondering if it's possible to make a regular expression to find all of the text that is in between the following two strings:
mutablePath = CGPathCreateMutable();
...
CGPathAddPath(skinMutablePath, NULL, mutablePath);
Basically, the first and last lines will always be the same, and there will be a whole bunch of random stuff in between. I'm using the find feature in xCode and would like to count the number of lines that appear between all instances of the first and last line from above.
Is this even possible?
Xcode does not support multi-line regex matching. You'll have to search for your first and last line and count the lines in between by yourself.
Looks like you can use the DOTALL modifier,
I was able to find a block of code like yours with this regex:
(?s)mutablePath = CGPathCreateMutable\(\);.+CGPathAddPath\(skinMutablePath, NULL, mutablePath\);
More info in the ICU regex documentation here

Sed for partial replacement?

Imagine I have a file that has the following type of line:
FIXED_DATA1 VARIABLE_DATA FIXED_DATA2
I want to change the fixed data and leave the variable data as is. For various reasons, using two sed operations to replace the fixed data will not work. For instance, the fixed fields might be double-quotes, and the line has other areas containing them, thus really the regex is written to match a pattern in the variable data and the fixed data.
If I'm bent on using sed, is there a way to change both fixed data fields at once while leaving the variable field unchanged?
Thanks.
You need to partition the line into the three pieces, replace the outer two and leave the middle alone:
sed 's/^FIX1 \(.*\) FIX2$/New \1 End/'
You can make the beginning and end matches more complex as needed.

Regular Expression for number.(space), objective-c

I have an NSArray of lines (objective-c iphone), and I'm trying to find the line which starts with a number, followed by a dot and a space, but can have any number of spaces (including none) before it, and have any text following it eg:
1. random text
2. text random
3.
what regular expression would I use to get this? (I'm trying to learn it, and I needed the above expression anyway, so I thought I'd use it as an example)
With C#:
#"^ *[0-9]+\. "
It doesn't check for the presence of something after the ., so this is legal:
1.(space)
If you delete the # and escape the \ it should work with other languages (it is pretty "down-to-earth" as RegExpes go)
I may suggest (Perl-compatible regexp):
^\s*\d+\.\s
At the beginning of a line:
Any number (0-n) of spaces
One or more digits
A dot
A space
Something like
^\s*\d+\.
But it depends on the language.
/^\s*[0-9]+\.\s+/
would be my guess providing you don't have any space before the number