This question already has answers here:
split() but keep delimiter
(2 answers)
Closed 11 months ago.
I want to split a multi sentence paragraph into its constituent sentences whilst retaining the split characters ie the '. ? !'. The code I'm using is:
my #Sentence = split(/[\.\?\!]/,$Paragraph);
Is there any way that I can save those sentence terminators?
Yes, if you add parentheses around the delimiter, they will be included in the result list.
my #Sentence = split /([\.\?\!])/, $Paragraph;
E.g. if you have the string foo.bar.baz before you would get qw(foo bar baz), and with parentheses you would get qw(foo . bar . baz).
In case you want to keep the delimiters attached to the sentence, you could use a lookbehind assertion
my #Sentence = split /(?<=[\.\?\!])/, $Paragraph;
# result qw(foo. bar. baz)
If you want to strip unnecessary spaces after the match, you could use /(?<=[\.\?\!]) */.
This question already has answers here:
How to use different separators (/ , |) in a regular expression
(2 answers)
Closed 1 year ago.
While trying to debug an issue I am having with git-svn and the --ignore-paths option, I've run into this perl expression that I don't understand and haven't been able to find anything similar in perl documentation.
$path =~ m!$self->{ignore_regex}!
My understanding of this is that this is matching the $path string to the ignore_regex but it doesn't seem to match anything the way I expect. The part that I don't understand is the m! ! around $self->{ignore_regex}?
How should I be reading this syntax?
This is the match operator.
m!...! is more commonly written as /.../, but they are 100% identical.
If "/" is the delimiter then the initial m is optional. With the m you can use any pair of non-whitespace (ASCII) characters as delimiters. This is particularly useful for matching path names that contain "/", to avoid LTS (leaning toothpick syndrome). [...]
For example, the following are identical:
$path =~ /$self->{ignore_regex}/
$path =~ m/$self->{ignore_regex}/
$path =~ m^$self->{ignore_regex}^
$path =~ m{$self->{ignore_regex}}
The code in question checks if the string in $path matches the regex pattern in $self->{ignore_regex}.
This question already has answers here:
Is it possible to escape regex metacharacters reliably with sed
(4 answers)
Closed 3 years ago.
Reading sed manual, It seems only \ and & are to be watch out for in replacement space when using the s command in sed.
Are there any other characters having special meaning in replacement space? (Affecting the replacement)
Short Answer: You need to quote '\' '&' and new line.
Long Answer: The 'sed' man page indicates
To include a literal '\', '&', or newline in the final replacement,
be sure to precede the desired '\', '&', or newline in the REPLACEMENT
with a '\'.
Implying all other characters are valid literals in the REPLACEMENT string.
This question already has answers here:
What's the most robust way to efficiently parse CSV using awk?
(6 answers)
Closed 5 years ago.
I have a text file with comma seperated values which has newline characters in the column values. So it makes the column data split to next line causing data issues.
Sample data
"604","56-1203802","xx","VEN","null","50","1","20","N�
jTï"
"5526","841328305","yyINC","VEN","null","50","1","20","~R¿½K�ï
¿½ï¿½}("
"604","561203802","C","VEN",,"null","50","1","20","2ï½a��"
Expected Output
"604","56-1203802","xx","VEN","null","50","1","20","N�jTï"
"5526","841328305","yyINC","VEN","null","50","1","20","~R¿½K���}("
"604","561203802","C","VEN",,"null","50","1","20","2ï½a��"
I need to remove the newlines inside double-quoted strings.
I tried the below awk command to remove it, but it is not working as expected.
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' infile.txt > outfile.txt
The required result would be to remove the LF and CR characters from the data.
I tried solutions for similar question posted, but not working for me.
Newline characters in the file are not visible unless copied to Notepad++ when it shows as CR LF.
You can try this sed:
sed ':loop; /" *$/!{N;s/\n//g; b loop}' file
This question already has answers here:
The "backspace" escape character '\b': unexpected behavior?
(5 answers)
Closed 7 years ago.
the \b "bakspace in perl doesn't works when we use it at the last of the string.
Eg: If we see the code, i have written
print "Hello\n";
print "Hello\n";
print "\bHe\bllo\b";
It gives me this output:
Hello Hello Hllo
So should the highlighted oo be deleted or in case, the \n would have been deleted taking the control to the 2nd line?
\b is a shorthand for \x08, so
print "a\b";
simply outputs bytes
61 08
Most terminals interpret 08 as a request to move the cursor one position to the left. If you want to "erase" a character, you need to overwrite it with another.
print "a\b \b";