lowercase everything except content between single quotes - perl - perl

Is there a way in perl to replace all text in input line except ones within single quotes(There could be more than one) using regex, I have achieved this using the code below but would like to see if it can be done with regex and map.
while (<>) {
my $m=0;
for (split(//)) {
if (/'/ and ! $m) {
$m=1;
print;
}
elsif (/'/ and $m) {
$m=0;
print;
}
elsif ($m) {
print;
}
else {
print lc;
}
}
}
**Sample input:**
and (t.TARGET_TYPE='RAC_DATABASE' or (t.TARGET_TYPE='ORACLE_DATABASE' and t.TYPE_QUALIFIER3 != 'racinst'))
**Sample output:**
and (t.target_type='RAC_DATABASE' or (t.target_type='ORACLE_DATABASE' and t.type_qualifier3 != 'racinst'))

You can give this a shot. All one regexp.
$str =~ s/(?:^|'[^']*')\K[^']*/lc($&)/ge;
Or, cleaner and more documented (this is semantically equivalent to the above)
$str =~ s/
(?:
^ | # Match either the start of the string, or
'[^']*' # some text in quotes.
)\K # Then ignore that part,
# because we want to leave it be.
[^']* # Take the text after it, and
# lowercase it.
/lc($&)/gex;
The g flag tells the regexp to run as many times as necessary. e tells it that the substitution portion (lc($&), in our case) is Perl code, not just text. x lets us put those comments in there so that the regexp isn't total gibberish.

Don't you play too hard with regexp for such a simple job?
Why not get the kid 'split' for it today?
#!/usr/bin/perl
while (<>)
{
#F = split "'";
#F = map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);
print join "'", #F;
}
The above is for understanding. We often join the latter two lines reasonably into:
print join "'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);
Or enjoy more, making it a one-liner? (in bash shell) In concept, it looks like:
perl -pF/'/ -e 'join "'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);' YOUR_FILE
In reality, however, we need to respect the shell and do some escape (hard) job:
perl -pF/\'/ -e 'join "'"'"'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);' YOUR_FILE
(The single-quoted single quote needs to become 5 letters: '"'"')
If it doesn't help your job, it helps sleep.

One more variant with Perl one-liner. I'm using hex \x27 for single quotes
$ cat sql_str.txt
and (t.TARGET_TYPE='RAC_DATABASE' or (t.TARGET_TYPE='ORACLE_DATABASE' and t.TYPE_QUALIFIER3 != 'racinst'))
$ perl -ne ' { #F=split(/\x27/); for my $val (0..$#F) { $F[$val]=lc($F[$val]) if $val%2==0 } ; print join("\x27",#F) } ' sql_str.txt
and (t.target_type='RAC_DATABASE' or (t.target_type='ORACLE_DATABASE' and t.type_qualifier3 != 'racinst'))
$

Related

Concatenate with non-variable stuff in perl

I need to concate map result with a string.
perl -le 'print (map { (q(a)..q(z))[rand(26)] } 1..3) . "123"'
Expected 3 random symbols and 123. But there is no 123 just 3 random symbols.
In general, I need to add a variable there.
With warnings:
print (...) interpreted as function at -e line 1.
Useless use of concatenation (.) or string in void context at -e line 1.
This is because your code is of the following form:
print(...) . "123"
Solutions:
perl -le'print( map( { (q(a)..q(z))[rand(26)] } 1..3 ) . "123" )' # Fully parenthesized
perl -le'print map( { (q(a)..q(z))[rand(26)] } 1..3 ) . "123"' # Opt parens dropped
perl -le'print( ( map { (q(a)..q(z))[rand(26)] } 1..3 ) . "123" )'
perl -le'print +( map { (q(a)..q(z))[rand(26)] } 1..3 ) . "123"' # "Disambiguated"
Except those aren't right either. While they fix the problem you asked about, they reveal a second problem. They invariably print 3123 because map in scalar context returns the number of scalars it would otherwise return in list context.
Solutions:
perl -le'print( map( { (q(a)..q(z))[rand(26)] } 1..3 ), "123" )' # . => ,
perl -le'print map( { (q(a)..q(z))[rand(26)] } 1..3 ), "123"' # . => ,
perl -le'print( ( map { (q(a)..q(z))[rand(26)] } 1..3 ), "123" )' # . => ,
perl -le'print +( map { (q(a)..q(z))[rand(26)] } 1..3 ), "123"' # . => ,
perl -le'print join "", ( map { (q(a)..q(z))[rand(26)] } 1..3 ), "123"' # join
There are a couple of interesting things going on here. First let's ask Perl to help up track down any problems by turning on warnings.
$ perl -Mwarnings -le 'print (map { (q(a)..q(z))[rand(26)] } 1..3) . "123"'
print (...) interpreted as function at -e line 1.
Useless use of concatenation (.) or string in void context at -e line 1.
lfy
Two warnings there. Let's look at both of them.
print (...) interpreted as function
If the first non-whitespace character following print (or any other list operator) is a opening parenthesis, then Perl assumes that you want to call print as a function and it will look for the balancing closing parenthesis to end the list of arguments to print.
Useless use of concatenation (.) or string in void context
Because the print call is assumed to end with the closing parenthesis, the . "123" isn't doing anything useful. And is therefore ignored.
The standard way to tell Perl that an opening parenthesis isn't marking a function call is to use a +.
$ perl -Mwarnings -le 'print +(map { (q(a)..q(z))[rand(26)] } 1..3) . "123"'
3123
Well, we lost the warnings. But we got '3' where we were hoping to see three symbols. What we have here now is basically this:
print +(map ...) . "123";
Because of the concatenation, map is being called in scalar context. And in scalar context, map no longer returns a list of values, but the size of that list (an integer - 3 in this case).
The fix for that is to replace the . with a comma, so map is called in list context.
$ perl -Mwarnings -le 'print +(map { (q(a)..q(z))[rand(26)] } 1..3), "123"'
ntg123
So you were being burnt by a) the parentheses not doing what you wanted them to do and b) map being called in scalar context.
perl -le 'print join("",map { (q(a)..q(z))[rand(26)] } 1..3) . "123"'
Enclose the whole line to be printed inside parenthesis and use a comma as separator:
perl -le 'print ( (map { (q(a)..q(z))[rand(26)] } 1..3) , "123")'

Perl: How to use command line special characters (newline, tab) in $LIST_SEPARATOR ($")

I would like to use the value of a variable (fixed by a command line option for instance) as a list separator, enabling that value to be a special character (newline, tabulation, etc.).
Unfortunately the naïve approach does not work due to the fact that the two following print statement behave differentely :
my #tab = ("a","b","c");
# block 1 gives expected result:
# a
# b
# c
{
local $" = "\n"; #" let us please the color syntax engine
print "#tab";
}
# block 2 gives unwanted result:
# a\nb\nc
{
use Getopt::Long;
my $s;
GetOptions('separator=s' => \$s);
local $" = "$s"; #" let us please the color syntax engine
print "#tab";
}
Any idea I can correct the block 2 so that I get the wanted result (the one produced by block 1) ?
It actually does work the same if you assign the same string. Perl's
"\n"
creates a one character string consisting of a newline. With my shell (bash), you'd use
'
'
to do the same.
$ perl a.pl --separator='
'
a
b
ca
b
c
You didn't do this. You passed a string consisting of the two characters \ and n to Perl instead.
If you your program to convert two chars \n into a newline, you'll need to tell it to do so.
my #tab = qw( a b c );
sub handle_escapes {
my ($s) = #_;
$s =~ s/\\([\\a-z])/
$1 eq '\\' ? '\\' :
$1 eq 'n' ? "\n" :
do { warn("Unrecognised escape \\$1"); "\\$1" }
/seg;
return $s;
}
{
my $s = '\n'; #" let us please the color syntax engine
local $" = handle_escapes($s);
print "#tab";
}
{
use Getopt::Long;
my $s;
GetOptions('separator=s' => \$s);
local $" = handle_escapes($s); #" let us please the color syntax engine
print "#tab";
}
$ perl a.pl --separator='\n'
a
b
ca
b
c

Perl parsing - mixture of chars, tabs and spaces

I have the following types of line in my code:
MMAPI_CLOCK_OUTPUTS = 1, /*clock outputs system*/
MMAPI_SYSTEM_MANAGEMENT = 0, /*sys man system*/
I want to parse them to get:
'MMAPI_CLOCK_OUTPUTS'
'1'
'clock outputs system'
So I tried:
elsif($TheLine =~ /\s*(.*)s*=s*(.*),s*\/*(.*)*\//)
but this doesn't get the last string 'clock outputs system'
What should the parsing code actually be?
You should escape the slashes, stars and the s for spaces. Instead of writing /, * or s in your regex, write \/, \* and \s:
/\s*(.*)\s=\s*(.*),\s\/\*(.*)\*\//
if($TheLine =~ m%^(\S+)\s+=\s+(\d+),\s+/\*(.*)\*/%) {
print "$1 $2 $3\n"
}
This uses % as an alternative delimiter in order to avoid leaning toothpick syndrome when you escape the / characters.
Try this regex: /^\s*(.*?)\s*=\s*(\d+),\s*\/\*(.*?)\*\/$/
Here is an example in which you can test it:
#!/usr/bin/perl
use strict;
use warnings;
my $str = "MMAPI_CLOCK_OUTPUTS = 1, /*clock outputs system*/\n
MMAPI_SYSTEM_MANAGEMENT = 0, /*sys man system*/";
while ($str =~ /^\s*(.*?)\s*=\s*(\d+),\s*\/\*(.*?)\*\/$/gm) {
print "$1 $2 $3 \n";
}
# Output:
# MMAPI_CLOCK_OUTPUTS 1 clock outputs system
# MMAPI_SYSTEM_MANAGEMENT 0 sys man system

Exact pattern match using perl index() function

I am trying to use the index() function and I want to find the position of a word inside a string, only when it is an exact match. For example:
My string is STRING="CATALOG SCATTER CAT CATHARSIS"
And my search string is KEY=CAT
I want to say something like index($STRING, $KEY) and check match for CAT, and not CATALOG. How do I accomplish this? The documentation says
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-expression pattern match.
which makes me think that it may not be that straight-forward, but my perl skills are limited :). Is it possible to do what I am trying to do?
Hopefully, I was able to articulate my question well. Thanks in advance for your help!
How about:
my $str = "CATALOG SCATTER CAT CATHARSIS";
my $key = "CAT";
if ($str =~ /\b$key\b/) {
say "match at char ",$-[0];;
} else {
say "no match";
}
output:
match at char 16
You need to learn about Regular Expressions in Perl. Perl didn't invent Regular Expressions, but tremendously expanded upon the concept. In fact, many other programming languages talk specifically about using Perl Regular Expressions.
A regular expression matches a specific word pattern. For example, /cat/ matches the sequence cat in a string.
if ( $string =~ /cat/ ) {
print "String contains the letters 'cat' in a row\n";
}
In many ways, this does the same thing as:
my $location = index ( $string, "cat" );
if ( $location =! -1 ) { # index returns -1 when substring isn't found
print "String contains the letters 'cat' in a row\n";
}
But, both of these would match:
"Don't let the cat out of the bag"
"The Sears catalog arrived in the mail"
You don't want to match the last. So, you could do this:
my $location = index $string, " cat ";
Now, index $string, " cat " won't match the word catalog. Case closed! Or is it? What about:
"cat and dog it doth rain."
Maybe you could check and say things are okay if a sentence starts with "cat":
if ( (index ($string, " cat ") != -1) or (index ($string, "cat") = 0) ) {
print "String contains the letters 'cat' in a row\n";
}
But, what about these?
"The word CAT in all uppercase"
"Stupid cat"
"Cat! Here Cat! Common Cat!": Punctuation after the word "cat"
"Don't let the 'cat' out of the 'bag'": Quotation Marks around "cat"
It could take dozens of lines to specify each and every one of these conditions.
However:
if ( $string =~ /\bcat\b/i ) {
print "String contains the word 'cat' in it\n";
}
Specifies each and every one -- and then some. The \b says this is a word boundary. This could be a space, a tab, a quote, the beginning or ending of a line. Thus /\bcat\b/ specifies that this should be the word cat and not catalog. The i on the end tells your regular expression to ignore case when matching, so you'll find Cat, cat, CAT, cAt, and all other possible combinations.
In fact, Perl's regular expressions is what made Perl such a popular language to begin with.
Fortunately, Perl comes with not one, but two tutorials on Regular Expressions:
perlretut: Perl Regular Expression Tutorial
perlrequick: Perl Regular Expression Quick Start.
Hope this helps.
That's (partial) solution of this problem with index:
use warnings;
use strict;
my $test = 'CATALOG SCATTER CAT CATHARSIS';
my $key = 'CAT';
my $k_length = length $key;
my $s_length = (length $test) - $k_length;
my $pos = -1;
while (($pos = index $test, $key, $pos + 1) > -1) {
if ($pos > 0) {
my $prev_char = substr $test, $pos - 1, 1;
### print "Previous character: '$prev_char'\n";
next if $prev_char ge 'A' && $prev_char le 'Z'
|| $prev_char ge 'a' && $prev_char le 'z';
}
if ($pos < $s_length) {
my $next_char = substr $test, $pos + $k_length, 1;
### print "Next character: '$next_char'\n";
next if $next_char ge 'A' && $next_char le 'Z'
|| $next_char ge 'a' && $next_char le 'z';
}
print "Word '$key' found at " . $pos + 1 . "th position.\n";
}
As you see, it's kinda wordy, because it uses basic Perl string functions - index and substr - only. Checking whether the substring found is indeed a word is done via checking its next and previous characters (if they exist): if they belong to either A-Z or a-z range, it's not a word.
You can simplify it a bit by trying to lowercase these characters (with lc), then check against the single character range only:
my $lc_prev_char = lc( substr $test, $pos - 1, 1 );
next if $lc_prev_char ge 'a' && $lc_prev_char le 'z';
... but then again, it's quite a minor improvement (if improvement at all).
Now consider this:
my $test = 'CATALOG SCATTER CAT CATHARSIS CAT';
my $key = 'CAT';
while ($test =~ /(?<![A-Za-z])$key(?![A-Za-z])/g) {
print "Word '$key' found at " . ($-[0] + 1) . "th position.\n";
}
... and that's it! The pattern literally tests the string given ($test) for the substring given ($key) not being either preceded with or followed by the symbol of A-Za-z range, and supporting Perl regex magic (this variable, in particular) makes it easy to get the starting position of such substring.
The bottom line: use regexes to do the regexes' work.
Regular expressions allow for the search to contain word boundaries as well as distinct characters. While
my $string = "CATALOG SCATTER CAT CATHARSIS";
index($string, 'CAT');
will return zero or greater if $string contains the characters CAT, a regular expression like
$string =~ /\bCAT\b/;
will return false as $string doesn't contain CAT preceded and followed by a word boundary. (A word boundary is either the beginning or end of the string, or between an word character and a non-word character. A word character is any alphanumeric character or an underscore.)
use \E value.
so :
#!usr/bin/perl
my $string ="Little Tony";
my $check = "Ton";
if($string =~ m/$check\E/g)
{
print "match";
}
else
{
die("No Match");
}

How to compress 4 consecutive blank lines into one single line in Perl

I'm writing a Perl script to read a log so that to re-write the file into a new log by removing empty lines in case of seeing any consecutive blank lines of 4 or more. In other words, I'll have to compress any 4 consecutive blank lines (or more lines) into one single line; but any case of 1, 2 or 3 lines in the file will have to remain the format. I have tried to get the solution online but the only I can find is
perl -00 -pe ''
or
perl -00pe0
Also, I see the example in vim like this to delete blocks of 4 empty lines :%s/^\n\{4}// which match what I'm looking for but it was in vim not Perl. Can anyone help in this? Thanks.
To collapse 4+ consecutive Unix-style EOLs to a single newline:
$ perl -0777 -pi.bak -e 's|\n{4,}|\n|g' file.txt
An alternative flavor using look-behind:
$ perl -0777 -pi.bak -e 's|(?<=\n)\n{3,}||g' file.txt
use strict;
use warnings;
my $cnt = 0;
sub flush_ws {
$cnt = 1 if ($cnt >= 4);
while ($cnt > 0) {print "\n"; $cnt--; }
}
while (<>) {
if (/^$/) {
$cnt++;
} else {
flush_ws();
print $_;
}
}
flush_ws();
Your -0 hint is a good one since you can use -0777 to slurp the whole file in -p mode. Read more about these guys in perlrun So this oneliner should do the trick:
$ perl -0777 -pe 's/\n{5,}/\n\n/g'
If there are up to four new lines in a row, nothing happens. Five newlines or more (four empty lines or more) are replaced by two newlines (one empty line). Note the /g switch here to replace not only the first match.
Deparsed code:
BEGIN { $/ = undef; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
s/\n{5,}/\n\n/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
HTH! :)
One way using GNU awk, setting the record separator to NUL:
awk 'BEGIN { RS="\0" } { gsub(/\n{5,}/,"\n")}1' file.txt
This assumes that you're definition of empty excludes whitespace
This will do what you need
perl -ne 'if (/\S/) {$n = 1 if $n >= 4; print "\n" x $n, $_; $n = 0} else {$n++}' myfile