awk perl grep pattern match ignoring - perl

I have a file with about 100,000 lines that look more of less like this:
if (uri=~"^proto:[+]*55555.*"){
rewritehostport("10.10.10.2:1337");
rewritehostport("10.20.30.2:2345");
sl_send_reply("302", "Redirect");
exit;
}
if (uri=~"^proto:[+]*4444.*"){
rewritehostport("10.10.10.2:1337");
rewritehostport("10.20.30.2:2345");
sl_send_reply("302", "Redirect");
exit;
}
if (uri=~"^proto:[+]*3333.*"){
rewritehostport("10.10.10.2:1337");
rewritehostport("10.20.30.2:2345");
sl_send_reply("302", "Redirect");
exit;
}
I am looking for a method to selective ignore a variable (say 55555) along with the lines up until the curly bracket }
awk '/proto/{a=1} a; /{/{a=0}' myfile.cfg ignores the center piece but still yields the beginning portion:
if (uri=~"^proto:[+]*55555.*"){
I'd like to be able to look for certain patterns and ignore those I choose to ignore, e.g., find 5555 and 3333 and ignore that entire string, leaving 4444 alone. I initially thought something to the tune of:
awk '!/4444/ && /proto/{a=1} a; /{/{a=0}'
But its non functional. So I said hrmm perl loops:
if ($_[1] =~ /proto/) {
if ($_[6] =~ /\}/) {
print "something\n";
foreach (#_) {
print $_;
}
print "something\n";
}
}
Buttttttt... that wouldn't always work because some lines might be:
if (uri=~"^proto:[+]*9999.*"){
rewritehostport("10.10.10.2:1337");
sl_send_reply("302", "Redirect");
exit;
}
Then I thought: grep -wvf file_with_data_I_want_removed original_file >> new_file But that defeats the purpose because I'd have to create file_with_data_I_want_removed
In essence, I want to say:
for [ this list of numbers (55555, 3333) ]
go into this_file if_number_exists remove line with number along with everything until the nearest curly bracket while ignoring the other ones
done
if (uri=~"^proto:[+]*4444.*"){
rewritehostport("10.10.10.2:1337");
rewritehostport("10.20.30.2:2345");
sl_send_reply("302", "Redirect");
exit;
}

You were very close. Just rearranging the flag state should get you the desired output.
awk '/proto.*(55555|3333)/{a=0};a;/}/{a=1}' myfile.cfg
if (uri=~"^proto:[+]*4444.*"){
rewritehostport("10.10.10.2:1337");
rewritehostport("10.20.30.2:2345");
sl_send_reply("302", "Redirect");
exit;
}
You disable the flag when your pattern that needs to be skipped is seen.
You print the lines for which your flag is set.
When you see the end of pattern enable the flag.

You could set the record separator, through the RS variable to } :
awk '!/4444/' RS='}' ORS='}' file

Related

Print each line of a file

I have a file test.txt that reads as follows:
one
two
three
Now, I want to print each line of this file as follows:
.one (one)
.two (two)
.three (three)
I try this in Perl:
#ARGV = ("test.txt");
while (<>) {
print (".$_ \($_\)");
}
This doesn't seem to work and this is what I get:
.one
(one
).two
(two
).three
(three
)
Can some help me figure out what's going wrong?
Update :
Thanks to Aureliano Guedes for the suggestion.
This 1-liner seems to work :
perl -pe 's/([^\s]+)/.$1 ($1)/'
$_ will include the newline, e.g. one\n, so print ".$_ \($_\)" becomes something like print ".one\n (one\n).
Use chomp to get rid of them, or use s/\s+\z// to remove all trailing whitespace.
while (<>) {
chomp;
print ".$_ ($_)\n";
}
(But add a \n to print the newline that you do want.)
Besides the correct answer already given, you can do this in a oneliner:
perl -pe 's/(.+)/.$1 ($1)/'
Or if you prefer a while loop:
while (<>) {
s/(.+)/.$1 ($1)/;
print;
}
This simply modifies your current line to your desired output and prints it then.
Another Perl one-liner without using regex.
perl -ple ' $_=".$_ ($_)" '
with the given inputs
$ cat test.txt
one
two
three
$ perl -ple ' $_=".$_ ($_)" ' test.txt
.one (one)
.two (two)
.three (three)
$

String can't find the value in a same line

use strict;
use warnings;
my $str = "This is the test and new paragraph...\n";
if($str=~m/paragraph/gi) # First Loop
{
if($str=~m/test/gi) # Second Loop
{
print "Ok...\n";
}
else
{
print "Not Ok...\n";
}
}
if($str=~m/test/i) #it doesn't prints the value
Output is: Not Ok...
if($str=~m/test/gi) #it prints the value
Output is: Ok...
In above case if the string found the paragraph value and the second loop couldn't find the test value. However in the second loop if we inserted the global g it can.
Could you please someone can explain me whats happening. Thanks in advance.
/g is used for finding all matches of a pattern. It doesn't make sense to alter the pattern between matches. Generally speaking, if (/.../g) makes no sense and should be replaced with if (/.../).
There are advanced uses for if (/\G.../gc), but that's different. if (/.../g) only makes sense if you're unrolling a while loop. (e.g. while (1) { ...; last if !/.../g; ... }).
Here's what's happening in this specific circumstance:
Because you signaled you wanted to find all matches (by using /g), the position at which to start matching is set to the end of the match (denoted by ^ below).
This is the test and new paragraph...
---------^
You can see this using pos.
$ perl -e'
my $str = "This is the test and new paragraph...";
if ($str =~ /paragraph/g) {
CORE::say pos($str) // 0;
if ($str =~ /test/g) {
CORE::say pos($str) // 0;
}
}
'
34
The subsequent m/test/gi doesn't match because test does not appear at or after the position at which the last match ended.
The solution is to simply remove the g modifier from you match operators.
$ perl -e'
my $str = "This is the test and new paragraph...";
if ($str =~ /paragraph/) {
CORE::say pos($str) // 0;
if ($str =~ /test/) {
CORE::say pos($str) // 0;
}
}
'
0
0
From perldoc perlretut:
Global matching
The final two modifiers we will discuss here, //g and //c , concern multiple matches. The modifier //g stands for global matching and allows the matching operator to match within a string as many times as possible. In scalar context, successive invocations against a string will have //g jump from match to match, keeping track of position in the string as it goes along. You can get or set the position with the pos() function.
In the first test, you're using the global flag, then the position of cursor is memorized, so the second match doesn't find test because it is before paragraph.
You have to remove the global flag from the first match.
my $str = "This is the test and new paragraph...\n";
if ($str =~ /paragraph/i) {
if ($str =~ /test/i) {
print "Ok...\n";
} else {
print "Not Ok...\n";
}
}

Bash/perl Printing line(s) from file until a character with conditions

I'm trying to scan a file for lines containing a specific string, and print the lines to another file.
However, I need to print out multiple lines until ")" character IF the line containing the string ended in "," ignoring whitespaces.
Currently I'm using
for func in $fnnames
do
sed/"$func"/p <$file >>$CODEBASEDIR/function_signature -n
done
where $func contains the string I look for, but of course it doesn't work for the restriction.
Is there a way to do this? Currently using bash, but perl is fine also.
Thanks.
Your question is tricky because your restrictions are not precise. You say - I think - that a block should look like this:
foo,
bar,
baz)
Where foo is the string that starts the block, and closing parenthesis ends it. However, you could also be saying:
foo bar baz) xxxxxxxxxxx,
And you only want to print until the ), which is to say foo bar baz), IF the line ends with comma.
You could also be saying that only lines that end with a comma should be continued:
foo, # print + is continued
bar # print + is not continued
xxxxx # ignored line
foo # print + is not continued
foo,
bar,
baz) # closing parens also end block
Since I can only guess that you mean the first alternative, I give you two options:
use strict;
use warnings;
sub flip {
while (<DATA>) {
print if /^foo/ .. /\)\s*$/;
}
}
sub ifchain {
my ($foo, $print);
while (<DATA>) {
if (/^foo/) {
$foo = 1; # start block
print;
} elsif ($foo) {
if (/,\s*$/) {
print;
} elsif (/\)\s*$/) {
$foo = 0; # end block
print;
}
# for catching input errors:
else { chomp; warn "Mismatched line '$_'" }
}
}
}
__DATA__
foo1,
bar,
baz)
sadsdasdasdasd,
asda
adaffssd
foo2,
two,
three)
yada
The first one will print any lines found between a line starting with foo and a line ending with ). It will ignore the "lines end with comma" restriction. On the positive side, it can be simplified to a one-liner:
perl -ne 'print if /^foo/ .. /\)\s*$/' file.txt
The second one is just a simplistic if-structure that will consider both restrictions, and warn (print to STDERR) if it finds a line inside a block that does not match both.
perl -ne 'print if 1 .. /\).*,\s*$/'

How do I ignore multiple newlines in perl?

Suppose I have a file with these inputs:
line 1
line 2
line3
My program should only store "line1", "line2" and "line3" not the newlines. How do I achieve that?
My program already removed leading and trailing whitespaces but it doesn't help to remove newline.
I am setting $/ as \n because each input is separated by a \n.
while (<>) {
chomp;
next unless /\S/;
print "$_\n";
}
Set
$/ = q(); # that's an empty string, like "" or ''
while (<>) {
chomp;
...
}
The special value of the defined empty string is how you tell the input operator to treat one or more newlines as the terminator (preferring more), and also to get chomp to remove them all. That way each record always starts with real data.
Perl -n is the equivalent of wrapping while(<>) { } around your script. Assuming that all you need to do is eliminate blank lines, you can do it like this:
#! /usr/bin/perl -n
print unless ( /^$/ );
... On the other hand, if that's all you need to do, you might as well ditch perl and use
grep -n '^$'
Edit: your post says that you want to store values where lines are not blank... in that case, assuming that you don't have too much work to do in the rest of your script, you might do something like this:
#! /usr/bin/perl -n
my #values;
push #values, $_ unless ( /^$/ );
END {
# do whatever work you want to do here
}
... but this quickly reaches a point of limiting returns if you have very much code inside the END{} block.

Perl: basic question, function functionality

What does this function do?
sub MyDigit {
return <<END;
0030\t0039
END
}
That's called a "here-document", and is used for breaking strings up over multiple lines as an alternative to concatenation or list operations:
print "this is ",
"one line when printed, ",
"because print takes multiple ",
"arguments and prints them all!\n";
print "however, you can also " .
"concatenate strings together " .
"and print them all as one string.\n";
print <<DOC;
But if you have a lot of text to print,
you can use a "here document" and create
a literal string that runs until the
delimiter that was declared with <<.
DOC
print "..and now we're back to regular code.\n";
You can read more about here-documents in the manual: see perldoc perlop.
You’ve all missed the point!
It’s defining a user-defined property for use in \p{MyDigit} and \P{MyDigit} using regular expressions.
It’s like these:
sub InKana {
return <<'END';
3040 309F
30A0 30FF
END
}
Alternatively, you could define it in terms of existing property names:
sub InKana {
return <<'END';
+utf8::InHiragana
+utf8::InKatakana
END
}
You can also do set subtraction using a "C<->" prefix. Suppose you only
wanted the actual characters, not just the block ranges of characters.
You could weed out all the undefined ones like this:
sub IsKana {
return <<'END';
+utf8::InHiragana
+utf8::InKatakana
-utf8::IsCn
END
}
You can also start with a complemented character set using the "C" prefix:
sub IsNotKana {
return <<'END';
!utf8::InHiragana
-utf8::InKatakana
+utf8::IsCn
END
}
I figure I must be right, since I’m speaking ex camelis. :)
It uses something called a Here Document to return a string "0030\t0039"
It returns the string "0030\t0039\n" (\t being a tab and \n a newline that is being added because the line ends in a newline (obviously)).
<<FOO
sometext
FOO
Is a so-called heredoc, a way to conveniently write multi-line strings (though here it is used with only one line).
You can help yourself by trying a simple experiment:
C:\Temp> cat t.pl
#!/usr/bin/perl
use strict; use warnings;
print MyDigit();
sub MyDigit {
return <<END;
0030\t0039
END
}
Output:
C:\Temp> t | xxd
0000000: 2020 2020 3030 3330 0930 3033 390d 0a 0030.0039..
Now, in your case, the END is not lined up at the beginning of the line, so you should have gotten the message:
Can't find string terminator "END" anywhere before EOF at …