Extract multiline output if consist certain word AND inside a bracket - perl

I have an input of such:
Jun 29 16:46:13 iPhone SomeThing[79987] <Notice>: [AppName] file.x:115 DEBUG: ClassNumberOne viewControllers: (
"<UINavigationController: 0x105031a00>",
"<UINavigationController: 0x10505ba00>",
"<UINavigationController: 0x10486fe00>",
"<UINavigationController: 0x105052600>",
"<UINavigationController: 0x105065c00>"
)
Jun 29 16:46:13 iPhone SomeThing[79987] <Notice>: [AppName] file.x:151 DEBUG: ClassNumberTwo ARG2 2
There's two case to be matched here, one is of multi-line, another is of single line. The criteria is that it must have the DEBUG: keyword. For multiline, if that line has the keyword and and a (, then it should match until the end of ). Each line is separated by a newline separator. I can't figure this out. Currently I'm using a simple grep DEBUG: and that's it. But for the multi-line scenario, everything is lost beside the first one. And I'm not familiar with perl, any idea? Thanks in advance!
Note that I'm on iOS (jailbroken), thus another tool might be limited.
EDIT:
Expected output will be the whole line that matched the criteria, that's the same as the input example shown above. The actual input have tonnes of other lines that doesn't have the keyword DEBUG:, and thus will be ignored.

With any awk in any shell on every UNIX box:
$ awk 'f; /\)/{f=0} /DEBUG:/{print; f=/\(/}' file
Jun 29 16:46:13 iPhone SomeThing[79987] <Notice>: [AppName] file.x:115 DEBUG: ClassNumberOne viewControllers: (
"<UINavigationController: 0x105031a00>",
"<UINavigationController: 0x10505ba00>",
"<UINavigationController: 0x10486fe00>",
"<UINavigationController: 0x105052600>",
"<UINavigationController: 0x105065c00>"
)
Jun 29 16:46:13 iPhone SomeThing[79987] <Notice>: [AppName] file.x:151 DEBUG: ClassNumberTwo ARG2 2
Explanation:
awk ' # WHILE read line DO
f; # IF the flag `f` is set THEN print the current line ENDIF
/\)/{f=0} # IF the current line contains `)` THEN clear the flag ENDIF
/DEBUG:/ { # IF the current line contains `DEBUG:` THEN
print; # print the line
f=/\(/ # set the flag `f` to 1 if the line contains `(`, 0 otherwise
} # ENDIF
' file # ENDWHILE

Here is an example of using a Regexp in Perl (but this should probably be handled more accurately by a parser like Regexp::Grammars):
use feature qw(say);
use strict;
use warnings;
my $data = do { local $/; <> };
my #lines = $data
=~ /
^( (?:(?!$).)* DEBUG:
(?:
(?: [^(]*? $ )
|
(?: (?:(?!$).)* \( [^)]* \) .*? $ )
)
)/gmsx;
say for #lines;

Related

Can somebody explain this obfuscated perl regexp script?

This code is taken from the HackBack DIY guide to rob banks by Phineas Fisher. It outputs a long text (The Sixth Declaration of the Lacandon Jungle). Where does it fetch it? I don't see any alphanumeric characters at all. What is going on here? And what does the -r switch do? It seems undocumented.
perl -Mre=eval <<\EOF
''
=~(
'(?'
.'{'.(
'`'|'%'
).("\["^
'-').('`'|
'!').("\`"|
',').'"(\\$'
.':=`'.(('`')|
'#').('['^'.').
('['^')').("\`"|
',').('{'^'[').'-'.('['^'(').('{'^'[').('`'|'(').('['^'/').('['^'/').(
'['^'+').('['^'(').'://'.('`'|'%').('`'|'.').('`'|',').('`'|'!').("\`"|
'#').('`'|'%').('['^'!').('`'|'!').('['^'+').('`'|'!').('['^"\/").(
'`'|')').('['^'(').('['^'/').('`'|'!').'.'.('`'|'%').('['^'!')
.('`'|',').('`'|'.').'.'.('`'|'/').('['^')').('`'|"\'").
'.'.('`'|'-').('['^'#').'/'.('['^'(').('`'|('$')).(
'['^'(').('`'|',').'-'.('`'|'%').('['^('(')).
'/`)=~'.('['^'(').'|</'.('['^'+').'>|\\'
.'\\'.('`'|'.').'|'.('`'|"'").';'.
'\\$:=~'.('['^'(').'/<.*?>//'
.('`'|"'").';'.('['^'+').('['^
')').('`'|')').('`'|'.').(('[')^
'/').('{'^'[').'\\$:=~/('.(('{')^
'(').('`'^'%').('{'^'#').('{'^'/')
.('`'^'!').'.*?'.('`'^'-').('`'|'%')
.('['^'#').("\`"| ')').('`'|'#').(
'`'|'!').('`'| '.').('`'|'/')
.'..)/'.('[' ^'(').'"})')
;$:="\."^ '~';$~='#'
|'(';$^= ')'^'[';
$/='`' |'.';
$,= '('
EOF
The basic idea of the code you posted is that each alphanumeric character has been replaced by a bitwise operation between two non-alphanumeric characters. For instance,
'`'|'%'
(5th line of the "star" in your code)
Is a bitwise or between backquote and modulo, whose codepoints are respectively 96 and 37, whose "or" is 101, which is the codepoint of the letter "e". The following few lines all print the same thing:
say '`' | '%' ;
say chr( ord('`' | '%') );
say chr( ord('`') | ord('%') );
say chr( 96 | 37 );
say chr( 101 );
say "e"
Your code starts with (ignore whitespaces which don't matter):
'' =~ (
The corresponding closing bracket is 28 lines later:
^'(').'"})')
(C-f this pattern to see it on the web-page; I used my editor's matching parenthesis highlighting to find it)
We can assign everything in between the opening and closing parenthesis to a variable that we can then print:
$x = '(?'
.'{'.(
'`'|'%'
).("\["^
'-').('`'|
'!').("\`"|
',').'"(\\$'
.':=`'.(('`')|
'#').('['^'.').
('['^')').("\`"|
',').('{'^'[').'-'.('['^'(').('{'^'[').('`'|'(').('['^'/').('['^'/').(
'['^'+').('['^'(').'://'.('`'|'%').('`'|'.').('`'|',').('`'|'!').("\`"|
'#').('`'|'%').('['^'!').('`'|'!').('['^'+').('`'|'!').('['^"\/").(
'`'|')').('['^'(').('['^'/').('`'|'!').'.'.('`'|'%').('['^'!')
.('`'|',').('`'|'.').'.'.('`'|'/').('['^')').('`'|"\'").
'.'.('`'|'-').('['^'#').'/'.('['^'(').('`'|('$')).(
'['^'(').('`'|',').'-'.('`'|'%').('['^('(')).
'/`)=~'.('['^'(').'|</'.('['^'+').'>|\\'
.'\\'.('`'|'.').'|'.('`'|"'").';'.
'\\$:=~'.('['^'(').'/<.*?>//'
.('`'|"'").';'.('['^'+').('['^
')').('`'|')').('`'|'.').(('[')^
'/').('{'^'[').'\\$:=~/('.(('{')^
'(').('`'^'%').('{'^'#').('{'^'/')
.('`'^'!').'.*?'.('`'^'-').('`'|'%')
.('['^'#').("\`"| ')').('`'|'#').(
'`'|'!').('`'| '.').('`'|'/')
.'..)/'.('[' ^'(').'"})';
print $x;
This will print:
(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})
The remaining of the code is a bunch of assignments into some variables; probably here only to complete the pattern: the end of the star is:
$:="\."^'~';
$~='#'|'(';
$^=')'^'[';
$/='`'|'.';
$,='(';
Which just assigns simple one-character strings to some variables.
Back to the main code:
(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})
This code is inside a regext which is matched against an empty string (don't forget that we had first '' =~ (...)). (?{...}) inside a regex runs the code in the .... With some whitespaces, and removing the string within the eval, this gives us:
# fetch an url from the web using curl _quitely_ (-s)
($: = `curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)
# replace end of paragraphs with newlines in the HTML fetched
=~ s|</p>|\n|g;
# Remove all HTML tags
$: =~ s/<.*?>//g;
# Print everything between SEXTA and Mexicano (+2 chars)
print $: =~ /(SEXTA.*?Mexicano..)/s
You can automate this unobfuscation process by using B::Deparse: running
perl -MO=Deparse yourcode.pl
Will produce something like:
'' =~ m[(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})];
$: = 'P';
$~ = 'h';
$^ = 'r';
$/ = 'n';
$, = '(';

Perl: IRC Notice Printing

Having a few issues with the below code..
my $file=File::Tail->new("/var/log/messages");
while (defined(my $line=$file->read)) {
print $sock "NOTICE #logs $line";
}
As you can see I'm tailing the servers message logs (which works) and printing it into an IRC socket as a NOTICE, but for some reason it's only printing out the first word of each line into the channel - for example, it's only printing out 'Jan' as that's the month.
Can anyone help with this?
[06:55:48] IRCBOT (~IRCBOT#10.1.0.4) joined the channel.
[06:56:00] -IRCBOT- Jan
[06:56:00] -IRCBOT- Jan
[06:56:00] -IRCBOT- Jan
[06:56:00] -IRCBOT- Jan
[06:56:00] -IRCBOT- Jan
[06:56:02] -IRCBOT- Jan
Many Thanks in Advance!
EDIT: Just in case it matters, this is how I'm connecting to the IRC server..
use IO::Socket;
use File::Tail;
my $file=File::Tail->new("/var/log/messages");
my $server = "irc.example.co.uk";
my $nick = "IRCBOT";
my $login = "IRCBOT";
my $channel = "#logs";
my $sock = new IO::Socket::INET(PeerAddr => $server,
PeerPort => 6667,
Proto => 'tcp') or
die "Can't connect\n";
You need to prefix the text string with a : character,
print $sock "NOTICE #logs :$line";
If you want to escape the "$line", you can do something like:
print $sock "NOTICE #logs :\x01$line\x01";
By default, the IRC protocol separates parameters by spaces, you need to include the leading semicolon before a text string to indicate it is the trailing parameter and should not be separated.
NOTICE #logs Jan 1st 2014
is treated as a command plus 4 parameters,
NOTICE #logs :Jan 1st 2014 ... more stuff ... long line
is treated as a command plus one parameter that extends to CR LF (possibly including trailing whitespace)

How to match an integer after finding a keyword?

I have a text file content as below:
Starting log...
Sample at 10000000
Mode is set to 0
0007F43: CHANGE DETECTED at 290313 line 0 from 00 to 04
0007F46: Mismatched at 290316 line 0
0007F50: Matched occur at 290326 line 1
0007F53: Mismatched at 290336 line 2
0007F56: Matched occur at 290346 line 0
0007F60: Mismatched at 290356 line 2
0007F63: Matched occur at 290366 line 0
Saving log....
DONE!!!
I am running simple perl program as below to get the value for the line contains "Mismatched"
#!/usr/bin/perl
print "Starting perl script\n\n";
open (LOG,"dump.log");
while (<LOG>) {
next if !/Mismatched/;
/at\s+"([^"]+)"/;
print $1,"\n";
}
close(LOG);
print "DONE!!\n";
exit;
but what i get the error message as below, may I know what's wrong with my coding? Is it I miss anything related with chomp()?
Use of uninitialized value in print at test.pl line 9, <LOG> line 5.
Use of uninitialized value in print at test.pl line 9, <LOG> line 7.
Use of uninitialized value in print at test.pl line 9, <LOG> line 9.
DONE!!
And.. is there any suggestion to get the integer (i.e. 290316) after searching the keyword "Mismatched" by using more simple scripting? I just want to get the first value only..
$1 is getting printed even if it does not have anything. It should be in a condition:
print $1,"\n" if (/Mismatched at (\d+)/);
To store all values in an array:
push #arr,$1 if (/Mismatched at (\d+)/);
change regex to:
/at\s+(\d+)/;
You've got answers that show you the correct way to do this, but nothing yet that explains what you were doing wrong. The problem is in your regex.
/at\s+"([^"]+)"/
Let's break it down and see what it's trying to match.
at : the string 'at'
\s+ : one or more whitespace characters
" : a double quote character
([^"]+) : one or more characters that aren't double quote characters
" : a double quote character
So, effectively, you're looking for 'at' followed by a double quoted string. And you're capturing (into $1) the contents of the double quoted string.
But none of your data contains any double quote characters. So there are no double quoted strings. So nothing ever matches and nothing ever gets captured into $1. Which is why you get the 'uninitialised value' error when you try to print $1.
I'd be interested to hear why you thought you wanted to match double quote characters in a piece of text that doesn't contain any of them.
I'd change your script to implement a more modern perl style:
#!/usr/bin/perl
use strict;
use warnings;
print "Starting perl script\n\n";
open my $LOG, '<', 'dump.log' or die $!;
while( <$LOG> ) {
print "$1\n" if /Mismatched at (\d+)/;
}
close $LOG;
print "DONE!!\n";

Perl+Selenium: chomp() fails

I'm using Selenium for work and I have extract some data from "//ul", unfortunately this data contains a newline, I tried to use chomp() function to remove this (because I need to write in a CSV's file) but it's not working, the portion of code is:
open (INFO, '>>file.csv') or die "$!";
print INFO ("codice\;descrizione\;prezzo\;URLFoto\n");
my $sel = Test::WWW::Selenium->new( host => "localhost",
port => 4444,
browser => "*chrome",
browser_url => "http://www.example.com/page.htm" );
$sel->open_ok("/page.htm");
$sel->click_ok("//table[2]/tbody/tr/td/a/img");
$sel->wait_for_page_to_load_ok("30000");
my $descrizione = $sel->get_text("//ul");
my $prezzo = $sel->get_text("//p/font");
my $codice = $sel->get_text("//p/font/b");
my $img = $sel->get_attribute ("//p/img/\#src");
chomp ($descrizione);
print INFO ("$codice\;$descrizione\;$prezzo\;$img\n");
$sel->go_back_ok();
# Close file
close (INFO);
but the output is:
Art. S500 Set Yoga "Siddhartha";Idea regalo ?SET YOGA Siddhartha? Elegante scatola in cartone lucido contenente:
2 mattoni in legno naturale mis. cm 20 x 12,5 x 7
1 cinghia in cotone mis. cm 4 x 235
1 stuoia in cotone mis. cm 70 x 170
1 manuale di introduzione allo yoga stampato
Tutto rigorosamente realizzato con materiali natural;€ 82,50;../images/S500%20(Custom).jpg
chomp removes the platform specific end-of-line character sequence from the end of a string or a set of strings.
In your case, you seem to have a single string with embedded newlines and/or carriage returns. Hence, you probably want to replace any sequence of possible line endings with something else, let's say a single space character. In that case, you'd do:
$descrizione =~ s/[\r\n]+/ /g;
If you want to replace all vertical whitespace, Perl has a special character class shortcut for that:
use v5.10;
$descrizione =~ s/\v+/ /g;
Use this to remove \r as well.
$descrizione =~ s#[\r\n]+\z##;
regards,

Why does defined sdf return true in this Perl example?

I tried this example in Perl. Can someone explain why is it true?
if (defined sdf) { print "true"; }
It prints true.
sdf could be any name.
In addition, if there is sdf function defined and it returns 0, then it does not print anything.
print (sdf); does not print sdf string but
if (sdf eq "sdf")
{
print "true";
}
prints true.
The related question remains if sdf is a string. What is it not printed by print?
sdf is a bareword.
perl -Mstrict -e "print qq{defined\n} if defined sdf"
Bareword "sdf" not allowed while "strict subs" in use at -e line 1.
Execution of -e aborted due to compilation errors.
For more fun, try
perl -Mstrict -e "print sdf => qq{\n}"
See Strictly speaking about use strict:
The subs aspect of use strict disables the interpretation of ``bare words'' as text strings. By default, a Perl identifier (a sequence of letters, digits, and underscores, not starting with a digit unless it is completely numeric) that is not otherwise a built-in keyword or previously seen subroutine definition is treated as a quoted text string:
#daynames = (sun, mon, tue, wed, thu, fri, sat);
However, this is considered to be a dangerous practice, because obscure bugs may result:
#monthnames = (jan, feb, mar, apr, may, jun,
jul, aug, sep, oct, nov, dec);
Can you spot the bug? Yes, the 10th entry is not the string 'oct', but rather an invocation of the built-in oct() function, returning the numeric equivalent of the default $_ treated as an octal number.
Corrected: (thanks #ysth)
E:\Home> perl -we "print sdf"
Unquoted string "sdf" may clash with future reserved word at -e line 1.
Name "main::sdf" used only once: possible typo at -e line 1.
print() on unopened filehandle sdf at -e line 1.
If a bareword is supplied to print in the indirect object slot, it is taken as a filehandle to print to. Since no other arguments are supplied, print defaults to printing $_ to filehandle sdf. Since sdf has not been opened, it fails. If you run this without warnings, you do not see any output. Note also:
E:\Home> perl -MO=Deparse -e "print sdf"
print sdf $_;
as confirmation of this observation. Note also:
E:\Home> perl -e "print asdfg, sadjkfsh"
No comma allowed after filehandle at -e line 1.
E:\Home> perl -e "print asdfg => sadjkfsh"
asdfgsadjkfsh
The latter prints both strings because => automatically quotes strings on the LHS if they consist solely of 'word' characters, removing the filehandle interpretation of the first argument.
All of these examples show that using barewords leads to many surprises. You should use strict to avoid such cases.
This is a "bareword". If it is allowed, it has the value of "sdf", and is therefore not undefined.
The example isn't special:
telemachus ~ $ perl -e 'if (defined sdf) { print "True\n" };'
True
telemachus ~ $ perl -e 'if (defined abc) { print "True\n" };'
True
telemachus ~ $ perl -e 'if (defined ccc) { print "True\n" };'
True
telemachus ~ $ perl -e 'if (defined 8) { print "True\n" };'
True
None of those is equivalent to undef which is what defined checks for.
You might want to check out this article on truth in Perl: What is Truth?
defined returns true if the expression has a value other than the undefined value.
the defined function returns true unless the value passed in the argument is undefined. This is useful from distinguishing a variable containing 0 or "" from a variable that just winked into existence.