Perl: perl regex for extracting values from complex lines - perl

Input log file:
Nservdrx_cycle 4 servdrx4_cycle
HCS_cellinfo_st[10] (type = (LTE { 2}),cell_param_id = (28)
freq_info = (10560),band_ind = (rsrp_rsrq{ -1}),Qoffset1 = (0)
Pcompensation = (0),Qrxlevmin = (-20),cell_id = (7),
agcreserved{3} = ({ 0, 0, 0 }))
channelisation_code1 16/5 { 4} channelisation_code1
sync_ul_info_st_ (availiable_sync_ul_code = (15),uppch_desired_power =
(20),power_ramping_step = (3),max_sync_ul_trans = (8),uppch_position_info =
(0))
trch_type PCH { 7} trch_type8
last_report 0 zeroth bit
I was trying to extract only integer for my above inputs but I am facing some
issue with if the string contain integer at the beginning and at the end
For ( e.g agcreserved{3},HCS_cellinfo_st[10],Qoffset1)
here I don't want to ignore {3},[10] and 1 but in my code it does.
since I was extracting only integer.
Here I have written simple regex for extracting only integer.
MY SIMPLE CODE:
use strict;
use warnings;
my $Ipfile = 'data.txt';
open my $FILE, "<", $Ipfile or die "Couldn't open input file: $!";
my #array;
while(<$FILE>)
{
while ($_ =~ m/( [+-]?\d+ )/xg)
{
push #array, ($1);
}
}
print "#array \n";
output what I am getting for above inputs:
4 4 10 2 28 10560 -1 1 0 0 -20 7 3 0 0 0 1 16 5 4 1 15 20 3 8 0 7 8 0
expected output:
4 2 28 10560 -1 0 0 -20 7 0 0 0 4 15 20 3 8 0 7 0
If some body can help me with explanation ?

You are catching every integer because your regex has no restrictions on which characters can (or can not) come before/after the integer. Remember that the /x modifier only serves to allow whitespace/comments inside your pattern for readability.
Without knowing a bit more about the possible structure of your output data, this modification achieves the desired output:
while ( $_ =~ m! [^[{/\w] ( [+-]?\d+ ) [^/\w]!xg ) {
push #array, ($1);
}
I have added rules before and after the integer to exclude certain characters. So now, we will only capture if:
There is no [, {, /, or word character immediately before the number
There is no / or word character immediately after the number
If your data could have 2-digit numbers in the { N} blocks (e.g. PCH {12}) then this will not capture those and the pattern will need to become much more complex. This solution is therefore quite brittle, without knowing more of the rules about your target data.

Related

horizontal absolute values of every line

I am trying to calculate the absolute values of line 2 - values of line 1
and then the horizontal absolute values of every line in my input file. Here's a part of that input.
43 402 51 360
63 60 69 63
65 53 89 55
103 138 135 135
109 36 123 38
To be more precise about what im trying to do I made the following example
initial data
0 2 0 0
0 1 1 1
next stage (absolute value after subscription the second line minus the first line)
2 2 0
1 0 0
final stage (horizontal application of abs values until one column remained)
0
1
The below code was a failed attempt to obtain the final stage of the single column. My problem here is that I don't know how to obtain the final (desired) stage by using subroutine, as I believe that it is a better way to solving my problem. Of course, every idea or better approach is welcome.
#!/usr/bin/perl
use feature qw(say);
use strict;
use warnings;
use Data::Dumper;
my #rows = 'table_only_numbers';
open(my $fh, '<:encoding(UTF-8)', $rows)
sub ori {
for ($num_cols=#{ $rows[$r-1]}; $num_cols=1; $num_cols-- ){
my #diff_diffs = map { abs($diffs[$_-1] - $diffs[$_]) } 1..$num_cols-1;
#final=#diff_diffs;
say join ' ',#final;
return (final) }
my $num_cols = #{ $rows[0] };
for my $r (1..$#rows) {
die "Bad format!" if #{ $rows[$r] } != $num_cols;
my #diffs = map { abs($rows[$r-1][$_] - $rows[$r][$_]) } 0..$num_cols-1;
while ($num_cols>1)
{
$final_output = ori(#{ $rows[0] })
say "final_output";
}
}
close $fh;
Finally, I figure it by myself without subroutines!!! Im posting it in case someone face the same issue in the future.I know that it is an easiest way to do it but as I am newbie in Perl it is the easiest way for me.
So I used:
for the first abs of the line 2 minus the line 1
my #data = map { abs($current[$_]-$previous[$_]) } 0..$#current;
push #final, \#data;
To obtain the absolute value of row 2 minus the row 1
And after I used 3 times as I had 3 columns left (in my case) the following coding line and each time I Substituted the #xxx with a new variable. and I have desired output of I column.
foreach my $row (#XXX) {
my #data = map { abs(#{$row}[$_]-#{$row}[$_+1]) } 0..$#{$row}-1;
say join ' ', #data;
push #XXX, \#data;}

Combining two different multi-row strings into one using Perl

I'm having some issues trying to combine two multi-row strings into one after performing regular expression manipulations on those strings. As an example, I start with data in this form:
TMS: xxxxxxx11110000
TDI: xxxxxxx00001111
TMS: xxxx00001111
TDI: xxxx11110000
To get it in the form I need, I search the file for the key word "TMS: ", extract just the data, use regular expressions to remove the "x's", reverse the data, and then place each bit on its own line and store it in a string. Resultant string would look like this:
0
0
0
0
1
1
1
1
I then search the file for "TDI: " and repeat that same process. The last step would be to concatenate the first string with the second string to get the following output (given the example above):
01
01
01
01
10
10
10
10
10
10
10
10
01
01
01
01
However, when I concatenate the two strings, what I'm getting as an output right now is
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
1
Is there a way to get the result I'm looking for without changing too much about my process? I've tried using the split command, chomp command, etc. without any luck.
Would be good to have a minimal example to see how you're approaching this problem. Additionally, there are a lot of things about your input file that are not clear. For example, are TMS and TDI always paired in the file, or do you have to check for that? Will you always take the next TDI instance to pair with the preceeding TMS event, or can they be more disjointed? Does TMS always preceed TDI or can they be reversed?
One simple way to do this assuming that the data look just like you've indicated in your example, might be to read each line and store the data in one array for the TMS string and one array for the TDI string. If both arrays are full, then we have a pair to output, so output the pair and clear the arrays for the next events. Otherwise, read the next line to get the TDI data:
#!/usr/bin/env perl
use strict;
use warnings;
my (#first, #second);
while (my $elem = <DATA>) {
($elem =~ /^TMS/)
? (#first = read_string($elem))
: (#second = read_string($elem));
if (#second) {
for my $index (0..$#first) {
print "$first[$index]$second[$index]\n";
}
print "\n";
#first = #second = ();
}
}
sub read_string {
my $string = shift;
my #bits = grep {/\d/} split('', $string);
return reverse(#bits);
}
__DATA__
TMS: xxxxxxx11110000
TDI: xxxxxxx00001111
TMS: xxxx00001111
TDI: xxxx11110000
Output from this would be:
01
01
01
01
10
10
10
10
10
10
10
10
01
01
01
01
What you want is a zip operation. Conveniently List::MoreUtils provides one for you.
#x = qw/a b c d/;
#y = qw/1 2 3 4/;
# returns [a, 1], [b, 2], [c, 3], [d, 4]
#z = zip6 #x, #y;
To get the input for zip either put your resultants into an array in the first place, or split your input string.
hobbs answer from Code Golf: Lasers was solving quite a different problem, but part of the solution was about how to "rotate" a multi-line string, and it could be useful here.
First, don't put each bit on its own line, just separate bits from different rows of input on different lines. Put the multi-line string into $_.
$_ = '0000111111110000
1111000000001111';
Now execute the following code:
$_ = do {
my $o;
$o .= "\n" while s/^./$o.=$&,""/meg;
$o };
(the substitution in hobbs's algorithm started with s/.$/.../. By using s/^./.../, it becomes an algorithm for transposition rather than for rotation)
Input:
$_ = '0000111111110000
1111000000001111';
Output:
01
01
01
01
10
10
10
10
10
10
10
10
01
01
01
01
This algorithm easily generalizes to any number of rows and columns in the input.
Input:
$_='ABCDE
12345
FGHIJ
67890';
Output:
A1F6
B2G7
C3H8
D4I9
E5J0

Insert the highest value among the number of times it occurs

I have two files:
1) Tab file with the following content. Let's call this reference file:
V$HMGIY_01_rc Ncor=0.405
V$CACD_01 Ncor=0.405
V$GKLF_02 Ncor=0.650
V$AML2_Q3 Ncor=0.792
V$WT1_Q6 Ncor=0.607
V$KID3_01 Ncor=0.668
V$CNOT3_01 Ncor=0.491
V$KROX_Q6 Ncor=0.423
V$ETF_Q6_rc Ncor=0.547
V$E2F_Q2_rc Ncor=0.653
V$SP1_Q6_01_rc Ncor=0.650
V$SP4_Q5 Ncor=0.660
2) The second tab file contains the search string X as shown below. Let's call this file as search_string:
A X
NF-E2_SC-22827 NF-E2
NRSF NRSF
NFATC1_SC-17834 NFATC1
NFKB NFKB
TCF3_SC-349 TCF3
MEF2A MEF2A
what I have already done is: Take the first search term (from search_string file; column X), check if it occurs in first column of the reference file. Example: The first search term is NF-E2. I checked if this string occurs in the first column of the reference file. If it occurs, then give a score of 1, else give 0. Also i have counted the number of times it matches the pattern. Now my output is of the format:
Keyword Keyword in file? Number of times keyword occurs in file
NF-E2 1 3
NRSF 0 0
NFATC1 0 0
NFKB 1 7
TCF3 0 0
Now, in addition to this, what I would like to add is the highest Ncor value for each string in each file. Say for example: while I search for NF-E2 in NF-E2.txt, the Ncor values present are: 3.02, 2.87 and 4.59. Then I want the value 4.59 to be printed in the next column. So now my output should look like:
Keyword Keyword in file? Number of times keyword occurs in file Ncor
NF-E2 1 3 4.59
NRSF 0 0
NFATC1 0 0
NFKB 1 7 1.66
TCF3 0 0
Please note: I need to search each string in different files i.e. The first string (Nf-E2) should be searched in file NF-E2.tab; the second string (NRSF) should be searched in file NRSF.tab and so on.
Here is my code:
perl -lanE '$str=$F[1]; $f="/home/$str/list/$str.txt"; $c=`grep -c "$str" "$f"`;chomp($c);$x=0;$x++ if $c;say "$str\t$x\t$c"' file2
PLease help!!!
This should work:
#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
chomp;
my $keyword = (split /\s+/)[1];
my $file = "/home/$keyword/list/${keyword}.txt";
open my $reference, '<', "$file" or die "Cannot open $file: $!";
my $key_cnt = 0;
my $max_ncor = 0;
while (my $line = <$reference>) {
my ($string, undef, $ncor) = split /\s+|=/, $line;
if ($string =~ $keyword) {
$key_cnt++;
$max_ncor = $ncor if ($max_ncor < $ncor);
}
}
print join("\t", $keyword, $key_cnt ? 1 : 0, $key_cnt, $key_cnt ? $max_ncor : ''), "\n";
}
Run it like this:
perl t.pl search_string.txt

Greping an array obtained through NET:TELNET

I'm writing a Munin-Pluging and I like to capture the screen output from a telnet session.
The output of such a session looks as follows:
...
0x00017 0x41b3f340 BPING 0 0 0 0 198 132 330
0x00018 0x41b47340 CHKFAILED 0 0 0 0 198 132 330
0x00026 0x41b4f340 CIP 0 0 0 0 370 264 634
0x0001e 0x41b57340 CONTROL 0 1 0 0 3876 2178 6054
0x01014 0x41b5f340 UNETSRVR 0 0 0 1 296 198 494
0x00037 0x41b67340 ---- 0 0 0 0 198 132 330
0x00000 0x43b67450 ---- 0 0 0 0 0 0 0
0x00000 0x4bb67450 ---- 0 0 0 0 5084 4224 9308
0x00000 0x49367450 ---- 0 0 0 0 14742 4158 18900
-------------------------------------------------------------------------------------------
SUMMARY : 2 40 5 7 4898229 2728176 7626405
This script extract the screen content into an array (#lines).
#!/usr/bin/perl
use Net::Telnet ();
use strict;
use warnings;
my $t = new Net::Telnet (Timeout => 10);
$t->port(777);
$t->open("192.168.0.1");
$t->buffer_empty;
my #lines = $t->waitfor(match =>"m/.* SUMMARY : .* \n/");
my #gagu = grep { "$_" =~ /^.*BPING.*\n/ } #lines;
print #gagu;
Of what type is the array #lines?
Why do I always get the whole
content from grep and not a filtered line?
Is the array i got from net:telnet different from other arrays?
Yes, I'm new to Perl.
I am not familiar with this module and what it does, but I assume it gives you some sort of return value similar to what you have stated.
If you are getting all the lines in your #gagu array, that can be either that your data in the #lines array consists of just one line, or that the grep fails.
For example, #lines may contain the string:
"foo bar baz\nfoo1 bar1 baz1\n";
and not, as you expect
"foo bar baz\n";
"foo1 bar1 baz1\n";
Your grep statement probably works as expected, though you might want to consider:
Not quoting $_, since that serves no purpose.
Not using $_ at all, since that is the default variable it is not needed (except for clarity) to use it.
Not using anchors ^ and \n, because they are redundant.
For example, ^.* matches any string, anywhere. Using it to simply match a string is redundant. Ending the regex with .*\n is redundant, because all it says is "match any character except newline until we find a newline". Assuming you have newlines, it does nothing. Assuming you don't, it gives you a false negative. All you need for this match is /BPING/. So here's what your code might look like:
use Data::Dumper;
my #lines = $t->waitfor(match =>"m/ SUMMARY :/");
my #gagu = grep /BPING/, #lines;
print Dumper \#gagu;
If you want to see whitespace printed out visibly, you can use the $Data::Dumper::Useqq variable:
$Data::Dumper::Useqq = 1;
print Dumper \#gagu;
Printing variables is a very good debugging tool.

Help writing flexible splits, perl

A couple weeks ago I posted a question about trouble I was having parsing an irregularly-formatted data file. Here's a sample of the data:
01-021412 15/02/2007 207,000.00 14,839.00 18 -6 2 6 6 5 16 6 4 4 3 -28 -59 -88 -119
-149 -191 -215 -246
Atraso Promedio ---> 2.88
I need a program that would extract 01-021412, 18, count and sum all the digits in the subsequent series, and store atraso promedio, and that could repeat this operation for over 40,000 entires. I received a very helpful response, and from that was able to write the code:
use strict;
use warnings;
#Create an output file
open(OUT, ">outFull.csv");
print OUT "loanID,nPayments,atrasoPromedio,atrasoAlt,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72\n";
open(MYINPUTFILE, "<DATOS HISTORICO ASPIRE2.txt");
my #payments;
my $numberOfPayments;
my $loanNumber;
while(<MYINPUTFILE>)
{
if(/\b\d{2}-\d{6}\b/)
{
($loanNumber, undef, undef, undef, $numberOfPayments, #payments) = split;
}
elsif(m/---> *(\d*.\d*)/)
{
my (undef, undef, undef, $atrasoPromedio) = split;
my $N = scalar #payments;
print "$numberOfPayments,$N,$loanNumber\n";
if($N==$numberOfPayments){
my $total = 0;
($total+=$_) for #payments;
my $atrasoAlt = $total/$N;
print OUT "$loanNumber,$numberOfPayments,$atrasoPromedio,$atrasoAlt,",join( ',', #payments),"\n";
}
}
else
{
push(#payments, split);
}
}
This would work fine, except for the fact that about 50 percent of entries include an '*' as follows:
* 01-051948 06/03/2009 424,350.00 17,315.00 48 0 6 -2 0 21 10 9 13 10 9 7 13 3 4
12 -3 14 8 6
Atraso Promedio ---> 3.02
The asterisk causes the program to fail because it interrupts the split pattern, causing incorrect variable assignments. Until now I've dealt with this by removing the asterisks from the input data file, but I just realized that by doing this the program actually omits these loans altogether. Is there an economical way to modify my script so that it handles entries with and without asterisks?
As an aside, if an entry does include an asterisk I would like to record this fact in the output data.
Many thanks in advance,
Aaron
Use an intermediate array:
my $has_asterisk;
# ...
if(/\b\d{2}-\d{6}\b/)
{
my #fields = split;
$has_asterisk = $fields[0] eq '*';
shift #fields if $has_asterisk;
($loanNumber, undef, undef, undef, $numberOfPayments, #payments) = #fields;
}
You could discard the asterisk before doing the split :
while(<MYINPUTFILE>) {
s/^\s*\*\s*//;
if(/\b\d{2}-\d{6}\b/) {
($loanNumber, undef, undef, undef, $numberOfPayments, #payments) = split;
...
And, apart of this, you should use 3 args open, lexical filehandles and test open for failure.
my $file = 'DATOS HISTORICO ASPIRE2.txt';
open my $MYINPUTFILE, '<', $file or die "unable to open '$file' for reading : $!";
so it looks like your first if statement regex is not accounting for that '*', so how about we modify it. my perl regex skillz are a little rusty, note that this is untested.
if(/(?:\* )?\b\d{2}-\d{6}\b/)
* is a modifier meaning "zero or more times" so we need to escape it, \*
(?: ) means "group this together but don't save it", I just use that so I can apply the ? to both the space and * at the same time
At beginning of the while loop, try this:
...
while(<MYINPUTFILE>)
{
my $asterisk_exists = 0;
if (s/^\* //) {
$asterisk_exists = 1;
}
...
In addition to removing the asterisk by using the s/// function, you also keep track of whether or not the asterisk was there in the first place. With the asterisk removed, the rest of your script should function as normal.