/* start of maker a_b.c[0] */
/* start of maker a_b.c[1] */
maker ( "a_b.c[0]" )
maker ( "a_b.c[1]" )
How to extract the strings inside double quotes and store them into an array? Here's what i have tried.
open(file, "P2.txt");
#A = (<file>) ;
foreach $str(#A)
{
if($str =~ /"a_b.c"/)
{
print "$str \n";
}
}
Note: Only content inside double quotes have to be stored into an array. If you see the 1st line of example inside slashes, you'll see same string that i want to match. That shouldn't get printed. So only the string inside double quotes should be stored into an array. Even if the same string gets repeated somewhere else without double quotes, it should not get printed. .
It's not about looking for strings in double quotes. It's about defining a pattern (a regular expression) that matches the lines that you want to find.
Here's the smallest change that I can make to your code in order to make this work:
open(file, "P2.txt");
#A = (<file>) ;
foreach $str(#A)
{
if($str =~ /"a_b.c/) # <=== Change here
{
print "$str \n";
}
}
All I've done is to remove the closing double-quote from your match expression. Because you don't care what comes after that, you don't need to specify it in the regular expression.
I should point out that this isn't completely correct. In a regular expression, a dot has a special meaning (it means "match any character here") so to match an actual dot (which is what you want), you need to escape the dot with a backslash. So it should be:
if($str =~ /"a_b\.c/)
Rewriting to use a few more modern Perl practices, I would do something like this:
# Two safety nets to find problems in your code
use strict;
use warnings;
# say() is a better print()
use feature 'say';
# Use a variable for the filehandle (and declare it with 'my')
# Use three-arg version of open()
# Check return value from open() and die if it fails
open(my $file, '<', "P2.txt") or die $!;
# Read data directly from filehandle
while ($str = <$file>)
{
if ($str =~ /"a_b\.c/)
{
say $str;
}
}
You could even use the implicit variable ($_) and statement modifiers to make your loop even simpler.
while (<$file>) {
say if /"a_b\.c/;
}
Looking at the sample input you provided, the task can be paraphrased as "extract single string arguments to things that look like function invocations". It seems like there is the added complication not matching in C-style comments. For that, note perlfaq -q comment.
As the FAQ entry demonstrates, ignoring content in arbitrary C-style comments is generally not trivial. I decided to try C::Tokenize to help:
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
use C::Tokenize qw( tokenize );
use Const::Fast qw( const );
use Path::Tiny qw( path );
sub is_open_paren {
($_[0]->{type} eq 'grammar') && ($_[0]->{grammar} eq '(');
}
sub is_close_paren {
($_[0]->{type} eq 'grammar') && ($_[0]->{grammar} eq ')');
}
sub is_comment {
$_[0]->{type} eq 'comment';
}
sub is_string {
$_[0]->{type} eq 'string';
}
sub is_word {
$_[0]->{type} eq 'word';
}
sub find_single_string_args_in_invocations {
my ($source) = #_;
my $tokens = tokenize(path( $source )->slurp);
for (my $i = 0; $i < #$tokens; ++$i) {
next if is_comment( $tokens->[$i] );
next unless is_word( $tokens->[$i] );
next unless is_open_paren( $tokens->[$i + 1] );
next unless is_string( $tokens->[$i + 2] );
next unless is_close_paren( $tokens->[$i + 3]);
say $tokens->[$i + 2]->{string};
$i += 3;
}
}
find_single_string_args_in_invocations($ARGV[0]);
which, with your input, yields:
C:\Temp> perl t.pl test.c
"a_b.c[0]"
"a_b.c[1]"
Related
I'm having some problem with a subroutine that locates certain files and extracts some data out of them.
This subroutine is called inside a foreach loop, but whenever the call is made the loop skips to its next iteration. So I am wondering whether any of the next;'s are somehow escaping from the subroutine to the foreach loop where it is called?
To my knowledge the sub looks solid though so I'm hoping if anyone can see something I'm missing?
sub FindKit{
opendir(DH, "$FindBin::Bin\\data");
my #kitfiles = readdir(DH);
closedir(DH);
my $nametosearch = $_[0];
my $numr = 1;
foreach my $kitfile (#kitfiles)
{
# skip . and .. and Thumbs.db and non-K-files
if($kitfile =~ /^\.$/) {shift #kitfiles; next;}
if($kitfile =~ /^\.\.$/) {shift #kitfiles; next;}
if($kitfile =~ /Thumbs\.db/) {shift #kitfiles; next;}
if($kitfile =~ /^[^K]/) {shift #kitfiles; next;}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<","data\\$kitfile") or die "$!";
while (<$fhkits>) {}
if ($. <= 1) {
print " Empty File!";
next;
}
seek($fhkits,0,0);
while (my $kitrow = <$fhkits>) {
if ($. == 0 && $kitrow =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/g) {
close $fhkits;
return $1;
}
}
$numr++;
close $fhkits;
}
return 0;
}
To summarize comments, the refactored code:
use File::Glob ':bsd_glob';
sub FindKit {
my $nametosearch = $_[0];
my #kitfiles = glob "$FindBin::Bin/data/K*"; # files that start with K
foreach my $kitfile (#kitfiles)
{
open my $fhkits, '<', $kitfile or die "$!";
my $kitrow_first_line = <$fhkits>; # read first line
return if eof; # next read is end-of-file so it was just header
my ($result) = $kitrow_first_line =~
/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
return $result if $result;
}
return 0;
}
I use core File::Glob and enable :bsd_glob option, which can handle spaces in filenames. I follow the docs note to use "real slash" on Win32 systems.
I check whether there is only a header line using eof.†
I do not see how this can affect the calling code, other than by its return value. Also, I don't see how the posted code can make the caller skip the beat, either. That problem is unlikely to be in this sub.
Please let me know if I missed some point with the above rewrite.
† Previous version used to check whether there is just one (header) line by
1 while <$fhkits>; # check number of lines ...
return if $. == 1; # there was only one line, the header
Also correct but eof is way better
The thing that is almost certainly screwing you here, is that you are shifting the list that you are iterating.
That's bad news, as you're deleting elements ... but in places you aren't necessarily thinking.
For example:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = qw ( one two three );
my $count;
foreach my $value ( #list ) {
print "Iteration ", ++$count," value is $value\n";
if ( $value eq 'two' ) { shift #list; next };
}
print "#list";
How many times do you think that should iterate, and which values should end up in the array?
Because you shift you never process element 'three' and you delete element 'one'. That's almost certainly what's causing you problems.
You also:
open using a relative path, when your opendir used an absolute one.
skip a bunch of files, and then skip anything that doesn't start with K. Why not just search for things that do start with K?
read the file twice, and one is to just check if it's empty. The perl file test -z will do this just fine.
you set $kitrow for each line in the file, but don't really use it for anything other than pattern matching. It'd probably work better using implicit variables.
You only actually do anything on the first line - so you don't ever need to iterate the whole file. ($numr seems to be discarded).
you use a global match, but only use one result. The g flag seems redundant here.
I'd suggest a big rewrite, and do something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use FindBin;
sub FindKit{
my ($nametosearch) = #_;
my $numr = 1;
foreach my $kitfile (glob "$FindBin::Bin\\data\\K*" )
{
if ( -z $kitfile ) {
print "$kitfile is empty\n";
next;
}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<", $kitfile) or die "$!";
<$kitfile> =~ m/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/
and return $1;
return 0;
}
}
As a big fan of the Path::Tiny module (me have it always installed and using it in every project) my solution would be:
use strict;
use warnings;
use Path::Tiny;
my $found = FindKit('mykit');
print "$found\n";
sub FindKit {
my($nametosearch) = #_;
my $datadir = path($0)->realpath->parent->child('data');
die "$datadir doesn't exists" unless -d $datadir;
for my $file ($datadir->children( qr /^K/ )) {
next if -z $file; #skip empty
my #lines = $file->lines;
return $1 if $lines[0] =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
}
return;
}
Some comments and still opened issues:
Using the Path::Tiny you could always use forward slashes in the path-names, regardless of the OS (UNIX/Windows), e.g. the data/file will work on windows too.
AFAIK the FindBin is considered broken - so the above uses the $0 and realpath ...
what if the Kit is in multiple files? The above always returns on the 1st found one
the my #lines = $file->lines; reads all lines - unnecessary - but on small files doesn't big deal.
the the reality this function returns the arg for the Maakartikel, so probably better name would be find_articel_by_kit or find_articel :)
easy to switch to utf8 - just change the $file->lines to $file->lines_utf8;
I have the following code that uses Text::CSV:
#!/usr/bin/perl
package main;
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV -> new ({ binary => 0, eol => $/ });
open my $io, "<", $file or die "$file: $!";
while (my $row = $csv -> getline ($io))
{
my #fields = #$row;
while(my $field = <#fields>)
{
print $field."\n";
}
}
(adapted from the Text::CSV documentation)
When I try to loop through my #fields array after assigning it the values from #$row, it breaks up the values on spaces. For instance
"FOO","BAR","IM FOO BAR'D"
comes out as
FOO
BAR
IM
FOO
BAR'D
Why is this happening and how can I fix it?
Try
for my $field (#fields)
instead of
while(my $field = <#fields>)
That while statement does not do what you think it does. It's equivalent to
while (my $field = glob "#fields")
which isn't remotely what you meant. (glob breaks its argument on spaces and tries to expand wildcards, matching files on disk. Your argument doesn't have any wildcards, so it winds up meaning about the same as split ' ', "#fields").
The problem is in your second loop:
while(my $field = <#fields>)
{
print $field."\n";
}
You can see what's actually going on using B::Deparse:
$ perl -MO=Deparse -e 'while (my $field = <#fields>) { print $field."\n" }'
use File::Glob ();
while (defined(my $field = glob(join($", #fields)))) {
do {
print $field . "\n"
};
}
-e syntax OK
Let's break that down a bit:
join($", #fields)
joins the elements of #fields into a single string, separated by $" (default is a single space). So if #fields contains FOO, BAR, and IM FOO BAR'D, the result of the join will be
FOO BAR IM FOO BAR'D
Now, what does glob do? From perldoc -f glob:
In list context, returns a (possibly empty) list of filename expansions on the value of EXPR such as the standard Unix shell /bin/csh would do. In scalar context, glob iterates through such filename expansions, returning undef when the list is exhausted. This is the internal function implementing the <*.c> operator [...]
Note that glob splits its arguments on whitespace and treats each segment as separate pattern.
So
glob("FOO BAR IM FOO BAR'D")
in scalar context will return FOO, followed by BAR, followed by IM, and so on.
As cjm suggests, change your loop to something like this to fix:
foreach my $field (#fields)
{
print "$field\n";
}
Or better yet, instead of copying the contents of #$row into #fields and looping through #fields, simply loop through #$row directly:
foreach my $field (#$row) {
print "$field\n";
}
I have been learning perl for the past two weeks. I have been writing some perl scripts for my school project. I need to parse a text file for multiple strings. I searched perl forums and got some information.The below function parses a text file for one string and returns a result. However I need the script to search the file for multiple strings.
use strict;
use warnings;
sub find_string {
my ($file, $string) = #_;
open my $fh, '<', $file;
while (<$fh>) {
return 1 if /\Q$string/;
}
die "Unable to find string: $string";
}
find_string('filename', 'string');
Now for instance if the file contains multiple strings with regular expressions as listed below
"testing"
http://www.yahoo.com =1
http://www.google.com=2
I want the function to search for multiple strings like
find_string('filename', 'string1','string2','string3');
Please can somebody explain me how i need to do that.It would be really helpful
Going through this very quickly here:
You right now pass the name of a file, and one string. What if you pass multiple strings:
if ( find_string ( $file, #strings ) ) {
print "Found a string!\n";
}
else {
print "No string found\n";
}
..
sub find_string {
my $file = shift;
my #strings = #_;
#
# Let's make the strings into a regular expression
#
my $reg_exp = join "|" ,#strings; # Regex is $string1|$string2|$string3...
open my $fh, "<", $file or die qq(Can't open file...);
while ( my $line = <$fh> ) {
chomp $line;
if ( $line =~ $reg_exp ) {
return 1; # Found the string
}
}
return 0; # String not found
}
I am about to go into a meeting, so I haven't really even tested this, but the idea is there. A few things:
You want to handle characters in your strings that could be regular expression characters. You can use either the quotemeta command, or use \Q and \E before and after each string.
Think about using use autodie to handle files that can't be open. Then, you don't have to check your open statement (like I did above).
There are limitations. This would be awful if you were searching for 1,000 different strings, but should be okay with a few.
Note how I use a scalar file handle ($fh). Instead of opening your file via the subroutine, I would pass in a scalar file handle. This would allow you to take care of an invalid file issue in your main program. That's the big advantage of scalar file handles: They can be easily passed to subroutines and stored in class objects.
Tested Program
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
use feature qw(say);
use constant {
INPUT_FILE => 'test.txt',
};
open my $fh, "<", INPUT_FILE;
my #strings = qw(foo fo+*o bar fubar);
if ( find_string ( $fh, #strings ) ) {
print "Found a string!\n";
}
else {
print "No string found\n";
}
sub find_string {
my $fh = shift; # The file handle
my #strings = #_; # A list of strings to look for
#
# We need to go through each string to make sure there's
# no special re characters
for my $string ( #strings ) {
$string = quotemeta $string;
}
#
# Let's join the stings into one big regular expression
#
my $reg_exp = join '|', #strings; # Regex is $string1|$string2|$string3...
$reg_exp = qr($reg_exp); # This is now a regular expression
while ( my $line = <$fh> ) {
chomp $line;
if ( $line =~ $reg_exp ) {
return 1; # Found the string
}
}
return 0; # String not found
}
autodie handles issues when I can't open a file. No need to check for it.
Notice I have three parameters in my open. This is the preferred way.
My file handle is $fh which allows me to pass it to my find_string subroutine. Open the file in the main program, and I can handle read errors there.
I loop through my #strings and use the quotemeta command to automatically escape special regular expression characters.
Note that when I change $string in my loop, it actually modifies the #strings array.
I use qr to create a regular expression.
My regular expression is /foo|fo\+\*o|bar|fubar/.
There are a few bugs For example, the string fooburberry will match with foo. Do you want that, or do you want your strings to be whole words?
I'm happy to see use strict and use warnings in your script. Here is one basic way to do it.
use strict;
use warnings;
sub find_string {
my ($file, $string1, $string2, $string3) = #_;
my $found1 = 0;
my $found2 = 0;
my $found3 = 0;
open my $fh, '<', $file;
while (<$fh>) {
if ( /$string1/ ) {
$found1 = 1;
}
if ( /$string2/ ) {
$found2 = 1;
}
if ( /$string3/ ) {
$found3 = 1;
}
}
if ( $found1 == 1 and $found2 == 1 and $found3 == 1 ) {
return 1;
} else {
return 0;
}
}
my $result = find_string('filename', 'string1'. 'string2', 'string3');
if ( $result == 1 ) {
print "Found all three strings\n";
} else {
print "Didn't find all three\n";
}
I think you can store the file content in an array first, then grep the input in the array.
use strict;
use warnings;
sub find_multi_string {
my ($file, #strings) = #_;
my $fh;
open ($fh, "<$file");
#store the whole file in an array
my #array = <$fh>;
for my $string (#strings) {
if (grep /$string/, #array) {
next;
} else {
die "Cannot find $string in $file";
}
}
return 1;
}
I need some help with following perl code.
#!perl -w
use strict;
use warnings;
open my $file, '<', 'ubb' or die $1;
my $spool = 0;
my #matchingLines;
while (<$file>) {
if (/GROUPS/i) {
$spool = 1;
next;
}
elsif (/SERVERS/i) {
$spool = 0;
print map { "$_" } #matchingLines;
#matchingLines = ();
}
if ($spool) {
push (#matchingLines, $_);
}
}
close ($file);
Output from that is shown below.
ADM LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=1
ADM_TMS LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=2
TMSNAME=TMS
ADM_1 LMID=GW_S4_1_PM
GRPNO=11
ADM_2 LMID=GW_S4_2_BM
GRPNO=12
DMWSG_Gateway_1 LMID=GW_S4_1_PM
GRPNO=101
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_Gateway_2 LMID=GW_S4_2_BM
GRPNO=201
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_1 LMID=GW_S4_1_PM
GRPNO=106
DMWSG_2 LMID=GW_S4_2_BM
GRPNO=206
But I only would like to get the first word of each line (e.g. ADM, ADM_TMS, ADM_1).
Note that the file has a lot of other lines above and below what's printed here. I only want to do this for lines that is in between GROUPS and SERVERS.
I would suggest 2 changes in your code
Note: Tested these with your sample data (plus other stuff) in your question.
I: Extract first word before push
Change this
push (#matchingLines, $_);
to
push (#matchingLines, /^(\S+)/);
This would push the first word of each line into the array, instead of the entire line.
Note that /^(\S+)/ is shorthand for $_ =~ /^(\S+)/. If you're using an explicit loop variable like in 7stud's answer, you can't use this shorthand, use the explicit syntax instead, say $line =~ /^(\S+)/ or whatever your loop variable is.
Of course, you can also use split function as suggested in 7stud's answer.
II: Change how you print
Change this
print map { "$_" } #matchingLines;
into
local $" = "\n";
print "#matchingLines \n";
$" specifies the delimiter used for list elements when the array is printed with print or say inside double quotes.
Alternatively, as per TLP's suggestion,
$\ = $/;
print for #lines;
or
print join("\n", #lines), "\n"
Note that $/ is the input record separator (newline by default), $\ is the output record separator (undefined by default). $\ is appended after each print command.
For more information on $/, $\, and $":
See perldoc perlvar (just use CTRL+F to find them in that page)
Or you can simply use perldoc -v '$/' etc on your console to get those information.
Note on readability
I don't think implicit regex matching i.e. /pattern/ is bad per se.
But matching against a variable, i.e. $variable =~ /pattern/ is more readable (as in you can immediately see there's a regex matching going on) and more beginner-friendly, at the cost of conciseness.
use strict;
use warnings;
use 5.014; #say()
my $fname = 'data.txt';
open my $INFILE, '<', $fname
or die "Couldn't open $fname: $!"; #-->Not $1"
my $recording_on = 0;
my #matching_lines;
for my $line (<$INFILE>) {
if ($line =~ /groups/i) {
$recording_on = 1;
next;
}
elsif ($line =~ /servers/i) {
say for #matching_lines; #say() is the same as print(), but it adds a newline at the end
#matching_lines = ();
$recording_on = 0;
}
if ($recording_on) {
my ($first_word, $trash) = split " ", $line, 2;
push #matching_lines, $first_word;
}
}
close $INFILE;
You can use the flip-flop operator (range) to select a part of your input. The idea of this operator is that it returns false until its LHS (left hand side) returns true, and after that it returns true until its RHS returns false, after which it is reset. It is somewhat like preserving a state.
Note that the edge lines are also included in the match, so we need to remove those. After that, use doubleDown's idea and push /^(\S+)/ onto an array. The nice thing about using this with push is that the capture regex returns an empty list if it fails, and this gives us a warning-free failure when the regex does not match.
use strict;
use warnings;
my #matches;
while (<>) {
if (/GROUPS/i .. /SERVERS/i) { # flip-flop remembers the matches
next if (/GROUPS/i or /SERVERS/i);
push #matches, /^(\S+)/;
}
}
# #matches should now contain the first words of those lines
I need to detect if the first character in a file is an equals sign (=) and display the line number. How should I write the if statement?
$i=0;
while (<INPUT>) {
my($line) = $_;
chomp($line);
$findChar = substr $_, 0, 1;
if($findChar == "=")
$output = "$i\n";
print OUTPUT $output;
$i++;
}
Idiomatic perl would use a regular expression (^ meaning beginning of line) plus one of the dreaded builtin variables which happens to mean "line in file":
while (<INPUT>) {
print "$.\n" if /^=/;
}
See also perldoc -v '$.'
Use $findChar eq "=". In Perl:
== and != are numeric comparisons. They will convert both operands to a number.
eq and ne are string comparisons. They will convert both operands to a string.
Yes, this is confusing. Yes, I still write == when I mean eq ALL THE TIME. Yes, it takes me forever to spot my mistake too.
It looks like you are not using strict and warnings. Use them, especially since you do not know Perl, you might also want to add diagnostics to the list of must-use pragmas.
You are keeping track of the input line number in a separate variable $i. Perl has various builtin variables documented in perlvar. Some of these, such as $. are very useful use them.
You are using my($line) = $_; in the body of the while loop. Instead, avoid $_ and assign to $line directly as in while ( my $line = <$input> ).
Note that bareword filehandles such as INPUT are package global. With the exception of the DATA filehandle, you are better off using lexical filehandles to properly limit the scope of your filehandles.
In your posts, include sample data in the __DATA_ section so others can copy, paste and run your code without further work.
With these comments in mind, you can print all lines that do not start with = using:
#!/usr/bin/perl
use strict; use warnings;
while (my $line = <DATA> ) {
my $first_char = substr $line, 0, 1;
if ( $first_char ne '=' ) {
print "$.:$first_char\n";
}
}
__DATA__
=
=
a
=
+
However, I would be inclined to write:
while (my $line = <DATA> ) {
# this will skip blank lines
if ( my ($first_char) = $line =~ /^(.)/ ) {
print "$.:$first_char\n" unless $first_char eq '=';
}
}