When I am reading an input file in perl I got below line
u_pwrup_control/g_pwrup_bscan_cell[262]_u_pwrup_bscan
Now I want to find a similar line in reference file using regexp .But it is not matching when I use below command.
while(<INPUT_FILE>){
$k=$_;
##opening ref file in read mode
while(<REF_FILE>)
if ($_ =~ /$k/) {
print $_;
} else {
print $k is not matching;
}
}
}
Please tell me how to match [] without escaping with .
You are looking for the function quotemeta. Alternatively, you can use \Q...\E inside the regex (more informations about that on perlre.
Applied to your code :
either do $k = quotemeta $_; (the $_ is optional though), instead of $k = $_;
or keep $k = $_; and in the regex, do $_ =~ /\Q$k/.
You didn't provide a lot details in your question, so I'm no guarantying that this will actually match what you are trying to match, but at least [ and ] (and any other unsafe character) will be escaped in the regex.
In particular, you might want to chomp in both while after reading the lines, but it really depends on what you are reading.
But your code could be improved in many way, including :
Always add use strict; and use warnings; at the beginning of your script.
Related to the previous point, but use lexically scoped variables (ie. declared with my instead of your global ones (not declared)). So write my $k = ... instead of just $k = ... (only when you declare it).
Instead of doing your first while like that :
while (<INPUT_FILE>){ $k = $_; ... }
It would be much cleaner to do something like :
while (my $k = <INPUT_FILE>) { ... }
Using $_ is convenient in a lot of cases, but in that one, it's really not.
Don't use global filehandles, but instead use lexical variables :
open my $INPUT_FILE, '<', 'your_file_name' or die $!
And the, you can use them the same way as your old global ones : while (<$INPUT_FILE>) { ... }
Related
/* start of maker a_b.c[0] */
/* start of maker a_b.c[1] */
maker ( "a_b.c[0]" )
maker ( "a_b.c[1]" )
How to extract the strings inside double quotes and store them into an array? Here's what i have tried.
open(file, "P2.txt");
#A = (<file>) ;
foreach $str(#A)
{
if($str =~ /"a_b.c"/)
{
print "$str \n";
}
}
Note: Only content inside double quotes have to be stored into an array. If you see the 1st line of example inside slashes, you'll see same string that i want to match. That shouldn't get printed. So only the string inside double quotes should be stored into an array. Even if the same string gets repeated somewhere else without double quotes, it should not get printed. .
It's not about looking for strings in double quotes. It's about defining a pattern (a regular expression) that matches the lines that you want to find.
Here's the smallest change that I can make to your code in order to make this work:
open(file, "P2.txt");
#A = (<file>) ;
foreach $str(#A)
{
if($str =~ /"a_b.c/) # <=== Change here
{
print "$str \n";
}
}
All I've done is to remove the closing double-quote from your match expression. Because you don't care what comes after that, you don't need to specify it in the regular expression.
I should point out that this isn't completely correct. In a regular expression, a dot has a special meaning (it means "match any character here") so to match an actual dot (which is what you want), you need to escape the dot with a backslash. So it should be:
if($str =~ /"a_b\.c/)
Rewriting to use a few more modern Perl practices, I would do something like this:
# Two safety nets to find problems in your code
use strict;
use warnings;
# say() is a better print()
use feature 'say';
# Use a variable for the filehandle (and declare it with 'my')
# Use three-arg version of open()
# Check return value from open() and die if it fails
open(my $file, '<', "P2.txt") or die $!;
# Read data directly from filehandle
while ($str = <$file>)
{
if ($str =~ /"a_b\.c/)
{
say $str;
}
}
You could even use the implicit variable ($_) and statement modifiers to make your loop even simpler.
while (<$file>) {
say if /"a_b\.c/;
}
Looking at the sample input you provided, the task can be paraphrased as "extract single string arguments to things that look like function invocations". It seems like there is the added complication not matching in C-style comments. For that, note perlfaq -q comment.
As the FAQ entry demonstrates, ignoring content in arbitrary C-style comments is generally not trivial. I decided to try C::Tokenize to help:
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
use C::Tokenize qw( tokenize );
use Const::Fast qw( const );
use Path::Tiny qw( path );
sub is_open_paren {
($_[0]->{type} eq 'grammar') && ($_[0]->{grammar} eq '(');
}
sub is_close_paren {
($_[0]->{type} eq 'grammar') && ($_[0]->{grammar} eq ')');
}
sub is_comment {
$_[0]->{type} eq 'comment';
}
sub is_string {
$_[0]->{type} eq 'string';
}
sub is_word {
$_[0]->{type} eq 'word';
}
sub find_single_string_args_in_invocations {
my ($source) = #_;
my $tokens = tokenize(path( $source )->slurp);
for (my $i = 0; $i < #$tokens; ++$i) {
next if is_comment( $tokens->[$i] );
next unless is_word( $tokens->[$i] );
next unless is_open_paren( $tokens->[$i + 1] );
next unless is_string( $tokens->[$i + 2] );
next unless is_close_paren( $tokens->[$i + 3]);
say $tokens->[$i + 2]->{string};
$i += 3;
}
}
find_single_string_args_in_invocations($ARGV[0]);
which, with your input, yields:
C:\Temp> perl t.pl test.c
"a_b.c[0]"
"a_b.c[1]"
I'm reading this textfile to get ONLY the words in it and ignore all kind of whitespaces:
hello
now
do you see this.sadslkd.das,msdlsa but
i hoohoh
And this is my Perl code:
#!usr/bin/perl -w
require 5.004;
open F1, './text.txt';
while ($line = <F1>) {
#print $line;
#arr = split /\s+/, $line;
foreach $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
#print #arr;
}
close F1;
And this is the output:
hello
now
do
you
see
this.sadslkd.das,msdlsa
but
i
hoohoh
The output is showing two newlines but I am expecting the output to be just words. What should I do to just get words?
You should always use strict and use warnings (in preference to the -w command-line qualifier) at the top of every Perl program, and declare each variable at its first point of use using my. That way Perl will tell you about simple errors that you may otherwise overlook.
You should also use lexical file handles with the three-parameter form of open, and check the status to make sure it succeeded. There is little point in explicitly closing an input file unless you expect your program to run for an appreciable time, as Perl will close all files for you on exit.
Do you really need to require Perl v5.4? That version is fifteen years old, and if there is anything older than that installed then you have a museum!
Your program would be better like this:
use strict;
use warnings;
open my $fh, '<', './text.txt' or die $!;
while (my $line = <$fh>) {
my #arr = split /\s+/, $line;
foreach my $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
}
Note: my apologies. The warnings pragma and lexical file handles were introduced only in v5.6 so that part of my answer is irrelevant. The latest version of Perl is v5.16 and you really should upgrade
As Birei has pointed out, the problem is that, when the line has leading whitespace, there is a empty field before the first separator. Imagine if your data was comma-separated, then you would want Perl to report a leading empty field if the line started with a comma.
To extract all the non-space characters you can use a regular expression that does exactly that
my #arr = $line =~ /\S+/g;
and this can be emulated by using the default parameter for split which is a single quoted space (not a regular expression)
my #arr = $line =~ split ' ', $line;
In this case split behaves like the awk utility and discards any leading empty fields as you expected.
This is even simpler if you let Perl use the $_ variable in the read loop, as all of the parameters for split can be defaulted:
while (<F1>) {
my #arr = split;
foreach my $w (#arr) {
print "$w\n" if $w !~ /^\s+$/;
}
}
This line is the problem:
#arr=split(/\s+/,$line);
\s+ does a match just before the leading spaces. Use ' ' instead.
#arr=split(' ',$line);
I believe that in this line:
if(!($w =~ /^\s+$/))
You wanted to ask if there's nothing in this row - don't print it.
But the "+" in the REGEX actually force it to have at least 1 space.
If you change the "\s+" to "\s*", you'll see that it's working. because * is 0 occurrences or more ...
I need to detect if the first character in a file is an equals sign (=) and display the line number. How should I write the if statement?
$i=0;
while (<INPUT>) {
my($line) = $_;
chomp($line);
$findChar = substr $_, 0, 1;
if($findChar == "=")
$output = "$i\n";
print OUTPUT $output;
$i++;
}
Idiomatic perl would use a regular expression (^ meaning beginning of line) plus one of the dreaded builtin variables which happens to mean "line in file":
while (<INPUT>) {
print "$.\n" if /^=/;
}
See also perldoc -v '$.'
Use $findChar eq "=". In Perl:
== and != are numeric comparisons. They will convert both operands to a number.
eq and ne are string comparisons. They will convert both operands to a string.
Yes, this is confusing. Yes, I still write == when I mean eq ALL THE TIME. Yes, it takes me forever to spot my mistake too.
It looks like you are not using strict and warnings. Use them, especially since you do not know Perl, you might also want to add diagnostics to the list of must-use pragmas.
You are keeping track of the input line number in a separate variable $i. Perl has various builtin variables documented in perlvar. Some of these, such as $. are very useful use them.
You are using my($line) = $_; in the body of the while loop. Instead, avoid $_ and assign to $line directly as in while ( my $line = <$input> ).
Note that bareword filehandles such as INPUT are package global. With the exception of the DATA filehandle, you are better off using lexical filehandles to properly limit the scope of your filehandles.
In your posts, include sample data in the __DATA_ section so others can copy, paste and run your code without further work.
With these comments in mind, you can print all lines that do not start with = using:
#!/usr/bin/perl
use strict; use warnings;
while (my $line = <DATA> ) {
my $first_char = substr $line, 0, 1;
if ( $first_char ne '=' ) {
print "$.:$first_char\n";
}
}
__DATA__
=
=
a
=
+
However, I would be inclined to write:
while (my $line = <DATA> ) {
# this will skip blank lines
if ( my ($first_char) = $line =~ /^(.)/ ) {
print "$.:$first_char\n" unless $first_char eq '=';
}
}
I've been programming in Perl for a while, but I never have understood a couple of subtleties about Perl:
The use and the setting/unsetting of the $_ variable confuses me. For instance, why does
# ...
shift #queue;
($item1, #rest) = split /,/;
work, but (at least for me)
# ...
shift #queue;
/some_pattern.*/ or die();
does not seem to work?
Also, I don't understand the difference between iterating through a file using foreach versus while. For instance,I seem to be getting different results for
while(<SOME_FILE>){
# Do something involving $_
}
and
foreach (<SOME_FILE>){
# Do something involving $_
}
Can anyone explain these subtle differences?
shift #queue;
($item1, #rest) = split /,/;
If I understand you correctly, you seem to think that this shifts off an element from #queue to $_. That is not true.
The value that is shifted off of #queue simply disappears The following split operates on whatever is contained in $_ (which is independent of the shift invocation).
while(<SOME_FILE>){
# Do something involving $_
}
Reading from a filehandle in a while statement is special: It is equivalent to
while ( defined( $_ = readline *SOME_FILE ) ) {
This way, you can process even colossal files line-by-line.
On the other hand,
for(<SOME_FILE>){
# Do something involving $_
}
will first load the entire file as a list of lines into memory. Try a 1GB file and see the difference.
Another, albeit subtle, difference between:
while (<FILE>) {
}
and:
foreach (<FILE>) {
}
is that while() will modify the value of $_ outside of its scope, whereas, foreach() makes $_ local. For example, the following will die:
$_ = "test";
while (<FILE1>) {
print "$_";
}
die if $_ ne "test";
whereas, this will not:
$_ = "test";
foreach (<FILE1>) {
print "$_";
}
die if $_ ne "test";
This becomes more important with more complex scripts. Imagine something like:
sub func1() {
while (<$fh2>) { # clobbers $_ set from <$fh1> below
<...>
}
}
while (<$fh1>) {
func1();
<...>
}
Personally, I stay away from using $_ for this reason, in addition to it being less readable, etc.
Regarding the 2nd question:
while (<FILE>) {
}
and
foreach (<FILE>) {
}
Have the same functional behavior, including setting $_. The difference is that while() evaluates <FILE> in a scalar context, while foreach() evaluates <FILE> in a list context. Consider the difference between:
$x = <FILE>;
and
#x = <FILE>;
In the first case, $x gets the first line of FILE, and in the second case #x gets the entire file. Each entry in #x is a different line in FILE.
So, if FILE is very big, you'll waste memory slurping it all at once using foreach (<FILE>) compared to while (<FILE>). This may or may not be an issue for you.
The place where it really matters is if FILE is a pipe descriptor, as in:
open FILE, "some_shell_program|";
Now foreach(<FILE>) must wait for some_shell_program to complete before it can enter the loop, while while(<FILE>) can read the output of some_shell_program one line at a time and execute in parallel to some_shell_program.
That said, the behavior with regard to $_ remains unchanged between the two forms.
foreach evaluates the entire list up front. while evaluates the condition to see if its true each pass. while should be considered for incremental operations, foreach only for list sources.
For example:
my $t= time() + 10 ;
while ( $t > time() ) { # do something }
StackOverflow: What’s the difference between iterating over a file with foreach or while in Perl?
It is to avoid this sort of confusion that it's considered better form to avoid using the implicit $_ constructions.
my $element = shift #queue;
($item,#rest) = split /,/ , $element;
or
($item,#rest) = split /,/, shift #queue;
likewise
while(my $foo = <SOMEFILE>){
do something
}
or
foreach my $thing(<FILEHANDLE>){
do something
}
while only checks if the value is true, for also places the value in $_, except in some circumstances. For example <> will set $_ if used in a while loop.
to get similar behaviour of:
foreach(qw'a b c'){
# Do something involving $_
}
You have to set $_ explicitly.
while( $_ = shift #{[ qw'a b c' ]} ){
# Do something involving $_
}
It is better to explicitly set your variables
for my $line(<SOME_FILE>){
}
or better yet
while( my $line = <SOME_FILE> ){
}
which will only read in the file one line at a time.
Also shift doesn't set $_ unless you specifically ask it too
$_ = shift #_;
And split works on $_ by default. If used in scalar, or void context will populate #_.
Please read perldoc perlvar so that you will have an idea of the different variables in Perl.
perldoc perlvar.
I'm looking through perl code and I see this:
sub html_filter {
my $text = shift;
for ($text) {
s/&/&/g;
s/</</g;
s/>/>/g;
s/"/"/g;
}
return $text;
}
what does the for loop do in this case and why would you do it this way?
The for loop aliases each element of the list its looping over to $_. In this case, there is only one element, $text.
Within the body, this allows one to write
s/&/&/g;
etc. instead of having to write
$text =~ s/&/&/g;
repeatedly. See also perldoc perlsyn.
Without an explicit loop variable, the for loop uses the special variable called $_. The substitution statements inside the loop also use the special $_ variable because none other is specified, so this is just a trick to make the source code shorter. I would probably write this function as:
sub html_filter {
my $text = shift;
$text =~ s/&/&/g;
$text =~ s/</</g;
$text =~ s/>/>/g;
$text =~ s/"/"/g;
return $text;
}
This will have no performance consequences and is readable by people other than Perl.
As Mr Hewgill points out, the code sample is implicitly localizing and aliasing to $_, the magical implied variable.
He offers a substitute that is more readable at the cost of boilerplate code.
There is no reason to sacrifice readability for brevity. Simply replace the implicit localization and assignment with an explicit version:
sub html_filter {
local $_ = shift;
s/&/&/g;
s/</</g;
s/>/>/g;
s/"/"/g;
return $_;
}
If I didn't know Perl all that well and came across this code, I'd know that I needed to look at the docs for $_ and local--as a bonus in perlvar, there a few examples of localizing $_.
For anyone who uses Perl a lot, the above should be easy to understand.
So there is really no reason to sacrifice readability for brevity here.
It's just used to alias $text to $_, the default variable. Done because they're too lazy to use an explicit variable or don't want to waste precious cycles creating a new scalar.
Its cleaning up &, < , > and quote characters and replacing them with the appropriate HTML entity chars.
It loops through your text and substitutes ampersands (&) with &, < with <, > with > and " with ". You'd do this for output to a .html document... those are the proper entity characters.
The original code could be more flexible by using wantarray to test the desired context:
sub html_filter {
my #text = #_;
for (#text) {
s/&/&/g;
s/</</g;
s/>/>/g;
s/"/"/g;
}
return wantarray ? #text: "#text"; }
That way you could call it in list context or scalar context and get back the correct results, for example:
my #stuff = html_filter('"','>');
print "$_\n" for #stuff;
my $stuff = html_filter('&');
print $stuff;