I need to detect if the first character in a file is an equals sign (=) and display the line number. How should I write the if statement?
$i=0;
while (<INPUT>) {
my($line) = $_;
chomp($line);
$findChar = substr $_, 0, 1;
if($findChar == "=")
$output = "$i\n";
print OUTPUT $output;
$i++;
}
Idiomatic perl would use a regular expression (^ meaning beginning of line) plus one of the dreaded builtin variables which happens to mean "line in file":
while (<INPUT>) {
print "$.\n" if /^=/;
}
See also perldoc -v '$.'
Use $findChar eq "=". In Perl:
== and != are numeric comparisons. They will convert both operands to a number.
eq and ne are string comparisons. They will convert both operands to a string.
Yes, this is confusing. Yes, I still write == when I mean eq ALL THE TIME. Yes, it takes me forever to spot my mistake too.
It looks like you are not using strict and warnings. Use them, especially since you do not know Perl, you might also want to add diagnostics to the list of must-use pragmas.
You are keeping track of the input line number in a separate variable $i. Perl has various builtin variables documented in perlvar. Some of these, such as $. are very useful use them.
You are using my($line) = $_; in the body of the while loop. Instead, avoid $_ and assign to $line directly as in while ( my $line = <$input> ).
Note that bareword filehandles such as INPUT are package global. With the exception of the DATA filehandle, you are better off using lexical filehandles to properly limit the scope of your filehandles.
In your posts, include sample data in the __DATA_ section so others can copy, paste and run your code without further work.
With these comments in mind, you can print all lines that do not start with = using:
#!/usr/bin/perl
use strict; use warnings;
while (my $line = <DATA> ) {
my $first_char = substr $line, 0, 1;
if ( $first_char ne '=' ) {
print "$.:$first_char\n";
}
}
__DATA__
=
=
a
=
+
However, I would be inclined to write:
while (my $line = <DATA> ) {
# this will skip blank lines
if ( my ($first_char) = $line =~ /^(.)/ ) {
print "$.:$first_char\n" unless $first_char eq '=';
}
}
Related
/* start of maker a_b.c[0] */
/* start of maker a_b.c[1] */
maker ( "a_b.c[0]" )
maker ( "a_b.c[1]" )
How to extract the strings inside double quotes and store them into an array? Here's what i have tried.
open(file, "P2.txt");
#A = (<file>) ;
foreach $str(#A)
{
if($str =~ /"a_b.c"/)
{
print "$str \n";
}
}
Note: Only content inside double quotes have to be stored into an array. If you see the 1st line of example inside slashes, you'll see same string that i want to match. That shouldn't get printed. So only the string inside double quotes should be stored into an array. Even if the same string gets repeated somewhere else without double quotes, it should not get printed. .
It's not about looking for strings in double quotes. It's about defining a pattern (a regular expression) that matches the lines that you want to find.
Here's the smallest change that I can make to your code in order to make this work:
open(file, "P2.txt");
#A = (<file>) ;
foreach $str(#A)
{
if($str =~ /"a_b.c/) # <=== Change here
{
print "$str \n";
}
}
All I've done is to remove the closing double-quote from your match expression. Because you don't care what comes after that, you don't need to specify it in the regular expression.
I should point out that this isn't completely correct. In a regular expression, a dot has a special meaning (it means "match any character here") so to match an actual dot (which is what you want), you need to escape the dot with a backslash. So it should be:
if($str =~ /"a_b\.c/)
Rewriting to use a few more modern Perl practices, I would do something like this:
# Two safety nets to find problems in your code
use strict;
use warnings;
# say() is a better print()
use feature 'say';
# Use a variable for the filehandle (and declare it with 'my')
# Use three-arg version of open()
# Check return value from open() and die if it fails
open(my $file, '<', "P2.txt") or die $!;
# Read data directly from filehandle
while ($str = <$file>)
{
if ($str =~ /"a_b\.c/)
{
say $str;
}
}
You could even use the implicit variable ($_) and statement modifiers to make your loop even simpler.
while (<$file>) {
say if /"a_b\.c/;
}
Looking at the sample input you provided, the task can be paraphrased as "extract single string arguments to things that look like function invocations". It seems like there is the added complication not matching in C-style comments. For that, note perlfaq -q comment.
As the FAQ entry demonstrates, ignoring content in arbitrary C-style comments is generally not trivial. I decided to try C::Tokenize to help:
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
use C::Tokenize qw( tokenize );
use Const::Fast qw( const );
use Path::Tiny qw( path );
sub is_open_paren {
($_[0]->{type} eq 'grammar') && ($_[0]->{grammar} eq '(');
}
sub is_close_paren {
($_[0]->{type} eq 'grammar') && ($_[0]->{grammar} eq ')');
}
sub is_comment {
$_[0]->{type} eq 'comment';
}
sub is_string {
$_[0]->{type} eq 'string';
}
sub is_word {
$_[0]->{type} eq 'word';
}
sub find_single_string_args_in_invocations {
my ($source) = #_;
my $tokens = tokenize(path( $source )->slurp);
for (my $i = 0; $i < #$tokens; ++$i) {
next if is_comment( $tokens->[$i] );
next unless is_word( $tokens->[$i] );
next unless is_open_paren( $tokens->[$i + 1] );
next unless is_string( $tokens->[$i + 2] );
next unless is_close_paren( $tokens->[$i + 3]);
say $tokens->[$i + 2]->{string};
$i += 3;
}
}
find_single_string_args_in_invocations($ARGV[0]);
which, with your input, yields:
C:\Temp> perl t.pl test.c
"a_b.c[0]"
"a_b.c[1]"
I'm trying to grep multiple patterns from a log file using perl. For the first pattern i'm getting the desired matching pattern via read only variable($1,$2..). But for the next pattern the read only variable is returning the previous value but not the value matching the second pattern.
here is the code:
$tmp = `grep "solo_video_channel_.*(0): queueing" $log`;
chomp($tmp);
$tmp =~ m/(.*):.*solo_video_channel_write(.*): queueing page (.*).*/;
$chnl = $2;
$page = $3;
$timestamp = $1;
$tmp1 = `grep "(0): DUMP GO" $log`;
chomp($tmp1);
$tmp1 =~ m/(.*): solo_video_channel_write(0): DUMP GO/;
$dmp = $1;
print "dump go time = $1\n";
tmp1's value after grep is coming as expected. but $1 value remains same as the previous one.
Any suggestions?
Always make sure that you verify that a regex matched before using a captured variable.
Additionally, there is no reason to shell out to grep. Use Perl's file processing instead:
use strict;
use warnings;
local #ARGV = $log;
while (<>) {
chomp;
if (/solo_video_channel_.*\(0\): queueing/) {
if ( my ( $timestamp, $chnl, $page ) = m/(.*):.*solo_video_channel_write(.*): queueing page (.*).*/ ) {
print "$. - $timestamp, $chnl, $page\n";
}
}
if ( my ($dmp) = m/(.*): solo_video_channel_write\(0\): DUMP GO/ ) {
print "dump go time = $dmp\n";
}
}
Note, your first set of if's could almost certainly be combined into a single if statement, but I left it as is for now.
Why not use Pure Perl? It's faster than running external greps. Plus, you can grep both regular expressions at once. Faster than looping through the file twice.
Always check the value of your rexp match. Here I'm using if statements to do this. Note too that I am printing all lines that don't match with UNMATCHED LINES. You can remove the else when you see that everything is working, or simply redirect 2> /dev/null.
use strict;
use warnings;
use autodie;
use feature qw(say);
my $log = "log.txt";
open my $log_fh, "<", $log;
while ( my $line = <$log_fh> ) {
my $timestamp;
my $channel;
my $page;
my $gotime;
if ( $line =~ /(.*):.*solo_video_channel_(.*):\s+queueing page (.*)/ ) {
$timestamp = $1;
$channel = $2;
$page = $3;
say qq(Timestamp = "$timestamp" Channel = "$channel" Page = "$page");
}
elsif ( $line =~ /(.*): solo_video_channel_write(0): DUMP GO/ ) {
$gotime = $1;
say "Dump Go Time = $1";
}
else {
say STDERR qq(UNMATCHED LINES: "$line");
}
}
close $log_fh;
In the second regexp you need to escape the literal brackets
$tmp1 =~ m/(.*): solo_video_channel_write\(0\): DUMP GO/
This is because the expression \(0\) matches the exact pattern (0)
In the example given in this answer this would include strings such as
37: solo_video_channel_write(0): DUMP GO
In contrast, the expression (0) matches the exact pattern 0 and sets a capture group.
With the regexp given in your original question
$tmp1 =~ m/(.*): solo_video_channel_write(0): DUMP GO/;
matching would occur on strings such as
37: solo_video_channel_write0: DUMP GO
Of course in the original program the strings are not in this format, so they do not match and $1 is not set
The regular expression syntax for the shell program grep is (confusingly) different
To use round brackets for setting a capture group they must be escaped with a backslash, which is the opposite to the syntax in perl
I am trying to write in Perl to count the number of each A/C/G/T bases in a DNA sequence. But couldn't figure out what went wrong in my code. "ATCTAGCTAGCTAGCTA" is the kind of data I am given.
#!usr/bin/perl
use strict;
use warnings;
my $in_file = <$ARGV[0]>;
open( my $FH_IN, "<", $in_file );
my $dna = <$FH_IN>;
my $index = 0;
my ( $freq_a, $freq_c, $freq_g, $freq_t ) = 0;
my $dna_length = length($dna);
while ( $index < $dna_length ) {
my $base = substr( $dna, $index, 1 );
if ( $base eq "A" ) {
$freq_a++;
$index++;
next;
} elsif ( $base eq "C" ) {
$freq_c++;
$index++;
next;
} elsif ( $base eq "G" ) {
$freq_g++;
$index++;
next;
} elsif ( $base eq "T" ) {
$freq_t++;
$index++;
next;
} else {
next;
}
}
print "$freq_a\n$freq_c\n$freq_g\n$freq_t\n";
exit;
I know there are a lot of ways to do it, but what I want to know is what I did wrong so I can learn from mistakes.
Perl has a special file handle to use with these kinds of problems: The diamond operator <>. It will read input from either a file name, if provided, and standard input if not.
Secondly, since you are only interested in ACGT, might as well look for only them, using a regex: /([ACGT])/g.
Thirdly, using a hash is the idiomatic way to count characters in Perl: $count{A}++
So your script becomes:
use strict;
use warnings;
my %count;
while (<>) {
while (/([ACGT])/g) {
$count{$1}++;
}
}
print "$_\n" for #count{qw(A C G T)};
Usage:
script.pl input.txt
Okay, you've done well so far and there's only one problem that stops your program from working.
It's far from obvious, but each line that's read from the file has a newline character "\n" at the end. So what's happening is that $index reaches the newline in the string, which is processed by the else clause (because it's not A, C, G or T) which just does a next, so the same character is processed over and over again. Your program just hangs, right?
You could remove the newline with chomp, but a proper fix is to increment $index in the else clause just as you do with all the other characters. So it looks like
else {
++$index;
next;
}
As you've suspected, there are much better ways to write this. There are also a couple of other nasties in your code, but that change should get you on your way for now.
It would be instructive for you to print out the values in $dna_length, $index and $base each time you go round the loop - immediately after you assign a value to $base.
Your code would be more robust if you moved the incrementing of $index to the end of the loop (outside of the if/elsif/else block) and removed all of your next statements.
An alternative "quick fix" is to chomp() the input line before you start processing it.
I need some help with following perl code.
#!perl -w
use strict;
use warnings;
open my $file, '<', 'ubb' or die $1;
my $spool = 0;
my #matchingLines;
while (<$file>) {
if (/GROUPS/i) {
$spool = 1;
next;
}
elsif (/SERVERS/i) {
$spool = 0;
print map { "$_" } #matchingLines;
#matchingLines = ();
}
if ($spool) {
push (#matchingLines, $_);
}
}
close ($file);
Output from that is shown below.
ADM LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=1
ADM_TMS LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=2
TMSNAME=TMS
ADM_1 LMID=GW_S4_1_PM
GRPNO=11
ADM_2 LMID=GW_S4_2_BM
GRPNO=12
DMWSG_Gateway_1 LMID=GW_S4_1_PM
GRPNO=101
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_Gateway_2 LMID=GW_S4_2_BM
GRPNO=201
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_1 LMID=GW_S4_1_PM
GRPNO=106
DMWSG_2 LMID=GW_S4_2_BM
GRPNO=206
But I only would like to get the first word of each line (e.g. ADM, ADM_TMS, ADM_1).
Note that the file has a lot of other lines above and below what's printed here. I only want to do this for lines that is in between GROUPS and SERVERS.
I would suggest 2 changes in your code
Note: Tested these with your sample data (plus other stuff) in your question.
I: Extract first word before push
Change this
push (#matchingLines, $_);
to
push (#matchingLines, /^(\S+)/);
This would push the first word of each line into the array, instead of the entire line.
Note that /^(\S+)/ is shorthand for $_ =~ /^(\S+)/. If you're using an explicit loop variable like in 7stud's answer, you can't use this shorthand, use the explicit syntax instead, say $line =~ /^(\S+)/ or whatever your loop variable is.
Of course, you can also use split function as suggested in 7stud's answer.
II: Change how you print
Change this
print map { "$_" } #matchingLines;
into
local $" = "\n";
print "#matchingLines \n";
$" specifies the delimiter used for list elements when the array is printed with print or say inside double quotes.
Alternatively, as per TLP's suggestion,
$\ = $/;
print for #lines;
or
print join("\n", #lines), "\n"
Note that $/ is the input record separator (newline by default), $\ is the output record separator (undefined by default). $\ is appended after each print command.
For more information on $/, $\, and $":
See perldoc perlvar (just use CTRL+F to find them in that page)
Or you can simply use perldoc -v '$/' etc on your console to get those information.
Note on readability
I don't think implicit regex matching i.e. /pattern/ is bad per se.
But matching against a variable, i.e. $variable =~ /pattern/ is more readable (as in you can immediately see there's a regex matching going on) and more beginner-friendly, at the cost of conciseness.
use strict;
use warnings;
use 5.014; #say()
my $fname = 'data.txt';
open my $INFILE, '<', $fname
or die "Couldn't open $fname: $!"; #-->Not $1"
my $recording_on = 0;
my #matching_lines;
for my $line (<$INFILE>) {
if ($line =~ /groups/i) {
$recording_on = 1;
next;
}
elsif ($line =~ /servers/i) {
say for #matching_lines; #say() is the same as print(), but it adds a newline at the end
#matching_lines = ();
$recording_on = 0;
}
if ($recording_on) {
my ($first_word, $trash) = split " ", $line, 2;
push #matching_lines, $first_word;
}
}
close $INFILE;
You can use the flip-flop operator (range) to select a part of your input. The idea of this operator is that it returns false until its LHS (left hand side) returns true, and after that it returns true until its RHS returns false, after which it is reset. It is somewhat like preserving a state.
Note that the edge lines are also included in the match, so we need to remove those. After that, use doubleDown's idea and push /^(\S+)/ onto an array. The nice thing about using this with push is that the capture regex returns an empty list if it fails, and this gives us a warning-free failure when the regex does not match.
use strict;
use warnings;
my #matches;
while (<>) {
if (/GROUPS/i .. /SERVERS/i) { # flip-flop remembers the matches
next if (/GROUPS/i or /SERVERS/i);
push #matches, /^(\S+)/;
}
}
# #matches should now contain the first words of those lines
I'm reading this textfile to get ONLY the words in it and ignore all kind of whitespaces:
hello
now
do you see this.sadslkd.das,msdlsa but
i hoohoh
And this is my Perl code:
#!usr/bin/perl -w
require 5.004;
open F1, './text.txt';
while ($line = <F1>) {
#print $line;
#arr = split /\s+/, $line;
foreach $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
#print #arr;
}
close F1;
And this is the output:
hello
now
do
you
see
this.sadslkd.das,msdlsa
but
i
hoohoh
The output is showing two newlines but I am expecting the output to be just words. What should I do to just get words?
You should always use strict and use warnings (in preference to the -w command-line qualifier) at the top of every Perl program, and declare each variable at its first point of use using my. That way Perl will tell you about simple errors that you may otherwise overlook.
You should also use lexical file handles with the three-parameter form of open, and check the status to make sure it succeeded. There is little point in explicitly closing an input file unless you expect your program to run for an appreciable time, as Perl will close all files for you on exit.
Do you really need to require Perl v5.4? That version is fifteen years old, and if there is anything older than that installed then you have a museum!
Your program would be better like this:
use strict;
use warnings;
open my $fh, '<', './text.txt' or die $!;
while (my $line = <$fh>) {
my #arr = split /\s+/, $line;
foreach my $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
}
Note: my apologies. The warnings pragma and lexical file handles were introduced only in v5.6 so that part of my answer is irrelevant. The latest version of Perl is v5.16 and you really should upgrade
As Birei has pointed out, the problem is that, when the line has leading whitespace, there is a empty field before the first separator. Imagine if your data was comma-separated, then you would want Perl to report a leading empty field if the line started with a comma.
To extract all the non-space characters you can use a regular expression that does exactly that
my #arr = $line =~ /\S+/g;
and this can be emulated by using the default parameter for split which is a single quoted space (not a regular expression)
my #arr = $line =~ split ' ', $line;
In this case split behaves like the awk utility and discards any leading empty fields as you expected.
This is even simpler if you let Perl use the $_ variable in the read loop, as all of the parameters for split can be defaulted:
while (<F1>) {
my #arr = split;
foreach my $w (#arr) {
print "$w\n" if $w !~ /^\s+$/;
}
}
This line is the problem:
#arr=split(/\s+/,$line);
\s+ does a match just before the leading spaces. Use ' ' instead.
#arr=split(' ',$line);
I believe that in this line:
if(!($w =~ /^\s+$/))
You wanted to ask if there's nothing in this row - don't print it.
But the "+" in the REGEX actually force it to have at least 1 space.
If you change the "\s+" to "\s*", you'll see that it's working. because * is 0 occurrences or more ...