This question already has answers here:
Counting number of occurrences of a string inside another (Perl)
(4 answers)
Closed 7 years ago.
I have to check how many times was "," in each line in file. Anybody have idea how can I do it in Perl?
On this moment my code looks like it:
open($list, "<", $student_list)
while ($linelist = <$list>)
{
printf("$linelist");
}
close($list)
But I have no idea how to check how many times is "," in each $linelist :/
Use the transliteration operator in counting mode:
my $commas = $linelist =~ y/,//;
Edited in your code :
use warnings;
use strict;
open my $list, "<", "file.csv" or die $!;
while (my $linelist = <$list>)
{
my $commas = $linelist =~ y/,//;
print "$commas\n";
}
close($list);
If you just want to count the number of somethings in a file, you don't need to read it into memory. Since you aren't changing the file, mmap would be just fine:
use File::Map qw(map_file);
map_file my $map, $filename, '<';
my $count = $map =~ tr/,//;
#! perl
# perl script.pl [file path]
use strict;
use warnings;
my $file = shift or die "No file name provided";
open(my $IN, "<", $file) or die "Couldn't open file $file: $!";
my #matches = ();
my $index = 0;
# while <$IN> will get the file one line at a time rather than loading it all into memory
while(<$IN>){
my $line = $_;
my $current_count = 0;
# match globally, meaning keep track of where the last match was
$current_count++ while($line =~ m/,/g);
$matches[$index] = $current_count;
$index++;
}
$index = 0;
for(#matches){
$index++;
print "line $index had $_ matches\n"
}
You can use mmap Perl IO layer instead of File::Map. It is almost as efficient as former but most probably present in your Perl installation without needing installing a module. Next, using y/// is more efficient than m//g in array context.
use strict;
use warnings;
use autodie;
use constant STUDENT_LIST => 'text.txt';
open my $list, '<:mmap', STUDENT_LIST;
while ( my $line = <$list> ) {
my $count = $line =~ y/,//;
print "There is $count commas at $.. line.\n";
}
If you would like grammatically correct output you can use Lingua::EN::Inflect in the right place
use Lingua::EN::Inflect qw(inflect);
print inflect "There PL_V(is,$count) $count PL_N(comma,$count) at ORD($.) line.\n";
Example output:
There are 7 commas at 1st line.
There are 0 commas at 2nd line.
There is 1 comma at 3rd line.
There are 2 commas at 4th line.
There are 7 commas at 5th line.
Do you want #commas for each line in the file, or #commas in the entire file?
On a per-line basis, replace your while loop with:
my #data = <list>;
foreach my $line {
my #chars = split //, $line;
my $count = 0;
foreach my $c (#chars) { $count++ if $c eq "," }
print "There were $c commas\n";
}
Related
I am trying to parse data like:
header1
-------
var1 0
var2 5
var3 9
var6 1
header2
-------
var1 -3
var3 5
var5 0
Now I want to get e.g. var3 for header2. Whats the best way to do this?
So far I was parsing my files line-by-line via
open(FILE,"< $file");
while (my $line = <FILE>){
# do stuff
}
but I guess it's not possible to handle multiline parsing properly.
Now I am thinking to parse the file at once but wasn't successful so far...
my #Input;
open(FILE,"< $file");
while (<FILE>){ #Input = <FILE>; }
if (#Input =~ /header2/){
#...
}
The easier way to handle this is "paragraph mode".
local $/ = "";
while (<>) {
my ($header, $body) =~ /^([^\n]*)\n-+\n(.*)/s
or die("Bad data");
my #data = map [ split ], split /\n/, $body;
# ... Do something with $header and #data ...
}
The same can be achieved without messing with $/ as follows:
my #buf;
while (1) {
my $line = <>;
$line =~ s/\s+\z// if !defined($line);
if (!length($line)) {
if (#buf) {
my $header = shift(#buf);
shift(#buf);
my #data = map [ split ], splice(#buf);
# ... Do something with $header and #data ...
}
last if !defined($line);
next;
}
push #buf, $line;
}
(In fact, the second snippet includes a couple of small improvements over the first.)
Quick comments on your attempt:
The while loop is useless because #Input = <FILE> places the remaining lines of the file in #Input.
#Input =~ /header2/ matches header2 against the stringification of the array, which is the stringification of the number of elements in #Input. If you want to check of an element of #Input contains header2, will you will need to loop over the elements of #Inputs and check them individually.
while (<FILE>){ #Input = <FILE>; }
This doesn't make much sense. "While you can read a record from FILE, read all of the data on FILE into #Input". I think what you actually want is just:
my #Input = <FILE>;
if (#Input =~ /header2/){
This is quite strange too. The binding operator (=~) expects scalar operands, so it evaluates both operands in scalar context. That means #Input will be evaluated as the number of elements in #Input. That's an integer and will never match "header2".
A couple of approaches. Firstly a regex approach.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $data = join '', <$fh>;
if ($data =~ /header2.+var3 (.+?)\n/s) {
say $1;
} else {
say 'Not found';
}
The key to this is the /s on the m// operator. Without it, the two dots in the regex won't match newlines.
The other approach is more of a line by line parser.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $section = '';
while (<$fh>) {
chomp;
# if the line all word characters,
# then we've got a section header.
if ($_ !~ /\W/) {
$section = $_;
next;
}
my ($key, $val) = split;
if ($section eq 'header2' and $key eq 'var3') {
say $val;
last;
}
}
We read the file a line at a time and make a note of the section headers. For data lines, we split on whitespace and check to see if we're in the right section and have the right key.
In both cases, I've switched to using a more standard approach (lexical filehandles, 3-arg open(), or die $!) for opening the file.
I'm new to Perl and I'm afraid I am stuck and wanted to ask if someone might be able to help me.
I have a file with two columns (tab separated) of oldname and newname.
I would like to use the oldname as key and newname as value and store it as a hash.
Then I would like to open a different file (gff file) and replace all the oldnames in there with the newnames and write it to another file.
I have given it my best try but am getting a lot of errors.
If you could let me know what I am doing wrong, I would greatly appreciate it.
Here are how the two files look:
oldname newname(SFXXXX) file:
genemark-scaffold00013-abinit-gene-0.18 SF130001
augustus-scaffold00013-abinit-gene-1.24 SF130002
genemark-scaffold00013-abinit-gene-1.65 SF130003
file to search and replace in (an example of one of the lines):
scaffold00013 maker gene 258253 258759 . - . ID=maker-scaffold00013-augustus-gene-2.187;Name=maker-scaffold00013-augustus-gene-2.187;
Here is my attempt:
#!/usr/local/bin/perl
use warnings;
use strict;
my $hashfile = $ARGV[0];
my $gfffile = $ARGV[1];
my %names;
my $oldname;
my $newname;
if (!defined $hashfile) {
die "Usage: $0 hash_file gff_file\n";
}
if (!defined $gfffile) {
die "Usage: $0 hash_file gff_file\n";
}
###save hashfile with two columns, oldname and newname, into a hash with oldname as key and newname as value.
open(HFILE, $hashfile) or die "Cannot open $hashfile\n";
while (my $line = <HFILE>) {
chomp($line);
my ($oldname, $newname) = split /\t/;
$names{$oldname} = $newname;
}
close HFILE;
###open gff file and replace all oldnames with newnames from %names.
open(GFILE, $gfffile) or die "Cannot open $gfffile\n";
while (my $line2 = <GFILE>) {
chomp($line2);
eval "$line2 =~ s/$oldname/$names{oldname}/g";
open(OUT, ">SFrenamed.gff") or die "Cannot open SFrenamed.gff: $!";
print OUT "$line2\n";
close OUT;
}
close GFILE;
Thank you!
Your main problem is that you aren't splitting the $line variable. split /\t/ splits $_ by default, and you haven't put anything in there.
This program builds the hash, and then constructs a regex from all the keys by sorting them in descending order of length and joining them with the | regex alternation operator. The sorting is necessary so that the longest of all possible choices is selected if there are any alternatives.
Every occurrence of the regex is replaced by the corresponding new name in each line of the input file, and the output written to the new file.
use strict;
use warnings;
die "Usage: $0 hash_file gff_file\n" if #ARGV < 2;
my ($hashfile, $gfffile) = #ARGV;
open(my $hfile, '<', $hashfile) or die "Cannot open $hashfile: $!";
my %names;
while (my $line = <$hfile>) {
chomp($line);
my ($oldname, $newname) = split /\t/, $line;
$names{$oldname} = $newname;
}
close $hfile;
my $regex = join '|', sort { length $b <=> length $a } keys %names;
$regex = qr/$regex/;
open(my $gfile, '<', $gfffile) or die "Cannot open $gfffile: $!";
open(my $out, '>', 'SFrenamed.gff') or die "Cannot open SFrenamed.gff: $!";
while (my $line = <$gfile>) {
chomp($line);
$line =~ s/($regex)/$names{$1}/g;
print $out $line, "\n";
}
close $out;
close $gfile;
Why are you using an eval? And $oldname is going to be undefined in the second while loop, because the first while loop you redeclare them in that scope (even if you used the outer scope, it would store the very last value that you processed, which wouldn't be helpful).
Take out the my $oldname and my $newname at the top of your script, it is useless.
Take out the entire eval line. You need to repeat the regex for each thing you want to replace. Try something like:
$line2 =~ s/$_/$names{$_}/g for keys %names;
Also see Borodin's answer. He made one big regex instead of a loop, and caught your lack of the second argument to split.
#!/usr/bin/perl
use strict;
use warnings;
use warnings;
use 5.010;
my #names = ("RD", "HD", "MP");
my $flag = 0;
my $filename = 'Sample.txt';
if (open(my $fh, '<', $filename))
{
while (my $row = <$fh>)
{
foreach my $i (0 .. $#names)
{
if( scalar $row =~ / \G (.*?) ($names[$i]) /xg )
{
$flag=1;
}
}
}
if( $flag ==1)
{
say $filename;
}
$flag=0;
}
here i read the content from one file and compare with array values, if file contant matches with array value i just display the file. in the same way how can i access different file from different direcory and compare the array values with same?
Q: How can I access a different file?
A: By specifying a different filename.
By the way: If you are using flags for loop control in Perl, you are doing something wrong. You can specify that this was the last iteration of the loop (in C: break), or that you want to start the next iteration. You can label the loops so that you can break out of as many loops as you like at once:
#!/usr/bin/perl
use 5.010; use warnings;
my #names = qw(RD HD MP);
# unpack command line arguments
my ($filename) = #ARGV;
open my $fh, "<", $filename or die "Oh noes, $filename is bad: $!";
LINE:
while (my $line = <$fh>) {
NAME:
foreach my $name (#names) {
if ($line =~ /\Q$name\E/) { # \QUOT\E the $name to escape everything
say "$filename contains $name";
last LINE;
}
}
}
Other highlights:
using a foreach loop as intended and
removing the (in this context) senseless \G assertion
You can then execute the script as perl script.pl Sample.txt or perl script.pl ../another.dir/foo.bar or whatever.
You can use the ~~ operator in Perl 5.10.
Don't forget to chomp the trailing whitespace.
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
my #names = ('RD', 'HD', 'MP');
my $other_dir = '/tmp';
my $filename = 'Sample.txt';
if ( open( my $fh, '<', "$other_dir/$filename" ) ) {
ROW:
while ( my $row = <$fh> ) {
chomp $row; # remove trailing \n
if ( $row ~~ #names ) {
say $filename;
last ROW;
}
}
}
close $fh;
I have a simple .csv file that has that I want to extract data out of a write to a new file.
I to write a script that reads in a file, reads each line, then splits and structures the columns in a different order, and if the line in the .csv contains 'xxx' - dont output the line to output file.
I have already managed to read in a file, and create a secondary file, however am new to Perl and still trying to work out the commands, the following is a test script I wrote to get to grips with Perl and was wondering if I could aulter this to to what I need?-
open (FILE, "c1.csv") || die "couldn't open the file!";
open (F1, ">c2.csv") || die "couldn't open the file!";
#print "start\n";
sub trim($);
sub trim($)
{
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
$a = 0;
$b = 0;
while ($line=<FILE>)
{
chop($line);
if ($line =~ /xxx/)
{
$addr = $line;
$post = substr($line, length($line)-18,8);
}
$a = $a + 1;
}
print $b;
print " end\n";
Any help is much appreciated.
To manipulate CSV files it is better to use one of the available modules at CPAN. I like Text::CSV:
use Text::CSV;
my $csv = Text::CSV->new ({ binary => 1, empty_is_undef => 1 }) or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'c1.csv' or die "ERROR: $!";
$csv->column_names('field1', 'field2');
while ( my $l = $csv->getline_hr($fh)) {
next if ($l->{'field1'} =~ /xxx/);
printf "Field1: %s Field2: %s\n", $l->{'field1'}, $l->{'field2'}
}
close $fh;
If you need do this only once, so don't need the program later you can do it with oneliner:
perl -F, -lane 'next if /xxx/; #n=map { s/(^\s*|\s*$)//g;$_ } #F; print join(",", (map{$n[$_]} qw(2 0 1)));'
Breakdown:
perl -F, -lane
^^^ ^ <- split lines at ',' and store fields into array #F
next if /xxx/; #skip lines what contain xxx
#n=map { s/(^\s*|\s*$)//g;$_ } #F;
#trim spaces from the beginning and end of each field
#and store the result into new array #n
print join(",", (map{$n[$_]} qw(2 0 1)));
#recombine array #n into new order - here 2 0 1
#join them with comma
#print
Of course, for the repeated use, or in a bigger project you should use some CPAN module. And the above oneliner has much cavetas too.
This question already has an answer here:
Closed 12 years ago.
Possible Duplicate:
How do I change, delete, or insert a line in a file, or append to the beginning of a file in Perl?
How would I use perl to open up a file, look for an item that someone inputs, and if it is found, it will delete from that line to 14 lines below.
Something like this will work:
#!/usr/bin/env perl
use Modern::Perl;
use IO::File;
say "Enter a pattern please: ";
chomp (my $input = <>);
my $pattern;
# check that the pattern is good
eval {
$pattern = qr ($input);
}; die $# if $#;
my $fh = IO::File->new("test.txt", "+<") or die "$!\n";
my #lines = $fh->getlines;
$fh->seek(0,0);
for (my $pos = 0; $pos < $#lines; ++$pos) {
if ($lines[$pos] =~ $pattern) {
$pos += 14;
} else {
print {$fh} $lines[$pos];
}
}
$fh->close;
$|++
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
my $filename = 'filename.txt';
my $tmp = 'filename.txt.tmp';
print "Enter a pattern please: ";
chomp (my $input = <>);
my $pattern = qr($input)x;
open my $i_fh, '+<', $filename;
open my $o_fh, '>', $tmp;
while(<$i_fh>){
# move print here if you want to print the matching line
if( /$pattern/ ){
<$i_fh> for 1..14;
next;
}
print {$o_fh} $_ ;
}
close $o_fh;
close $i_fh;
use File::Copy
move $tmp, $filename;