File manipulation in Perl - perl

I have a simple .csv file that has that I want to extract data out of a write to a new file.
I to write a script that reads in a file, reads each line, then splits and structures the columns in a different order, and if the line in the .csv contains 'xxx' - dont output the line to output file.
I have already managed to read in a file, and create a secondary file, however am new to Perl and still trying to work out the commands, the following is a test script I wrote to get to grips with Perl and was wondering if I could aulter this to to what I need?-
open (FILE, "c1.csv") || die "couldn't open the file!";
open (F1, ">c2.csv") || die "couldn't open the file!";
#print "start\n";
sub trim($);
sub trim($)
{
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
$a = 0;
$b = 0;
while ($line=<FILE>)
{
chop($line);
if ($line =~ /xxx/)
{
$addr = $line;
$post = substr($line, length($line)-18,8);
}
$a = $a + 1;
}
print $b;
print " end\n";
Any help is much appreciated.

To manipulate CSV files it is better to use one of the available modules at CPAN. I like Text::CSV:
use Text::CSV;
my $csv = Text::CSV->new ({ binary => 1, empty_is_undef => 1 }) or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'c1.csv' or die "ERROR: $!";
$csv->column_names('field1', 'field2');
while ( my $l = $csv->getline_hr($fh)) {
next if ($l->{'field1'} =~ /xxx/);
printf "Field1: %s Field2: %s\n", $l->{'field1'}, $l->{'field2'}
}
close $fh;

If you need do this only once, so don't need the program later you can do it with oneliner:
perl -F, -lane 'next if /xxx/; #n=map { s/(^\s*|\s*$)//g;$_ } #F; print join(",", (map{$n[$_]} qw(2 0 1)));'
Breakdown:
perl -F, -lane
^^^ ^ <- split lines at ',' and store fields into array #F
next if /xxx/; #skip lines what contain xxx
#n=map { s/(^\s*|\s*$)//g;$_ } #F;
#trim spaces from the beginning and end of each field
#and store the result into new array #n
print join(",", (map{$n[$_]} qw(2 0 1)));
#recombine array #n into new order - here 2 0 1
#join them with comma
#print
Of course, for the repeated use, or in a bigger project you should use some CPAN module. And the above oneliner has much cavetas too.

Related

Check how many "," in each line in Perl [duplicate]

This question already has answers here:
Counting number of occurrences of a string inside another (Perl)
(4 answers)
Closed 7 years ago.
I have to check how many times was "," in each line in file. Anybody have idea how can I do it in Perl?
On this moment my code looks like it:
open($list, "<", $student_list)
while ($linelist = <$list>)
{
printf("$linelist");
}
close($list)
But I have no idea how to check how many times is "," in each $linelist :/
Use the transliteration operator in counting mode:
my $commas = $linelist =~ y/,//;
Edited in your code :
use warnings;
use strict;
open my $list, "<", "file.csv" or die $!;
while (my $linelist = <$list>)
{
my $commas = $linelist =~ y/,//;
print "$commas\n";
}
close($list);
If you just want to count the number of somethings in a file, you don't need to read it into memory. Since you aren't changing the file, mmap would be just fine:
use File::Map qw(map_file);
map_file my $map, $filename, '<';
my $count = $map =~ tr/,//;
#! perl
# perl script.pl [file path]
use strict;
use warnings;
my $file = shift or die "No file name provided";
open(my $IN, "<", $file) or die "Couldn't open file $file: $!";
my #matches = ();
my $index = 0;
# while <$IN> will get the file one line at a time rather than loading it all into memory
while(<$IN>){
my $line = $_;
my $current_count = 0;
# match globally, meaning keep track of where the last match was
$current_count++ while($line =~ m/,/g);
$matches[$index] = $current_count;
$index++;
}
$index = 0;
for(#matches){
$index++;
print "line $index had $_ matches\n"
}
You can use mmap Perl IO layer instead of File::Map. It is almost as efficient as former but most probably present in your Perl installation without needing installing a module. Next, using y/// is more efficient than m//g in array context.
use strict;
use warnings;
use autodie;
use constant STUDENT_LIST => 'text.txt';
open my $list, '<:mmap', STUDENT_LIST;
while ( my $line = <$list> ) {
my $count = $line =~ y/,//;
print "There is $count commas at $.. line.\n";
}
If you would like grammatically correct output you can use Lingua::EN::Inflect in the right place
use Lingua::EN::Inflect qw(inflect);
print inflect "There PL_V(is,$count) $count PL_N(comma,$count) at ORD($.) line.\n";
Example output:
There are 7 commas at 1st line.
There are 0 commas at 2nd line.
There is 1 comma at 3rd line.
There are 2 commas at 4th line.
There are 7 commas at 5th line.
Do you want #commas for each line in the file, or #commas in the entire file?
On a per-line basis, replace your while loop with:
my #data = <list>;
foreach my $line {
my #chars = split //, $line;
my $count = 0;
foreach my $c (#chars) { $count++ if $c eq "," }
print "There were $c commas\n";
}

Parsing a CSV file and Hashing

I am trying to parse a CSV file to read in all the other zip codes. I am trying to create a hash where each key is a zip code and the value is the number it appears in the file. Then I want to print out the contents as Zip Code - Number. Here is the Perl script I have so far.
use strict;
use warnings;
my %hash = qw (
zipcode count
);
my $file = $ARGV[0] or die "Need CSV file on command line \n";
open(my $data, '<', $file) or die "Could not open '$file $!\n";
while (my $line = <$data>) {
chomp $line;
my #fields = split "," , $line;
if (exists($hash{$fields[2]})) {
$hash{$fields[1]}++;
}else {
$hash{$fields[1]} = 1;
}
}
my $key;
my $value;
while (($key, $value) = each(%hash)) {
print "$key - $value\n";
}
exit;
You don't say which column your zip code is in, but you are using the third field to check for an existing hash element, and then the second field to increment it.
There is no need to check whether a hash element already exists: Perl will happily create a non-existent hash element and increment it to 1 the first time you access it.
There is also no need to explicitly open any files passed as command line parameters: Perl will open them and read them if you use the <> operator without a file handle.
This reworking of your own program may work. It assumes the zip code is in the second column of the CSV. If it is anywhere else just change ++$hash{$fields[1]} appropriately.
use strict;
use warnings;
#ARGV or die "Need CSV file on command line \n";
my %counts;
while (my $line = <>) {
chomp $line;
my #fields = split /,/, $line;
++$counts{$fields[1]};
}
while (my ($key, $value) = each %counts) {
print "$key - $value\n";
}
Sorry if this is off-topic, but if you're on a system with the standard Unix text processing tools, you could use this command to count the number of occurrences of each value in field #2, and not need to write any code.
cut -d, -f2 filename.csv | sort | uniq -c
which will generate something like this output, where the count is listed first, and the zipcode second:
12 12345
2 56789
34 78912
1 90210

Parsing Tab Delimited File into an array

I am attempting to read a CSV into an array in a way that I can access each column in a row. However when I run the following code with the goal of printing a specific column from each row, it only outputs empty lines.
#set command line arguments
my ($infi, $outdir, $idcol) = #ARGV;
#lead file of data to get annotations for
open FILE, "<", $infi or die "Can't read file '$infi' [$!]\n";
my #data;
foreach my $row (<FILE>){
chomp $row;
my #cells = split /\t/, $row;
push #data, #cells;
}
#fetch genes
foreach (#data){
print "#_[$idcol]\n";
# print $geneadaptor->fetch_by_dbID($_[$idcol]);
}
With a test input of
a b c
1 2 3
d e f
4 5 6
I think the issue here isn't so much loading the file, but in treating the resulting array. How should I be approaching this problem?
First of all you need to push #data, \#cells, otherwise you will get all the fields concatenated into a single list.
Then you need to use the loop value in the second for loop.
foreach (#data){
print $_->[$idcol], "\n";
}
#_ is a completely different variable from $_ and is unpopulated here.
You should also consider using
while (my $row = <FILE>) { ... }
to read your file. It reads only a single line at a time whereas for will read the entire file into a list of lines before iterating over it.
I recommend to avoid parsing the CSV file directly and using the Text::CSV module.
use Text::CSV;
use Carp;
#set command line arguments
my ($infi, $outdir, $idcol) = #ARGV;
my $csv = Text::CSV->new({
sep_char => "\t"
});
open(my $fh, "<:encoding(UTF-8)", $infi) || croak "can't open $infi: $!";
# Uncomment if you need to skip header line
# <$fh>;
while (<$fh>) {
if ($csv->parse($_)) {
my #columns = $csv->fields();
print "$columns[0]\t$columns[1]\t$columns[2]\n";
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close $fh;

how to find the first occurrence of a string in all the files in a folder in perl

I'm trying to find the line of first occurrence of the string "victory" in each txt file in a folder. For each first "victory" in file I would like to save the number from that line to #num and the file name to #filename
Example: For the file a.txt that starts with the line: "lalala victory 123456" -> $num[$i]=123456 and $filename[$i]="a.txt"
ARGV holds all the file names. my problem is that I'm trying to go line by line and I don't know what I'm doing wrong.
one more thing - how can I get the last occurrence of "victory" in the last file??
use strict;
use warnings;
use File::Find;
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #ARGV, $File::Find::name } }, $dir); $^I = ".bak";
my $argvv;
my $counter=0;
my $prev_arg=0;
my $line = 0;
my #filename=0;
my #num=0;
my $i = 0;
foreach $argvv (#ARGV)
{
#open $line, $argvv or die "Could not open file: $!";
my $line = IN
while (<$line>)
{
if (/victory/)
{
$line = s/[^0-9]//g;
$first_bit[$i] = $line;
$filename[$i]=$argvv;
$i++;
last;
}
}
close $line;
}
for ($i=0; $i<3; $i++)
{
print $filename[$i]." ".$num[$i]."\n";
}
Thank you very much! :)
Your example script has a number of minor problems. The following example should do what you want in a fairly clean manner:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
# Find the files we're interested in parsing
my #files = ();
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #files, $File::Find::name } }, $dir);
# We'll store our results in a hash, rather than in 2 arrays as you did
my %foundItems = ();
foreach my $file (#files)
{
# Using a lexical file handle is the recommended way to open files
open my $in, '<', $file or die "Could not open $file: $!";
while (<$in>)
{
# Uncomment the next two lines to see what's being parsed
# chomp; # Not required, but helpful for the debug print below
# print "$_\n"; # Print out the line being parsed; for debugging
# Capture the number if we find the word 'victory'
# This assumes the number is immediately after the word; if that
# is not the case, it's up to you to modify the logic here
if (m/victory\s+(\d+)/)
{
$foundItems{$file} = $1; # Store the item
last;
}
}
close $in;
}
foreach my $file (sort keys %foundItems)
{
print "$file=> $foundItems{$file}\n";
}
the below searches for a string abc in all the files(file*.txt) and prints only the first line.
perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' file*.txt
tested:
> cat temp
abc 11
22
13
,,
abc 22
bb
cc
,,
ww
kk
ll
,,
> cat temp2
abc t goes into 1000
fileA1, act that abc specific place
> perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' temp temp2
abc 11
abc t goes into 1000
>

Why does my Perl script say "Can't call method parse on an undefined value"?

I am new to Perl and still trying to figure out how to code in this language.
I am currently trying to split a long single string of csv into multiple lines.
Data example
a,b,c<br />x,y,x<br />
which I so far have manage to split up, adding in quotes, to add into a CSV file again later on:
"a,b,c""x,y,z"
By having the quotes it just signifies which sets of CSV are together as such.
The problem I am having is that when I try and create a CSV file, passing in data in a string i am getting an error
"Can't call method "parse" on an undefined variable.
When I print out the string which I am passing in, it is defined and holds data. I am hoping that this is something simple which I am doing wrong through lack of experience.
The CSV code which I am using is:
use warnings;
use Text::CSV;
use Data::Dumper;
use constant debug => 0;
use Text::CSV;
print "Running CSV editor......\n";
#my $csv = Text::CSV->new({ sep_char => ',' });
my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";
my $fileextension = substr($file, -4);
#If the file is a CSV file then read in the file.
if ($fileextension =~ m/csv/i)
{
print "Reading and formating: $ARGV[0] \n";
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
my #fields;
my $testline;
my $line;
while ($line = <$data>)
{
#Clears the white space at the end of the line.
chomp $line;
#Splits the line up and removes the <br />.
$testline = join "\" \" ", split qr{<br\s?/>}, $line;
#my $newStr = join $/, #lines;
#print $newStr;
my $q1 = "\"";
$testline = join "", $q1,$testline,$q1;
print "\n printing testline: \n $testline \n";
}
$input_string = $testline;
print "\n Testing input string line:\n $input_string";
if ($csv->parse ($input_string))
{
my #field = $csv->fields;
foreach my $col (0 .. $#field) {
my $quo = $csv->is_binary ($col) ? $csv->{quote_char} : "";
printf "%2d: %s%s%s\n", $col, $quo, $field[$col], $quo;#
}
}
else
{
print STDERR "parse () failed on argument: ",
$csv->error_input, "\n";
$csv->error_diag ();
}
#print $_,$/ for #lines;
print "\n Finished reading and formating: $ARGV[0] \n";
}else
{
print "Error: File is not a CSV file\n"
}
You did not create a Text::CSV object, but you try to use it.
"Can't call method "parse" on an undefined variable
This means that your $csv is not there, thus it does not have a method called parse. Simply create a Text::CSV object first, at the top of your code below all the use lines.
my $csv = Text::CSV->new;
Pleae take a look at the CPAN documentation of Text::CSV.
Also, did I mention you should use strict?