Perl - count number of columns per row in a csv file - perl

I want to count the number of columns in a row for a CSV file.
row 1 10 columns
row 2 11 columns
etc.
I can print out the value of the last column, but I really just want a count per row.
perl -F, -lane "{print #keys[$_].$F[$_] foreach(-1)}" < testing.csv
I am on a windows machine
Thanks.

If you have a proper csv file, it can contain embedded delimiters (e.g. 1,"foo,bar",2), in which case a simple split will not be enough. You can use the Text::CSV module fairly easily with a one-liner like this:
Copy/paste version:
perl -MText::CSV -lwe"my $c=Text::CSV->new({sep_char=>','}); while($r=$c->getline(*STDIN)) { print scalar #$r }" < sorted.csv
Readable version:
perl -MText::CSV # use Text::CSV module
-lwe # add newline to print, use warnings
"my $c = Text::CSV->new(); # set up csv object
while( $r = $c->getline(*STDIN) ) { # get lines from stdin
print scalar #$r # print row size
}" < sorted.csv # input file to stdin
If your input can be erratic, Text::CSV->getline might choke on corrupted lines (the while loop is ended), in which case it may be safer to use plain parsing:
perl -MText::CSV -nlwe"
BEGIN { $r = Text::CSV->new() };
$r->parse($_);
print scalar $r->fields
" comma.csv
Note that in this case we use a different input method. This is because while getline() requires a file handle, parse() does not. Since the diamond operator uses either ARGV or STDIN depending on your argument, I find it is better to be explicit.

If you don't have commas as part of the fields, you can split the line and count the number of fields
#! /usr/bin/perl
use strict;
use warnings;
my #cols = split(',', $_);
my $n = #cols;
print "row $. $n columns\n";
you can call this
perl -n script.pl testing.csv

Related

How I search and print matched wold in UNIX or perl?

1=ABC,2=mnz,3=xyz
1=pqr,3=ijk,2=lmn
I have this in text file I want to search 1= and that should print only matched word 1=ABC and 1=pqr
Any suggestions in Perl or Unix?
Input:
$ cat grep.in
1=ABC,2=mnz,3=xyz
1=pqr,3=ijk,2=lmn
4=pqr,3=ijk,2=lmn
Command:
$ grep -o '1=[^,]\+' grep.in
1=ABC
1=pqr
Explanations:
You can just use grep on your input
-o is to output only the matching pattern
1=[^,]\+ the regex will match strings that start by 1= followed by at least one character that is not a comma (I have based this on the hypothesis that there is no comma in the right part of the = except the separator)
if you want to accept empty result you can change the \+ by *
It appears that your input data is in CSV format. Here is a Perl solution based on Text::CSV
parse the CSV content row-wise
print out columns that start with 1=
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
eol => "\n",
}) or die "CSV\n";
# parse
while (my $row = $csv->getline(\*DATA)) {
foreach (#{ $row }) {
print "$_\n" if /^1=/;
}
}
exit 0;
__DATA__
1=ABC,2=mnz,3=xyz
1=pqr,3=ijk,2=lmn
Test run:
$ perl dummy.pl
1=ABC
1=pqr
Replace DATA with STDIN to read the input from standard input instead.

Replace single space with multiple spaces in perl

I have a requirement of replacing a single space with multiple spaces so that the second field always starts at a particular position (here 36 is the position of second field always).
I have a perl script written for this:
while(<INP>)
{
my $md=35-index($_," ");
my $str;
$str.=" " for(1..$md);
$_=~s/ +/$str/;
print "$_" ;
}
Is there any better approach with just using the regex in =~s/// so that I can use it on CLI directly instead of script.
Assuming that the fields in your data are demarcated by spaces
while (<$fh>) {
my ($first, #rest) = split;
printf "%-35s #rest\n", $first;
}
The first field is now going to be 36 wide, aligned left due to - in the format of printf. See sprintf for the many details. The rest is printed with single spaces between the original space-separated fields, but can instead be done as desired (tab separated, fixed width...).
Or you can leave the "rest" after the first field untouched by splitting the line into two parts
while (<$fh>) {
my ($first, $rest) = /(\S+)\s+(.*)/;
printf "%-35s $rest\n", $first;
}
(or use split ' ', $_, 2 instead of regex)
Please give more detail if there are other requirements.
One approach is to use plain ol' Perl formats:
#!/usr/bin/perl
use warnings;
use strict;
my($first, $second, $remainder);
format STDOUT =
#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< #<<<<<< #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$first, $second,$remainder
.
while (<DATA>) {
($first, $second, $remainder) = split(/\s+/, $_, 3);
write;
}
exit 0;
__DATA__
ABCD TEST EFGH don't touch
FOO BAR FUD don't touch
Test output. I probably miscounted the columns, but you should get the idea:
$ perl dummy.pl
ABCD TEST EFGH don't touch
FOO BAR FUD don't touch
Other option would be Text::Table

print lines after finding a key word in perl

I have a variable $string and i want to print all the lines after I find a keyword in the line (including the line with keyword)
$string=~ /apple /;
I'm using this regexp to find the key word but I do not how to print lines after this keyword.
It's not really clear where your data is coming from. Let's assume it's a string containing newlines. Let's start by splitting it into an array.
my #string = split /\n/, $string;
We can then use the flip-flop operator to decide which lines to print. I'm using \0 as a regex that is very unlikely to match any string (so, effectively, it's always false).
for (#string) {
say if /apple / .. /\0/;
}
Just keep a flag variable, set it to true when you see the string, print if the flag is true.
perl -ne 'print if $seen ||= /apple/'
If your data in scalar variable we can use several methods
Recommended method
($matching) = $string=~ /([^\n]*apple.+)/s;
print "$matching\n";
And there is another way to do it
$string=~ /[^\n]*apple.+/s;
print $&; #it will print the data which is match.
If you reading the data from file, try the following
while (<$fh>)
{
if(/apple/)
{
print <$fh>;
}
}
Or else try the following one liner
perl -ne 'print <> and exit if(/apple/);' file.txt

How to store value from cut command into a Perl array?

my #up = `cat abc.txt|head -2|tail -1|cut -d' ' -f1-3`;
Instead of storing the individual fields in the array. It's storing the entire output as a string in the first element.
This is the output I am getting
$up[0] = 'xxx 12 234'
I want this
#up = ('xxx', 12, 234)
|
It looks like you want the first three space-delimited fields of the second line of file abc.txt
The problem is that backticks will return one line of output in each element of the array, and because cut prints all three fields on a single line, they appear as a single array element.
You could split the value again inside Perl, but when you have the whole of the Perl language available, it's wasteful to use the shell to do something so simple and you should do everything in Perl
This program will do as you ask. I've used Data::Dump only so that you can verify that the contents of #up are as you wanted
use strict;
use warnings 'all';
use Data::Dump;
my #up = do {
open my $fh, '<', 'abc.txt' or die $!;
<$fh>; # Skip one line
(split ' ', <$fh>)[0 .. 2];
};
dd \#up;
output
["xxx", 12, 234]
You can either split the result by whitespaces:
my #up = split(/\s+/, `cat abc.txt ...`);
Or prior you can set input record separator to space. This one however is not as flexible, it's just simple string so in case there are two spaces in a row it will treat it as empty field in the middle:
local $/ = " ";
my #up = `cat abc.txt ...`;

Summing a column of numbers in a text file using Perl

Ok, so I'm very new to Perl. I have a text file and in the file there are 4 columns of data(date, time, size of files, files). I need to create a small script that can open the file and get the average size of the files. I've read so much online, but I still can't figure out how to do it. This is what I have so far, but I'm not sure if I'm even close to doing this correctly.
#!/usr/bin/perl
open FILE, "files.txt";
##array = File;
while(FILE){
#chomp;
($date, $time, $numbers, $type) = split(/ /,<FILE>);
$total += $numbers;
}
print"the total is $total\n";
This is how the data looks in the file. These are just a few of them. I need to get the numbers in the third column.
12/02/2002 12:16 AM 86016 a2p.exe
10/10/2004 11:33 AM 393 avgfsznew.pl
11/01/2003 04:42 PM 38124 c2ph.bat
Your program is reasonably close to working. With these changes it will do exactly what you want
Always use use strict and use warnings at the start of your program, and declare all of your variables using my. That will help you by finding many simple errors that you may otherwise overlook
Use lexical file handles, the three-parameter form of open, and always check the return status of any open call
Declare the $total variable outside the loop. Declaring it inside the loop means it will be created and destroyed each time around the loop and it won't be able to accumulate a total
Declare a $count variable in the same way. You will need it to calculate the average
Using while (FILE) {...} just tests that FILE is true. You need to read from it instead, so you must use the readline operator like <FILE>
You want the default call to split (without any parameters) which will return all the non-space fields in $_ as a list
You need to add a variable in the assignment to allow for athe AM or PM field in each line
Here is a modification of your code that works fine
use strict;
use warnings;
open my $fh, '<', "files.txt" or die $!;
my $total = 0;
my $count = 0;
while (<$fh>) {
my ($date, $time, $ampm, $numbers, $type) = split;
$total += $numbers;
$count += 1;
}
print "The total is $total\n";
print "The count is $count\n";
print "The average is ", $total / $count, "\n";
output
The total is 124533
The count is 3
The average is 41511
It's tempting to use Perl's awk-like auto-split option. There are 5 columns; three containing date and time information, then the size and then the name.
The first version of the script that I wrote is also the most verbose:
perl -n -a -e '$total += $F[3]; $num++; END { printf "%12.2f\n", $total / ($num + 0.0); }'
The -a (auto-split) option splits a line up on white space into the array #F. Combined with the -n option (which makes Perl run in a loop that reads the file name arguments in turn, or standard input, without printing each line), the code adds $F[3] (the fourth column, counting from 0) to $total, which is automagically initialized to zero on first use. It also counts the lines in $num. The END block is executed when all the input is read; it uses printf() to format the value. The + 0.0 ensures that the arithmetic is done in floating point, not integer arithmetic. This is very similar to the awk script:
awk '{ total += $4 } END { print total / NR }'
First drafts of programs are seldom optimal — or, at least, I'm not that good a programmer. Revisions help.
Perl was designed, in part, as an awk killer. There is still a program a2p distributed with Perl for converting awk scripts to Perl (and there's also s2p for converting sed scripts to Perl). And Perl does have an automatic (built-in) variable that keeps track of the number of lines read. It has several names. The tersest is $.; the mnemonic name $NR is available if you use English; in the script; so is $INPUT_LINE_NUMBER. So, using $num is not necessary. It also turns out that Perl does a floating point division anyway, so the + 0.0 part was unnecessary. This leads to the next versions:
perl -MEnglish -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $NR; }'
or:
perl -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $.; }'
You can tune the print format to suit your whims and fancies. This is essentially the script I'd use in the long term; it is fairly clear without being long-winded in any way. The script could be split over multiple lines if you desired. It is a simple enough task that the legibility of the one-line is not a problem, IMNSHO. And the beauty of this is that you don't have to futz around with split and arrays and read loops on your own; Perl does most of that for you. (Granted, it does blow up on empty input; that fix is trivial; see below.)
Recommended version
perl -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $. if $.; }'
The if $. tests whether the number of lines read is zero or not; the printf and division are omitted if $. is zero so the script outputs nothing when given no input.
There is a noble (or ignoble) game called 'Code Golf' that was much played in the early days of Stack Overflow, but Code Golf questions are no longer considered good questions. The object of Code Golf is to write a program that does a particular task in as few characters as possible. You can play Code Golf with this and compress it still further if you're not too worried about the format of the output and you're using at least Perl 5.10:
perl -Mv5.10 -n -a -e '$total += $F[3]; END { say $total / $. if $.; }'
And, clearly, there are a lot of unnecessary spaces and letters in there:
perl -Mv5.10 -nae '$t+=$F[3];END{say$t/$.if$.}'
That is not, however, as clear as the recommended version.
#!/usr/bin/perl
use warnings;
use strict;
open my $file, "<", "files.txt";
my ($total, $cnt);
while(<$file>){
$total += (split(/\s+/, $_))[3];
$cnt++;
}
close $file;
print "number of files: $cnt\n";
print "total size: $total\n";
printf "avg: %.2f\n", $total/$cnt;
Or you can use awk:
awk '{t+=$4} END{print t/NR}' files.txt
Try doing this :
#!/usr/bin/perl -l
use strict; use warnings;
open my $file, '<', "my_file" or die "open error [$!]";
my ($total, $count);
while (<$file>){
chomp;
next if /^$/;
my ($date, $time, $x, $numbers, $type) = split;
$total += $numbers;
$count++;
}
print "the average is " . $total/$count . " and the total is $total";
close $file;
It is as simple as this:
perl -F -lane '$a+=$F[3];END{print "The average size is ".$a/$.}' your_file
tested below:
> cat temp
12/02/2002 12:16 AM 86016 a2p.exe
10/10/2004 11:33 AM 393 avgfsznew.pl
11/01/2003 04:42 PM 38124 c2ph.bat
Now the execution:
> perl -F -lane '$a+=$F[3];END{print "The average size is ".$a/$.}' temp
The average size is 41511
>
explanation:
-F -a says store the line in an array format.with the default separator as space or tab.
so nopw $F[3] has you size of the file.
sum up all the sizes in the 4th column untill all the lines are processed.
END will be executed after processing all the lines in the file.
so $. at the end will gives the number of lines.
so $a/$. will give the average.
This solution opens the file and loops through each line of the file. It then splits the file into the five variables in the line by splitting on 1 or more spaces.
open the file for reading, "<", and if it fails, raise an error or die "..."
my ($total, $cnt) are our column total and number of files added count
while(<FILE>) { ... } loops through each line of the file using the file handle and stores the line in $_
chomp removes the input record separator in $_. In unix, the default separator is a newline \n
split(/\s+/, $_) Splits the current line represented by$_, with the delimiter \s+. \s represents a space, the + afterward means "1 or more". So, we split the next line on 1 or more spaces.
Next we update $total and $cnt
#!/usr/bin/perl
open FILE, "<", "files.txt" or die "Error opening file: $!";
my ($total, $cnt);
while(<FILE>){
chomp;
my ($date, $time, $am_pm, $numbers, $type) = split(/\s+/, $_);
$total += $numbers;
$cnt++;
}
close FILE;
print"the total is $total and count of $cnt\n";`