i have a problem. This perl program should open all files and map them together, something similiar to paste command in unix system.
my #files;
for (#fileList ? #fileList : qw(-)) {
open $files[#files], '<', $_; #}
}
while (grep defined, (my #lines = map {scalar <$_>;} #files)) {
chomp #lines;
print join("\t", #lines), "\n";
}
The problem is that, when it comes to two different files like
One
Two
Three
And:
Apple
Banana
Orange
Kiwi
It throws me an error of uninitialized value.
Use of uninitialized value $lines[0] in chomp
Use of uninitialized value $lines[0] in join
Also same error vice versa, when files are Apple,Banana.. and One two three.
Thank you in advance
The problem is that since your files have a different length, you will get some undefined values in your #lines array. You cannot filter them out in the while condition, but you can filter them out afterwards, inside the loop:
while (grep defined, (my #lines = map {scalar <$_>;} #files)) {
#lines = map defined($_) ? $_ : "", #lines;
This will replace undefined values with the empty string, removing any uninitialized warnings for those values, while keeping your tabs correctly aligned.
Note that your code actually does what it is meant to do, because undef will stringify to the empty string. It is only because you have warnings on that you get the warnings about it. However, it is a very good idea to have warnings on, so errors such as this can be found and fixed properly.
You should note that using open without checking its return value is not a good idea, unless you explicitly handle the failed cases. The proper way is to do this:
open $files[#files], '<', $_ or die $!;
For simplicity one can also use
use autodie;
The autodie module will produce fatal errors for critical function calls such as open.
If you have Perl v5.10 or later installed (and you really should, as it is five years old now) then you can use the // (defined-or) operator to substitute a null string for undef after the end of the files. It will also impose scalar context on its operands, so there is no need to use the scalar operator. In addition you must substitute the length operator for defined, resulting in this code
while (grep length, (my #lines = map { <$_> // '' } #files)) {
chomp #lines;
print join("\t", #lines), "\n";
}
Related
I am reviewing for a test and I can't seem to get this example to code out right.
Problem: Write a perl script, called ileaf, which will linterleave the lines of a file with those of another file writing the result to a third file. If the files are a different length then the excess lines are written at the end.
A sample invocation:
ileaf file1 file2 outfile
This is what I have:
#!/usr/bin/perl -w
open(file1, "$ARGV[0]");
open(file2, "$ARGV[1]");
open(file3, ">$ARGV[2]");
while(($line1 = <file1>)||($line2 = <file2>)){
if($line1){
print $line1;
}
if($line2){
print $line2;
}
}
This sends the information to screen so I can immediately see the result. The final verson should "print file3 $line1;" I am getting all of file1 then all of file2 w/out and interleaving of the lines.
If I understand correctly, this is a function of the use of the "||" in my while loop. The while checks the first comparison and if it's true drops into the loop. Which will only check file1. Once file1 is false then the while checks file2 and again drops into the loop.
What can I do to interleave the lines?
You're not getting what you want from while(($line1 = <file1>)||($line2 = <file2>)){ because as long as ($line1 = <file1>) is true, ($line2 = <file2>) never happens.
Try something like this instead:
open my $file1, "<", $ARGV[0] or die;
open my $file2, "<", $ARGV[1] or die;
open my $file3, ">", $ARGV[2] or die;
while (my $f1 = readline ($file1)) {
print $file3 $f1; #line from file1
if (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
}
while (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
close $file1;
close $file2;
close $file3;
You'd think if they're teaching you Perl, they'd use the modern Perl syntax. Please don't take this personally. After all, this is how you were taught. However, you should know the new Perl programming style because it helps eliminates all sorts of programming mistakes, and makes your code easier to understand.
Use the pragmas use strict; and use warnings;. The warnings pragma replaces the need for the -w flag on the command line. It's actually more flexible and better. For example, I can turn off particular warnings when I know they'll be an issue. The use strict; pragma requires me to declare my variables with either a my or our. (NOTE: Don't declare Perl built in variables). 99% of the time, you'll use my. These variables are called lexically scoped, but you can think of them as true local variables. Lexically scoped variables don't have any value outside of their scope. For example, if you declare a variable using my inside a while loop, that variable will disappear once the loop exits.
Use the three parameter syntax for the open statement: In the example below, I use the three parameter syntax. This way, if a file is called >myfile, I'll be able to read from it.
**Use locally defined file handles. Note that I use my $file_1_fh instead of simply FILE_1_HANDLE. The old way, FILE_1_HANDLE is globally scoped, plus it's very difficult to pass the file handle to a function. Using lexically scoped file handles just works better.
Use or and and instead of || and &&: They're easier to understand, and their operator precedence is better. They're more likely not to cause problems.
Always check whether your open statement worked: You need to make sure your open statement actually opened a file. Or use the use autodie; pragma which will kill your program if the open statements fail (which is probably what you want to do anyway.
And, here's your program:
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
open my $file_1, "<", shift;
open my $file_2, "<", shift;
open my $output_fh, ">", shift;
for (;;) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
no warnings qw(uninitialized);
print {$output_fh} $line_1 . $line_2;
use warnings;
}
In the above example, I read from both files even if they're empty. If there's nothing to read, then $line_1 or $line_2 is simply undefined. After I do my read, I check whether both $line_1 and $line_2 are undefined. If so, I use last to end my loop.
Because my file handle is a scalar variable, I like putting it in curly braces, so people know it's a file handle and not a variable I want to print out. I don't need it, but it improves clarity.
Notice the no warnings qw(uninitialized);. This turns off the uninitialized warning I'll get. I know that either $line_1 or $line_3 might be uninitialized, so I don't want the warning. I turn it back on right below my print statement because it is a valuable warning.
Here's another way to do that for loop:
while ( 1 ) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
print {$output_fh} $line_1 if defined $line_1;
print {$output_fh} $line_2 if defined $line_2;
}
The infinite loop is a while loop instead of a for loop. Some people don't like the C style of for loop and have banned it from their coding practices. Thus, if you have an infinite loop, you use while ( 1 ) {. To me, maybe because I came from a C background, for (;;) { means infinite loop, and while ( 1 ) { takes a few extra milliseconds to digest.
Also, I check whether $line_1 or $line_2 is defined before I print them out. I guess it's better than using no warning and warning, but I need two separate print statements instead of combining them into one.
Here's another option that uses List::MoreUtils's zip to interleave arrays and File::Slurp to read and write files:
use strict;
use warnings;
use List::MoreUtils qw/zip/;
use File::Slurp qw/read_file write_file/;
chomp( my #file1 = read_file shift );
chomp( my #file2 = read_file shift );
write_file shift, join "\n", grep defined $_, zip #file1, #file2;
Just noticed Tim A has a nice solution already posted. This solution is a bit wordier, but might illustrate exactly what is going on a bit more.
The method I went with reads all of the lines from both files into two arrays, then loops through them using a counter.
#!/usr/bin/perl -w
use strict;
open(IN1, "<", $ARGV[0]);
open(IN2, "<", $ARGV[1]);
my #file1_lines;
my #file2_lines;
while (<IN1>) {
push (#file1_lines, $_);
}
close IN1;
while (<IN2>) {
push (#file2_lines, $_);
}
close IN2;
my $file1_items = #file1_lines;
my $file2_items = #file2_lines;
open(OUT, ">", $ARGV[2]);
my $i = 0;
while (($i < $file1_items) || ($i < $file2_items)) {
if (defined($file1_lines[$i])) {
print OUT $file1_lines[$i];
}
if (defined($file2_lines[$i])) {
print OUT $file2_lines[$i];
}
$i++
}
close OUT;
I'm reading this textfile to get ONLY the words in it and ignore all kind of whitespaces:
hello
now
do you see this.sadslkd.das,msdlsa but
i hoohoh
And this is my Perl code:
#!usr/bin/perl -w
require 5.004;
open F1, './text.txt';
while ($line = <F1>) {
#print $line;
#arr = split /\s+/, $line;
foreach $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
#print #arr;
}
close F1;
And this is the output:
hello
now
do
you
see
this.sadslkd.das,msdlsa
but
i
hoohoh
The output is showing two newlines but I am expecting the output to be just words. What should I do to just get words?
You should always use strict and use warnings (in preference to the -w command-line qualifier) at the top of every Perl program, and declare each variable at its first point of use using my. That way Perl will tell you about simple errors that you may otherwise overlook.
You should also use lexical file handles with the three-parameter form of open, and check the status to make sure it succeeded. There is little point in explicitly closing an input file unless you expect your program to run for an appreciable time, as Perl will close all files for you on exit.
Do you really need to require Perl v5.4? That version is fifteen years old, and if there is anything older than that installed then you have a museum!
Your program would be better like this:
use strict;
use warnings;
open my $fh, '<', './text.txt' or die $!;
while (my $line = <$fh>) {
my #arr = split /\s+/, $line;
foreach my $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
}
Note: my apologies. The warnings pragma and lexical file handles were introduced only in v5.6 so that part of my answer is irrelevant. The latest version of Perl is v5.16 and you really should upgrade
As Birei has pointed out, the problem is that, when the line has leading whitespace, there is a empty field before the first separator. Imagine if your data was comma-separated, then you would want Perl to report a leading empty field if the line started with a comma.
To extract all the non-space characters you can use a regular expression that does exactly that
my #arr = $line =~ /\S+/g;
and this can be emulated by using the default parameter for split which is a single quoted space (not a regular expression)
my #arr = $line =~ split ' ', $line;
In this case split behaves like the awk utility and discards any leading empty fields as you expected.
This is even simpler if you let Perl use the $_ variable in the read loop, as all of the parameters for split can be defaulted:
while (<F1>) {
my #arr = split;
foreach my $w (#arr) {
print "$w\n" if $w !~ /^\s+$/;
}
}
This line is the problem:
#arr=split(/\s+/,$line);
\s+ does a match just before the leading spaces. Use ' ' instead.
#arr=split(' ',$line);
I believe that in this line:
if(!($w =~ /^\s+$/))
You wanted to ask if there's nothing in this row - don't print it.
But the "+" in the REGEX actually force it to have at least 1 space.
If you change the "\s+" to "\s*", you'll see that it's working. because * is 0 occurrences or more ...
am very new to Perl and need your help
I have a CSV file xyz.csv with contents:
here level1 and er values are strings names...not numbers...
level1,er
level2,er2
level3,er3
level4,er4
I parse this CSV file using the script below and pass the fields to an array in the first run
open(my $d, '<', $file) or die "Could not open '$file' $!\n";
while (my $line = <$d>) {
chomp $line;
my #data = split "," , $line;
#XYX = ( [ "$data[0]", "$data[1]" ], );
}
For the second run I take an input from a command prompt and store in variable $val. My program should parse the CSV file from the value stored in variable until it reaches the end of the file
For example
I input level2 so I need a script to parse from the second line to the end of the CSV file, ignoring the values before level2 in the file, and pass these values (level2 to level4) to the #XYX = (["$data[1]","$data[1]"],);}
level2,er2
level3,er3
level4,er4
I input level3 so I need a script to parse from the third line to the end of the CSV file, ignoring the values before level3 in the file, and pass these values (level3 and level4) to the #XYX = (["$data[0]","$data[1]"],);}
level3,er3
level4,er4
How do I achieve that? Please do give your valuable suggestions. I appreciate your help
As long as you are certain that there are never any commas in the data you should be OK using split. But even so it would be wise to limit the split to two fields, so that you get everything up to the first comma and everything after it
There are a few issues with your code. First of all I hope you are putting use strict and use warnings at the top of all your Perl programs. That simple measure will catch many trivial problems that you could otherwise overlook, and so it is especially important before you ask for help with your code
It isn't commonly known, but putting a newline "\n" at the end of your die string prevent Perl from giving file and line number details in the output of where the error occurred. While this may be what you want, it is usually more helpful to be given the extra information
Your variable names are verly unhelpful, and by convention Perl variables consist of lower-case alphanumerics and underscores. Names like #XYX and $W don't help me understand your code at all!
Rather than splitting to an array, it looks like you would be better off putting the two fields into two scalar variables to avoid all that indexing. And I am not sure what you intend by #XYX = (["$data[1]","$data[1]"],). First of all do you really mean to use $data[1] twice? Secondly, your should never put scalar variables inside double quotes, as it does something very specific, and unless you know what that is you should avoid it. Finally, did you mean to push an anonymous array onto #XYX each time around the loop? Otherwise the contents of the array will be overwritten each time a line is read from the file, and the earlier data will be lost
This program uses a regular expression to extract $level_num from the first field. All it does it find the first sequence of digits in the string, which can then be compared to the minimum required level $min_level to decide whether a line from the log is relevant
use strict;
use warnings;
my $file = 'xyz.csv';
my $min_level = 3;
my #list;
open my $fh, '<', $file or die "Could not open '$file' $!";
while (my $line = <$fh>) {
chomp $line;
my ($level, $error) = split ',', $line, 2;
my ($level_num) = $level =~ /(\d+)/;
next unless $level_num >= $min_level;
push #list, [ $level, $error ];
}
For deciding which records to process you can use the "flip-flop" operator (..) along these lines.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $level = shift || 'level1';
while (<DATA>) {
if (/^\Q$level,/ .. 0) {
print;
}
}
__DATA__
level1,er
level2,er2
level3,er3
level4,er4
The flip-flop operator returns false until its first operand is true. At that point it returns false until its second operand is true; at which point it returns false again.
I'm assuming that your file is ordered so that once you start to process it, you never want to stop. That means that the first operand to the flip-flop can be /^\Q$level,/ (match the string $level at the start of the line) and the second operand can just be zero (as we never want it to stop processing).
I'd also strongly recommend not parsing CSV records using split /,/. That may work on your current data but, in general, the fields in a CSV file are allowed to contain embedded commas which will break this approach. Instead, have a look at Text::CSV or Text::ParseWords (which is included with the standard Perl distribution).
Update: I seem to have got a couple of downvotes on this. It would be great if people would take the time to explain why.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my #XYZ;
my $file = 'xyz.csv';
open my $fh, '<', $file or die "$file: $!\n";
my $level = shift; # get level from commandline
my $getall = not defined $level; # true if level not given on commandline
my $parser = Text::CSV->new({ binary => 1 }); # object for parsing lines of CSV
while (my $row = $parser->getline($fh)) # $row is an array reference containing cells from a line of CSV
{
if ($getall # if level was not given on commandline, then put all rows into #XYZ
or # if level *was* given on commandline, then...
$row->[0] eq $level .. 0 # ...wait until the first cell in a row equals $level, then put that row and all subsequent rows into #XYZ
)
{
push #XYZ, $row;
}
}
close $fh;
#!/usr/bin/perl
use strict;
use warnings;
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
my $level = shift ||"level1";
while (my $line = <$data>) {
chomp $line;
my #fields = split "," , $line;
if($fields[0] eq $level .. 0){
print "\n$fields[0]\n";
print "$fields[1]\n";
}}
This worked....thanks ALL for your help...
I have some code that looks like
my ($ids,$nIds);
while (<myFile>){
chomp;
$ids.= $_ . " ";
$nIds++;
}
This should concatenate every line in my myFile, and nIds should be my number of lines. How do I print out my $ids and $nIds?
I tried simply print $ids, but Perl complains.
my ($ids, $nIds)
is a list, right? With two elements?
print "Number of lines: $nids\n";
print "Content: $ids\n";
How did Perl complain? print $ids should work, though you probably want a newline at the end, either explicitly with print as above or implicitly by using say or -l/$\.
If you want to interpolate a variable in a string and have something immediately after it that would looks like part of the variable but isn't, enclose the variable name in {}:
print "foo${ids}bar";
You should always include all relevant code when asking a question. In this case, the print statement that is the center of your question. The print statement is probably the most crucial piece of information. The second most crucial piece of information is the error, which you also did not include. Next time, include both of those.
print $ids should be a fairly hard statement to mess up, but it is possible. Possible reasons:
$ids is undefined. Gives the warning undefined value in print
$ids is out of scope. With use
strict, gives fatal warning Global
variable $ids needs explicit package
name, and otherwise the undefined
warning from above.
You forgot a semi-colon at the end of
the line.
You tried to do print $ids $nIds,
in which case perl thinks that $ids
is supposed to be a filehandle, and
you get an error such as print to
unopened filehandle.
Explanations
1: Should not happen. It might happen if you do something like this (assuming you are not using strict):
my $var;
while (<>) {
$Var .= $_;
}
print $var;
Gives the warning for undefined value, because $Var and $var are two different variables.
2: Might happen, if you do something like this:
if ($something) {
my $var = "something happened!";
}
print $var;
my declares the variable inside the current block. Outside the block, it is out of scope.
3: Simple enough, common mistake, easily fixed. Easier to spot with use warnings.
4: Also a common mistake. There are a number of ways to correctly print two variables in the same print statement:
print "$var1 $var2"; # concatenation inside a double quoted string
print $var1 . $var2; # concatenation
print $var1, $var2; # supplying print with a list of args
Lastly, some perl magic tips for you:
use strict;
use warnings;
# open with explicit direction '<', check the return value
# to make sure open succeeded. Using a lexical filehandle.
open my $fh, '<', 'file.txt' or die $!;
# read the whole file into an array and
# chomp all the lines at once
chomp(my #file = <$fh>);
close $fh;
my $ids = join(' ', #file);
my $nIds = scalar #file;
print "Number of lines: $nIds\n";
print "Text:\n$ids\n";
Reading the whole file into an array is suitable for small files only, otherwise it uses a lot of memory. Usually, line-by-line is preferred.
Variations:
print "#file" is equivalent to
$ids = join(' ',#file); print $ids;
$#file will return the last index
in #file. Since arrays usually start at 0,
$#file + 1 is equivalent to scalar #file.
You can also do:
my $ids;
do {
local $/;
$ids = <$fh>;
}
By temporarily "turning off" $/, the input record separator, i.e. newline, you will make <$fh> return the entire file. What <$fh> really does is read until it finds $/, then return that string. Note that this will preserve the newlines in $ids.
Line-by-line solution:
open my $fh, '<', 'file.txt' or die $!; # btw, $! contains the most recent error
my $ids;
while (<$fh>) {
chomp;
$ids .= "$_ "; # concatenate with string
}
my $nIds = $.; # $. is Current line number for the last filehandle accessed.
How do I print out my $ids and $nIds?
print "$ids\n";
print "$nIds\n";
I tried simply print $ids, but Perl complains.
Complains about what? Uninitialised value? Perhaps your loop was never entered due to an error opening the file. Be sure to check if open returned an error, and make sure you are using use strict; use warnings;.
my ($ids, $nIds) is a list, right? With two elements?
It's a (very special) function call. $ids,$nIds is a list with two elements.
I have a hash in which I store the products a customer buys (%orders). It uses the product code as key and has a reference to an array with the other info as value.
At the end of the program, I have to rewrite the inventory to the updated version (i.e. subtract the quantity of the bought items)
This is how I do rewrite the inventory:
sub rewriteInventory{
open(FILE,'>inv.txt');
foreach $key(%inventory){
print FILE "$key\|$inventory{$key}[0]\|$inventory{$key}[1]\|$inventory{$key}[2]\n"
}
close(FILE);
}
where $inventory{$key}[x] is 0 → Title, 1 → price, 2 → quantity.
The problem here is that when I look at inv.txt afterwards, I see things like this:
CD-911|Lady Gaga - The Fame|15.99|21
ARRAY(0x145030c)|||
BOOK-1453|The Da Vinci Code - Dan Brown|14.75|12
ARRAY(0x145bee4)|||
Where do these ARRAY(0x145030c)||| entries come from? Or more important, how do I get rid of them?
You want to iterate over
keys %inventory
and not
%inventory
which, as you see, causes you to iterate over key, value pairs.
You're using your hash in list context, so you're getting all your values thrown in with your keys. I think what you actually want to do is:
foreach $key (keys %inventory) {
print FILE "...";
}
EDIT: I was wrong about having to use an explicit dereferencing arrow; this is inferred between brackets when necessary, even if the first brackets do NOT require a dereference. That said, I will leave the remainder of the answer as posted since it was accepted, but merely note that if you choose not to use join, you needn't actually use $inventory{$key}->[0] but can in fact use $inventory{$key}[0] as originally posted.
Just be aware that the first (hash) brackets do not imply a dereference but the second (array) brackets do. Your errant array refs in the output were coming from looping over not only keys but also values of the hash.
ORIGINAL ANSWER:
In addition to using keys, you also need to dereference the references-to-array (this is why you're seeing each value output as ARRAY with an address---you're printing the references, not the values of the dereferenced array) when you print, so your loop becomes something like:
foreach my $key (sort keys %inventory) {
print FILE "$key\|$inventory{$key}->[0]\|$inventory{$key}->[1]\|$inventory{$key}->[2]\n";
}
I'd probably rewrite it a little more idiomatically as:
foreach my $key (sort keys %inventory) {
print FILE (join '|', $key, #{$inventory{$key}}) . "\n";
}
Hope that helps!
Here is one way to write that routine:
#!/usr/bin/perl
use strict; use warnings;
my %inventory;
while ( my $line = <DATA> ) {
chomp $line;
my ($key, #values) = split qr{\|}, $line;
last unless #values;
$inventory{$key} = \#values;
}
write_inventory(\%inventory, 'test.txt');
sub write_inventory {
my ($inventory, $output_file) = #_;
open my $output, '>', $output_file
or die "Cannot open '$output_file': $!";
for my $item ( keys %$inventory ) {
unless ( 'ARRAY' eq ref $inventory{$item} ) {
warn "Invalid item '$item' in inventory\n";
next;
}
print $output join('|', $item, #{ $inventory{$item} }), "\n";
}
close $output
or die "Cannot close '$output': $!";
}
__DATA__
CD-911|Lady Gaga - The Fame|15.99|21
BOOK-1453|The Da Vinci Code - Dan Brown|14.75|12
The rules are:
Don't use global variables: Pass a reference to %inventory to write_inventory instead of having it operate on the global %inventory.
Don't use global variables: Instead of using the bareword file handle FILE which has package scope, use a lexical file handle whose scope is limited to write_inventory.
Check for errors on file operations: Make sure open succeeded before plowing ahead and trying to write. Make sure close succeeded before assuming all data you printed actually got saved.
You MUST use strict and warnings because, at this point in your learning process, you do not know enough to know what you do not know.o