Perl open file from command line with wildcard - perl

I am executing my script this way:
./script.pl -f files*
I looked at some other threads (like How can I open a file in Perl using a wildcard in the directory name?)
If i hard code the file name like it is written in this thread I get my desired result. If I take it from the command line it does not.
My options subroutine should save all the files I get this way in an array.
my #file;
sub Options{
my $i=0;
foreach my $opt (#ARGV){
switch ($opt){
case "-f" {
$i++;
### This part does not work:
#file= glob $ARGV[$i];
print Dumper("$ARGV[$i]"); #$VAR1 = 'files';
print Dumper(#file); #$VAR1 = 'files';
}
}
$i++;
}
}
It seems the execution is interpreted in advance and the wildcard (*) is dropped in the process.
Desired result: All files beginning with files are saved in an array, after execution from the command line.
I hope you get my problem. If not feel free to ask.
Thank you.

Well, first I'd suggest using a module to do args on command line:
Getopt::Long for example.
But otherwise your problem is simpler - your shell is expanding the 'file*' before perl gets it. (shell glob is getting there first).
If you do this with:
-f 'file*'
then it'll work properly. You should be able to see this - for example - if you just:
use Data::Dumper;
print Dumper \#ARGV;
I expect you'll see a much longer list than you thought.
However, I'd also point out - perl has a really nice feature you may be able to use (depending what you're doing with your files).
You can use <>, which automatically opens and reads all files specified on command line (in order).

Since your shell is already expanding the glob files* into a list of filenames, that's what the Perl program gets.
$ perl -E 'say #ARGV' files*
files1files2files3
There's no need to do that in Perl, if your shell can do it for you. If all you want is the filenames in an array, you already have #ARGV which contains those.

Related

Error while running sed command in perl cript

I am trying to run the following command in perl script :
#!/usr/bin/perl
my $cmd3 =`sed ':cycle s/^\(\([^,]*,\)\{0,13\}[^,|]*\)|[^,]*/\1/;t cycle' file1 >file2`;
system($cmd3);
but is not producing any output nor any error.
Although when I am running the command from command line it is working perfectly and gives desired output. Can you guys please help what I am doing wrong here ?
Thanks
system() doesn't return the output, just the exit status.
To see the output, print $cmd3.
my $cmd3 = `sed ':cycle s/^\(\([^,]*,\)\{0,13\}[^,|]*\)|[^,]*/\1/;t cycle' file1 >file2`;
print "$cmd3\n";
Edit:
If you want to check for exceptional return values, use CPAN module IPC::System::Simple:
use IPC::System::Simple qw(capture);
my $result = capture("any-command");
Running sed from inside Perl is just insane.
#!/usr/bin/perl
open (F, '<', "file1") or die "$O: Could not open file1: $!\n";
while (<F>) {
1 while s/^(([^,]*,){0,13}[^,|]*)\|[^,]*/$1/;
print;
}
Notice how Perl differs from your sed regex dialect in that grouping parentheses and alternation are unescaped, whereas a literal round parenthesis or pipe symbol needs to be backslash-escaped (or otherwise made into a literal, such as by putting it in a character class). Also, the right-hand side of the substitution prefers $1 (you will get a warning if you use warnings and have \1 in the substitution; technically, at this level, they are equivalent).
man perlrun has a snippet explaining how to implement the -i option inside a script if you really need that, but it's rather cumbersome. (Search for the first occurrence of "LINE:" which is part of the code you want.)
However, if you want to modify file1 in-place, and you pass it to your Perl script as its sole command-line argument, you can simply say $^I = 1; (or with use English; you can say $INPLACE_EDIT = 1;). See man perlvar.
By the way, the comment that your code "isn't producing any output" isn't entirely correct. It does what you are asking it to; but you are apparently asking for the wrong things.
Quoting a command in backticks executes that command. So
my $cmd3 = `sed ... file1 >file2`;
runs the sed command in a subshell, there and then, with input from file1, and redirected into file2. Because of the redirection, the output from this pipeline is nothing, i.e. an empty string "", which is assigned to $cmd3, which you then completely superfluously attempt to pass to system.
Maybe you wanted to put the sed command in regular quotes instead of backticks (so that the sed command line would be the value of $cmd3, which it then makes sense to pass to system). But because of the redirection, it would still not produce any visible output; it would create file2 containing the (possibly partially substituted) text from file1.

Perl Porter Stemmer

I was checking this porter stemmer. Below they said I should change my first line. To what exactly I tried every thing but the stemmer ain't working. What a good example might be?
#!/usr/local/bin/perl -w
#
# Perl implementation of the porter stemming algorithm
# described in the paper: "An algorithm for suffix stripping, M F Porter"
# http://www.muscat.com/~martin/stem.html
#
# Daniel van Balen (vdaniel#ldc.usb.ve)
#
# October-1999
#
# To Use:
#
# Put the line "use porter;" in your code. This will import the subroutine
# porter into your current name space (by default this is Main:: ). Make
# sure this file, "porter.pm" is in your #INC path (it includes the current
# directory).
# Afterwards use by calling "porter(<word>)" where <word> is the word to strip.
# The stripped word will be the returned value.
#
# REMEMBER TO CHANGE THE FIRST LINE TO POINT TO THE PATH TO YOUR PERL
# BINARY
#
As A code I am writing what follows:
use Lingua::StopWords qw(getStopWords);
use Main::porter;
my $stopwords = getStopWords('en');
#stopwords = grep { $stopwords->{$_} } (keys %$stopwords);
chdir("c:/perl/input");
#files = <*>;
foreach $file (#files)
{
open (input, $file);
while (<input>)
{
open (output,">>c:/perl/normalized/".$file);
chomp;
porter<$_>;
for my $stop (#stopwords)
{
s/\b\Q$stop\E\b//ig;
}
$_ =~s/<[^>]*>//g;
$_ =~ s/[[:punct:]]//g;
print output "$_\n";
}
}
close (input);
close (output);
The code gives no errors except it is not stemming anything!!!
That comment block is full of incorrect advice.
A #! line in a .pm file has no effect. It's a common mistake. The #! line tells Unix which interpreter to run the program with if and only if you run the file as a command line program.
./somefile # uses #! to determine what to run somefile with
/usr/bin/perl somefile # runs somefile with /usr/bin/perl regardless of #!
The #! line does nothing in a module, a .pm file which you use. Perl is already running at that point. The line is nothing but a comment.
The second problem is that your default namespace is main not Main. Casing matters.
Moving on to your code, use Main::porter; should not work. It should be use porter. You should get an error message like Can't locate Main/porter.pm in #INC (#INC contains: ...). If that code runs, perhaps you moved porter.pm into a Main/ directory? Move it out, it will confuse the importing of the porter function.
porter<$_>; says "try to read a line from the filehandle $_ and pass that into porter". $_ isn't a filehandle, it's a line from the file you just opened. You want porter($_) to pass the line into the porter function. If you turn on warnings (add use warnings to the top of your script) Perl will warn you about mistakes like that.
You'll also presumably want to do something with the return value from porter, otherwise it will truly do nothing. my #whatever_porter_returns = porter($_).
Likely one or more of your chdir or opens have silently failed so your program may have no input. Unfortunately, Perl does not let you know when this happens, you have to check. Normally you add an or die $! after the function to check for the error. This is busy work and often one forgets, instead you can use autodie which will automatically produce an error if any system calls like chdir or open fail.
With that stuff fixed your code should work, or at least produce useful error messages.
Finally, there are many stemming modules on CPAN which are likely to be higher quality than the one you've found with documentation and tests and updates and all that. Lingua::Stem and Text::English specifically use the porter algorithm. You might want to give those a shot.

What's the use of <> in Perl?

What's the use of <> in Perl. How to use it ?
If we simply write
<>;
and
while(<>)
what is that the program doing in both cases?
The answers above are all correct, but it might come across more plainly if you understand general UNIX command line usage. It is very common to want a command to work on multiple files. E.g.
ls -l *.c
The command line shell (bash et al) turns this into:
ls -l a.c b.c c.c ...
in other words, ls never see '*.c' unless the pattern doesn't match. Try this at a command prompt (not perl):
echo *
you'll notice that you do not get an *.
So, if the shell is handing you a bunch of file names, and you'd like to go through each one's data in turn, perl's <> operator gives you a nice way of doing that...it puts the next line of the next file (or stdin if no files are named) into $_ (the default scalar).
Here is a poor man's grep:
while(<>) {
print if m/pattern/;
}
Running this script:
./t.pl *
would print out all of the lines of all of the files that match the given pattern.
cat /etc/passwd | ./t.pl
would use cat to generate some lines of text that would then be checked for the pattern by the loop in perl.
So you see, while(<>) gets you a very standard UNIX command line behavior...process all of the files I give you, or process the thing I piped to you.
<>;
is a short way of writing
readline();
or if you add in the default argument,
readline(*ARGV);
readline is an operator that reads a line from the specified file handle. Reading from the special file handle ARGV will read from STDIN if #ARGV is empty or from the concatenation of the files named by #ARGV if it's not.
As for
while (<>)
It's a syntax error. If you had
while (<>) { ... }
it get rewritten to
while (defined($_ = <>)) { ... }
And as previously explained, that means the same as
while (defined($_ = readline(*ARGV))) { ... }
That means it will read lines from (previously explained) ARGV until there are no more lines to read.
It is called the diamond operator and feeds data from either stdin if ARGV is empty or each line from the files named in ARGV. This webpage http://docstore.mik.ua/orelly/perl/learn/ch06_02.htm explains it very well.
In many cases of programming with syntactical sugar like this, Deparse of O is helpful to find out what's happening:
$ perl -MO=Deparse -e 'while(<>){print 42}'
while (defined($_ = <ARGV>)) {
print 42;
}
-e syntax OK
Quoting perldoc perlop:
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk, and any other Unix filter program that takes
a list of filenames, doing the same to each line of input from all of
them. Input from <> comes either from standard input, or from each
file listed on the command line.
it takes the STDIN standard input:
> cat temp.pl
#!/usr/bin/perl
use strict;
use warnings;
my $count=<>;
print "$count"."\n";
>
below is the execution:
> temp.pl
3
3
>
so as soon as you execute the script it will wait till the user gives some input.
after 3 is given as input,it stores that value in $count and it prints the value in the next statement.

Perl system + split + array

my name is luis, live in arg.
i have a problem, which can not solve.
**IN BASH**
pwd
/home/labs-perl
ls
file1.pl file2.pl
**IN PERL**
my $ls = exec("ls");
my #lsarray = split(/\s+/, $ls);
print "$lsarray[1]\n"; #how this, i need the solution. >> file.pl
file1.pl file2.pl # but this is see in shell.
The output you see is not from the print statement, it is the console output of ls. To get the ls output into a variable, use backticks:
my $ls = `ls`;
my #lsarray = split(/\s+/, $ls);
print "$lsarray[1]\n";
This is because exec does not return, the statements after it are not executed. From perldoc:
The exec function executes a system command and never returns; use
system instead of exec if you want it to return. It fails and returns
false only if the command does not exist and it is executed directly
instead of via your system's command shell
But using system command will not help you as it does not allow output capturing, hence, the backticks. However, using glob functions is better:
my #arr = glob("*");
print $arr[1], "\n";
Also, perl array indices start at 0, not 1. To get file1.pl you should use print "$lsarray[0]\n".
It is bad practice to use the shell when you can write something within Perl.
This program displays what I think you want.
chdir '/home/labs-perl' or die $!;
my #dir = glob '*';
print "#dir\n";
Edit
I have just understood better what you need from perreal's post.
To display the first file in the current working directory, just write
print((glob '*')[0], "\n");
print <*> // die("No file found\n"), "\n";
(Though using an iterator in scalar context should usually be avoided if the script will be doing anything further.)

Can I obtain values from a perl script using a system call from the middle of another perl script?

I'm trying to modify a script that someone else has written and I wanted to keep my script separate from his.
The script I wrote ends with a print line that outputs all relevant data separated by spaces.
Ex: print "$sap $stuff $more_stuff";
I want to use this data in the middle of another perl script and I'm not sure if it's possible using a system call to the script I wrote.
Ex: system("./sap_calc.pl $id"); #obtain printed data from sap_calc.pl here
Can this be done? If not, how should I go about this?
Somewhat related, but not using system():
How can I get one Perl script to see variables in another Perl script?
How can I pass arguments from one Perl script to another?
You're looking for the "backtick operator."
Have a look at perlop, Section "Quote-like operators".
Generally, capturing a program's output goes like this:
my $output = `/bin/cmd ...`;
Mind that the backtick operator captures STDOUT only. So in order to capture everything (STDERR, too) the commands needs to be appended with the usual shell redirection "2>&1".
If you want to use the data printed to stdout from the other script, you'd need to use backticks or qx().
system will only return the return value of the shell command, not the actual output.
Although the proper way to do this would be to import the actual code into your other script, by building a module, or simply by using do.
As a general rule, it is better to use all perl solutions, than relying on system/shell as a way of "simplifying".
myfile.pl:
sub foo {
print "Foo";
}
1;
main.pl:
do 'myfile.pl';
foo();
perldoc perlipc
Backquotes, like in shell, will yield the standard output of the command as a string (or array, depending on context). They can more clearly be written as the quote-like qx operator.
#lines = `./sap_calc.pl $id`;
#lines = qx(./sap_calc.pl $id);
$all = `./sap_calc.pl $id`;
$all = qx(./sap_calc.pl $id);
open can also be used for streaming instead of reading into memory all at once (as qx does). This can also bypass the shell, which avoids all sorts of quoting issues.
open my $fh, '-|', './sap_calc.pl', $id;
while (readline $fh) {
print "read line: $_";
}