I have a file containing the following content 1000 line in the following format:
abc def ghi gkl
How can I write a Perl script to print only the first and the third fields?
abc ghi
perl -lane 'print "#F[0,2]"' file
If no answer is good for you yet, I'll try to get the bounty ;-)
#!/usr/bin/perl
# Lines beginning with a hash (#) denote optional comments,
# except the first line, which is required,
# see http://en.wikipedia.org/wiki/Shebang_(Unix)
use strict; # http://perldoc.perl.org/strict.html
use warnings; # http://perldoc.perl.org/warnings.html
# http://perldoc.perl.org/perlsyn.html#Compound-Statements
# http://perldoc.perl.org/functions/defined.html
# http://perldoc.perl.org/functions/my.html
# http://perldoc.perl.org/perldata.html
# http://perldoc.perl.org/perlop.html#I%2fO-Operators
while (defined(my $line = <>)) {
# http://perldoc.perl.org/functions/split.html
my #chunks = split ' ', $line;
# http://perldoc.perl.org/functions/print.html
# http://perldoc.perl.org/perlop.html#Quote-Like-Operators
print "$chunks[0] $chunks[2]\n";
}
To run this script, given that its name is script.pl, invoke it as
perl script.pl FILE
where FILE is the file that you want to parse. See also http://perldoc.perl.org/perlrun.html. Good luck! ;-)
That's really kind of a waste for something as powerful as perl, since you can do the same thing in one trivial line of awk.
awk '{ print $1 $3 }'
while ( <> ) {
my #fields = split;
print "#fields[0,2]\n";
}
and just for variety, on Windows:
C:\Temp> perl -pale "$_=qq{#F[0,2]}"
and on Unix
$ perl -pale '$_="#F[0,2]"'
As perl one-liner:
perl -ane 'print "#F[0,2]\n"' file
Or as executable script:
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, '<', 'file' or die "Can't open file: $!\n";
while (<$fh>) {
my #fields = split;
print "#fields[0,2]\n";
}
Execute the script like this:
perl script.pl
or
chmod 755 script.pl
./script.pl
I'm sure I shouldn't get the bounty since the question asks for the result to be given in perl, but anyway:
In bash/ksh/ash/etc:
cut -d " " -f 1,3 "file"
In Windows/DOS:
for /f "tokens=1-4 delims= " %i in (file) do (echo %i %k)
Advantages: like others said, no need to learn Pearl, Awk, nothing, just knowing some tools. The result of both calls can be saved to the disk by using the ">" and the ">>" operator.
while(<>){
chomp;
#s = split ;
print "$s[0] $s[2]\n";
}
please start to go through the documentation as well
#!/usr/bin/env perl
open my$F, "<", "file" or die;
print join(" ",(split)[0,2])."\n" while(<$F>);
close $F
One easy way is:
(split)[0,2]
Example:
$_ = 'abc def ghi gkl';
print( (split)[0,2] , "\n");
print( join(" ", (split)[0,2] ),"\n");
Command line:
perl -e '$_="abc def ghi gkl";print(join(" ",(split)[0,2]),"\n")'
Related
In AWK, it is common to see this kind of structure for a script that runs on two files:
awk 'NR==FNR { print "first file"; next } { print "second file" }' file1 file2
Which uses the fact that there are two variables defined: FNR, which is the line number in the current file and NR which is the global count (equivalent to Perl's $.).
Is there something similar to this in Perl? I suppose that I could maybe use eof and a counter variable:
perl -nE 'if (! $fn) { say "first file" } else { say "second file" } ++$fn if eof' file1 file2
This works but it feels like I might be missing something.
To provide some context, I wrote this answer in which I manually define a hash but instead, I would like to populate the hash from the values in the first file, then do the substitutions on the second file. I suspect that there is a neat, idiomatic way of doing this in Perl.
Unfortunately, perl doesn't have a similar NR==FNR construct to differentiate between two files. What you can do is use the BEGIN block to process one file and main body to process the other.
For example, to process a file with the following:
map.txt
a=apple
b=ball
c=cat
d=dog
alpha.txt
f
a
b
d
You can do:
perl -lne'
BEGIN {
$x = pop;
%h = map { chomp; ($k,$v) = split /=/; $k => $v } <>;
#ARGV = $x
}
print join ":", $_, $h{$_} //= "Not Found"
' map.txt alpha.txt
f:Not Found
a:apple
b:ball
d:dog
Update:
I gave a pretty simple example, and now when I look at that, I can only say TIMTOWDI since you can do:
perl -F'=' -lane'
if (#F == 2) { $h{$F[0]} = $F[1]; next }
print join ":", $_, $h{$_} //= "Not Found"
' map.txt alpha.txt
f:Not Found
a:apple
b:ball
d:dog
However, I can say for sure, there is no NR==FNR construct for perl and you can probably process them in various different ways based on the files.
It looks like what you're aiming for is to use the same loop for reading both files, and have a conditional inside the loop that chooses what to do with the data. I would avoid that idea because you are hiding what two distinct processes in the same stretch of code, making it less than clear what is going on.
But, in the case of just two files, you could compare the current file with the first element of #ARGV, like this
perl -nE 'if ($ARGV eq $ARGV[0]) { say "first file" } else { say "second file" }' file1 file2
Forgetting about one-line programs, which I hate with a passion, I would just explicitly open $ARGV[0] and $ARGV[1]. Perhaps naming them like this
use strict;
use warnings;
use 5.010;
use autodie;
my ($definitions, $data) = #ARGV;
open my $fh, '<', $definitions;
while (<$fh>) {
# Build hash
}
open $fh, '<', $data;
while (<$fh>) {
# Process file
}
But if you want to avail yourself of the automatic opening facilities then you can mess with #ARGV like this
use strict;
use warnings;
my ($definitions, $data) = #ARGV;
#ARGV = ($definitions);
while (<>) {
# Build hash
}
#ARGV = ($data);
while (<>) {
# Process file
}
You can also create your own $fnr and compare to $..
Given:
var='first line
second line'
echo "$var" >f1
echo "$var" >f2
echo "$var" >f3
You can create a pseudo FNR by setting a variable in the BEGIN block and resetting at each eof:
perl -lnE 'BEGIN{$fnr=1;}
if ($fnr==$.) {
say "first file: $ARGV, $fnr, $. $_";
}
else {
say "$ARGV, $fnr, $. $_";
}
eof ? $fnr=1 : $fnr++;' f{1..3}
Prints:
first file: f1, 1, 1 first line
first file: f1, 2, 2 second line
f2, 1, 3 first line
f2, 2, 4 second line
f3, 1, 5 first line
f3, 2, 6 second line
Definitely not as elegant as awk but it works.
Note that Ruby has support for FNR==NR type logic.
I have two queries about the tie::file module
I have used the tie::file module to do a search on a 55 MB file and set an memory of 20 MB in tie::file. When i tried to grep on the tied array for a search string it's taking a lot of time. Is there any work around for it?
Can tie::file used for reading a binary file. The tied array is delimited by "\n". How do i use a tie::file to read an binary file? Could you pls paste me some sample code.
/home/a814899> perl -e 'print "x\n"x27 for 1..1024*1024;' >a
/home/a814899> echo "hello world" >> a
Using Unix grep
/home/a814899> time grep "hello " a
hello world
real 0m8.280s
user 0m8.129s
sys 0m0.139s
Using the regex
/home/a814899> (time perl -e 'while (<>) { if (/hello/) { print "hello world"} }' a)
hello world
real 0m51.316s
user 0m51.087s
sys 0m0.189s
Using Perl Grep
#!/usr/bin/perl
print "executing\n";
my $outputFileDir="/home/a814899";
my $sFileName="a";
open my $fh, "<", $outputFileDir . "/" . $sFileName or do {
print "Could not open the file";
};
print "success in open" . "\n";
my #out=grep {/hello world/} <$fh> ;
print "#out" ;
close($fh)
Yes.
This is how you probably did it using Tie::File:
$ (
time perl -MTie::File -e'
tie #a, "Tie::File", $ARGV[0];
for (#a) { if (/y/) { } }
' a
) 2>&1 | grep real
real 2m44.333s
This is the "workaround":
$ (
time perl -e'
while (<>) { if (/y/) { } }
' a
) 2>&1 | grep real
real 0m0.644s
The data file was created using
$ perl -E'say "x"x54 for 1..1024*1024;' >a
Tie::File doesn't read files; Tie::File provides a means of mapping lines of a file to array elements. Since "binary" files have no lines, accessing one using Tie::File wouldn't make any sense.
I have an input file as following. I need to break them into multiple files based on the columns 2,3&5. The file has more columns but i have used cut command to get only the required columns.
12,Accounts,India,free,Internal
13,Finance,China,used,Internal
16,Finance,China,free,Internal
12,HR,India,free,External
19,HR,China,used,Internal
33,Finance,Japan,free,Internal
39,Accounts,US,used,External
14,Accounts,Japan,used,External
11,Finance,India,used,External
11,HR,US,used,External
10,HR,India,used,External
Output files:
Accounts_India_Internal --
12,Accounts,India,free,Internal
Finance_China_Internal --
13,Finance,China,used,Internal
16,Finance,China,free,Internal
HR_India_External --
12,HR,India,free,External
10,HR,India,used,External
HR_China_Internal --
19,HR,China,used,Internal
and so on..
Please let me know how to achieve this.
As of now, I am thinking to sort the file based on these columns (2,3,5) and then run a loop on each record and start creating files. If a file does not exist, then create and add the record. Otherwise open the old file and add the record.
Is it possible to do this using shell scripting (bash)?
Is it possible to do this using shell scripting (bash)?
If you simply want to split the files based on fields 2, 3 and 5 you can do that quickly with awk:
awk -F, '{print >> $2"_"$3"_"$5}' infile.txt
That appends each line to a file whose name is made up of fields 2, 3 and 5.
Example:
[me#home]$ awk -F, '{print >> $2"_"$3"_"$5}' infile.txt
[me#home]$ cat Accounts_India_Internal
12,Accounts,India,free,Internal
[me#home]$ cat Finance_China_Internal
13,Finance,China,used,Internal
16,Finance,China,free,Internal
If you do want output sorted, you can first run the file through sort.
sort -k2,3 -k5,5 -t, infile.txt | awk -F, '{print >> $2"_"$3"_"$5}'
That sorts the lines on fields 2, 3, and 5 before passing them on to the awk command.
Do note that the we're appending to the files so if you repeat the command without deleting the output files, you'll end up with duplicate data in the output files. To address this, as well as include your additional requirements (using first line as header for all new files) as mentioned in the chat, see this solution.
I suggest you keep a hash of file handles keyed by their corresponding file names
This program demonstrates. The input file is expected as a parameter on the command line
use strict;
use warnings;
my %fh;
while (<>) {
chomp;
my $filename = join '_', (split /,/)[1,2,4];
if (not $fh{$filename}) {
open $fh{$filename}, '>', $filename or die "Unable to open '$filename' for output: $!";
print "$filename created\n";
}
print { $fh{$filename} } $_, "\n";
}
output
Accounts_India_Internal created
Finance_China_Internal created
HR_India_External created
HR_China_Internal created
Finance_Japan_Internal created
Accounts_US_External created
Accounts_Japan_External created
Finance_India_External created
HR_US_External created
Note: To use the code, simply change <DATA> to <> and use the file name as argument. The Data::Dumper print is there only for demonstration purposes and can also be removed.
use strict;
use warnings;
use Data::Dumper;
my %h;
while (<DATA>) {
chomp;
my #data = split /,/;
my $file = join "_", #data[1,2,4];
push #{$h{$file}}, $_;
}
print Dumper \%h;
__DATA__
12,Accounts,India,free,Internal
13,Finance,China,used,Internal
16,Finance,China,free,Internal
12,HR,India,free,External
19,HR,China,used,Internal
33,Finance,Japan,free,Internal
39,Accounts,US,used,External
14,Accounts,Japan,used,External
11,Finance,India,used,External
11,HR,US,used,External
10,HR,India,used,External
To print the files, you could use a subroutine like so:
for my $key (keys %h) {
print_file($key, $h{$key};
}
sub print_file {
my ($file, $data) = #_;
open my $fh, ">", $file or die $!;
print $fh "$_\n" for #$data;
}
save input text as foo, then:
cat foo | perl -nle '$k = join "_", (split ",", $_)[1,2,4]; $t{$k} = [#{$t{$k}}, $_]; END{for (keys %t){print join "\n", "$_ --", #{$t{$_}}, undef }}' | csplit -sz - '/^$/' {*}
I recorded some data on my laptop and because the OS system language is German it converted the decimal separator to a comma (didn't think of that at the time...).
The column separator (there are three columns in the text file) is a comma too and so I end up with six columns instead of three
Example.txt
4,0,5,0,6,0
should be
4.0, 5.0, 6.0
How can I loop through all files in a folder and replace every first, third and fifth comma with a point in all lines in my data-files? I would prefer a bash script (.sh) or possibly a perl solution
Or how about awk
for F in * ; do awk -F, 'BEGIN { OFS = "," } ; { print $1"."$2, $3"."$4, $5"."$6 } ' $F | sponge $F ; done
You need "moreutils" for sponge, by the way. And back up your files first!
Generally for csv parsing you should use Text::CSV, however for this correction task, a quick and dirty could be:
#!/usr/bin/perl
use strict;
use warnings;
my $output;
#onen my $out, '>', 'outfile.dat';
#open my $in, '<', 'infile.dat';
#while(<$in>){
while(<DATA>){
chomp;
my #fields = split ',';
while (#fields) {
$output .= shift(#fields) . '.' . shift(#fields);
$output .= ', ' if #fields;
}
$output .= "\n";
}
#print $out $output;
print $output;
__DATA__
4,0,5,0,6,0
4,0,5,0,6,0
of course you will read from a file rather than DATA and print to a new file presumably. I have added this real-world usage as comments.
Well I see lots of valid and good answers here, here's another.
perl -wpe 'my $i; s/,/($i^=1) ? "." : ","/ge'
Here /e means "execute the replacement part"; $i^=1 generates a 1,0,1,0...sequence, and x?y:z selects y or z based on x's value (i.e. if (x) {y} else {z})
Following perl script should help you.
perl -e '$a = $ARGV[0]; $a =~ s/(\d)\,(\d\,)?/$1\.$2/g; print $a' "4,0,5,0,6,0"
OUTPUT
4.0,5.0,6.0
In Perl, the necessary regex would be s/,([^,]*,?)/.$1/g. If you apply this to a string, it will replace the first comma with a period, preserve the next comma (if any), and then start looking for commas again after the second one.
So I'm trying to read in a config. file in Perl. The config file uses a trailing backslash to indicate a line continuation. For instance, the file might look like this:
=== somefile ===
foo=bar
x=this\
is\
a\
multiline statement.
I have code that reads in the file, and then processes the trailing backslash(es) to concatenate the lines. However, it looks like Perl already did it for me. For instance, the code:
open(fh, 'somefile');
#data = <fh>;
print join('', #data);
prints:
foo=bar
x=thisisamultiline statement
Lo and behold, the '#data = ;' statement appears to have already handled the trailing backslash!
Is this defined behavior in Perl?
I have no idea what you are seeing, but that is not valid Perl code and that is not a behavior in Perl. Here is some Perl code that does what you want:
#!/usr/bin/perl
use strict;
use warnings;
while (my $line = <DATA>) {
#collapse lines that end with \
while ($line =~ s/\\\n//) {
$line .= <DATA>;
}
print $line;
}
__DATA__
foo=bar
x=this\
is\
a\
multiline statement.
Note: If you are typing the file in on the commandline like this:
perl -ple 1 <<!
foo\
bar
baz
!
Then you are seeing the effect of your shell, not Perl. Consider the following counterexample:
printf 'foo\\\nbar\nbaz\n' | perl -ple 1
My ConfigReader::Simple module supports continuation lines in config files, and should handle your config if it's the format in your question.
If you want to see how to do it yourself, check out the source for that module. It's not a lot of code.
I don't know what exactly you are doing, but the code you gave us doesn't even run:
=> cat z.pl
#!/usr/bin/perl
fh = open('somefile', 'r');
#data = <fh>;
print join('', #data);
=> perl z.pl
Can't modify constant item in scalar assignment at z.pl line 2, near ");"
Execution of z.pl aborted due to compilation errors.
And if I change the snippet to be actual perl:
=> cat z.pl
#!/usr/bin/perl
open my $fh, '<', 'somefile';
my #data = <$fh>;
print join('', #data);
it clearly doesn't mangle the data:
=> perl z.pl
foo=bar
x=this\
is\
a\
multiline statement.