Perl script, parse text file between words - perl

I have a text file that looks like this:
... //John/box/sandbox/users/abc/project/build/file2
... //John/box/sandbox/users/cde/project/build/file1
... //John/box/sandbox/users/hdf/project/config/file
Using a Perl script, how can I parse this file so that my final output is:
//John/box/sandbox/users/abc/project/
//John/box/sandbox/users/cde/project/
//John/box/sandbox/users/hdf/project/
Basically my ultimate goal is to search for "//" and "project" on the same line and then take everything between them.
Thanks for the fast response, Both doesn't seems to work for me
I'm using perl 5.8.3 build 809
perl -nle 'print $1 if m#(//.*project/)#;' output.txt
use FileHandle;
use Env;
use Tk;
use File::Copy;
open(DAT, "output.txt") || die("Could not open file!");
my $input = <DAT>;
while (<$input>){
chomp;
print "$1\n" if ($_ =~ /(^\/\/.*project\/)/);
}
Everyone thank you for your help. It worked fine, i had to remove ^.
For future questions i will add my work, sorry this is my first question. Human make mistakes :)

my $infile = 'in.txt';
open my $input, '<', $infile or die "Can't open to $infile: $!";
while (<$input>){
chomp;
print "$1\n" if ($_ =~ /(\/\/.*project\/)/);
}

This is simple enough to do as a command-line filter:
perl -nle'print $1 if m#(//.*project/)#;' output.txt

Related

How to extract a specific string using perl?

I have set of strings say "-f /path/filename1.f", "-f $path/filename2.f", etc in a single file file.f I want to read file.f and extract /path/filename1.f, $path/filename2.f, etc in another file.
I tried finding solution online but it looks like a mess.
Is there any clean and simple solution there for these kind of simple pattern searching?
below is the requirement
Example,
file.f (input file to perl script)
-f /path/filename1.f
-f $path1/filename2.f
-f /path/filename3.f
-f $path2/filename4.f
outputfile.f
/path/filename1.f
$path1/filename2.f
/path/filename3.f
$path2/filename4.f
Basically I just want path string from the file.f
Some perl code to solve your problem:
use strict;
use warnings;
open my $fhi, "<", "file.f" or die "Error: $!";
open my $fho, ">", "output.f" or die "Error: $!";
while( <$fhi> ) { # Read each line in $_ variable
s/^-f //; # Remove "-f " at the beginning of $_
print $fho $_; # print $_ to output.f file
}
close $fhi;
close $fho;
The simplest way is using cut:
cut -f2 -d’ ‘ input_file > output_file
Or you can use Perl:
perl -lane ‘print $F[1]’ input_file > output_file
These solutions extract the second field of the input and print it.
Look into the below solution -
Here everything after -f will be taken out.
#!/usr/bin/perl
use strict;
use warnings;
open(FILE,"<file.f");
while(<FILE>)
print "$1\n" if($_ =~ /^-f\s(.*)/);
}

I have a file that I want to split using pipe as delimiter. How can I read the file using Perl?

Here is a shell script reading the file.
#!/bin/sh
procDate=$1
echo "Date $procDate"
file=`cat filename_$procDate.txt`
echo "$file"
I want to convert it to Perl and use the split operator with pipe | as delimiter.
It's far from clear from your question what it is that you want to do with these fields once you have split them
Your own shell script uses cat to copy the entire contents of your file into $file, but that's unlikely to be what you need to do
A very generalised Perl program would look like this
use strict;
use warnings 'all';
my ($procDate) = #ARGV;
print "Date $procDate\n";
open my $fh, '<', "filename_$procDate.txt" or die $!;
while ( <$fh> ) {
chomp;
my #fields = split /\|/;
# do something with #fields, for instance
print "#fields\n";
}
That code splits each line on pipe | characters, puts the list of substrings in #fields and then prints it separated by spaces. But I can't guess what more you might want to do?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
open(FILE, "<filename_$procDate.txt") or die "Couldn't open file filename_$procDate.txt, $!";
while ( my $line = <FILE> ) {
print "Line content is $line\n";
my #line_content = split(/\|/, $line);
print Dumper (\#line_content);
}
close (FILE);

splitting a large file into small files based on column value in perl

I am trying to split up a large file (having around 17.6 million data) into 6-7 small files based on the column value.Currently, I am using sql bcp utility to dump in all data into one table and creating seperate files using bcp out utility.
But someone suggested me to use Perl as it would be more faster and you don't need to create a table for that.As I am not a perl guy. I am not sure how to do it in perl.
Any help..
INPUT file :
inputfile.txt
0010|name|address|city|.........
0020|name|number|address|......
0030|phone no|state|street|...
output files:
0010.txt
0010|name|address|city|.........
0020.txt
0020|name|number|address|......
0030.txt
0030|phone no|state|street|...
It is simplest to keep a hash of output file handles, keyed by the file name. This program shows the idea. The number at the start of each record is used to create the name of the file where it belongs, and file of that name is opened unless we already have a file handle for it.
All of the handles are closed once all of the data has been processed. Any errors are caught by use autodie, so explicit checking of the open, print and close calls is unnecessary.
use strict;
use warnings;
use autodie;
open my $in_fh, '<', 'inputfile.txt';
my %out_fh;
while (<$in_fh>) {
next unless /^(\d+)/;
my $filename = "$1.txt";
open $out_fh{$filename}, '>', $filename unless $out_fh{$filename};
print { $out_fh{$filename} } $_;
}
close $_ for values %out_fh;
Note close caught me out here because, unlike most operators that work on $_ if you pass no parameters, a bare close will close the currently selected file handle. That is a bad choice IMO, but it's way to late to change it now
17.6 million rows is going to be a pretty large file, I'd imagine. It'll still be slow with perl to process.
That said, you're going to want something like the below:
use strict;
use warnings;
my $input = 'FILENAMEHERE.txt';
my %results;
open(my $fh, '<', $input) or die "cannot open input file: $!";
while (<$fh>) {
my ($key) = split '|', $_;
my $array = $results{$key} || [];
push $array, $_;
$results{$key} = $array;
}
for my $filename (keys %results) {
open(my $out, '>', "$filename.txt") or die "Cannot open output file $out: $!";
print $out, join "\n", $results{$filename};
close($out);
}
I haven't explicitly tested this, but it should get you going in the right direction.
$ perl -F'|' -lane '
$key = $F[0];
$fh{$key} or open $fh{$key}, ">", "$key.txt" or die $!;
print { $fh{$key} } $_
' inputfile.txt
perl -Mautodie -ne'
sub out { $h{$_[0]} ||= open(my $f, ">", "$_[0].txt") && $f }
print { out($1) } $_ if /^(\d+)/;
' file

foreach and special variable $_ not behaving as expected

I'm learning Perl and wrote a small script to open perl files and remove the comments
# Will remove this comment
my $name = ""; # Will not remove this comment
#!/usr/bin/perl -w <- wont remove this special comment
The name of files to be edited are passed as arguments via terminal
die "You need to a give atleast one file-name as an arguement\n" unless (#ARGV);
foreach (#ARGV) {
$^I = "";
(-w && open FILE, $_) || die "Oops: $!";
/^\s*#[^!]/ || print while(<>);
close FILE;
print "Done! Please see file: $_\n";
}
Now when I ran it via Terminal:
perl removeComments file1.pl file2.pl file3.pl
I got the output:
Done! Please see file:
This script is working EXACTLY as I'm expecting but
Issue 1 : Why $_ didn't print the name of the file?
Issue 2 : Since the loop runs for 3 times, why Done! Please see file: was printed only once?
How you would write this script in as few lines as possible?
Please comment on my code as well, if you have time.
Thank you.
The while stores the lines read by the diamond operator <> into $_, so you're writing over the variable that stores the file name.
On the other hand, you open the file with open but don't actually use the handle to read; it uses the empty diamond operator instead. The empty diamond operator makes an implicit loop over files in #ARGV, removing file names as it goes, so the foreach runs only once.
To fix the second issue you could use while(<FILE>), or rewrite the loop to take advantage of the implicit loop in <> and write the entire program as:
$^I = "";
/^\s*#[^!]/ || print while(<>);
Here's a more readable approach.
#!/usr/bin/perl
# always!!
use warnings;
use strict;
use autodie;
use File::Copy;
# die with some usage message
die "usage: $0 [ files ]\n" if #ARGV < 1;
for my $filename (#ARGV) {
# create tmp file name that we are going to write to
my $new_filename = "$filename\.new";
# open $filename for reading and $new_filename for writing
open my $fh, "<", $filename;
open my $new_fh, ">", $new_filename;
# Iterate over each line in the original file: $filename,
# if our regex matches, we bail out. Otherwise we print the line to
# our temporary file.
while(my $line = <$fh>) {
next if $line =~ /^\s*#[^!]/;
print $new_fh $line;
}
close $fh;
close $new_fh;
# use File::Copy's move function to rename our files.
move($filename, "$filename\.bak");
move($new_filename, $filename);
print "Done! Please see file: $filename\n";
}
Sample output:
$ ./test.pl a.pl b.pl
Done! Please see file: a.pl
Done! Please see file: b.pl
$ cat a.pl
#!/usr/bin/perl
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
$ cat a.pl.bak
#!/usr/bin/perl
# this doesn't do much
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
Its not safe to use multiple loops and try to get the right $_. The while Loop is killing your $_. Try to give your files specific names inside that loop. You can do this with so:
foreach my $filename(#ARGV) {
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
or that way:
foreach (#ARGV) {
my $filename = $_;
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
Please never use barewords for filehandles and do use a 3-argument open.
open my $FILE, '<', $filename — good
open FILE $filename — bad
Simpler solution: Don't use $_.
When Perl was first written, it was conceived as a replacement for Awk and shell, and Perl heavily borrowed from that syntax. Perl also for readability created the special variable $_ which allowed you to use various commands without having to create variables:
while ( <INPUT> ) {
next if /foo/;
print OUTPUT;
}
The problem is that if everything is using $_, then everything will effact $_ in many unpleasant side effects.
Now, Perl is a much more sophisticated language, and has things like locally scoped variables (hint: You don't use local to create these variables -- that merely gives _package variables (aka global variables) a local value.)
Since you're learning Perl, you might as well learn Perl correctly. The problem is that there are too many books that are still based on Perl 3.x. Find a book or web page that incorporates modern practice.
In your program, $_ switches from the file name to the line in the file and back to the next file. It's what's confusing you. If you used named variables, you could distinguished between files and lines.
I've rewritten your program using more modern syntax, but your same logic:
use strict;
use warnings;
use autodie;
use feature qw(say);
if ( not $ARGV[0] ) {
die "You need to give at least one file name as an argument\n";
}
for my $file ( #ARGV ) {
# Remove suffix and copy file over
if ( $file =~ /\..+?$/ ) {
die qq(File "$file" doesn't have a suffix);
}
my ( $output_file = $file ) =~ s/\..+?$/./; #Remove suffix for output
open my $input_fh, "<", $file;
open my $output_fh, ">", $output_file;
while ( my $line = <$input_fh> ) {
print {$output_fh} $line unless /^\s*#[^!]/;
}
close $input_fh;
close $output_fh;
}
This is a bit more typing than your version of the program, but it's easier to see what's going on and maintain.

problems with replacing first line of file using perl

I have a file that looks like this:
I,like
blah...
I want to replace only the first line with 'i,am' to get:
i,am
blah...
These are big files, so this is what I did (based on this):
open(FH, "+< input.txt") or die "FAIL!";
my $header = <FH>;
chop($header);
$header =~ s/I,like/i,am/g;
seek FH, 0, 0; # go back to start of file
printf FH $header;
close FH;
However, I get this when I run it:
i,amke
blah...
I looks like the 'ke' from like is still there. How do I get rid of it?
What I would do is probably something like this:
perl -i -pe 'if ($. == 1) { s/.*/i,am/; }' yourfile.txt
Which will only affect the first line, when the line counter for the current file handle $. is equal to 1. The regex will replace everything except newline. If you need it to match your specific line, you can include that in the if-statement:
perl -i -pe 'if ($. == 1 and /^I,like$/) { s/.*/i,am/; }' yourfile.txt
You can also look into Tie::File, which allows you to treat the file like an array, which means you can simply do $line[0] = "i,am\n". It is mentioned that there may be performance issues with this module, however.
If the replacement has a different length than the original, you cannot use this technique. You can for example create a new file and then rename it to the original name.
open my $IN, '<', 'input.txt' or die $!;
open my $OUT, '>', 'input.new' or die $!;
my $header = <$IN>;
$header =~ s/I,like/i,am/g;
print $OUT $header;
print $OUT $_ while <$IN>; # Just copy the rest.
close $IN;
close $OUT or die $!;
rename 'input.new', 'input.txt' or die $!;
I'd just use Tie::File:
#! /usr/bin/env perl
use common::sense;
use Tie::File;
sub firstline {
tie my #f, 'Tie::File', shift or die $!;
$f[0] = shift;
untie #f;
}
firstline $0, '#! ' . qx(which perl);
Usage:
$ ./example
$ head -2 example
#! /bin/perl
use common::sense;