I want to split the large file in to small files by splitting at the specific line with the help of regex. Any help?
My code doing the job but it also creating a empty file.
#!/usr/local/lib/perl/5.14.2
open( INFILE, 'test.txt' );
#lines = <INFILE>;
$file = "outfile";
for ( $j = 0; $j <= $#lines; $j++ ) {
open( OUTFILE, ">", $file . $j );
$file_name = $file . $j;
#print "file is $file_name\n";
$i = 0;
while (#lines) {
$_ = shift #lines;
chomp;
$i++;
if ( $_ =~ /^###\s*(.*)\s*###/ && $i > 1 ) {
unshift #lines, "$_\n";
print "$filename\n";
last;
}
print OUTFILE "$_\n";
}
close(OUTFILE);
}
close(INFILE);
My input file contains :
-------------
### abcd hdkjfkdj ####
body 1 dsjklsjdfskl
### zyz fhid ###
abcdksdsd djnfkldsfmnsldk ;lkjfkl
---------------------------
it is creating 3 outfiles called outfile0,outfile1,outfile2. but outfile0 is empty I want to avoid this.
The way to fix it is to open the file only in response to the line being found. Your program will open a new file regardless and that's why it has an empty output file
Here is a rewrite that works. I've also removed the temporary #lines array
#!/usr/bin/perl
#
use warnings;
use strict;
open(my $file,"<", "test.txt") || die $!;
my $counter=1;
my $out;
while(<$file>) {
if (/###\s*(.*)\s*###/) {
open($out, ">", "outfile$counter") || warn "outfile$counter $!";
$counter++;
}
print $out $_ if $out;
}
If you want to use the material between the ### blocks as file titles, you can set the file name when you're doing the pattern match on the lines with ### blocks.
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, '<', 'my_file.txt' or die "Could not open file: $!";
# initialise a variable that will hold the output file handle
my $out;
while (<$fh>) {
# capture the title between the # signs
if (/##+ (.*?) ##+/) {
open $out, '>', $1.".txt" or die "Could not create file $1.txt: $!";
}
elsif ($out) {
print $out $_;
}
else {
# if $out is not set, we haven't yet encountered a title block
warn "Error: line found with no title block: $_";
}
}
Sample input:
Text files containing their own name
### questions-1 ####
Why are a motorcycle's front brakes more effective than back?
Is it possible to make a gradient follow a path in Illustrator?
Text files containing their own name
### questions-2 ###
Why does Yoda mourn the Jedi after order 66 is executed?
what are the standard gui elements called?
Flybe just cancelled my return flight. Will they refund that part of the trip?
### questions-3 ###
Merge two arrays of ElementModels?
Is this set open or closed?
Output: three files, questions-1.txt, questions-2.txt, questions-3.txt, containing the appropriate lines. e.g. questions-1.txt:
Why are a motorcycle's front brakes more effective than back?
Is it possible to make a gradient follow a path in Illustrator?
Text files containing their own name
You haven't stated whether you want the ### lines in the output or not, so I've left them off.
Depending on what OS you're on and what your potential file names contain, you may want to filter them and replace special characters with an underscore (or just remove the special characters).
Related
I'm using with Perl to open two text files, process them and then write the output to another file.
I have a file INPUT were every line is a customer. I will process each line into variables that will be used to substitute text in another file, TEMP. The result should be written into individual files for each customer, OUTPUT.
My program seems to be working on only the first file. The rest of the files remain empty with no output.
#!/usr/bin/perl -w
if ( $#ARGV < 0) {
print "Usage: proj5.pl <mm/dd/yyyy>\n";
exit;
}
my $date = $ARGV[0];
open(INFO, "p5Customer.txt") or die("Could not open p5Customer.txt file\n");
open(TEMP, "template.txt") or die("Could not open template.txt file\n");
my $directory = "Emails";
mkdir $directory unless(-e $directory);
foreach $info (<INFO>){
($email, $fullname, $title, $payed, $owed) = split /,/, $info;
next if($owed < $payed);
chomp($owed);
$filepath = "$directory/$email";
unless(open OUTPUT, '>>'.$filepath){
die "Unable to create '$filepath'\n";
}
foreach $detail (<TEMP>){
$detail =~ s/EMAIL/$email/g;
$detail =~ s/(NAME|FULLNAME)/$fullname/g;
$detail =~ s/TITLE/$title/g;
$detail =~ s/AMOUNT/$owed/g;
$detail =~ s{DATE}{$date}g;
print OUTPUT $detail;
}
close(OUTPUT);
}
close(INFO);
close(TEMP);
As has been said, you need to open your template file again each time you read from it. There's a bunch of other issues with your code too
Always use strict and use warnings 'all' and declare every variable with my as close as possible to where it is first used
$#ARGV is the index of the last element of #ARGV, so $#ARGV < 0 is much better written as #ARGV < 1
You should use lexical file handles, and the three-parameter form of open, so open(INFO, "p5Customer.txt") should be open my $info_fh, '<', "p5Customer.txt"
You should use while instead of for to read from a file
It is easier to use the default variable $_ for short loops
It is pointless to capture a substring in a regular expression if you're not going to use it, so (NAME|FULLNAME) should be NAME|FULLNAME
There is no point in closing input files before the end of your program
It is also much better to use an existing template system, such as
Template::Toolkit
This should work for you
#!/usr/bin/perl
use strict;
use warnings 'all';
if ( #ARGV < 1 ) {
print "Usage: proj5.pl <mm/dd/yyyy>\n";
exit;
}
my $date = $ARGV[0];
open my $info_fh, '<', 'p5Customer.txt' or die qq{Could not open "p5Customer.txt" file: $!};
my $directory = "Emails";
mkdir $directory unless -e $directory;
while ( <$info_fh> ) {
chomp;
my ($email, $fullname, $title, $payed, $owed) = split /,/;
next if $owed < $payed;
open my $template_fh, '<', 'template.txt' or die qq{Could not open "template.txt" file: $!};
my $filepath = "$directory/$email";
open my $out_fh, '>', $filepath or die qq{Unable to create "$filepath": $!};
while ( <$template_fh> ) {
s/EMAIL/$email/g;
s/FULLNAME|NAME/$fullname/g;
s/TITLE/$title/g;
s/AMOUNT/$owed/g;
s/DATE/$date/g;
print $out_fh $_;
}
close($out_fh);
}
Your problem is that the TEMP loop is inside the INPUT loop and so the TEMP loop will end while the INPUT loop is still on the first line of the INPUT file.
Best to store TEMP file data into a hash table and work on the TEMP hash table inside the INPUT loop.
Good luck.
My program is trying to search a string from multiple files in a directory. The code searches for single patterns like perl but fails to search a long string like Status Code 1.
Can you please let me know how to search for strings with multiple words?
#!/usr/bin/perl
my #list = `find /home/ad -type f -mtime -1`;
# printf("Lsit is $list[1]\n");
foreach (#list) {
# print("Now is : $_");
open(FILE, $_);
$_ = <FILE>;
close(FILE);
unless ($_ =~ /perl/) { # works, but fails to find string "Status Code 1"
print "found\n";
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh "My first report generated by perl";
close $fh;
} # end unless
} # end For
There are a number of problems with your code
You must always use strict and use warnings at the top of every Perl program. There is little point in delcaring anything with my without strict in place
The lines returned by the find command will have a newline at the end which must be removed before Perl can find the files
You should use lexical file handles (my $fh instead of FILE) and the three-parameter form of open as you do with your output file
$_ = <FILE> reads only the first line of the file into $_
unless ($_ =~ /perl/) is inverted logic, and there's no need to specify $_ as it is the default. You should write if ( /perl/ )
You can't use say unless you have use feature 'say' at the top of your program (or use 5.010, which adds all features available in Perl v5.10)
It is also best to avoid using shell commands as Perl is more than able to do anything that you can using command line utilities. In this case -f $file is a test that returns true if the file is a plain file, and -M $file returns the (floating point) number of days since the file's modification time
This is how I would write your program
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
for my $file ( glob '/home/ad/*' ) {
next unless -f $file and int(-M $file) == 1;
open my $fh, '<', $file or die $!;
while ( <$fh> ) {
if ( /perl/ ) {
print "found\n";
my $filename = 'report.txt';
open my $out_fh, '>>', $filename or die "Could not open file '$filename': $!";
say $fh "My first report generated by perl";
close $out_fh;
last;
}
}
}
it should have matched unless $_ contains text in different case.
try this.
unless($_ =~ /Status\s+Code\s+1/i) {
Change
unless ($_ =~ /perl/) {
to:
unless ($_ =~ /(Status Code 1)/) {
I am certain the above works, except it's case sensitive.
Since you question it, I rewrote your script to make more sense of what you're trying to accomplish and implement the above suggestion. Correct me if I am wrong, but you're trying to make a script which matches "Status Code 1" in a bunch of files where last modified within 1 day and print the filename to a text file.
Anyways, below is what I recommend:
#!/usr/bin/perl
use strict;
use warnings;
my $output_file = 'report.txt';
my #list = `find /home/ad -type f -mtime -1`;
foreach my $filename (#list) {
print "PROCESSING: $filename";
open (INCOMING, "<$filename") || die "FATAL: Could not open '$filename' $!";
foreach my $line (<INCOMING>) {
if ($line =~ /(Status Code 1)/) {
open( FILE, ">>$output_file") or die "FATAL: Could not open '$output_file' $!";
print FILE sprintf ("%s\n", $filename);
close(FILE) || die "FATAL: Could not CLOSE '$output_file' $!";
# Bail when we get the first match
last;
}
}
close(INCOMING) || die "FATAL: Could not close '$filename' $!";
}
I've got a script that reformats an input file and creates an output file. When I try to read that output file for the second part of the script, it doesn't work. However if I split the script into two parts it works fine and gives me the output that I need. I'm not a programmer and surprised I've got this far - I've been banging my head for days trying to resolve this.
My command for running it is this (BTW the temp.txt was just a brute force workaround for getting rid of the final comma to get my final output file - couldn't find another solution):
c:\perl\bin\perl merge.pl F146.sel temp.txt F146H.txt
Input looks like this (from another software package) ("F146.sel"):
/ Selected holes from the .\Mag_F146_Trimmed.gdb database.
"L12260"
"L12270"
"L12280"
"L12290"
Output looks like this (mods to the text: quotes removed, insert comma, concatenate into one line, remove the last comma) "F146H.txt":
L12260,L12270,L12280,L12290
Then I want to use this as input in the next part of the script, which basically inserts this output into a line of code that I can use in another software package (my "merge.gs" file). This is the output that I get if I split my script into two parts, but it just gives me a blank if I do it as one (see below).
CURRENT Database,"RAD_F146.gdb"
SETINI MERGLINE.OUT="DALL"
SETINI MERGLINE.LINES="L12260,L12270,L12280,L12290"
GX mergline.gx
What follows is my "merge.pl". What have I done wrong?
(actually, the question could be - what haven't I done wrong, as this is probably the most retarded code you've seen in a while. In fact, I bet some of you could get this entire operation done in 10-15 lines of code, instead of my butchered 90. Thanks in advance.)
# this reformats the SEL file to remove the first line and replace the " with nothing
$file = shift ;
$temp = shift ;
$linesH = shift ;
#open (Profiles, ">.\\scripts\\P2.gs")||die "couldn't open output .gs file";
open my $in, '<', $file or die "Can't read old file: Inappropriate I/O control operation";
open my $out, '>', $temp or die "Can't write new file: Inappropriate I/O control operation";
my $firstLine = 1;
while( <$in> )
{
if($firstLine)
{
$firstLine = 0;
}
else{
s/"L/L/g; # replace "L with L
s/"/,/g; # replace " with,
s|\s+||; # concatenates it all into one line
print $out $_;
}
}
close $out;
open (part1, "${temp}")||die "Couldn't open selection file";
open (part2, ">${linesH}")||die "Couldn't open selection file";
printitChomp();
sub printitChomp
{
print part2 <<ENDGS;
ENDGS
}
while ($temp = <part1> )
{
print $temp;
printit();
}
sub printit
{$string = substr (${temp}, 0,-1);
print part2 <<ENDGS;
$string
ENDGS
}
####Theoretically this creates the merge script from the output
####file from the previous loop. However it only seems to work
####if I split this into 2 perl scripts.
open (MergeScript, ">MergeScript.gs")||die "couldn't open output .gs file";
printitMerge();
open (SEL, "${linesH}")||die "Couldn't open selection file";
sub printitMerge
#open .sel file
{
print MergeScript <<ENDGS;
ENDGS
}
#iterate over required files
while ( $line = <SEL> ){
chomp $line;
print STDOUT $line;
printitLines();
}
sub printitLines
{
print MergeScript <<ENDGS;
CURRENT Database,"RAD_F146.gdb"
SETINI MERGLINE.OUT="DALL"
SETINI MERGLINE.LINES="${line}"
GX mergline.gx
ENDGS
}
so I think all you were really missing was close(part2); to allow it to be reopened as SEL..
#!/usr/bin/env perl
use strict;
use warnings;
# this reformats the SEL file to remove the first line and replace the " with nothing
my $file = shift;
my $temp = shift;
my $linesH = shift;
open my $in, '<', $file or die "Can't read old file: Inappropriate I/O control operation";
open my $out, '>', $temp or die "Can't write new file: Inappropriate I/O control operation";
my $firstLine = 1;
while (my $line = <$in>){
print "LINE: $line\n";
if ($firstLine){
$firstLine = 0;
} else {
$line =~ s/"L/L/g; # replace "L with L
$line =~ s/"/,/g; # replace " with,
$line =~ s/\s+//g; # concatenates it all into one line
print $out $line;
}
}
close $out;
open (part1, $temp) || die "Couldn't open selection file";
open (part2, ">", $linesH) || die "Couldn't open selection file";
while (my $temp_line = <part1>){
print "TEMPLINE: $temp_line\n";
my $string = substr($temp_line, 0, -1);
print part2 <<ENDGS;
$string
ENDGS
}
close(part2);
#### this creates the merge script from the output
#### file from the previous loop.
open (MergeScript, ">MergeScript.gs")||die "couldn't open output .gs file";
open (SEL, $linesH) || die "Couldn't open selection file";
#iterate over required files
while ( my $sel_line = <SEL> ){
chomp $sel_line;
print STDOUT $sel_line;
print MergeScript <<"ENDGS";
CURRENT Database,"RAD_F146.gdb"
SETINI MERGLINE.OUT="DALL"
SETINI MERGLINE.LINES="$sel_line"
GX mergline.gx
ENDGS
}
and one alternative way of doing it..
#!/usr/bin/env perl
use strict;
use warnings;
my $file = shift;
open my $in, '<', $file or die "Can't read old file: Inappropriate I/O control operation";
my #lines = <$in>; # read in all the lines
shift #lines; # discard the first line
my $line = join(',', #lines); # join the lines with commas
$line =~ s/[\r\n"]+//g; # remove the quotes and newlines
# print the line into the mergescript
open (MergeScript, ">MergeScript.gs")||die "couldn't open output .gs file";
print MergeScript <<"ENDGS";
CURRENT Database,"RAD_F146.gdb"
SETINI MERGLINE.OUT="DALL"
SETINI MERGLINE.LINES="$line"
GX mergline.gx
ENDGS
Can any body help me in reading all the files of particular format from the directory line by line and it should print on screen.
And my request is to include command lines in the program itself.
Then when ever simple I ran the program , it should display all the content of files.
Below is the program I wrote can any body help me please....
#!/usr/local/bin/perl
$filepath="/home/hclabv";
opendir(DIR,"$filepath");
#files=grep{/\.out$/} readdir(DIR);
closedir(DIR);
$c = 0;
for ($c=0 ;
while ($c <= #files)
{
$cmd = "Perlsc11 $files[$c]";
system($cmd);
if($#ARGV != 0) {
print STDERR "You must specify exactly one argument.\n";
exit 4;
}
else
{
print ("$files[$c]\n");
# Open the file.
open(INFILE, $ARGV[0]) or die "Cannot open $ARGV[0]: $!.\n";
while(my $l = <INFILE>) {
print $l;
}
close INFILE;
}
$c++;
}
You can use the glob feature in perl to get a list of filenames with the ".out" extension in the specified directory. You can then open these files one by one using a loop and print their contents to the screen. Here's the code,
# get all file-names with ".out" extension into array
my #outFiles = glob "/home/hclabv/*.out";
# loop through list of file names in array
foreach my $outFileName ( #outFiles )
{
# open the file for processing
open $outFile, '<', $outFileName or die "Unable to open file for reading : $!";
# iterate through each line in the file
while ( $line = <$outFile> )
{
# print the individual line
print "$line\n";
}
# close the file
close $outFile;
}
Please clarify what you mean by "including command lines", so we can help further.
I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.