Merge two files based on the starting of the line - perl

I want to merge two files into one using perl. Below are the sample files.
***FILE 1***
XDC123
XDC456
XDC678
BB987
BB654
*** FILE 2 ***
XDC876
XDC234
XDC789
BB456
BB678
And I want the merged file to look like:
***MERGED FILE***
XDC123
XDC456
XDC678
XDC876
XDC234
XDC789
BB987
BB654
BB456
BB678
For the above functionality I have written the below perl script snippet:
#!/usr/bin/env perl;
use strict;
use warnings;
my $file1 = 'C:/File1';
my $file2 = 'C:/File2';
my $file3 = 'C:/File3';
open( FILEONE, '<$file1' );
open( FILETWO, '<$file2' );
open( FILETHREE, '>$file3' );
while (<FILEONE>) {
if (/^XDC/) {
print FILETHREE;
}
if (/^BB/) {
last;
}
}
while (<FILETWO>) {
if (/^XDC/) {
print FILETHREE;
}
if (/^BB/) {
last;
}
}
while (<FILEONE>) {
if (/^BB/) {
print FILETHREE;
}
}
while (<FILETWO>) {
if (/^BB/) {
print FILETHREE;
}
}
close($file1);
close($file2);
close($file3);
But the merged file that is generated from the above code looks like:
***FILE 3***
XDC123
XDC456
XDC678
XDC876
XDC234
XDC789
BB654
BB678
The first line that starts from BB is missed out from both the files. Any help on this will be appreciated. Thank you.

The problem is, you iterate each file to the end, but never 'rewind' for if you're wanting to start over.
So your while ( <FILEONE> ) { line consumes (and discards) the first line that matches m/^BB/ - the last exits the "while" loop, but only after it's already read the line.
However that's assuming you get your open statements right, because:
open( FILEONE, '>$file1' );
Actually empties it, it doesn't read from it. So I am assuming you've transposed your code, and introduced new errors whilst doing so.
As a style point - you should really use 3 argument open, with lexical filehandles.
So instead:
#!/usr/bin/env perl;
use strict;
use warnings;
my $file1 = 'C:/File1';
my $file2 = 'C:/File2';
my $file3 = 'C:/File3';
my #lines;
foreach my $file ( $file1, $file2 ) {
open( my $input, '<', $file ) or die $!;
push( #lines, <$input> );
close($input);
}
open( my $output, '>', $file3 ) or die $!;
print {$output} sort #lines;
close($output)
(Although as noted in the comments - if that's all you want to do, the unix sort utility is probably sufficient).
However, if you need to preserve the numeric ordering, whilst sorting on the alphabetical, you need a slightly different data structure:
#!/usr/bin/env perl;
use strict;
use warnings;
my $file1 = 'C:/File1';
my $file2 = 'C:/File2';
my $file3 = 'C:/File3';
my %lines;
foreach my $file ( $file1, $file2 ) {
open( my $input, '<', $file ) or die $!;
while ( my $line = <$file> ) {
my ( $key ) = $line =~ m/^(\D+)/;
push %{$lines{$key}}, $line;
}
close($input);
}
open( my $output, '>', $file3 ) or die $!;
foreach my $key ( sort keys %lines ) {
print {$output} #{$lines{$key}};
}
close($output)

Related

What produces the white space in my perl programm?

As the title says, I have a program or better two functions to read and write a file either in an array or to one. But now to the mean reason why I write this: when running my test several times my test program that tests my functions produces more and more white space. Is there somebody that could explain my fail and correct me?
my code
Helper.pm:
#!/usr/bin/env perl
package KconfCtl::Helper;
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
return #rray;
}
sub array_to_file($$;$) {
my #rray = #{ shift() };
my $file = shift();
my $mode = shift();
$mode='>' if not $mode;
my $filestream;
if ( not defined $file ) {
$filestream = STDOUT;
}
else {
open( $filestream, $mode, $file ) or die("cant open $file: $!");
}
my $l = #rray; print $l,"\n";
foreach my $line (#rray) {
print $filestream "$line\n";
}
close($filestream);
}
1;
test_helper.pl:
use KconfCtl::Helper;
use strict;
my #t;
#t= KconfCtl::Helper::file_to_array("kconf.test");
#print #t;
my $t_index=#t;
#t[$t_index]="n";
KconfCtl::Helper::array_to_file(\#t, "kconf.test", ">");
the result after the first:
n
and the 2nd run:
n
n
When you read from a file, the data includes the newline characters at the end of each line. You're not stripping those off, but you are adding an additional newline when you output your data again. That means your file is gaining additional blank lines each time you read and write it
Also, you must always use strict and use warnings 'all' at the top of every Perl script; you should avoid using subroutine prototypes; and you should declare all of your variables as late as possible
Here's a more idiomatic version of your module code which removes the newlines on input using chomp. Note that you don't need the #! line on the module file as it won't be run from the command line, but you my want it on the program file. It's also more normal to export symbols from a module using the Exporter module so that you don't have to qualify the subroutine names by prefixing them with the full package name
use strict;
use warnings 'all';
package KconfCtl::Helper;
sub file_to_array {
my ($file) = #_;
open my $fh, '<', $file or die qq{Can't open "$file" for input: $!}; #'
chomp(my #array = <$fh>);
return #array;
}
sub array_to_file {
my ($array, $file, $mode) = #_;
$mode //= '>';
my $fh;
if ( $file ) {
open $fh, $mode, $file or die qq{Can't open "$file" for output: $!}; #'
}
else {
$fh = \*STDOUT;
}
print $fh $_, "\n" for #$array;
}
1;
and your test program would be like this
#!/usr/bin/env perl
use strict;
use warnings 'all';
use KconfCtl::Helper;
use constant FILE => 'kconf.test';
my #t = KconfCtl::Helper::file_to_array(FILE);
push #t, 'n';
KconfCtl::Helper::array_to_file(\#t, FILE);
When you read in from your file, you need to chomp() the lines, or else the \n at the end of the line is included.
Try this and you'll see what's happening:
use Data::Dumper; ## add this line
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, '<', $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
print Dumper( \#rray ); ### add this line
return #rray;
}
you can add
foreach(#rray){
chomp();
}
into your module to stop this happening.

match string in file and replacement with other string

I have a file containing lines as follows
#comments abc
#comments xyz
SerialPort=100
Baudrate=9600
Parity=2
Databits=8
Stopbits=1
also I have array #in = ( SerialPort=500 , Baudrate=300, parity=0, Databits=16, Stopbits=0 ),these array elements read from browser, I am trying to write perl script to match "SerialPort" in file and replace SerialPort=100 in file with SerialPort=500 of array, I want match all other elments in loop I tried code not working please improve the code which is below, I think regular expression is not working and each time if condition to match and substitution resulting false, and also when I look at file after execution of script file consists of duplicates.
#!/usr/bin/perl
$old_file = "/home/work/conf";
open (fd_old, "<", $old_file) || die "cant open file";
#read_file = <fd_old>;
close (fd_old);
#temp = ();
$flag = 0;
foreach $infile ( #read_file )
{
foreach $rr ( #in )
{
($key, $value ) = split(/=/, $rr );
if ( $infile =~ s/\b$key\b(.*)/$rr/ )
{
push ( #temp , $infile );
$flag = 0;
}
else
{
$flag = 1;
}
}
if ( $flag )
{
push (#temp, $infile );
}
}
open ( fd, ">", $old_file ) || die "can't open";
print fd #temp;
close(fd);
Perl 101: use strict; use warnings;.
Prefix variable names with $.
$old_file is undef when you try to open it.
And spell falg correctly, which if you'd turned on those options, you'd have been told about.
Also: When asking questions on SO, it's helpful if you point out what's not working.
#Maruti: Never write a perl program without use strict; and use warnings;. I have modified your code. Just have a look.
Code:
#!/usr/bin/perl
use strict;
use warnings;
my $old_file = "/home/work/conf";
open (my $fh, "<", $old_file) || die "cant open file";
my #read_file = <$fh>;
close ($fh);
my #temp = ();
my #in = ('SerialPort=500' , 'Baudrate=300', 'parity=0', 'Databits=16', 'Stopbits=0');
foreach my $infile ( #read_file )
{
foreach my $rr ( #in )
{
my ($key, $value) = split(/=/, $rr );
if ( $infile =~ m/\b$key\b\=\d+/ && $infile =~ /#.*/)
{
$infile =~ s/\b$key\b\=\d+/$rr/ig;
}
}
push (#temp, $infile );
}
open (my $out, ">", $old_file ) || die "can't open";
foreach my $res(#temp)
{
print $out $res;
}
close($out);

perl + read multiple csv files + manipulate files + provide output_files

Apologies if this is a bit long winded, bu i really appreciate an answer here as i am having difficulty getting this to work.
Building on from this question here, i have this script that works on a csv file(orig.csv) and provides a csv file that i want(format.csv). What I want is to make this more generic and accept any number of '.csv' files and provide a 'output_csv' for each inputed file. Can anyone help?
#!/usr/bin/perl
use strict;
use warnings;
open my $orig_fh, '<', 'orig.csv' or die $!;
open my $format_fh, '>', 'format.csv' or die $!;
print $format_fh scalar <$orig_fh>; # Copy header line
my %data;
my #labels;
while (<$orig_fh>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
print $format_fh join(',', #{ $data{$label} }), "\n";
}
i was hoping to use this script from here but am having great difficulty putting the 2 together:
#!/usr/bin/perl
use strict;
use warnings;
#If you want to open a new output file for every input file
#Do it in your loop, not here.
#my $outfile = "KAC.pdb";
#open( my $fh, '>>', $outfile );
opendir( DIR, "/data/tmp" ) or die "$!";
my #files = readdir(DIR);
closedir DIR;
foreach my $file (#files) {
open( FH, "/data/tmp/$file" ) or die "$!";
my $outfile = "output_$file"; #Add a prefix (anything, doesn't have to say 'output')
open(my $fh, '>', $outfile);
while (<FH>) {
my ($line) = $_;
chomp($line);
if ( $line =~ m/KAC 50/ ) {
print $fh $_;
}
}
close($fh);
}
the script reads all the files in the directory and finds the line with this string 'KAC 50' and then appends that line to an output_$file for that inputfile. so there will be 1 output_$file for every inputfile that is read
issues with this script that I have noted and was looking to fix:
- it reads the '.' and '..' files in the directory and produces a
'output_.' and 'output_..' file
- it will also do the same with this script file.
I was also trying to make it dynamic by getting this script to work in any directory it is run in by adding this code:
use Cwd qw();
my $path = Cwd::cwd();
print "$path\n";
and
opendir( DIR, $path ) or die "$!"; # open the current directory
open( FH, "$path/$file" ) or die "$!"; #open the file
**EDIT::I have tried combining the versions but am getting errors.Advise greatly appreciated*
UserName#wabcl13 ~/Perl
$ perl formatfile_QforStackOverflow.pl
Parentheses missing around "my" list at formatfile_QforStackOverflow.pl line 13.
source dir -> /home/UserName/Perl
Can't use string ("/home/UserName/Perl/format_or"...) as a symbol ref while "strict refs" in use at formatfile_QforStackOverflow.pl line 28.
combined code::
use strict;
use warnings;
use autodie; # this is used for the multiple files part...
#START::Getting current working directory
use Cwd qw();
my $source_dir = Cwd::cwd();
#END::Getting current working directory
print "source dir -> $source_dir\n";
my $output_prefix = 'format_';
opendir my $dh, $source_dir; #Changing this to work on current directory; changing back
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
# .... old processing code here ...
## Start:: This part works on one file edited for this script ##
#open my $orig_fh, '<', 'orig.csv' or die $!; #line 14 and 15 above already do this!!
#open my $format_fh, '>', 'format.csv' or die $!;
#print $format_fh scalar <$orig_fh>; # Copy header line #orig needs changeing
print $format_file scalar <$orig_file>; # Copy header line
my %data;
my #labels;
#while (<$orig_fh>) { #orig needs changing
while (<$orig_file>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
#print $format_fh join(',', #{ $data{$label} }), "\n"; #orig needs changing
print $format_file join(',', #{ $data{$label} }), "\n";
}
## END:: This part works on one file edited for this script ##
}
How do you plan on inputting the list of files to process and their preferred output destination? Maybe just have a fixed directory that you want to process all the cvs files, and prefix the result.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $source_dir = '/some/dir/with/cvs/files';
my $output_prefix = 'format_';
opendir my $dh, $source_dir;
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
.... old processing code here ...
}
Alternatively, you could just have an output directory instead of prefixing the files. Either way, this should get you on your way.

Can i collect the output of find(\&wanted, #directories) in an array

I am writing a script which will traverse the directory(including subdir also) and push the desired file in an array so that i can work on each file.
Here is my code:
use strict;
use warnings;
use File::Find;
my $path = $ARGV[0];
find({ wanted => \&GetappropriateFile }, $path);
sub GetappropriateFile
{
my $file = $_;
my #all_file;
# print "$file\n";
if ( -f and /traces[_d+]/)
{
#print "$file\n";
open(my $fh, "<", $file) or die "cannot open file:$!\n";
while( my $line = <$fh>){
$line =~ /Cmd\sline:\s+com.android*/;
push(#all_file,$file);
#print "$file\n";
}
close($fh);
#print"#all_file\n";
}
}
Problem Area : my $file = $_;
Instead of using " $file" if i could get a way to use an array here then i can easily read those files one by one and filter it.
Here what i am tring to do is : I have to open each file and check for the string "Cmd line: com.android" as soon as i get this string in the file i have to push this current file in an array and start reading the another file.
It would be better to avoid global vars.
use strict;
use warnings;
use File::Find qw( find );
sub IsAppropriateFile {
my ($file) = #_;
if (-f $file && $file =~ /traces[_d+]/) {
open(my $fh, "<", $file) or die "cannot open file:$!\n";
while ( my $line = <$fh> ) {
if ($line =~ /Cmd\sline:\s+com.android*/) {
return 1;
}
}
}
return 0;
}
{
my $path = $ARGV[0];
my #matching_files;
find({
wanted => sub {
push #matching_files, $_ if IsAppropriateFile($_);
},
}, $path);
print("$_\n") for #matching_files; # Or whatever.
}
Put declaration of #all_file outside of function, and use it after find() finishes,
my #all_file;
sub GetappropriateFile
{
..
}
You could also stop with file reading after successful match,
if ($line =~ /Cmd\sline:\s+com.android*/) {
push(#all_file, $file);
last;
}

Perl - empty rows while writing CSV from Excel

I want to convert excel-files to csv-files with Perl. For convenience I like to use the module File::Slurp for read/write operations. I need it in a subfunction.
While printing out to the screen, the program generates the desired output, the generated csv-files unfortunately just contain one row with semicolons, field are empty.
Here is the code:
#!/usr/bin/perl
use File::Copy;
use v5.14;
use Cwd;
use File::Slurp;
use Spreadsheet::ParseExcel;
sub xls2csv {
my $currentPath = getcwd();
my #files = <$currentPath/stage0/*.xls>;
for my $sourcename (#files) {
print "Now working on $sourcename\n";
my $outFile = $sourcename;
$outFile =~ s/xls/csv/g;
print "Output CSV-File: ".$outFile."\n";
my $source_excel = new Spreadsheet::ParseExcel;
my $source_book = $source_excel->Parse($sourcename)
or die "Could not open source Excel file $sourcename: $!";
foreach my $source_sheet_number ( 0 .. $source_book->{SheetCount} - 1 )
{
my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];
next unless defined $source_sheet->{MaxRow};
next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
next unless defined $source_sheet->{MaxCol};
next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};
foreach my $row_index (
$source_sheet->{MinRow} .. $source_sheet->{MaxRow} )
{
foreach my $col_index (
$source_sheet->{MinCol} .. $source_sheet->{MaxCol} )
{
my $source_cell =
$source_sheet->{Cells}[$row_index][$col_index];
if ($source_cell) {
print $source_cell->Value, ";"; # correct output!
write_file( $outFile, { binmode => ':utf8' }, $source_cell->Value, ";" ); # only one row of semicolons with empty fields!
}
}
print "\n";
}
}
}
}
xls2csv();
I know it has something to do with the parameter passing in the write_file function, but couldn't manage to fix it.
Has anybody an idea?
Thank you very much in advance.
write_file will overwrite the file unless the append => 1 option is given. So this:
write_file( $outFile, { binmode => ':utf8' }, $source_cell->Value, ";" );
Will write a new file for each new cell value. It does however not match your description of "only one row of semi-colons of empty fields", as it should only be one semi-colon, and one value.
I am doubtful towards this sentiment from you: "For convenience I like to use the module File::Slurp". While the print statement works as it should, using File::Slurp does not. So how is that convenient?
What you should do, if you still want to use write_file is to gather all the lines to print, and then print them all at once at the end of the loop. E.g.:
$line .= $source_cell->Value . ";"; # use concatenation to build the line
...
push #out, "$line\n"; # store in array
...
write_file(...., \#out); # print the array
Another simple option would be to use join, or to use the Text::CSV module.
Well, in this particular case, File::Slurp was indeed complicating this for me. I just wanted to avoid to repeat myself, which I did in the following clumsy working solution:
#!/usr/bin/perl
use warnings;
use strict;
use File::Copy;
use v5.14;
use Cwd;
use File::Basename;
use File::Slurp;
use Tie::File;
use Spreadsheet::ParseExcel;
use open qw/:std :utf8/;
# ... other functions
sub xls2csv {
my $currentPath = getcwd();
my #files = <$currentPath/stage0/*.xls>;
my $fh;
for my $sourcename (#files) {
say "Now working on $sourcename";
my $outFile = $sourcename;
$outFile =~ s/xls/csv/gi;
if ( -e $outFile ) {
unlink($outFile) or die "Error: $!";
print "Old $outFile deleted.";
}
my $source_excel = new Spreadsheet::ParseExcel;
my $source_book = $source_excel->Parse($sourcename)
or die "Could not open source Excel file $sourcename: $!";
foreach my $source_sheet_number ( 0 .. $source_book->{SheetCount} - 1 )
{
my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];
next unless defined $source_sheet->{MaxRow};
next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
next unless defined $source_sheet->{MaxCol};
next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};
foreach my $row_index (
$source_sheet->{MinRow} .. $source_sheet->{MaxRow} )
{
foreach my $col_index (
$source_sheet->{MinCol} .. $source_sheet->{MaxCol} )
{
my $source_cell =
$source_sheet->{Cells}[$row_index][$col_index];
if ($source_cell) {
print $source_cell->Value, ";";
open( $fh, '>>', $outFile ) or die "Error: $!";
print $fh $source_cell->Value, ";";
close $fh;
}
}
print "\n";
open( $fh, '>>', $outFile ) or die "Error: $!";
print $fh "\n";
close $fh;
}
}
}
}
xls2csv();
I'm actually NOT happy with it, since I'm opening and closing the files so often (I have many files with many lines). That's not very clever in terms of performance.
Currently I still don't know how to use the split or Text:CSV in this case, in order to put everything into an array and to open, write and close each file only once.
Thank you for your answer TLP.