\n not working. new line not working in Perl script - perl

Here I write Perl code but in if condition, used \n new line character not match.
#!/usr/bin/perl
use strict;
#use warnings;
use Cwd;
use File::Basename;
use File::Copy;
my $path = getcwd;
#print $path."\n";
opendir( INP, "$path\/" );
my #out = grep( /.(xml)$/, readdir(INP) );
close INP;
#print #out;
open( F6, ">Log.txt" );
foreach my $f1 (#out) {
open( FF, "<$path\/$f1" ) or die "Cannot open file: $out[0]";
my $data1 = join( "", <FF> );
my #FILE_KA_ARRAY = split( /\n/, $data1 );
my $file_ka_len = #FILE_KA_ARRAY;
#print F6 $file_ka_len."\n";
#print F6 $f."\t".$file_ka_len."\n";
print F6 $f1 . "\n";
for ( my $x = 1; $x < $file_ka_len; $x++ ) {
my $y = $x + 1;
my $temp_file_arr = "";
$temp_file_arr = $FILE_KA_ARRAY[$x];
#print F6 $temp_file_arr."\t$x\n";
my $temp1 = $temp_file_arr;
if ( $temp1
=~ m#(<list .*? depth="(\d+)">)\n?(<list .*? depth="(\d+)">)#gs )
{
my $list3 = $1;
print F6 "\t\t\t\t\t\t\t\t" . $y . "\t\t" . $list3 . "\n";
}
}
}

Assuming your problem line is this:
if($temp1=~m#(<list .*? depth="(\d+)">)\n?(<list .*? depth="(\d+)">)#gs)
Then the problem here is here:
my #FILE_KA_ARRAY = split(/\n/, $data1);
Because your split is removing the linefeeds and putting each line into the array. And so when you do:
$temp_file_arr = $FILE_KA_ARRAY[$x];
my $temp1=$temp_file_arr;
You have no linefeeds in there, because you have no linefeeds in your source.
Additionally though:
Don't turn off warnings. IF you have warnings FIX THEM.
This looks like XML. Use a parser. (Although I'd avoid XML::Simple - it's nasty)
indenting your code is a good thing, because it helps clarify your code.
if you use glob ( "$path/*.xml" ) instead of readdir and grep you get a list of paths built in.

Related

Make the same edit for edit for each column in a multi-column file

I have multiple CSV files with varying numbers of columns that I need to reformat into a fixed-format text file.
At this stage, I hash and unhash the columns that need to be edited, but its tedious and I can't add new columns without changing the program first.
Is there a simpler way of reading, splitting and editing all columns, regardless of the number of columns in the file?
Here is my code thus far:
use strict;
use warnings;
my $input = 'FILENAME.csv';
my $output = 'FILENAME.txt';
open (INPUT, "<", "$input_file") or die "\n !! Cannot open $input_file: $!";
open (OUTPUT, ">>", "$output_file") or die "\n !! Cannot create $output_file: $!";
while ( <INPUT> ) {
my $line = $_;
$line =~ s/\s*$//g;
my ( $a, $b, $c, $d, $e, $f, $g, $h, $i, $j ) = split('\,', $line);
$a = sprintf '%10s', $a;
$b = sprintf '%10s', $b;
$c = sprintf '%10s', $c;
$d = sprintf '%10s', $d;
$e = sprintf '%10s', $e;
$f = sprintf '%10s', $f;
$g = sprintf '%10s', $g;
$h = sprintf '%10s', $h;
$i = sprintf '%10s', $i;
$j = sprintf '%10s', $j;
print OUTPUT "$a$b$c$d$e$f$g$h$i$j\n";
}
close INPUT;
close OUTPUT;
exit;
Do you mean something like this?
perl -aF/,/ -lne 'print map sprintf("%10s", $_), #F' FILENAME.csv > FILENAME.txt
Any time you're using sequential variables, you should be using an array. And in this case, since you only use the array once, you don't even need to do more than hold it temporarily.
Also: Use lexical filehandles, it's better practice.
#!/usr/bin/env perl
use strict;
use warnings;
my $input_file = 'FILENAME.csv';
my $output_file = 'FILENAME.txt';
my $format = '%10s';
open( my $input_fh, "<", $input_file ) or die "\n !! Cannot open $input_file: $!";
open( my $output_fh, ">>", $output_file ) or die "\n !! Cannot create $output_file: $!";
while (<$input_fh>) {
print {$output_fh} join "", map { sprintf $format, $_ } split /,/;
}
close $input_fh;
close $output_fh;
exit;

Perl : Need to append two columns if the ID's are repeating

If id gets repeated I am appending app1, app2 and printing it once.
Input:
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
Output:
id|Name|app1|app2
1|abc|234,265|231,321|
2|xyz|123|215|
3|asd|213|235|
Output I'm getting:
id|Name|app1|app2
1|abc|234,231|
2|xyz|123,215|
1|abc|265,321|
3|asd|213,235|
My Code:
#! usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
my $counter = 0;
my %RepeatNumber;
my $pos=0;
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
open(FH, '<', join('', $basedir, $file)) || die $!;
my $line = readline(FH);
unless ($counter) {
chomp $line;
print OUTFILE $line;
print OUTFILE "\n";
}
while ($line = readline(FH)) {
chomp $line;
my #obj = split('\|',$line);
if($RepeatNumber{$obj[0]}++) {
my $str1= join("|",$obj[0]);
my $str2=join(",",$obj[2],$obj[3]);
print OUTFILE join("|",$str1,$str2);
print OUTFILE "\n";
}
}
This should do the trick:
use strict;
use warnings;
my $file_in = "doctor.txt";
open (FF, "<$file_in");
my $temp = <FF>; # remove first line
my %out;
while (<FF>)
{
my ($id, $Name, $app1, $app2) = split /\|/, $_;
$out{$id}[0] = $Name;
push #{$out{$id}[1]}, $app1;
push #{$out{$id}[2]}, $app2;
}
foreach my $key (keys %out)
{
print $key, "|", $out{$key}[0], "|", join (",", #{$out{$key}[1]}), "|", join (",", #{$out{$key}[2]}), "\n";
}
EDIT
To see what the %out contains (in case it's not clear), you can use
use Data::Dumper;
and print it via
print Dumper(%out);
I'd tackle it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use 5.14.0;
my %stuff;
#extract the header row.
#use the regex to remove the linefeed, because
#we can't chomp it inline like this.
#works since perl 5.14
#otherwise we could just chomp (#header) later.
my ( $id, #header ) = split( /\|/, <DATA> =~ s/\n//r );
while (<DATA>) {
#turn this row into a hash of key-values.
my %row;
( $id, #row{#header} ) = split(/\|/);
#print for diag
print Dumper \%row;
#iterate each key, and insert into $row.
foreach my $key ( keys %row ) {
push( #{ $stuff{$id}{$key} }, $row{$key} );
}
}
#print for diag
print Dumper \%stuff;
print join ("|", "id", #header ),"\n";
#iterate ids in the hash
foreach my $id ( sort keys %stuff ) {
#join this record by '|'.
print join('|',
$id,
#turn inner arrays into comma separated via map.
map {
my %seen;
#use grep to remove dupes - e.g. "abc,abc" -> "abc"
join( ",", grep !$seen{$_}++, #$_ )
} #{ $stuff{$id} }{#header}
),
"\n";
}
__DATA__
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
This is perhaps a bit overkill for your application, but it should handle arbitrary column headings and arbitary numbers of duplicates. I'll coalesce them though - so the two abc entries don't end up abc,abc.
Output is:
id|Name|app1|app2
1|abc|234,265|231,321
2|xyz|123|215
3|asd|213|235
Another way of doing it which doesn't use a hash (in case you want to be more memory efficient), my contribution lies under the opens:
#!/usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
select(OUTFILE);
open(FH, '<', join('', $basedir, $file)) || die $!;
print(scalar(<FH>));
my #lastobj = (undef);
foreach my $obj (sort {$a->[0] <=> $b->[0]}
map {chomp;[split('|')]} <FH>) {
if(defined($lastobj[0]) &&
$obj[0] eq $lastobj[0])
{#lastobj = (#obj[0..1],
$lastobj[2].','.$obj[2],
$lastobj[3].','.$obj[3])}
else
{
if($lastobj[0] ne '')
{print(join('|',#lastobj),"|\n")}
#lastobj = #obj[0..3];
}
}
print(join('|',#lastobj),"|\n");
Note that split, without it's third argument ignores empty elements, which is why you have to add the last bar. If you don't do a chomp, you won't need to supply the bar or the trailing hard return, but you would have to record $obj[4].

modification of script in perl

currently I have the following script
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $header = <> . <>;
print $header;
my $last_sequence_number = 0;
open( my $output, ">", "output.$last_sequence_number.out" ) or die $!;
print {$output} $header;
$seen{$last_sequence_number}++;
while (<>) {
my ($key) = split;
next unless $key =~ m/^\d+$/;
my $sequence_number = int( $key / 1000 );
if ( not $sequence_number == $last_sequence_number ) {
print "Opening new file for $sequence_number\n";
close($output);
open( $output, ">", "output.$sequence_number.out" ) or die $!;
print {$output} $header unless $seen{$sequence_number}++;
$last_sequence_number = $sequence_number;
}
print {$output} $_;
}
the script splits a file into other files with the pattern file 1 file 2 ... now I would need to pass to the script another parameter which allows to specify a prefix for the output so if this additional input is 1 then the output would be
1_file1,1_file2....and so on.. how could I do that?
I know that something like
use Getopt::Long;
could be used?
tried this
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $header = <> . <>;
print $header;
my ( $suffix, $filename ) = #ARGV;
open ( my $input, "<", $filename ) or die $!;
my $last_sequence_number = 0;
open( my $output, ">", "output.$last_sequence_number.out" ) or die $!;
print {$output} $header;
$seen{$last_sequence_number}++;
while (<$input>) {
my ($key) = split;
next unless $key =~ m/^\d+$/;
my $sequence_number = int( $key / 1000 );
if ( not $sequence_number == $last_sequence_number ) {
print "Opening new file for $sequence_number\n";
close($output);
open( $output, ">", "output.$sequence_number.out" ) or die $!;
print {$output} $header unless $seen{$sequence_number}++;
$last_sequence_number = $sequence_number;
}
print {$output} $_;
}
but that is not working. What is wrong?
I get
No such file or directory at ./spl.pl line 10, <> line 2.
after the header is printed.
As Sobrique says, your problem is the magical nature of <>. But I don't think that it's as hard to deal with as he thinks.
The point is that <> looks at the current value of #ARGV. So you can add other command line arguments as long as you ensure that you have removed them from #ARGV before you use <> for the first time.
So change your code so that it starts like this:
my %seen;
my $prefix = shift;
my $header = <> . <>;
You can then call your program like this:
$ your_program.pl prefix_goes_here list of file names...
Everything else should now work the same as it currently does, but you have your prefix stored away in $prefix so that you can use it in your print statements.
I hope that's what you wanted. Your question isn't particularly clear.
I would do something like this.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use Getopt::Long qw(:config bundling);
use Pod::Usage;
{
my $man = 0;
my $help = 0;
my $verbose = 0;
my $prefix = '';
my $suffix = '';
my $header_lines = 2;
my $bunch_size = 1000;
GetOptions(
'help|?' => \$help,
'man' => \$man,
'verbose|v+' => \$verbose,
'prefix|p=s' => \$prefix,
'suffix|s=s' => \$suffix,
'header|h=i' => \$header_lines,
'bunch|batch|bucket|b=i' => \$bunch_size
) or pod2usage(2);
pod2usage(1) if $help;
pod2usage( -exitval => 0, -verbose => 2 ) if $man;
pod2usage(
-exitval => 3,
-message => "Headers lines can't be negative number"
) if $header_lines < 0;
pod2usage(
-exitval => 4,
-message => "Bunch size has to be positive"
) unless $bunch_size > 0;
my $header = '';
$header .= <> for 1 .. $header_lines;
my %seen;
my $current_output_number = -1;
sub key2output { int( shift() / $bunch_size ) }
sub set_output {
my $output_number = shift;
if ( $output_number != $current_output_number ) {
my $seen = $seen{$output_number}++;
printf STDOUT "Opening %sfile for %d\n", $seen ? '' : 'new ',
$output_number
if $verbose;
open my $fh, $seen ? '>>' : '>',
$prefix . $output_number . $suffix;
select $fh;
print $header unless $seen;
$current_output_number = $output_number;
}
}
}
while (<>) {
my ($key) = /^(\d+)\s/;
next unless defined $key;
set_output( key2output($key) );
print;
}
__END__
=head1 NAME
code.pl - splits file by first number by thousands
=head1 SYNOPSIS
code.pl [options] [file ...]
Options:
--help brief help message
--man full documentation
--prefix output filename prefix
--suffix outpit filename suffix
--header number of header lines (default: 2)
=head1 OPTIONS
=over 8
=item B<--help>
Print a brief help message and exits.
=item B<--man>
Prints the manual page and exits.
=back
=head1 DESCRIPTION
B<This program> will read the given input file(s) and do something
useful with the contents thereof.
=cut
Just finish documentation and you can ship it to your colleagues.
The problem you've got is that the diamond operator <> is a piece of special perl magic.
It takes 'all filenames on command line' opens them and processes them in order.
To do what you're trying to do:
my ( $suffix, $filename ) = #ARGV;
open ( my $input, "<", $filename ) or die $!;
Then you can change your while loop to:
while ( <$input> ) {
And modify the output filename according to your desires. The key different there is that it'll only take one filename at that point - first arg is suffix, second is name.
You could perhaps extend this with:
my ( $suffix, #names ) = #ARGV;
And then run a foreach loop:
foreach my $filename ( #names ) {
open .... #etc

Output .Resx From .CS using perl script

.CS contains string within double quotes and I am trying to extract these strings into .resx file.
The existing code output the .resx but with only one string whereas .CS file contains more than one strings in quotes.
Can you please provide any reference to achieve this?
use strict;
use warnings;
use File::Find;
use XML::Writer;
use Cwd;
#user input: [Directory]
my $wrkdir = getcwd;
system "attrib -r /s";
print "Processing $wrkdir\n";
find( \&recurse_src_path, $wrkdir );
sub recurse_src_path
{
my $file = $File::Find::name;
my $fname = $_;
my #lines;
my $line;
if ( ( -f $file ) && ( $file =~ /.*\.cs$/i ) )
{
print "..";
open( FILE, $file ) || die "Cannot open $file:\n$!";
while ( $line = <FILE> )
{
if ( $line =~ s/\"(.*?)\"/$1/m )
{
chomp $line;
push( #lines, $line );
my $nl = '0';
my $dataIndent;
my $output = new IO::File(">Test.resx");
#binmode( $output, ":encoding(utf-8)" );
my $writer = XML::Writer->new(
OUTPUT => $output,
DATA_MODE => 1,
DATA_INDENT => 2
);
$writer->xmlDecl("utf-8");
$writer->startTag('root');
foreach my $r ($line)
{
print "$1\n";
$writer->startTag( 'data', name => $_ );
$writer->startTag('value');
$writer->characters($1);
$writer->endTag('value');
$writer->startTag('comment');
$writer->characters($1);
$writer->endTag('comment');
$writer->endTag('data');
}
$writer->endTag('root');
$writer->end;
$output->close();
}
}
close FILE;
}
}
Use the /g regex modifier. For example:
use strict;
use warnings;
my $cs_string = '
// Imagine this is .cs code here
system "attrib -r /s";
print "Processing $wrkdir\n";
find( \&recurse_src_path, $wrkdir );
';
while ($cs_string =~ /\"(.*)\"/g) {
print "Found quoted string: '$1'\n"
}
;
See also: http://perldoc.perl.org/perlrequick.html#Matching-repetitions
You might also want to look at File-Slurp to read your .cs code into a single Perl scalar, trusting that your .cs file is not too large.
Finally combine this with your existing code to get the .resx output format.

Searching for two matches in perl

I'm looking for a way to match two terms in a single string. For instance if I need to match both "foo" and "bar" in order for the string to match and be printed, and the string is "foo 121242Z AUTO 123456KT 8SM M10/M09 SLP02369", it would not match. But if the string was "foo 121242Z AUTO 123456KT 8SM bar M10/M09 SLP02369", it would match and then go on to be printed. Here's the code that I have currently but I am a bit stuck. Thanks!
use strict;
use warnings;
use File::Find;
use Cwd;
my #folder = ("/d2/aschwa/archive_project/METAR_data/");
open(OUT , '>', 'TEKGEZ_METARS.txt') or die "Could not open $!";
print OUT "Date (YYYYMMDD), Station, Day/Time, Obs Type, Wind/Gust (Kt), Vis (SM),
Sky, T/Td (C), Alt, Rmk\n";
print STDOUT "Finding METAR files\n";
my $criteria = sub {if(-e && /^/) {
open(my $file,$_) or die "Could not open $_ $!\n";
my $dir = getcwd;
my #dirs = split ('/', $dir);
while(<$file>) {
$_ =~ tr/\015//d;
print OUT $dirs[-1], ' ', $_ if /foo?.*bar/;
}
}
};
find($criteria, #folder);
close OUT;
print STDOUT "Done Finding Station METARS\n";
Why not just simple:
perl -ne'print if /foo.*bar/'
If you want process more files from some directory use find
find /d2/aschwa/archive_project/METAR_data/ -type f -exec perl -MFile::Spec -ne'BEGIN{$dir = (File::Spec->splitdir($ARGV[0]))[-2]} print $dir, ' ', $_ if /foo.*bar/' {} \; > TEKGEZ_METARS.txt
You can achieve it with positive look-ahead for both strings:
print OUT $dirs[-1], ' ', $_ if m/(?=.*foo)(?=.*bar)/;
#!/usr/bin/perl
use warnings;
use strict;
my $string1 = "foo 121242Z AUTO 123456KT 8SM M10/M09 SLP02369";
my $string2 = "foo 121242Z AUTO 123456KT 8SM bar M10/M09 SLP02369";
my #array = split(/\s+/, $string2);
my $count = 0;
foreach (#array){
$count++ if /foo/;
$count++ if /bar/;
}
print join(" ", #array), "\n" if $count == 2;
This will print for $string2, but not for $string1