Saving Data that's Been Run Through ActivePerl - perl

This must be a basic question, but I can't find a satisfactory answer to it. I have a script here that is meant to convert CSV formatted data to TSV. I've never used Perl before now and I need to know how to save the data that is printed after the Perl script runs it though.
Script below:
#!/usr/bin/perl
use warnings;
use strict;
my $filename = data.csv;
open FILE, $filename or die "can't open $filename: $!";
while (<FILE>) {
s/"//g;
s/,/\t/g;
s/Begin\.Time\.\.s\./Begin Time (s)/;
s/End\.Time\.\.s\./End Time (s)/;
s/Low\.Freq\.\.Hz\./Low Freq (Hz)/;
s/High\.Freq\.\.Hz\./High Freq (Hz)/;
s/Begin\.File/Begin File/;
s/File\.Offset\.\.s\./File Offset (s)/;
s/Random.Number/Random Number/;
s/Random.Percent/Random Percent/;
print;
}
All the data that's been analyzed is in the cmd prompt. How do I save this data?
edit:
thank you everyone! It worked perfectly!

From your cmd prompt:
perl yourscript.pl > C:\result.txt
Here you run the perl script and redirect the output to a file called result.txt

It's always potentially dangerous to treat all commas in a CSV file as field separators. CSV files can also include commas embedded within the data. Here's an example.
1,"Some data","Some more data"
2,"Another record","A field with, an embedded comma"
In your code, the line s/,/\t/g treats all tabs the same and the embedded comma in the final field will also be expanded to a tab. That's probably not what you want.
Here's some code that uses Text::ParseWords to do this correctly.
#!/usr/bin/perl
use strict;
use warnings;
use Text::ParseWords;
while (<>) {
my #line = parse_line(',', 0, $_);
$_ = join "\t", #line;
# All your s/.../.../ lines here
print;
}
If you run this, you'll see that the comma in the final field doesn't get updated.

Related

Remove multiple duplicate lines from a file

I have a Perl script run in crontab that generates a file rich with duplicate entries, because on each run it rewrites information previously written.
I would use a sort -u of file, but, I would do it at the end of the Perl script file.
My list
10/10/2017 00:01:39:000;Sagitter
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
10/12/2017 00:09:00:459;Sagitter
10/13/2017 01:11:03:009;Lupus
12/13/2017 04:29:00:609;Ariet
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
...
My code
#!/usr/bin/perl
# Libraries
use strict;
use warnings 'all';
%lines = ();
# Remove duplicate
open( TMP_GL_OUTPUT, '>', $OUTPUT_FILE ) or die $!;
while ( <TMP_GL_OUTPUT> ) {
$lines{$_}++;
}
open( OUTFILE, '>', $TMPOUTPUT_FILE ) or die $!;
print OUTFILE keys %lines;
close( OUTFILE );
close( TMP_GL_OUTPUT );
Where am I going wrong? In shell it feels shorter than in Perl.
sort -u $TMPOUTPUT_FILE > $OUTPUT_FILE
As Suggested by ikegamy user, I've do as following:
move $OUTPUT_FILE, $TMPOUTPUT_FILE; # Copy file
run [ 'sort', '-u', '--', $TMPOUTPUT_FILE ], '>', $OUTPUT_FILE; # Remove duplicate
unlink $TMPOUTPUT_FILE;
I think you are asking why your Perl program is longer than your shell script.
First of all, your shell script does something completely different than your Perl program.
Your shell script executes a program, and stores its out in a file.
Your Perl program reads a file, manipulates the data it read, and stores the output in a file.
The Perl equivalent to
sort -u -- "$TMPOUTPUT_FILE" > "$OUTPUT_FILE"
is
use IPC::Run qw( run );
run [ 'sort', '-u', '--', $TMPOUTPUT_FILE ], '>', $OUTPUT_FILE;
(There are differences in error handling between these two.)
They're not that different in length.
This brings up the second difference. The shell specializes in executing programs, but Perl is a general purpose language. It would be surprising if it wasn't longer in Perl!
(Now try comparing the size of your Perl program to the source of sort...)
List::Util is a core module.
use List::Util 'uniq';
print for uniq <>
Your code looks almost OK.
My proposition is only to chomp each line, before you
save an element in the hash.
The reason is that e.g. the last line, not terminated
with a \n may look just the same as one of previous lines,
but without chomp the previous line would have contained
the terminating \n, whereas the last - not.
The resut is that both these lines will be different keys in the hash.
Compare my example program (working, presented below) with yours, there are
no other significant differences, apart from reading from __DATA__ and
writing to the console.
In my program, for demonstration purposes, I put 2 variants of printout,
one with key values (repetition counts) and another, printing just keys.
In your program leave only the second printout.
use strict; use warnings; use feature qw(say);
my %lines;
while(<DATA>) {
chomp;
$lines{$_}++;
}
while(my($key, $val) = each %lines) {
printf "%-32s / %d\n", $key, $val;
}
say '========';
foreach my $key (keys %lines) {
say $key;
}
__DATA__
10/10/2017 00:01:39:000;Sagitter
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
10/12/2017 00:09:00:459;Sagitter
10/13/2017 01:11:03:009;Lupus
12/13/2017 04:29:00:609;Ariet
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
Edit
Your code assigns no names to $OUTPUT_FILE and $TMPOUTPUT_FILE,
you even didn't declare these variables, but I assume, that in your actual
code you did it.
Another detail is that %lines should be preceded with my,
otherwise, as you put use strict; the compiler prints an error.
Edit 2
There is a quicker and shorter solution than yours.
Instead of writing lines to a hash and printing them as late as in
the second step, you can do it in a single loop:
Read the line.
Check whether the hash already contains a key equal to the line just read.
If not, then:
write the line to the hash, to block the printout, if just the
same line occured again,
print the line.
You can even write this program as a Perl one-liner:
perl -lne"print if !$lines{$_}++" input.txt
If you run the above command from the Windows cmd, it will print the output
to the console. If you use Linux, instead of double quotes, you can use apostrophes.
You may of course redirect the output to any file, adding > output.txt to
the above command.
The code is executed for each input line, chomped due to -l option.
If any other details concerning Perl one-liners are not known to you, search the web.

Remove mysterious line breaks in CSV file using Perl

I have a CSV file that I'm parsing using Perl. The file is a BOM produced by Solidworks 2015 that was saved as an XLS file, then opened in Excel and saved as a CSV file.
There are cells that have line breaks. When I read a line with such a cell from the file, the line comes in with the line breaks. For example, here is one of the lines read looks like this:
74,,74,1,1,"SJ-TL303202-DET-074-
001",PDSI,"2.25"" DIA. X 8.00""",A2,513,1,
It reads in as a single line in Perl.
When I turn the Show All Characters in Notepad++, I can see the line breaks are cause by [CR][LF].
So I thought this would work to remove the line feeds:
$line =~ s/[\r\n]+//g;
but it does not.
You don't give much of a sample of your CSV data, but what you show is perfectly valid. A text field may contain newlines if you wish, as long as it is enclosed in double-quotes
The Text::CSV module will process it quite happily as long as you enable the binary option in the constructor call, and you may reformat the data as you wish before you write it back out again
This program expects the path to the input file as a parameter on the command line, and it will write the modified data to STDOUT, which you can redirect on the command line, like this
$ perl fix_csv.pl input.csv > output.csv
I've assumed that your data contains only 7-bit ASCII data, and it should work whether you're running it on a Windows system or on Linux
use strict;
use warnings 'all';
my ($csv_file) = #ARGV;
use Text::CSV;
open my $fh, '<', $csv_file or die qq{Unable to open "$csv_file" for input: $!};
my $csv = Text::CSV->new( { binary => 1 } );
while ( my $row = $csv->getline( $fh ) ) {
tr/\r\n//d for #$row;
$csv->combine(#$row);
print $csv->string, "\n";
}
output
74,,74,1,1,SJ-TL303202-DET-074-001,PDSI,"2.25"" DIA. X 8.00""",A2,513,1,

perl - string compare failing while fetching a line from a file.

my code,
#!/usr/bin/perl -w
use strict;
use warnings;
my $codes=" ";
my $count=0;
my $str1="code1";
open (FILE, '/home/vpnuser/testFile.txt') or die("Could not open the file.");
while($codes=<FILE>)
{
print($codes);
if($codes eq $str1)
{
$count++;
}
}
print "$count";
the comparison always fails. my testFile.txt contains one simple line - code1
when i have written a separate perl script where i have two strings declared in the script it self rather than getting it from a file, the eq operator works fine. but when i am getting it from a file, there is a problem. Pease help,
Thanks in advance!
Don't forget to chomp your file input if you don't want it to end in a return character.
while(my $codes = <FILE>)
{
chomp $codes;
That is likely the reason why your string comparison is failing.
As on additional aside, kudus for including use strict; and use warnings; at the the top of your script, like one should always do.
I'd like to recommend that you also include use autodie; at the top as well when doing file processing. It will automatically give you a detailed error message for doing many kinds of operations, such as opening a file, so you won't have to remember to include the error code $! or the filename in your die statement.

Perl script -- Multiple text file parsing and writing

Suppose i have this directory full of text files (raw text). What i need is a Perl script that will parse the directory (up2bottom) text files one by one and save their contents in a new single file, appointed by me. In other words i simply want to create a corpus of many documents. Note: these documents have to be separated by some tag e.g. indicating the sequence in which they were parsed.
So far i have managed to follow some examples and i know how to read, write and parse text files. But i am not yet in position to merge them into one script and handle many text files. Can you please provide some assistance. thanks
edit:
example code for writing to a file.
#!/usr/local/bin/perl
open (MYFILE, '>>data.txt');
print MYFILE "text\n";
close (MYFILE);
example code for reading a file.
#!/usr/local/bin/perl
open (MYFILE, 'data.txt');
while (<MYFILE>) {
chomp;
print "$_\n";
}
close (MYFILE);
I've also find out about the foreach function which can be used for tasks as such, but still don't know how to combine them and achieve the result explained in the description.
The important points in this suggestion are:
the "magic" diamond operator (a.k.a. readline), which reads from each file in *ARGV,
the eof function, which tells if the next readline on the current filehandle will return any data
the $ARGV variable, that contains the name of the currently opened file.
With that intro, here we go!
#!/usr/bin/perl
use strict; # Always!
use warnings; # Always!
my $header = 1; # Flag to tell us to print the header
while (<>) { # read a line from a file
if ($header) {
# This is the first line, print the name of the file
print "========= $ARGV ========\n";
# reset the flag to a false value
$header = undef;
}
# Print out what we just read in
print;
}
continue { # This happens before the next iteration of the loop
# Check if we finished the previous file
$header = 1 if eof;
}
To use it, just do: perl concat.pl *.txt > compiled.TXT

Is there an issue with opening filenames provided on the command line through $_?

I'm having trouble modifying a script that processes files passed as command line arguments, merely for copying those files, to additionally modifying those files. The following perl script worked just fine for copying files:
use strict;
use warnings;
use File::Copy;
foreach $_ (#ARGV) {
my $orig = $_;
(my $copy = $orig) =~ s/\.js$/_extjs4\.js/;
copy($orig, $copy) or die(qq{failed to copy $orig -> $copy});
}
Now that I have files named "*_extjs4.js", I would like to pass those into a script that similarly takes file names from the command line, and further processes the lines within those files. So far I am able get a file handle successfully as the following script and it's output shows:
use strict;
use warnings;
foreach $_ (#ARGV) {
print "$_\n";
open(my $fh, "+>", $_) or die $!;
print $fh;
#while (my $line = <$fh>) {
# print $line;
#}
close $fh;
}
Which outputs (in part):
./filetree_extjs4.js
GLOB(0x1a457de8)
./async_submit_extjs4.js
GLOB(0x1a457de8)
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves. A start would be to print the files lines, which I've tried to do with the commented out code above.
But that code has no effect, the files' lines do not get printed. What am I doing wrong? Is there a conflict between the $_ used to process command line arguments, and the one used to process file contents?
It looks like there are a couple of questions here.
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves.
The reason why print $fh is returning GLOB(0x1a457de8) is because the scalar $fh is a filehandle and not the contents of the file itself. To access the contents of the file itself, use <$fh>. For example:
while (my $line = <$fh>) {
print $line;
}
# or simply print while <$fh>;
will print the contents of the entire file.
This is documented in pelrdoc perlop:
If what the angle brackets contain is a simple scalar variable (e.g.,
<$foo>), then that variable contains the name of the filehandle to
input from, or its typeglob, or a reference to the same.
But it has already been tried!
I can see that. Try it after changing the open mode to +<.
According to perldoc perlfaq5:
How come when I open a file read-write it wipes it out?
Because you're using something like this, which truncates the file
then gives you read-write access:
open my $fh, '+>', '/path/name'; # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
doesn't exist:
open my $fh, '+<', '/path/name'; # open for update
Using ">" always clobbers or creates. Using "<" never does either. The
"+" doesn't change this.
It goes without saying that the or die $! after the open is highly recommended.
But take a step back.
There is a more Perlish way to back up the original file and subsequently manipulate it. In fact, it is doable via the command line itself (!) using the -i flag:
$ perl -p -i._extjs4 -e 's/foo/bar/g' *.js
See perldoc perlrun for more details.
I can't fit my needs into the command-line.
If the manipulation is too much for the command-line to handle, the Tie::File module is worth a try.
To read the contents of a filehandle you have to call readline read or place the filehandle in angle brackets <>.
my $line = readline $fh;
my $actually_read = read $fh, $text, $bytes;
my $line = <$fh>; # similar to readline
To print to a filehandle other than STDIN you have to have it as the first argument to print, followed by what you want to print, without a comma between them.
print $fh 'something';
To prevent someone from accidentally adding a comma, I prefer to put the filehandle in a block.
print {$fh} 'something';
You could also select your new handle.
{
my $oldfh = select $fh;
print 'something';
select $oldfh; # reset it back to the previous handle
}
Also your mode argument to open, causes it to clobber the contents of the file. At which point there is nothing left to read.
Try this instead:
open my $fh, '+<', $_ or die;
I'd like to add something to Zaid's excellent suggestion of using a one-liner.
When you are new to perl, and trying some tricky regexes, it can be nice to use a source file for them, as the command line may get rather crowded. I.e.:
The file:
#!/usr/bin/perl
use warnings;
use strict;
s/complicated/regex/g;
While tweaking the regex, use the source file like so:
perl -p script.pl input.js
perl -p script.pl input.js > testfile
perl -p script.pl input.js | less
Note that you don't use the -i flag here while testing. These commands will not change the input files, only print the changes to stdout.
When you're ready to execute the (permanent!) changes, just add the in-place edit -i flag, and if you wish (recommended), supply an extension for backups, e.g. ".bak".
perl -pi.bak script.pl *.js