Perl - prepend/remove line numbers to a file

Perl - prepend/remove line numbers to a file - perl

I'm currently writing a Perl script that needs to have an option within a menu that will prepend line numbers to a file and another option to remove those line numbers.
example outputs: "This is the first line of my file" should be changed to "01 - This is the first line of my file"
and
"01 - This is the first line of my file" should be changed to "This is the first line of my file"
I've thought of using loops to accomplish this but I have a feeling that there might be a more simple solution. How should I go about solving this?

From the command line:
To add the line numbers as requested:
perl -ne 'printf("%02d - ",$.);print' < file > newfile
To remove the line numbers:
perl -ne 's/^\d* - //;print' < newfile
Update
To edit the file "in place" use the -i flag. Also -p instead of -n and print. From the command line:
To add the line numbers as requested:
perl -pe 'printf("%02d - ",$.)' -i file
To remove the line numbers:
perl -pe 's/^\d* - //' -i file

Since you want to use it in a script I am going to extend on nirys answer:
I don't think there is going to be a more simple and mainainable solution than using a loop. One example with foreach:
sub addLinenumbersToFile {
my($file) = #_;
open my $in_handle, "<", $file;
my #lines = <$in_handle>;
close $in_handle;
my $line_count = 1;
foreach my $line (#lines) {
# modify element directly:
$line = sprintf("%02d - ", $line_count) . $line;
$line_count++;
}
open my $out_handle, ">", $file;
print $out_handle join('', #lines);
}
You can then call this function with the filename as parameter: addLinenumbersToFile("inputfile.txt");
And to return the file to it's original state:
sub removeLinenumbersFromFile {
my($file) = #_;
open my $in_handle, "<", $file;
my #lines = <$in_handle>;
close $in_handle;
my $line_count = 1;
foreach my $line (#lines) {
my $formatted_line_count = sprintf("%02d - ", $line_count);
# modify element directly:
$line =~ s/^$formatted_line_count//;
$line_count++;
}
open my $out_handle, ">", $file;
print $out_handle join('', #lines);
}
Since the line number remover uses the exact same format the chances of modifying the wrong file by accident are not very big. They could be further reduced by leaving the function after one line didn't match.

Related

Split file Perl

I want to split parts of a file. Here is what the start of the file looks like (it continues in same way):
Location Strand Length PID Gene
1..822 + 273 292571599 CDS001
906..1298 + 130 292571600 trxA
I want to split in Location column and subtract 822-1 and do the same for every row and add them all together. So that for these two results the value would be: (822-1)+1298-906) = 1213
How?
My code right now, (I don't get any output at all in the terminal, it just continue to process forever):
use warnings;
use strict;
my $infile = $ARGV[0]; # Reading infile argument
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $line2 = <$IN>;
my $coding = 0; # Initialize coding variable
while(my $line = $line2){ # reading the file line by line
# TODO Use split and do the calculations
my #row = split(/\.\./, $line);
my #row2 = split(/\D/, $row[1]);
$coding += $row2[0]- $row[0];
}
print "total amount of protein coding DNA: $coding\n";
So what I get from my code if I put:
print "$coding \n";
at the end of the while loop just to test is:
821
1642
And so the first number is correct (822-1) but the next number doesn't make any sense to me, it should be (1298-906). What I want in the end outside the loop:
print "total amount of protein coding DNA: $coding\n";
is the sum of all the subtractions of every line i.e. 1213. But I don't get anything, just a terminal that works on forever.

As a one-liner:
perl -nE '$c += $2 - $1 if /^(\d+)\.\.(\d+)/; END { say $c }' input.txt
(Extracting the important part of that and putting it into your actual script should be easy to figure out).

Explicitly opening the file makes your code more complicated than it needs to be. Perl will automatically open any files passed on the command line and allow you to read from them using the empty file input operator, <>. So your code becomes as simple as this:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $total;
while (<>) {
my ($min, $max) = /(\d+)\.\.(\d+)/;
next unless $min and $max;
$total += $max - $min;
}
say $total;
If this code is in a file called adder and your input data is in add.dat, then you run it like this:
$ adder add.dat
1213
Update: And, to explain where you were going wrong...
You only ever read a single line from your file:
my $line2 = <$IN>;
And then you continually assign that same value to another variable:
while(my $line = $line2){ # reading the file line by line
The comment in this line is wrong. I'm not sure where you got that line from.
To fix your code, just remove the my $line2 = <$IN> line and replace your loop with:
while (my $line = <$IN>) {
# your code here
}

How to print lines from log file which occurs after some particular time

I want to print all the lines which lets say occur after a time whose value is returned by localtime function of perl inside a perl script. I tried something like below:
my $timestamp = localtime();
open(CMD,'-|','cat xyz.log | grep -A1000 \$timestamp' || die ('Could not open');
while (defined(my $line=<CMD>)){
print $line;
}
If I replace the $timestamp in cat command with actaul time component from xyz.log then it print lines but its not printing with $timestamp variable.
Is there any alternative way I can print lines that occurs after current time in log files or how i can improve above command?

Your $timestamp is never evaluated in Perl as it appears only in single quotes. But why go out to shell in order to match a string and process a file? Perl is far better for that.
Here is a direct way first, then a basic approach. A full script is shown in the second example.
Read the file until you get to the line with the pattern, and exit the loop at that point. The next time you access that filehandle you'll be on the next line and can start printing, in another loop.
while (<$fh>) { last if /$timestamp/ }
print while <$fh>;
This prints out the part of the file starting with the line following the one which has the $timestamp anywhere in it. Adjust how exactly to match the timestamp if it is more specific.
Or -- set a flag when a line matches the timestamp, print if flag is set.
use warnings 'all';
use strict;
my $timestamp = localtime();
my $logile = 'xyz.log';
open my $fh, '<', $logfile or die "Can't open $logfile: $!";
my $mark = 0;
while (<$fh>)
{
if (not $mark) {
$mark = 1 if /$timestamp/;
}
else { print }
}
close $fh;

If you're doing the grepping in Shell anyway you might as well do it the other way round and call perl only to give you the result of localtime:
sed <xyz.log -ne"/^$(perl -E'say scalar localtime')/,\$p"
This uses sed's range addressing: first keep it from printing lines unless explicitly told so using -n, the select everything between the first occurrence of the timestamp (I added a ^ for good measure, just in case log lines could contain time stamps in plain text) and the end of file ($) and print it (p).
A pure Perl solution could look like this:
my $timestamp = localtime();
my $found;
open(my $fh, '<', 'xyz.log') or die ('Could not open xyz.log: $!');
while (<$fh>) {
if($found) {
print;
} else {
$found = 1 if /^$timestamp/;
}
}

I would suggest a Perlish approach like below:
open (my $cmd, "<", "xyz.log") or die $!;
#get all lines in an array with each index containing each line
my #log_lines = <$cmd>;
my $index = 0;
foreach my $line (#log_lines){
#write regex to capture time from line
my $rex = qr/regex_for_time_as_per_logline/is;
if ($line =~ /$rex/){
#found the line with expected time
last;
}
$index++;
}
#At this point we have got the index of array from where our expected time starts.
#So all indexes after that have desired lines, which you can write as below
foreach ($index..$#log_lines){
print $log_lines[$_];
}
If you share one of your logline, I could help with the regex.

You may also try this approach:
In this case, I tried to open /var/log/messages then convert each line timestamp to epoch and finding all the lines which has occurred after time()
use Date::Parse;
my $epoch_now = time(); # print epoch current time.
open (my $fh, "</var/log/messages") || die "error: $!\n";
while (<$fh>) {
chomp;
# one log line - looks like this
# Sep 9 08:17:01 localhost rsyslogd: rsyslogd was HUPed
my ($mon, $day, $hour, $min, $sec) = ($_ =~ /(\S+)\s*(\d+)\s*(\d+):(\d+):(\d+)/);
# date string part shouldn't be empty
if (defined($mon) && defined($day)
&& defined($hour) && defined($min)
&& defined($sec)) {
my $epoch_log = str2time("$mon $day $hour:$min:$sec");
if ($epoch_log > $epoch_now) {
print, "\n";
}
}
}

Perl - reading through command output file line by line

I have a log file which contains an output of several commands run on each server. The format is like
APRHY01> lt all
131119-15:41:39 10.105.219.68 10.0b stopfile=/tmp/27599
Checking MOM version...RNC_NODE_MODEL_M_1_200
Parsing MOM (cached): /home/ekisjay/moshell//jarxml/RNC_NODE_MODEL_M_1_200.xml.cache.gz Done.
.............
.
.
.
APRHY01> alt
131119-15:41:55 10.105.219.68 10.0b RNC_NODE_MODEL_M_1_200 stopfile=/tmp/27599
Connecting to 10.105.219.68:56834 (CorbaSecurity=OFF, corba_class=2, java=1.6.0_26, jacoms=R73D19, jacorb=R73D01)
Starting to retrieve active alarms
Nr of active alarms are: 3
APRHY01> strt
131119-15:41:58 10.105.219.68 10.0b RNC_NODE_MODEL_M_1_200 stopfile=/tmp/27599
Following 326 sites are up:
---------------------------------------------------------------------------------------------------------------------
MOD IUBLINK CELLNAMES CFRPHEM1 CFRPHEM2 CFRPHEM3 CFRPHEM4 CFRPHEM5 CFRPHEM6 ICDS TN ATMPORTS
---------------------------------------------------------------------------------------------------------------------
21 Iub_00023 UHYD494-X 111111 1 1 I
21 Iub_00032 UHY4100-X 111111 1 1 I
then for next server or node this repeats...
APRHY02> lt all
131119-15:44:51 10.105.219.4 10.0b stopfile=/tmp/2874
Checking MOM version...RNC_NODE_MODEL_M_1_200
Parsing MOM (cached): /home/ekisjay/moshell//jarxml/RNC_NODE_MODEL_M_1_200.xml.cache.gz Done.
Using paramfile /home/ekisjay/moshell//commonjars/pm/PARAM_RNC_M_1_50.txt
Parsing
file /home/ekisjay/moshell//commonjars/pm/PARAM_RNC_M_1_50.txt ...
I have to take few lines (according to the conditons that are said in the requirement) between every command for each node. I wrote a perl program in reading through line by line and stop at every line that matches a command like /[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\> and then retrieve the required lines between and upto the next command line and write it to another file. In the loop, my program actually skips one command in between and goes for the next command (1st, 3rd, 5th kind of...). Can anyone help me?

Perhaps the following will be helpful:
use strict;
use warnings;
my ( $fileName, $fh, $i );
while (<>) {
if ( !$fileName or $fileName ne $ARGV ) {
$fileName = $ARGV;
$i = 0;
}
if ( my ($cmd) = /^([A-Z]{5}\d{2}>.+)/ ) {
$cmd =~ s/\W+/_/g;
open $fh, '>', $cmd . '_' . ( sprintf '%05d', ++$i ) . '.txt' or die $!;
}
print $fh $_;
}
Command-line usage: >perl script.pl logFile1 [logFile2 .. logFileN]
The [ ] notation indicates optional, multiple files.
The script uses a regex to capture the command/server line, then substitutes 'non-word' characters with an underscore, and this plus a count plus .txt becomes the file name to which that block of command text is written. Thus, using your dataset, the following text files were created containing command content:
APRHY01_lt_all_00001.txt
APRHY01_alt_00002.txt
APRHY01_strt_00003.txt
APRHY02_lt_all_00001.txt
APRHY02_alt_00002.txt
APRHY02_strt_00003.txt
The count was inserted just in case the same command was issued more than once to the same server, as this number insures separate files for each.

You haven't showed us the code you're using to parse the file, so it's hard to say what might be wrong with it :-)
For breaking down multi-line log output like this, a good method is to loop through the file, appending lines to a block of text, until you find the first line of the next block -- then flush the block you've been appending and create a new one, starting with the current line.
my $block = "";
while (<>) {
if (/[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\>/) {
write_block($block) if $block;
$block = "";
}
$block .= $_;
}
write_block($block);

Your
code:
my $srcFile = "new.log";
my $destFile = "deviations.log";
my #grabbed = {};
my $line = "";
open (my $src, "$srcFile") or die "Could not open the log file $srcFile: $!";
open (my $dest, ">>$destFile") or die "Could not open the destination file $destFile: $!";
while ($line = <$src>)
{ if ($line =~ /[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\>/)
{ push #grabbed, "Deviations of the output of command: $line\n";
while ($line = <$src>)
{if ($line !~ /[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\>/)
{push #grabbed, $line;
}
else
{last;
} } }}
print $dest "\n#grabbed";
close $dest;
close $src;
when executing last on finding a new command line, goes to the outer while ($line = <$src>), thereby already reading the next line (the first output line of the command) and failing to recognize the start of the command. A simple fix is to omit the reading of a new line by labeling the outer loop and using redo instead of last:
LINE:
while ($line = <$src>)
{ if ($line =~ /[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\>/)
{ push #grabbed, "Deviations of the output of command: $line\n";
while ($line = <$src>)
{ if ($line !~ /[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\>/)
{ push #grabbed, $line }
else
{ redo LINE }
} } }

Read the last line of file with data in Perl

I have a text file to parse in Perl. I parse it from the start of file and get the data that is needed.
After all that is done I want to read the last line in the file with data. The problem is that the last two lines are blank. So how do I get the last line that holds any data?

If the file is relatively short, just read on from where you finished getting the data, keeping the last non-blank line:
use autodie ':io';
open(my $fh, '<', 'file_to_read.txt');
# get the data that is needed, then:
my $last_non_blank_line;
while (my $line = readline $fh) {
# choose one of the following two lines, depending what you meant
if ( $line =~ /\S/ ) { $last_non_blank_line = $line } # line isn't all whitespace
# if ( line !~ /^$/ ) { $last_non_blank_line = $line } # line has no characters before the newline
}
If the file is longer, or you may have passed the last non-blank line in your initial data gathering step, reopen it and read from the end:
my $backwards = File::ReadBackwards->new( 'file_to_read.txt' );
my $last_non_blank_line;
do {
$last_non_blank_line = $backwards->readline;
} until ! defined $last_non_blank_line || $last_non_blank_line =~ /\S/;

perl -e 'while (<>) { if ($_) {$last = $_;} } print $last;' < my_file.txt

You can use the module File::ReadBackwards in the following way:
use File::ReadBackwards ;
$bw = File::ReadBackwards->new('filepath') or
die "can't read file";
while( defined( $log_line = $bw->readline ) ) {
print $log_line ;
exit 0;
}
If they're blank, just check $log_line for a match with \n;

If the file is small, I would store it in an array and read from the end. If its large, use File::ReadBackwards module.

Here's my variant of command line perl solution:
perl -ne 'END {print $last} $last= $_ if /\S/' file.txt

No one mentioned Path::Tiny. If the file size is relativity small you can do this:
use Path::Tiny;
my $file = path($file_name);
my ($last_line) = $file->lines({count => -1});
CPAN page.
Just remember for the large file, just as #ysth said it's better to use File::ReadBackwards. The difference can be substantial.

sometimes it is more comfortable for me to run shell commands from perl code. so I'd prefer following code to resolve the case:
$result=`tail -n 1 /path/file`;

How do I use variables to do substitution in Perl?

I have several text files, that were once tables in a database, which is now disassembled. I'm trying to reassemble them, which will be easy, once I get them into a usable form. The first file, "keys.text" is just a list of labels, inconsistently formatted. Like:
Sa 1 #
Sa 2
U 328 #*
It's always letter(s), [space], number(s), [space], and sometime symbol(s). The text files that match these keys are the same, then followed by a line of text, also separated, or delimited, by a SPACE.
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
What I'm trying to do in the code below, is match the key from "keys.text", with the same key in the .txt files, and put a tab between the key, and the text. I'm sure I'm overlooking something very basic, but the result I'm getting, looks identical to the source .txt file.
Thanks in advance for any leads or assistance!
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
open(IN1, "keys.text");
my $key;
# Read each line one at a time
while ($key = <IN1>) {
# For each txt file in the current directory
foreach my $file (<*.txt>) {
open(IN, $file) or die("Cannot open TXT file for reading: $!");
open(OUT, ">temp.txt") or die("Cannot open output file: $!");
# Add temp modified file into directory
my $newFilename = "modified\/keyed_" . $file;
my $line;
# Read each line one at a time
while ($line = <IN>) {
$line =~ s/"\$key"/"\$key" . "\/t"/;
print(OUT "$line");
}
rename("temp.txt", "$newFilename");
}
}
EDIT: Just to clarify, the results should retain the symbols from the keys as well, if there are any. So they'd look like:
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...

The regex seems quoted rather oddly to me. Wouldn't
$line =~ s/$key/$key\t/;
work better?
Also, IIRC, <IN1> will leave the newline on the end of your $key. chomp $key to get rid of that.
And don't put parentheses around your print args, esp when you're writing to a file handle. It looks wrong, whether it is or not, and distracts people from the real problems.

if Perl is not a must, you can use this awk one liner
$ cat keys.txt
Sa 1 #
Sa 2
U 328 #*
$ cat mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
$ awk 'FNR==NR{ k[$1 SEP $2];next }($1 SEP $2 in k) {$2=$2"\t"}1 ' keys.txt mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...

Using split rather than s/// makes the problem straightforward. In the code below, read_keys extracts the keys from keys.text and records them in a hash.
Then for all files named on the command line, available in the special Perl array #ARGV, we inspect each line to see whether it begins with a key. If not, we leave it alone, but otherwise insert a TAB between the key and the text.
Note that we edit the files in-place thanks to Perl's handy -i option:
-i[extension]
specifies that files processed by the <> construct are to be edited in-place. It does this by renaming the input file, opening the output file by the original name, and selecting that output file as the default for print statements. The extension, if supplied, is used to modify the name of the old file to make a backup copy …
The line split " ", $_, 3 separates the current line into exactly three fields. This is necessary to protect whitespace that's likely to be present in the text portion of the line.
#! /usr/bin/perl -i.bak
use warnings;
use strict;
sub usage { "Usage: $0 text-file\n" }
sub read_keys {
my $path = "keys.text";
open my $fh, "<", $path
or die "$0: open $path: $!";
my %key;
while (<$fh>) {
my($text,$num) = split;
++$key{$text}{$num} if defined $text && defined $num;
}
wantarray ? %key : \%key;
}
die usage unless #ARGV;
my %key = read_keys;
while (<>) {
my($text,$num,$line) = split " ", $_, 3;
$_ = "$text $num\t$line" if defined $text &&
defined $num &&
$key{$text}{$num};
print;
}
Sample run:
$ ./add-tab input
$ diff -u input.bak input
--- input.bak 2010-07-20 20:47:38.688916978 -0500
+++ input 2010-07-20 21:00:21.119531937 -0500
## -1,3 +1,3 ##
-Sa 1 # Random line of text follows.
-Sa 2 This text is just as random.
-U 328 #* Continuing text...
+Sa 1 # Random line of text follows.
+Sa 2 This text is just as random.
+U 328 #* Continuing text...

Fun answers:
$line =~ s/(?<=$key)/\t/;
Where (?<=XXXX) is a zero-width positive lookbehind for XXXX. That means it matches just after XXXX without being part of the match that gets substituted.
And:
$line =~ s/$key/$key . "\t"/e;
Where the /e flag at the end means to do one eval of what's in the second half of the s/// before filling it in.
Important note: I'm not recommending either of these, they obfuscate the program. But they're interesting. :-)

How about doing two separate slurps of each file. For the first file you open the keys and create a preliminary hash. For the second file then all you need to do is add the text to the hash.
use strict;
use warnings;
my $keys_file = "path to keys.txt";
my $content_file = "path to content.txt";
my $output_file = "path to output.txt";
my %hash = ();
my $keys_regex = '^([a-zA-Z]+)\s*\(d+)\s*([^\da-zA-Z\s]+)';
open my $fh, '<', $keys_file or die "could not open $key_file";
while(<$fh>){
my $line = $_;
if ($line =~ /$keys_regex/){
my $key = $1;
my $number = $2;
my $symbol = $3;
$hash{$key}{'number'} = $number;
$hash{$key}{'symbol'} = $symbol;
}
}
close $fh;
open my $fh, '<', $content_file or die "could not open $content_file";
while(<$fh>){
my $line = $_;
if ($line =~ /^([a-zA-Z]+)/){
my $key = $1;
// strip content_file line from keys/number/symbols to leave text
line =~ s/^$key//;
line =~ s/\s*$hash{$key}{'number'}//;
line =~ s/\s*$hash{$key}{'symbol'}//;
$line =~ s/^\s+//g;
$hash{$key}{'text'} = $line;
}
}
close $fh;
open my $fh, '>', $output_file or die "could not open $output_file";
for my $key (keys %hash){
print $fh $key . " " . $hash{$key}{'number'} . " " . $hash{$key}{'symbol'} . "\t" . $hash{$key}{'text'} . "\n";
}
close $fh;
I haven't had a chance to test it yet and the solution seems a little hacky with all the regex but might give you an idea of something else you can try.

This looks like the perfect place for the map function in Perl! Read in the entire text file into an array, then apply the map function across the entire array. The only other thing you might want to do is use the quotemeta function to escape out any possible regular expressions in your keys.
Using map is very efficient. I also read the keys into an array in order to not have to keep opening and closing the keys file in my loop. It's an O^2 algorithm, but if your keys aren't that big, it shouldn't be too bad.
#! /usr/bin/env perl
use strict;
use vars;
use warnings;
open (KEYS, "keys.text")
or die "Cannot open 'keys.text' for reading\n";
my #keys = <KEYS>;
close (KEYS);
foreach my $file (glob("*.txt")) {
open (TEXT, "$file")
or die "Cannot open '$file' for reading\n";
my #textArray = <TEXT>;
close (TEXT);
foreach my $line (#keys) {
chomp $line;
map($_ =~ s/^$line/$line\t/, #textArray);
}
open (NEW_TEXT, ">$file.new") or
die qq(Can't open file "$file" for writing\n);
print TEXT join("\n", #textArray) . "\n";
close (TEXT);
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl - prepend/remove line numbers to a file - perl

Related

Split file Perl

How to print lines from log file which occurs after some particular time

Perl - reading through command output file line by line

Read the last line of file with data in Perl

How do I use variables to do substitution in Perl?

Categories

Resources