How to insert a line into the middle of an existing file - perl

consider an example where i want to insert few lines of text when
particular patter matches(if $line=~m/few lines in here/ then
insert lines in next line):
*current file:*
"This is my file and i wanna insert few lines in here and other
text of the file will continue."
*After insertion:*
"This is my file and i wanna insert few lines in here this is my
new text which i wanted to insert and other text of the file will
continue."
This is my code:
my $sourcename = $ARGV[1];
my $destname = $ARGV[0];
print $sourcename,"\n";
print $destname,"\n";
my $source_excel = new Spreadsheet::ParseExcel;
my $source_book = $source_excel->Parse($sourcename) or die "Could not open source Excel file $sourcename: $!";
my $source_cell;
#Sheet 1 - source sheet page having testnumber and worksheet number
my $source_sheet = $source_book->{Worksheet}[0]; #It is used to access worksheet
$source_cell = $source_sheet->{Cells}[1][0]; #Reads content of the cell;
my $seleniumHost = $source_cell->Value;
print $seleniumHost,"\n";
open (F, '+>>',"$destname") or die "Couldn't open `$destname': $!";
my $line;
while ($line = <F>){
print $line;
if($line=~m/FTP/){
#next if /FTP/;
print $line;
print F $seleniumHost;}

The perlfaq covers this. How do I change, delete, or insert a line in a file, or append to the beginning of a file?
Files are fixed blocks of data. They behave much like a piece of paper. How do you insert a line into the middle of a piece of paper? You can't, not unless you left space. You must recopy the whole thing, inserting your line into the new copy.

In a perl one-liner :
perl -ane 's/few lines in here and other\n/this is my\nnew text which i wanted to insert and other /; s/continue./\ncontinue./; print ' FILE
If you don't want a one-liner, it's easy to takes the substitutions in any script ;)

As long as you know the line:
perl -ne 'if ($. == 8) {s//THIS IS NEW!!!\n/}; print;'
Obviously you'd have to use -i to make the actual changes
OR:
perl -i -pe 'if($. == 8) {s//THIS IS NEW!!!\n/}' file

Someone mentioned Tie::File, which is a solution I'll have to look at for editing a file, but I generally use File::Slurp, which has relatively recently added edit_file and edit_file_lines subs.

Using perl's in-place edit flag (-i), it's easy to add lines to an existing file using Perl, as long as you can key off a text string, such as (in your case) "wanna insert few lines in here":
perl -pi -e 's{wanna insert few lines in here}{wanna insert few lines in here this is my\nnew text which i wanted to insert }' filename
It overwrites your old sentence (don't be scared) with a copy of your old sentence (nothing lost) plus the new stuff you want injected. You can even create a backup of the original file if you wish by passing a ".backup" extension to the -i flag:
perl -p -i'.backup' -e 's{wanna insert few lines in here}{wanna insert few lines in here this is my\nnew text which i wanted to insert }' filename
More info on Perl's search & replace capabilities can be found here:
http://www.atrixnet.com/in-line-search-and-replace-in-files-with-real-perl-regular-expressions/

You can avoid having to repeat the "markup" text using variable substitution.
echo -e "first line\nthird line" | perl -pe 's/(^first line$)/\1\nsecond line/'

Related

How to split a file (with sed) into numerous files according to a value found on each line?

I have several Company_***.csv files (altough the separator's a tab not a comma; hence should be *.tsv, but never mind) which contains a header plus numerous data lines e.g
1stHeader 2ndHeader DateHeader OtherHeaders...
111111111 SOME STRING 2020-08-01 OTHER STRINGS..
222222222 ANOT STRING 2020-08-02 OTHER STRINGS..
I have to split them according to the 3rd column here, it's a date.
Each file should be named like e.g. Company_2020_08_01.csv Company_2020_08_02.csv & so one
and containing: same header on the 1st line + matching rows as the following lines.
At first I thought about saving (once) the header in a single file e.g.
sed -n '1w Company_header.csv' Company_*.csv
then parsing the files with a pattern for the date (hence the headers would be skipped) e.g.
sed -n '/\t2020-[01][0-9]-[0-3][0-9]\t/w somefilename.csv' Company_*.csv
... and at last, insert the (missing) header in each generated file.
But I'm stuck at step 2: I can't find how I could generate (dynamically) the "filename" expected by the w command, neither how to capture the date in the search pattern (because apparently this is just an address, not a search-replace "field" as in the s/regexp/replacement/[flags] command, so you can't have capturing groups ( ) in there).
So I wonder if this is actually doable with sed? Or should I look upon other tools e.g. awk?
Disclaimer: I'm quite a n00b with these commands so I'm just learning/starting from scratch...
Perl to the rescue!
perl -e 'while (<>) {
$h = $_, next if $. == 1;
$. = 0 if eof;
#c = split /\t/;
open my $out, ">>", "Company_" . $c[2] =~ tr/-/_/r . ".csv" or die $!;
print {$out} $h unless tell $out;
print {$out} $_;
}' -- Company_*.csv
The diamond operator <> in scalar context reads a line from the input.
The first line of each file is stored in the variable $h, see $. and eof
split populates the #c array by the column values for each line
$c[2] contains the date, using tr we translate dashes to underscores to create a filename from it. open opens the file for appending.
print prints the header if the file is empty (see tell)
and prints the current line, too.
Note that it only appends to the files, so don't forget to delete any output files before running the script again.

Deleting a line from a huge file in Perl

I have huge text file and first five lines of it reads as below :
This is fist line
This is second line
This is third line
This is fourth line
This is fifth line
Now, I want to write something at a random position of the third line of that file which will replace the characters in that line by the new string I am writing. I am able to achieve that with the below code :
use strict;
use warnings;
my #pos = (0);
open my $fh, "+<", "text.txt";
while(<$fh) {
push #pos, tell($fh);
}
seek $fh , $pos[2]+1, 0;
print $fh "HELLO";
close($fh);
However, I am not able to figure out with the same kind of approach how can I delete the entire third line from that file so that the texts reads below :
This is fist line
This is second line
This is fourth line
This is fifth line
I do not want to read the entire file into an array, neither do I want to use Tie::File. Is it possible to achieve my requirement using seek and tell ? A solution will be very helpful.
A file is a sequence of bytes. We can replace (overwrite) some of them, but how would we remove them? Once a file is written its bytes cannot be 'pulled out' of the sequence or 'blanked' in any way. (The ones at the end of the file can be dismissed, by truncating the file as needed.)
The rest of the content has to move 'up', so that what follows the text to be removed overwrites it. We have to rewrite the rest of the file. In practice it is often far simpler to rewrite the whole file.
As a very basic example
use warnings 'all';
use strict;
use File::Copy qw(move);
my $file_in = '...';
my $file_out = '...'; # best use `File::Temp`
open my $fh_in, '<', $file_in or die "Can't open $file_in: $!";
open my $fh_out, '>', $file_out or die "Can't open $file_out: $!";
# Remove a line with $pattern
my $pattern = qr/this line goes/;
while (<$fh_in>)
{
print $fh_out $_ unless /$pattern/;
}
close $fh_in;
close $fh_out;
# Rename the new fie into the original one, thus replacing it
move ($file_out, $file_in) or die "Can't move $file_out to $file_in: $!";
This writes every line of input file into the output file, unless a line matches a given pattern. Then that file is renamed, replacing the original (what does not involve data copy). See this topic in perlfaq5.
Since we really use a temporary file I'd recommend the core module File::Temp for that.
This may be made more efficient, but far more complicated, by opening in update '+<' mode so to overwrite only a portion of the file. You iterate until the line with the pattern, record (tell) its position and the line length, then copy all remaining lines in memory. Then seek back to the position minus length of that line, and dump the copied rest of the file, overwriting the line and all that follows it.
Note that now the data for the rest of the file is copied twice, albeit one copy is in memory. Going to this trouble may make sense if the line to be removed is far down a very large file. If there are more lines to remove this gets messier.
Writing out a new file and copying it over the original changes the file's inode number. That may be a problem for some tools or procedures, and if it is you can instead update the original by either
Once the new file is written out, open it for reading and open the original for writing. This clobbers the original file. Then read from the new file and write to the original one, thus copying the content back to the same inode. Remove the new file when done.
Open the original file in read-write mode ('+<') to start with. Once the new file is written, seek to the beginning of the original (or to the place from which to overwrite) and write to it the content of the new file. Remember to also set the end-of-file if the new file is shorter,
truncate $fh, tell($fh);
after copying is done. This requires some care and the first way is probably generally safer.
If the file weren't huge the new "file" can be "written" in memory, as an array or a string.
Use sed command from Linux command line in Perl:
my $return = `sed -i '3d' text.txt`;
Where "3d" means delete the 3rd row.
It is useful to look at perlrun and see how perl itself modifies a file 'in-place.'
Given:
$ cat text.txt
This is fist line
This is second line
This is third line
This is fourth line
This is fifth line
You can apparently 'modify in-place', sed like, by using the -i and -p switch to invoke Perl:
$ perl -i -pe 's/This is third line\s*//' text.txt
$ cat text.txt
This is fist line
This is second line
This is fourth line
This is fifth line
But if you consult the Perl Cookbook recipe 7.9 (or look at perlrun) you will see that this:
$ perl -i -pe 's/This is third line\s*//' text.txt
is equivalent to:
while (<>) {
if ($ARGV ne $oldargv) { # are we at the next file?
rename($ARGV, $ARGV . '.bak');
open(ARGVOUT, ">$ARGV"); # plus error check
select(ARGVOUT);
$oldargv = $ARGV;
}
s/This is third line\s*//;
}
continue{
print;
}
select (STDOUT); # restore default output

Change line in textfile using perl

I read other places on how to do this but they were confusing for me.
I want to read lines from a text file and when I come across a certain line I want to append something to it.
My code is:
open my $p, "$username_filename" or die "can not open $username_filename: $!";
foreach $line (<$p>){
if ($line =~ /^listen/){
`echo "whatever" >> $username_file`;
}
}
However when I run this I get this error
sh: -c: line 0: syntax error near unexpected token `newline' sh: -c: line 0: `echo "current_user" >> '
Is this way correct to edit the file and why am I getting this error?
Working with files is not like editing in a word processor. Lines are an illusion, a file is just a big string of characters. You can't change a line in the middle of a file for the same reason you can't change a line in the middle of a book, the words can't be moved around to make room.
Instead, like a book, if you want to change something you need to rewrite the whole thing.
The basic algorithm is to...
Open the file for reading.
Open a temporary file for writing.
Read a line, alter the line, write the line.
Repeat 3 until done reading.
Overwrite the file with the temp file.
Some other notes...
print writes to STDOUT by default, but you can give it a filehandle to write to instead.
foreach my $line (<$fh>) is unfortunately not optimized to read files. It will read the possibly enormous file into memory. while(my $line = <$fh>) reads one line at a time.
I've turned on strict. This forces you to declare your variables. It protects you from typos like the one you made of $username_file vs $username_filename.
You could use something like "$filename.tmp" but File::Temp provides temp files that are guaranteed to be temporary, unique and cleaned up when the program exits.
use strict;
use warnings;
use autodie; # because writing 'or die' gets old fast
use File::Temp; # provides safe temp files
my $filename = ...; # set it somehow
open my $read, "<", $filename;
my $temp = File::Temp->new;
while(my $line = <$read>) {
if( $line =~ /^listen/ ) {
chomp $line; # remove the newline
$line .= " whatever\n"; # add our content and put a newline back
}
# Write the line to the temp file
print $temp $line;
}
# Overwrite our file with the rewritten temp file
rename $temp->filename, $filename;
That's inside a program. If you just want to do it quickly, you can do it on the command line with -i and -p.
perl -i.bak -pe 'if( /^listen/ ) { chomp; $_ .= "whatever" }' filename
-p says to run the code on each line of the file. The line will be put into $_ and whatever is in $_ will be printed. -i says to edit the file in place. -i.bak makes a backup of the original file just in case you make a mistake.
There are a few problems with your attempt. The big one is that using echo >> file will append to the file, not insert at some arbitrary place inside the file.
Another problem is that you're trying to append to a file called $username_file, and you haven't declared or defined that variable.
I don't think perl lets you insert into the middle of a file. I think your best bet would be to read the file a line at a time, and on the correct line(s), append the text you want. Write each line to a new file, then swap the files around at the end.
For example:
#!/usr/bin/perl
my $in_filename = "in.txt";
my $out_filename = "out.txt";
open (my $in, "<", $in_filename) or die;
open (my $out, ">", $out_filename) or die;
while (my $lline = <$in>)
{
chomp $lline;
if ( $lline =~ /listen/ )
{
print "$lline whatever\n";
}
else
{
print "$lline\n";
}
}
close $in;
close $out;
rename $in_filename, "$in_filename.original";
rename $out_filename, $in_filename;
I use chomp to remove line endings, because <$in> gives us a line including its line endings, wish otherwise messes up the append.
As always there are many ways to achieve this. I think using sed is probably a better option for this, but you specifically asked how to do it in perl, so perl it is.

Counting records separated by CR/LF (carriage return and newline) in Perl

I'm trying to create a simple script to read a text file that contains records of book titles. Each record is separated with a plain old double space (\r\n\r\n). I need to count how many records are in the file.
For example here is the input file:
record 1
some text
record 2
some text
...
I'm using a regex to check for carriage return and newline, but it fails to match. What am I doing wrong? I'm at my wits' end.
sub readInputFile {
my $inputFile = $_[0]; #read first argument from the commandline as fileName
open INPUTFILE, "+<", $inputFile or die $!; #Open File
my $singleLine;
my #singleRecord;
my $recordCounter = 0;
while (<INPUTFILE>) { # loop through the input file line-by-line
$singleLine = $_;
push(#singleRecord, $singleLine); # start adding each line to a record array
if ($singleLine =~ m/\r\n/) { # check for carriage return and new line
$recordCounter += 1;
createHashTable(#singleRecord); # send record make a hash table
#singleRecord = (); # empty the current record to start a new record
}
}
print "total records : $recordCounter \n";
close(INPUTFILE);
}
It sounds like you are processing a Windows text file on Linux, in which case you want to open the file with the :crlf layer, which will convert all CRLF line-endings to the standard Perl \n ending.
If you are reading Windows files on a Windows platform then the conversion is already done for you, and you won't find CRLF sequences in the data you have read. If you are reading a Linux file then there are no CR characters in there anyway.
It also sounds like your records are separated by a blank line. Setting the built-in input record separator variable $/ to a null string will cause Perl to read a whole record at a time.
I believe this version of your subroutine is what you need. Note that people familiar with Perl will thank you for using lower-case letters and underscore for variables and subroutine names. Mixed case is conventionally reserved for package names.
You don't show create_hash_table so I can't tell what data it needs. I have chomped and split the record into lines, and passed a list of the lines in the record with the newlines removed. It would probably be better to pass the entire record as a single string and leave create_hash_table to process it as required.
sub read_input_file {
my ($input_file) = #_;
open my $fh, '<:crlf', $input_file or die $!;
local $/ = '';
my $record_counter = 0;
while (my $record = <$fh>) {
chomp;
++$record_counter;
create_hash_table(split /\n/, $record);
}
close $fh;
print "Total records : $record_counter\n";
}
You can do this more succinctly by changing Perl's record-separator, which will make the loop return a record at a time instead of a line at a time.
E.g. after opening your file:
local $/ = "\r\n\r\n";
my $recordCounter = 0;
$recordCounter++ while(<INPUTFILE>);
$/ holds Perl's global record-separator, and scoping it with local allows you to override its value temporarily until the end of the enclosing block, when it will automatically revert back to its previous value.
But it sounds like the file you're processing may actually have "\n\n" record-separators, or even "\r\r". You'd need to set the record-separator correctly for whatever file you're processing.
If your files are not huge multi-gigabytes files, the easiest and safest way is to read the whole file, and use the generic newline metacharacter \R.
This way, it also works if some file actually uses LF instead of CRLF (or even the old Mac standard CR).
Use it with split if you also need the actual records:
perl -ln -0777 -e 'my #records = split /\R\R/; print scalar(#records)' $Your_File
Or if you only want to count the records:
perl -ln -0777 -e 'my $count=()=/\R\R/g; print $count' $Your_File
For more details, see also my other answer here to a similar question.

How can I extract a line or row?

How can I extract the whole line in a row, for example, row 3.
These data are saved in my text editor in linux.
Here's my data:
1,julz,kath,shiela,angel
2,may,ann,janice,aika
3,christal,justine,kim
4,kris,allan,jc,mine
I want output like:
3,christal,justine,kim
The following snippet reads in the first three lines, prints only the third then exits to ensure that no unnecessary processing takes place.
Without the exit, the script would continue to process the input file despite you knowing that you have no use for it.
perl -ne 'if ($. == 3) {print;exit}' infile.txt
As perlvar points out, $. is the current line number for the last file handle accessed.
$ perl -ne'print if $. == 3' your_file.txt
Below is a script version of #ysth's answer:
$ perl -mTie::File -e'tie #lines, q(Tie::File), q(your_file.txt);
> print $lines[2]'
If it's always the third line:
perl -ne 'print if 3..3' <infile >outfile
If it's always the one that has a numeric value of "3" as the first column:
perl -F, -nae 'print if $F[0] == 3' <infile >outfile # thanks for the comment doh!
Since you didn't say how you were identifying that line, I am providing alternatives.
For a more general solution:
open my $fh, '<', 'infile.txt';
while (my $line = <$fh>) {
print $line if i_want_this_line($line);
}
where i_want_this_line implements the criteria defining which line(s) you want.
Um, the -n answers are assuming the question is "what is a script that...". In which case, perl isn't even the best answer. But I don't read that into the question.
In general, if the lines are not of fixed length, you have to read through a file line by
line until you get to the line you want. Tie::File automates this process for you (though since the code it would replace is so trivial, I rarely bother with it, myself).
use Tie::File;
use Fcntl "O_RDONLY";
tie my #line, "Tie::File", "yourfilename", mode => O_RDONLY
or die "Couldn't open file: $!";
print "The third line is ", $line[2];
You can assign the diamond operator on your filehandle to a list, each element will be a line or row.
open $fh, "myfile.txt";
my #lines = <$fh>;
EDIT: This solution grabs all the lines so that you can access any one you want, e.g. row 3 would be $lines[2] ... If you really only want one specific line, that'd be a different solution, like the other answerers'.