i want to merge multiple csv files by specific condition using perl - perl

i have multiple csv files, i want to merge all those files.....
i am showing some of my sample csv files below...
M1DL1_Interpro_sum.csv
IPR017690,Outer membrane, omp85 target,821
IPR014729,Rossmann,327
IPR013785,Aldolase,304
IPR015421,Pyridoxal,224
IPR003594,ATPase,179
IPR000531,TonB receptor,150
IPR018248,EF-hand,10
M1DL2_Interpro_sum.csv
IPR017690,Outer membrane, omp85 target,728
IPR013785,Aldolase,300
IPR014729,Rossmann,261
IPR015421,Pyridoxal,189
IPR011991,Winged,113
IPR000873,AMP-dependent synthetase/ligase,111
M1DL3_Interpro_sum.csv
IPR017690,Outer membrane,905
IPR013785,Aldolase,367
IPR014729,Rossmann,338
IPR015421,Pyridoxal,271
IPR003594,ATPase,158
IPR018248,EF-hand,3
now to merge these files i have tried the following code
#ARGV = <merge_csvfiles/*.csv>;
print #ARGV[0],"\n";
open(PAGE,">outfile.csv") || die"Can't open outfile.csv\n";
while($i<scalar(#ARGV))
{
open(FILE,#ARGV[$i]) || die"Can't open ...#ARGV[$i]...\n";
$data.=join("",<FILE>);
close FILE;
print"file completed...",$i+1,"\n";
$i++;
}
#data=split("\n",$data);
#data2=#data;
print scalar(#data);
for($i=0;$i<scalar(#data);$i++)
{
#id1=split(",",#data[$i]);
$id_1=#id1[0];
#data[$j]=~s/\n//;
if(#data[$i] ne "")
{
print PAGE "\n#data[$i],";
for($j=$i+1;$j<scalar(#data2);$j++)
{
#id2=split(",",#data2[$j]);
$id_2=#id2[0];
if($id_1 eq $id_2)
{
#data[$j]=~s/\n//;
print PAGE "#data2[$j],";
#data2[$j]="";
#data[$j]="";
print "match found at ",$i+1," and ",$j+1,"\n";
}
}
}
print $i+1,"\n";
}
merge_csvfiles is a folder which contains all the files
output of above code is
IPR017690,Outer membrane,821,IPR017690,Outer membrane ,728,IPR017690,Outer membrane,905
IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338
IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367
IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271
IPR003594,ATPase,179,IPR003594,ATPase,158
IPR000531,TonB receptor,150
IPR018248,EF-hand,10,IPR018248,EF-hand,3
IPR011991,Winged,113
IPR000873,AMP-dependent synthetase/ligase
but i want the output in following format....
IPR017690,Outer membrane,821,IPR017690,Outer membrane ,728,IPR017690,Outer membrane,905
IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338
IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367
IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271
IPR003594,ATPase,179,0,0,0,IPR003594,ATPase,158
IPR000531,TonB receptor,150,0,0,0,0,0,0
IPR018248,EF-hand,10,0,0,0,IPR018248,EF-hand,3
0,0,0,IPR011991,Winged,113,0,0,0
0,0,0,IPR000873,AMP-dependent synthetase/ligase,111,0,0,0
Has anybody got any idea how can i do this?
Thank you for the help

As mentioned in Miguel Prz's comment, you haven't explained how you want the merge to be performed, but, judging by the "desired output" sample, it appears that what you want is to concatenate lines with matching IDs from all three input files into a single line in the output file, with "0,0,0" taking the place of any lines which don't appear in a given file.
So, then:
#!/usr/bin/env perl
use strict;
use warnings;
my #input_files = glob 'merge_csvfiles/*.csv';
my %data;
for my $i (0 .. $#input_files) {
open my $infh, '<', $input_files[$i]
or die "Failed to open $input_files[$i]: $!";
while (<$infh>) {
chomp;
my $id = (split ',', $_, 2)[0];
$data{$id}[$i] = $_;
}
print "Input file read: $input_files[$i]\n";
}
open my $outfh, '>', 'outfile.csv' or die "Failed to open outfile.csv: $!";
for my $id (sort keys %data) {
my #merge_data;
for my $i (0 .. $#input_files) {
push #merge_data, $data{$id}[$i] || '0,0,0';
}
print $outfh join(',', #merge_data) . "\n";
}
The first loop collects all the lines from each file into a hash of arrays. The hash keys are the IDs, so the lines for that ID from all files are kept together, and the value for each key is (a reference to) an array of the line associated with that ID in each file; using an array for this allows us to keep track of values which are missing as well as those which are present.
The second loop then takes the keys of that hash (in alphabetical order) and, for each one, creates a temporary array of the values associated with that ID, substituting "0,0,0" for missing values, joins them into a single string, and prints that to the output file.
The results, in outfile.csv, are:
IPR000531,TonB receptor,150,0,0,0,0,0,0
0,0,0,IPR000873,AMP-dependent synthetase/ligase,111,0,0,0
IPR003594,ATPase,179,0,0,0,IPR003594,ATPase,158
0,0,0,IPR011991,Winged,113,0,0,0
IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367
IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338
IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271
IPR017690,Outer membrane, omp85 target,821,IPR017690,Outer membrane, omp85 target,728,IPR017690,Outer membrane,905
IPR018248,EF-hand,10,0,0,0,IPR018248,EF-hand,3
Edit: Added explanations requested by OP in comments
can u expalain me the working of my $id = (split ',', $_, 2)[0]; and $# in this program
my $id = (split ',', $_, 2)[0]; gets the text prior to the first comma in the last line of text that was read:
Because I didn't specify what variable to put the data in, while (<$infh>) reads it into the default variable $_.
split ',', $_, 2 splits up the value of $_ into a list of comma-separated fields. The 2 at the end tells it to only produce at most 2 fields; the code will work fine without the 2, but, since I only need the first field, splitting into more parts isn't necessary.
Putting (...)[0] around the split command turns the returned list of fields into an (anonymous) array and returns the first element of that array. It's the same as if I'd written my #fields = split ',', $_, 2; my $id = $fields[0];, but shorter and without the extra variable.
$#array returns the highest-numbered index in the array #array, so for my $i (0 .. $#array) just means "loop over the indexes for all elements in #array". (Note that, if I hadn't needed the value of the index counter, I would have instead looped over the array's data directly, by using for my $filename (#input_files), but it would have been less convenient to keep track of the missing values if I'd done it that way.)

Related

Program argument is 100 but returns the value as 0100

Right now I am trying to do an assignment where I have to
- Extract information from an HTML file
- Save it to a scalar
- Run a regular expression to find the number of seats available in the designated course (the program argument is the course number for example 100 for ICS 100)
- If the course has multiple sessions, I have to find the sum of the seats available and print
- The output is just the number of seats available
The problem here is that when I was debugging and checking to make sure that my variable I have the program arg saved to was storing the correct value, it was storing the values with an extra 0 behind it.
ex.) perl filename.pl 100
ARGV[0] returns as 0100
I've tried storing the True regular expression values to an array, saving using multiple scalar variables, and changing my regular expression but none worked.
die "Usage: perl NameHere_seats.pl course_number" if (#ARGV < 1);
# This variable will store the .html file contents
my $fileContents;
# This variable will store the sum of seats available in the array #seatAvailable
my $sum = 0;
# This variable will store the program argument
my $courseNum = $ARGV[0];
# Open the file to read contents all at once
open (my $fh, "<", "fa19_ics_class_availability.html") or die ("Couldn't open 'fa19_ics_class_availability.html'\n");
# use naked brakets to limit the $/
{
#use local $/ to get <$fh> to read the whole file, and not one line
local $/;
$fileContents = <$fh>;
}
# Close the file handle
close $fh;
# Uncomment the line below to check if you've successfully extracted the text
# print $fileContents;
# Check if the course exists
die "No courses matched...\n" if ($ARGV[0] !~ m/\b(1[0-9]{2}[A-Z]?|2[0-8][0-9][A-Z]?|29[0-3])[A-Z]?\b/);
while ($fileContents =~ m/$courseNum(.+?)align="center">(\d)</) {
my $num = $2;
$sum = $sum + $num;
}
print $sum;
# Use this line as error checking to make sure #ARGV[0] is storing proper number
print $courseNum;
The current output I am receiving when program argument is 100 is just 0, and I assume it's because the regular expression is not catching any values as true therefore the sum remains at a value of 0. The output should be 15...
This is a link to the .html page > https://laulima.hawaii.edu/access/content/user/emeyer/ics/215/FA19/01/perl/fa19_ics_class_availability.html
You're getting "0100" because you have two print() statements.
print $sum;
...
print $courseNum;
And because there are no newlines or other output between them, you get the two values printed out next to each other. $sum is '0' and $courseNum is '100'.
So why is $sum zero? Well, that's because your regex isn't picking up the data you want it to match. Your regex looks like this:
m/$courseNum(.+?)align="center">(\d)</
You're looking for $courseNum followed by a number of other characters, followed by 'align="center">' and then your digit. This doesn't work for a number of reasons.
The string "100" appears many times in your text. Many times it doesn't even mean a course number (e.g. "100%"). Perhaps you should look for something more precise (ICS $coursenum).
The .+? doesn't do what you think it does. The dot doesn't match newline characters unless you use the /s option on the match operator.
But even if you fix those first two problems, it still won't work as there are a number of numeric table cells for each course and you're doing nothing to ensure that you're grabbing the last one. Your current code will get the "Curr. Enrolled" column, not the "Seats Avail" one.
This is a non-trivial HTML parsing problem. It shouldn't be addressed using regexes (HTML should never be parsed using regexes). You should look at one of the HTML parsing modules from CPAN - I think I'd use Web::Query.
Update: An example solution using Web::Query:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use File::Basename;
use Web::Query;
my $course_num = shift
or die 'Usage: perl ' . basename $0 . " course_number\n";
my $source = 'fa19_ics_class_availability.html';
open my $fh, '<', $source
or die "Cannot open '$source': $!\n";
my $html = do { local $/; <$fh> };
my $count_free;
wq($html)
# Get each table row in the table
->find('table.listOfClasses tr')
->each(sub {
my ($i, $elem) = #_;
my #tds;
# Get each <td> in the <tr>
$elem->find('td')->each(sub { push #tds, $_[1] });
# Ignore rows that don't have 13 columns
return if #tds != 13;
# Ignore rows that aren't about the right course
return if $tds[2]->text ne "ICS $course_num";
# Add the number of available places
$count_free += $tds[8]->text;
});
say $count_free;

Appending values to Hash if key is same in Perl

Problem is to read a file with value at every new line. Content of file looks like
3ssdwyeim3,3ssdwyeic9,2017-03-16,09:10:35.372,0.476,EndInbound
3ssdwyeim3,3ssdwyfyyn,2017-03-16,09:10:35.369,0.421,EndOutbound
3ssdwyfxc0,3ssdwyfxfi,2017-03-16,09:10:35.456,0.509,EndInbound
3ssdwyfxc0,3ssdwyhg0v,2017-03-16,09:10:35.453,0.436,EndOutbound
With the string before first comma being the Key and string in between last and second last comma the Value
i.e. for the first line 3ssdwyeim3 becomes the key and 0.476 Value.
Now as we are looping over each line if the key exists we have to concatenate the values separated by comma.
Hence for the next new line as key already exists key remains 3ssdwyeim3 but the value is updated to 0.476,0.421.
Finally we have to print the keys and values in a file.
I have written a code to achieve the same, which is as follows.
sub findbreakdown {
my ( $out ) = #_;
my %timeLogger;
open READ, "out.txt" or die "Cannot open out.txt for read :$!";
open OUTBD, ">$out\_breakdown.csv" or die "Cannot open $out\_breakdown.csv for write :$!";
while ( <READ> ) {
if ( /(.*),.*,.*,.*,(.*),.*/ ) {
$btxnId = $1;
$time = $2;
if ( !$timeLogger{$btxnId} ) {
$timeLogger{$btxnId} = $time;
}
else {
$previousValue = $timeLogger{$btxnId};
$newValue = join ",", $previousValue, $time;
$timeLogger{$btxnId} = $newValue;
}
}
foreach ( sort keys %timeLogger ) {
print OUTBD "$_ ,$timeLogger{$_}\n";
}
}
close OUTBD;
close READ;
}
However Something is going wrong and its printing like this
3ssdwyeim3,0.476
3ssdwyeim3,0.476,0.421
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
Whereas expected is:
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
Your program is behaving correctly, but you are printing the current state of the entire hash after you process each line.
Therefore you are printing hash keys before they have the complete set of values, and you have many duplicated lines.
If you move the foreach loop that prints to the end of your program (or simply use the debugger to inspect the variables) you will find that the final state of the hash is exactly what you expect.
Edit: I previously thought the problem was the below, but it's because I misread the sample data in your question.
This regular expression is not ideal:
if (/(.*),.*,.*,.*,(.*),.*/) {
The .* is greedy and will match as much as possible (including some content with commas). So if any line contains more than six comma-separated items, more than one item will be included in the first matching group. This may not be a problem in your actual data, but it's not an ideal way to write the code. The expression is more ambiguous than necessary.
It would be better written like this:
if (/^([^,]*),[^,]*,[^,]*,[^,]*,([^,]*),[^,]*$/) {
Which would only match lines with exactly six items.
Or consider using split on the input line, which would be a cleaner solution.
This is much simpler than you have made it. You can just split each line into fields and use push to add the value to the list corresponding to the key
I trust you can modify this to read from an external file instead of the DATA file handle?
use strict;
use warnings 'all';
my %data;
while ( <DATA> ) {
my #fields = split /,/;
push #{ $data{$fields[0]} }, $fields[-2];
}
for my $key ( sort keys %data ) {
print join(',', $key, #{ $data{$key} }), "\n";
}
__DATA__
3ssdwyeim3,3ssdwyeic9,2017-03-16,09:10:35.372,0.476,EndInbound
3ssdwyeim3,3ssdwyfyyn,2017-03-16,09:10:35.369,0.421,EndOutbound
3ssdwyfxc0,3ssdwyfxfi,2017-03-16,09:10:35.456,0.509,EndInbound
3ssdwyfxc0,3ssdwyhg0v,2017-03-16,09:10:35.453,0.436,EndOutbound
output
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436

Perl: Find a match, remove the same lines, and to get the last field

Being a Perl newbie, please pardon me for asking this basic question.
I have a text file #server1 that shows a bunch of sentences (white space is the field separator) on many lines in the file.
I needed to match lines with my keyword, remove the same lines, and extract only the last field, so I have tried with:
my #allmatchedlines;
open(output1, "ssh user1#server1 cat /tmp/myfile.txt |");
while(<output1>) {
chomp;
#allmatchedlines = $_ if /mysearch/;
}
close(output1);
my #uniqmatchedline = split(/ /, #allmatchedlines);
my $lastfield = $uniqmatchedline[-1]\n";
print "$lastfield\n";
and it gives me the output showing:
1
I don't know why it's giving me just "1".
Could someone please explain why I'm getting "1" and how I can get the last field of the matched line correctly?
Thank you!
my #uniqmatchedline = split(/ /, #allmatchedlines);
You're getting "1" because split takes a scalar, not an array. An array in scalar context returns the number of elements.
You need to split on each individual line. Something like this:
my #uniqmatchedline = map { split(/ /, $_) } #allmatchedlines;
There are two issues with your code:
split is expecting a scalar value (string) to split on; if you are passing an array, it will convert the array to scalar (which is just the array length)
You did not have a way to remove same lines
To address these, the following code should work (not tested as no data):
my #allmatchedlines;
open(output1, "ssh user1#server1 cat /tmp/myfile.txt |");
while(<output1>) {
chomp;
#allmatchedlines = $_ if /mysearch/;
}
close(output1);
my %existing;
my #uniqmatchedline = grep !$existing{$_}++, #allmatchedlines; #this will return the unique lines
my #lastfields = map { ((split / /, $_)[-1]) . "\n" } #uniqmatchedline ; #this maps the last field in each line into an array
print for #lastfields;
Apart from two errors in the code, I find the statement "remove the same lines and extract only the last field" unclear. Once duplicate matching lines are removed, there may still be multiple distinct sentences with the pattern.
Until a clarification comes, here is code that picks the last field from the last such sentence.
use warnings 'all';
use strict;
use List::MoreUtils qw(uniq)
my $file = '/tmp/myfile.txt';
my $cmd = "ssh user1\#server1 cat $file";
open my $fh, '-|', $cmd // die "Error opening $cmd: $!"; # /
while (<$fh>) {
chomp;
push #allmatchedlines, $_ if /mysearch/;
}
close(output1);
my #unique_matched_lines = uniq #allmatchedlines;
my $lastfield = ( split ' ', $unique_matched_lines[-1] )[-1];
print $lastfield, "\n";
I changed to the three-argument open, with error checking. Recall that open for a process involves a fork and returns pid, so an "error" doesn't at all relate to what happened with the command itself. See open. (The # / merely turns off wrong syntax highlighting.) Also note that # under "..." indicates an array and thus need be escaped.
The (default) pattern ' ' used in split splits on any amount of whitespace. The regex / / turns off this behavior and splits on a single space. You most likely want to use ' '.
For more comments please see the original post below.
The statement #allmatchedlines = $_ if /mysearch/; on every iteration assigns to the array, overwriting whatever has been in it. So you end up with only the last line that matched mysearch. You want push #allmatchedlines, $_ ... to get all those lines.
Also, as shown in the answer by Justin Schell, split needs a scalar so it is taking the length of #allmatchedlines – which is 1 as explained above. You should have
my #words_in_matched_lines = map { split } #allmatchedlines;
When all this is straightened out, you'll have words in the array #uniqmatchedline and if that is the intention then its name is misleading.
To get unique elements of the array you can use the module List::MoreUtils
use List::MoreUtils qw(uniq);
my #unique_elems = uniq #whole_array;

Using Perl to find and fix errors in CSV files

I am dealing with very large amounts of data. Every now and then there is a slip up. I want to identify each row with an error, under a condition of my choice. With that I want the row number along with the line number of each erroneous row. I will be running this script on a handful of files and I will want to output the report to one.
So here is my example data:
File_source,ID,Name,Number,Date,Last_name
1.csv,1,Jim,9876,2014-08-14,Johnson
1.csv,2,Jim,9876,2014-08-14,smith
1.csv,3,Jim,9876,2014-08-14,williams
1.csv,4,Jim,9876,not_a_date,jones
1.csv,5,Jim,9876,2014-08-14,dean
1.csv,6,Jim,9876,2014-08-14,Ruzyck
Desired output:
Row#5,4.csv,4,Jim,9876,not_a_date,jones (this is an erroneous row)
The condition I have chosen is print to output if anything in the date field is not a date.
As you can see, my desired output contains the line number where the error occurred, along with the data itself.
After I have my output that shows the lines within each file that are in error, I want to grab that line from the untouched original CSV file to redo (both modified and original files contain the same amount of rows). After I have a file of these redone rows, I can omit and clean up where needed to prevent interruption of an import.
Folder structure will contain:
Modified: 4.txt
Original: 4.csv
I have something started here, written in Perl, which by the logic will at least return the rows I need. However I believe my syntax is a little off and I do not know how to plug in the other subroutines.
Code:
$count = 1;
while (<>) {
unless ($F[4] =~ /\d+[-]\d+[-]\d+/)
print "Row#" . $count++ . "," . "$_";
}
The code above is supposed to give me my erroneous rows, but to be able to extract them from the originals is beyond me. The above code also contains some syntax errors.
This will do as you ask.
Please be certain that none of the fields in the data can ever contain a comma , otherwise you will need to use Text::CSV to process it instead of just a simple split.
use strict;
use warnings;
use 5.010;
use autodie;
open my $fh, '<', 'example.csv';
<$fh>; # Skip header
while (<$fh>) {
my #fields = split /,/;
if( $fields[4] !~ /^\d{4}-\d{2}-\d{2}$/ ) {
print "Row#$.,$_";
}
}
output
Row#5,4.csv,4,Jim,9876,not_a_date,jones
Update
If you want to process a number of files then you need this instead.
The close ARGV at the end of the loop is there so that the line counter $. is reset to
1 at the start of each file. Without it it just continues from 1 upwards across all the files.
You would run this like
rob#Samurai-U:~$ perl findbad.pl *.csv
or you could list the files individually, separated by spaces.
For the test I have created files 1.csv and 2.csv which are identical to your example data except that the first field of each line is the name of the file containing the data.
You may not want the line in the output that announces each file name, in which case you should replace the entire first if block with just next if $. == 1.
use strict;
use warnings;
#ARGV = map { glob qq{"$_"} } #ARGV; # For Windows
while (<>) {
if ($. == 1) {
print "\n\nFile: $ARGV\n\n";
next;
}
my #fields = split /,/;
unless ( $fields[4] =~ /^\d{4}-\d{2}-\d{2}$/ ) {
printf "Row#%d,%s", $., $_;
}
close ARGV if eof ARGV;
}
output
File: 1.csv
Row#5,1.csv,4,Jim,9876,not_a_date,jones
File: 2.csv
Row#5,2.csv,4,Jim,9876,not_a_date,jones

Perl: How to add a line to sorted text file

I want to add a line to the text file in perl which has data in a sorted form. I have seen examples which show how to append data at the end of the file, but since I want the data in a sorted format.
Please guide me how can it be done.
Basically from what I have tried so far :
(I open a file, grep its content to see if the line which I want to add to the file already exists. If it does than exit else add it to the file (such that the data remains in a sorted format)
open(my $FH, $file) or die "Failed to open file $file \n";
#file_data = <$FH>;
close($FH);
my $line = grep (/$string1/, #file_data);
if($line) {
print "Found\n";
exit(1);
}
else
{
#add the line to the file
print "Not found!\n";
}
Here's an approach using Tie::File so that you can easily treat the file as an array, and List::BinarySearch's bsearch_str_pos function to quickly find the insert point. Once you've found the insert point, you check to see if the element at that point is equal to your insert string. If it's not, splice it into the array. If it is equal, don't splice it in. And finish up with untie so that the file gets closed cleanly.
use strict;
use warnings;
use Tie::File;
use List::BinarySearch qw(bsearch_str_pos);
my $insert_string = 'Whatever!';
my $file = 'something.txt';
my #array;
tie #array, 'Tie::File', $file or die $!;
my $idx = bsearch_str_pos $insert_string, #array;
splice #array, $idx, 0, $insert_string
if $array[$idx] ne $insert_string;
untie #array;
The bsearch_str_pos function from List::BinarySearch is an adaptation of a binary search implementation from Mastering Algorithms with Perl. Its convenient characteristic is that if the search string isn't found, it returns the index point where it could be inserted while maintaining the sort order.
Since you have to read the contents of the text file anyway, how about a different approach?
Read the lines in the file one-by-one, comparing against your target string. If you read a line equal to the target string, then you don't have to do anything.
Otherwise, you eventually read a line 'greater' than your current line according to your sort criteria, or you hit the end of the file. In the former case, you just insert the string at that position, and then copy the rest of the lines. In the latter case, you append the string to the end.
If you don't want to do it that way, you can do a binary search in #file_data to find the spot to add the line without having to examine all of the entries, then insert it into the array before outputting the array to the file.
Here's a simple version that reads from stdin (or filename(s) specified on command line) and appends 'string to append' to the output if it's not found in the input. Outuput is printed on stdout.
#! /usr/bin/perl
$found = 0;
$append='string to append';
while(<>) {
$found = 1 if (m/$append/o);
print
}
print "$append\n" unless ($found);;
Modifying it to edit a file in-place (with perl -i) and taking the append string from the command line would be quite simple.
A 'simple' one-liner to insert a line without using any module could be:
perl -ni -le '$insert="lemon"; $eq=($insert cmp $_); if ($eq == 0){$found++}elsif($eq==-1 && !$found){print$insert} print'
giver a list.txt whose context is:
ananas
apple
banana
pear
the output is:
ananas
apple
banana
lemon
pear
{
local ($^I, #ARGV) = ("", $file); # Enable in-place editing of $file
while (<>) {
# If we found the line exactly, bail out without printing it twice
last if $_ eq $insert;
# If we found the place where the line should be, insert it
if ($_ gt $insert) {
print $insert;
print;
last;
}
print;
}
# We've passed the insertion point, now output the rest of the file
print while <>;
}
Essentially the same answer as pavel's, except with a lot of readability added. Note that $insert should already contain a trailing newline.