How to append and the line once the matched pattern is found - perl

I want to search the string in a file and if the search string is found
then I want to replace the three line based on value in curly braces.
I was going through one of solution from stack overflow
Perl - Insert lines after a match is found in a file
is-found-in-a-file
But the things are not working for me
input_file:
abcdef1{3} { 0x55, 0x55, 0x55 }
abcdef2{2} { 0x55, 0x55}
code:
use strict;
use warnings;
my $ipfile = 'input.txt';
open my $my_fh "<", $ipfile or die "Couldn't open input file: $!";
while(<$my_fh>)
{
if (/$abcdef1/)
{
s/abcdef1{3} {\n/abcdef1{3} {\nabcdef1 0x55\nabcdef1 0x55\nabcdef1
0x55\n/gm;
}
}
expected output:
abcdef1 0x55
abcdef1 0x55
abcdef1 0x55
abcdef2 0x55
abcdef2 0x55
Any help with explanation would be grateful.

Note in perlre and RE.info that using $ and { ... } have special meanings within regular expressions. You may not see output because you are missing at least one print statement. The first curly enclosure (ie: {\d+}) could be optional unless you want to validate the length of the series in the second enclosure.
Your loop may look something like:
while (<$my_fh>) {
if (/
^ # beginning of line
([^{]+) # the base pattern captured in $1 ("non-left curly braces")
.* # any number of characters
\{\s*(.*?)\s*\} # the data section surrounded by curlies captured in $2
$ # end of line
/x) # allow whitespace and comments
{
for my $val (split /, /, $2) {
print "$1 $val\n";
}
} else {
print;
}
}
Or more tersely:
while (my $line = <$my_fh>) {
if ($line =~ /^([^{]+).*\{\s*(.*?)\s*\}$/) {
$line = '';
$line .= "$1 $_\n" for split /, /, $2;
}
print $line;
}
The ? in the pattern .*? indicates a non-greedy match. In this case, it avoids matching the whitespace next to the second right curly brace.

Related

find a match and replace next line in perl

I am working on the perl script and need some help with it. The requirement is, I have to find a lable and once the label is found, I have to replace the word in a line immediately following the label. for Example, if the label is ABC:
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
I want to write a script to match the label (ABC) and once the label is found, replace a word in the next line immediately following the label.
Here is my attempt:
open(my $fh, "<", "file1.txt") or die "cannot open file:$!";
while (my $line = <$fh>))
{
next if ($line =~ /ABC/) {
$line =~ s/original_string/replaced_string/;
}
else {
$msg = "pattern not found \n ";
print "$msg";
}
}
Is this correct..? Any help will be greatly appreciated.
The following one-liner will do what you need:
perl -pe '++$x and next if /ABC:/; $x-- and s/old/new/ if $x' inFile > outFile
The code sets a flag and gets the next line if the label is found. If the flag is set, it's unset and the substitution is executed.
Hope this helps!
You're doing this in your loop:
next if ($line =~ /ABC/);
So, you're reading the file, if a line contains ABC anywhere in that line, you skip the line. However, for every other line, you do the replacement. In the end, you're replacing the string on all other lines and printing that out, and your not printing out your labels.
Here's what you said:
I have to read the file until I find a line with the label:
Once the label is found
I have to read the next line and replace the word in a line immediately following the label.
So:
You want to read through a file line-by-line.
If a line matches the label
read the next line
replace the text on the line
Print out the line
Following these directions:
use strict;
use warnings; # Hope you're using strict and warnings
use autodie; # Program automatically dies on failed opens. No need to check
use feature qw(say); # Allows you to use say instead of print
open my $fh, "<", "file1.txt"; # Removed parentheses. It's the latest style
while (my $line = <$fh>) {
chomp $line; # Always do a chomp after a read.
if ( $line eq "ABC:" ) { # Use 'eq' to ensure an exact match for your label
say "$line"; # Print out the current line
$line = <$fh> # Read the next line
$line =~ s/old/new/; # Replace that word
}
say "$line"; # Print the line
}
close $fh; # Might as well do it right
Note that when I use say, I don't have to put the \n on the end of the line. Also, by doing my chomp after my read, I can easily match the label without worrying about the \n on the end.
This is done exactly as you said it should be done, but there are a couple of issues. The first is that when we do $line = <$fh>, there's no guarantee we are really reading a line. What if the file ends right there?
Also, it's bad practice to read a file in multiple places. It makes it harder to maintain the program. To get around this issue, we'll use a flag variable. This allows us to know if the line before was a tag or not:
use strict;
use warnings; # Hope you're using strict and warnings
use autodie; # Program automatically dies on failed opens. No need to check
use feature qw(say); # Allows you to use say instead of print
open my $fh, "<", "file1.txt"; # Removed parentheses. It's the latest style
my $tag_found = 0; # Flag isn't set
while (my $line = <$fh>) {
chomp $line; # Always do a chomp after a read.
if ( $line eq "ABC:" ) { # Use 'eq' to ensure an exact match for your label
$tag_found = 1 # We found the tag!
}
if ( $tag_found ) {
$line =~ s/old/new/; # Replace that word
$tag_found = 0; # Reset our flag variable
}
say "$line"; # Print the line
}
close $fh; # Might as well do it right
Of course, I would prefer to eliminate mysterious values. For example, the tag should be a variable or constant. Same with the string you're searching for and the string you're replacing.
You mentioned this was a word, so your regular expression replacement should probably look like this:
$line =~ s/\b$old_word\b/$new_word/;
The \b mark word boundaries. This way, if you're suppose to replace the word cat with dog, you don't get tripped up on a line that says:
The Jeopardy category is "Say what".
You don't want to change category to dogegory.
Your problem is that reading in a file does not work like that. You're doing it line by line, so when your regex tests true, the line you want to change isn't there yet. You can try adding a boolean variable to check if the last line was a label.
#!/usr/bin/perl;
use strict;
use warnings;
my $found;
my $replacement = "Hello";
while(my $line = <>){
if($line =~ /ABC/){
$found = 1;
next;
}
if($found){
$line =~ s/^.*?$/$replacement/;
$found = 0;
print $line, "\n";
}
}
Or you could use File::Slurp and read the whole file into one string:
use File::Slurp;
$x = read_file( "file.txt" );
$x =~ s/^(ABC:\s*$ [\n\r]{1,2}^.*?)to\sbe/$1to was/mgx;
print $x;
using /m to make the ^ and $ match embedded begin/end of lines
x is to allow the space after the $ - there is probably a better way
Yields:
ABC:
string to was replaced
some other lines
ABC:
string to was replaced
some other lines
ABC:
string to was replaced
Also, relying on perl's in-place editing:
use File::Slurp qw(read_file write_file);
use strict;
use warnings;
my $file = 'fakefile1.txt';
# Initialize Fake data
write_file($file, <DATA>);
# Enclosed is the actual code that you're looking for.
# Everything else is just for testing:
{
local #ARGV = $file;
local $^I = '.bac';
while (<>) {
print;
if (/ABC/ && !eof) {
$_ = <>;
s/.*/replaced string/;
print;
}
}
unlink "$file$^I";
}
# Compare new file.
print read_file($file);
1;
__DATA__
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
ABC:
outputs
ABC:
replaced string
some other lines
ABC:
replaced string
some other lines
ABC:
replaced string
ABC:

search a group of string in a file is present in another file or not

I"m writing to perl script where basically want to open a file having many strings(one string in one line) and compare each of these strings is present in another file(search file) and print each occurrence of it. I have written the below code for one particular string finding. How can i improve it for list of strings from a file.
open(DATA, "<filetosearch.txt") or die "Couldn't open file filetosearch.txt for reading: $!";
my $find = "word or string to find";
#open FILE, "<signatures.txt";
my #lines = <DATA>;
print "Lined that matched $find\n";
for (#lines) {
if ($_ =~ /$find/) {
print "$_\n";
}
}
I'd try something like this:
use strict;
use warnings;
use Tie::File;
tie my #lines, 'Tie::File', 'filetosearch.txt';
my #matched;
my #result;
tie my #patterns, 'Tie::File', 'patterns.txt';
foreach my $pattern (#patterns)
{
$pattern = quotemeta $pattern;
#matched = grep { /$pattern/ } #lines;
push #result, #matched;
}
I use Tie::File, because it is convenient (not especially in this case, but others), others (perhaps a lot of others?) would disagree, but it is of no importance here
grep is a core function, that is very good at what it does (In my experience)
Ok, something like this will be faster.
sub testmatch
{
my ($find, $linesref)= #_ ;
for ( #$linesref ) { if ( $_ =~ /$find/ ) { return 1 ; } }
return 0 ;
}
{
open(DATA, "<filetosearch.txt") or die "die" ;
my #lines = <DATA> ;
open(SRC, "tests.txt") ;
while (<SRC>)
{
if ( testmatch( $_, \#lines )) { print "a match\n" }
}
}
If its matching full line to full line, you can pack the one line in as keys to a hash and just test existance:
{
open(DATA, "<filetosearch.txt") or die "die" ;
my %lines ;
#lines{<DATA>}= undef ;
open(SRC, "tests.txt") ;
while (<SRC>)
{
if ($_ ~~ %lines) { print "a match\n" }
}
}
maybe something like this will do the job:
open FILE1, "filetosearch.txt";
my #arrFileToSearch = <FILE1>;
close FILE1;
open FILE2, "signatures.txt";
my #arrSignatures = <FILE2>;
close FILE2;
for(my $i = 0; defined($arrFileToSearch[$i]);$i++){
foreach my $signature(#arrSignatures){
chomp($signature);
$signature = quotemeta($signature);#to be sure you are escaping special characters
if($arrFileToSearch[$i] =~ /$signature/){
print $arrFileToSearch[$i-3];#or any other index that you want
}
}
}
Here's another option:
use strict;
use warnings;
my $searchFile = pop;
my #strings = map { chomp; "\Q$_\E" } <>;
my $regex = '(?:' . ( join '|', #strings ) . ')';
push #ARGV, $searchFile;
while (<>) {
print if /$regex/;
}
Usage: perl script.pl strings.txt searchFile.txt [>outFile.txt]
The last, optional parameter directs output to a file.
First, the search file's name is (implicitly) popped off #ARGV and saved for later. Then the strings' file is read (<>) and map is used to chomp each line, escape meta-characters (the \Q and \E, in case there may be regex chars, e.g., a '.' or '*' etc., in the string) then these lines are passed to an array. The array's elements are joined with the regex alternation character (|) to effectively form an OR statement of all the strings that will be matched against each of the search file's lines. Next, the search file's name is pushed onto #ARGV so its lines can be searched. Again, each line is chomped and printed if one of the strings are found on the line.
Hope this helps!

Print only the first word in line

I need some help with following perl code.
#!perl -w
use strict;
use warnings;
open my $file, '<', 'ubb' or die $1;
my $spool = 0;
my #matchingLines;
while (<$file>) {
if (/GROUPS/i) {
$spool = 1;
next;
}
elsif (/SERVERS/i) {
$spool = 0;
print map { "$_" } #matchingLines;
#matchingLines = ();
}
if ($spool) {
push (#matchingLines, $_);
}
}
close ($file);
Output from that is shown below.
ADM LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=1
ADM_TMS LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=2
TMSNAME=TMS
ADM_1 LMID=GW_S4_1_PM
GRPNO=11
ADM_2 LMID=GW_S4_2_BM
GRPNO=12
DMWSG_Gateway_1 LMID=GW_S4_1_PM
GRPNO=101
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_Gateway_2 LMID=GW_S4_2_BM
GRPNO=201
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_1 LMID=GW_S4_1_PM
GRPNO=106
DMWSG_2 LMID=GW_S4_2_BM
GRPNO=206
But I only would like to get the first word of each line (e.g. ADM, ADM_TMS, ADM_1).
Note that the file has a lot of other lines above and below what's printed here. I only want to do this for lines that is in between GROUPS and SERVERS.
I would suggest 2 changes in your code
Note: Tested these with your sample data (plus other stuff) in your question.
I: Extract first word before push
Change this
push (#matchingLines, $_);
to
push (#matchingLines, /^(\S+)/);
This would push the first word of each line into the array, instead of the entire line.
Note that /^(\S+)/ is shorthand for $_ =~ /^(\S+)/. If you're using an explicit loop variable like in 7stud's answer, you can't use this shorthand, use the explicit syntax instead, say $line =~ /^(\S+)/ or whatever your loop variable is.
Of course, you can also use split function as suggested in 7stud's answer.
II: Change how you print
Change this
print map { "$_" } #matchingLines;
into
local $" = "\n";
print "#matchingLines \n";
$" specifies the delimiter used for list elements when the array is printed with print or say inside double quotes.
Alternatively, as per TLP's suggestion,
$\ = $/;
print for #lines;
or
print join("\n", #lines), "\n"
Note that $/ is the input record separator (newline by default), $\ is the output record separator (undefined by default). $\ is appended after each print command.
For more information on $/, $\, and $":
See perldoc perlvar (just use CTRL+F to find them in that page)
Or you can simply use perldoc -v '$/' etc on your console to get those information.
Note on readability
I don't think implicit regex matching i.e. /pattern/ is bad per se.
But matching against a variable, i.e. $variable =~ /pattern/ is more readable (as in you can immediately see there's a regex matching going on) and more beginner-friendly, at the cost of conciseness.
use strict;
use warnings;
use 5.014; #say()
my $fname = 'data.txt';
open my $INFILE, '<', $fname
or die "Couldn't open $fname: $!"; #-->Not $1"
my $recording_on = 0;
my #matching_lines;
for my $line (<$INFILE>) {
if ($line =~ /groups/i) {
$recording_on = 1;
next;
}
elsif ($line =~ /servers/i) {
say for #matching_lines; #say() is the same as print(), but it adds a newline at the end
#matching_lines = ();
$recording_on = 0;
}
if ($recording_on) {
my ($first_word, $trash) = split " ", $line, 2;
push #matching_lines, $first_word;
}
}
close $INFILE;
You can use the flip-flop operator (range) to select a part of your input. The idea of this operator is that it returns false until its LHS (left hand side) returns true, and after that it returns true until its RHS returns false, after which it is reset. It is somewhat like preserving a state.
Note that the edge lines are also included in the match, so we need to remove those. After that, use doubleDown's idea and push /^(\S+)/ onto an array. The nice thing about using this with push is that the capture regex returns an empty list if it fails, and this gives us a warning-free failure when the regex does not match.
use strict;
use warnings;
my #matches;
while (<>) {
if (/GROUPS/i .. /SERVERS/i) { # flip-flop remembers the matches
next if (/GROUPS/i or /SERVERS/i);
push #matches, /^(\S+)/;
}
}
# #matches should now contain the first words of those lines

Perl --Change in all matches in previous lines according to match in current line

File I want to parse:
input Pattern;
input SDF;
input ABC
input Pattern;
output Pattern;
output XYZ;
In perl, usual operation is scan line by line.
I want to check that if
current line has output Pattern; and previous line (or all previous lines)has input Pattern;
then change all the previous lines matches to "input Pattern 2;" and current line to "output Pattern2;".
It is complicated ,I hope I have explained properly.
Is it possible in Perl to scan and change previous lines after they have been read?
Thanks
If this is your data:
my $sfile =
'input Pattern;
input SDF;
input ABC
input Pattern;
output Pattern;
output XYZ;' ;
then, the following snippet will read the whole file and change text accordingly:
open my $fh, '<', \$sfile or die $!;
local $/ = undef; # set file input mode to 'slurp'
my $content = <$fh>;
close $fh;
$content =~ s{ ( # open capture group
input \s+ (Pattern); # find occurence of input pattern
.+? # skip some text
output \s+ \2 # find same for output
) # close capture group
}
{ # replace by evaluated expression
do{ # within a do block
local $_=$1; # get whole match to $_
s/($2)/$1 2/g; # substitute Pattern by Pattern 2
$_ # return substituted text
} # close do block
}esgx;
Then, you may close your file and check the string:
print $content;
=>
input Pattern 2;
input SDF;
input ABC
input Pattern 2;
output Pattern 2;
output XYZ;
You may even include a counter $n which will be incremented after each successful match (by code assertion (?{ ... }):
our $n = 1;
$content =~ s{ ( # open capture group
input \s+ (Pattern); # find occurence of input pattern
.+? # skip some text
output \s+ \2 # find same for output
) # close capture group
(?{ $n++ }) # ! update match count
}
{ # replace by evaluated expression
do{ # within a do block
local $_=$1; # get whole match to $_
s/($2)/$1 $n/g; # substitute Pattern by Pattern and count
$_ # return substituted text
} # close do block
}esgx;
The substitution will now start with input Pattern 2; und increment subsequently.
I think this will do what you need, but try it on a 'scratch' file first (a copy of the original) since it actually changes the file:
use Modern::Perl;
open my $fh_in, '<', 'parseThis.txt' or die $!;
my #fileLines = <$fh_in>;
close $fh_in;
for ( my $i = 1 ; $i < scalar #fileLines ; $i++ ) {
next
if $fileLines[$i] !~ /output Pattern;/
and $fileLines[ $i - 1 ] !~ /input Pattern;/;
$fileLines[$i] =~ s/output Pattern;/output Pattern2;/g;
$fileLines[$_] =~ s/input Pattern;/input Pattern 2;/g for 0 .. $i - 1;
}
open my $fh_out, '>', 'parseThis.txt' or die $!;
print $fh_out #fileLines;
close $fh_out;
Results:
input Pattern 2;
input SDF;
input ABC;
input Pattern 2;
output Pattern2;
output XYZ;
Hope this helps!
#!/usr/bin/env perl
$in1 = 'input Pattern';
$in2 = 'input Pattern2';
$out1 = 'output Pattern';
$out2 = 'output Pattern2';
undef $/;
$_ = <DATA>;
if (/^$in1\b.*?^$out1\b/gms) {
s/(^$in1\b)(?=.*?^$out1\b)/$in2/gms;
s/^$out1\b/$out2/gms;
}
print;
__DATA__
input Pattern;
input SDF;
input ABC;
input Pattern;
output Pattern;
output XYZ;
Will there be additional "Input pattern1: lines folloring an occurence of "Output Patttern1?"
Are there going to be multiple pattern to search for, or will it just be "If we find Output Pattern1 then perform the replacement?
Will the "output pattern occur multiple times, or just once?
Will there be additional "Input pattern1: lines folloring an occurence of "Output Patttern1?"
I would perform this task in two/mutiple passes:
Pass1 - read the file, looking for the matching output lines, store the line number in memory.
Pass 2 - read the file, and based on the line numbers in the set of matches, perform the replacement on the appropriate Input lines.
So in semi-perlish, untested psuedocode:
my #matches = ();
open $fh, $inputfile, '<';
while (<$fh>) {
if (/Pattern1/) {
push #matches, $.;
}
}
close $fh;
open $fh, $inputfile, '<';
while (<$fh>) {
if ($. <= $matches[-1]) {
s/Input Pattern1/Input Pattern2/;
print ;
}
else {
pop #matches);
last unless #matches;
}
}
close $fh;
You run this like:
$ replace_pattern.pl input_file > output_file
You'll need to adjust it a little to meet your exact needs, but that should get you close.
You cannot go back and change lines in Perl. What you can do is open the file for the first time in read mode, find out which line has the pattern (say the 5th line), close it before gulping the entire file into an array, open it again in write mode, modify the contents of the array upto the 5th line, dump that array into that file, and close it. Something like this (assuming each file will have at most one output pattern):
my #arr;
my #files = ();
while (<>) {
if ($. == 0) {
$curindex = undef;
#lines = ();
push #files, $ARGV;
}
push #lines, $_;
if (/output pattern/) { $curindex = $. }
if (eof) {
push #arr, [\#lines, $curindex];
close $ARGV;
}
}
for $file (#files) {
open file, "> $file";
#currentfiledetails = #{ $arr[$currentfilenumber++] };
#currentcontents = #{ $currentfiledetails[0] };
$currentoutputmarker = $currentfiledetails[1];
if ($currentoutputmarker) {
for (0 .. $currentoutputmarker - 2) {
$currentcontents[$_] =~ s/input pattern/input pattern2/g;
}
$currentcontents[$currentoutputmarker - 1] =~
s/output pattern/output pattern2/g;
}
print file for #currentcontents;
close file;
}

Neatest way to remove linebreaks in Perl

I'm maintaining a script that can get its input from various sources, and works on it per line. Depending on the actual source used, linebreaks might be Unix-style, Windows-style or even, for some aggregated input, mixed(!).
When reading from a file it goes something like this:
#lines = <IN>;
process(\#lines);
...
sub process {
#lines = shift;
foreach my $line (#{$lines}) {
chomp $line;
#Handle line by line
}
}
So, what I need to do is replace the chomp with something that removes either Unix-style or Windows-style linebreaks.
I'm coming up with way too many ways of solving this, one of the usual drawbacks of Perl :)
What's your opinion on the neatest way to chomp off generic linebreaks? What would be the most efficient?
Edit: A small clarification - the method 'process' gets a list of lines from somewhere, not nessecarily read from a file. Each line might have
No trailing linebreaks
Unix-style linebreaks
Windows-style linebreaks
Just Carriage-Return (when original data has Windows-style linebreaks and is read with $/ = '\n')
An aggregated set where lines have different styles
After digging a bit through the perlre docs a bit, I'll present my best suggestion so far that seems to work pretty good. Perl 5.10 added the \R character class as a generalized linebreak:
$line =~ s/\R//g;
It's the same as:
(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])
I'll keep this question open a while yet, just to see if there's more nifty ways waiting to be suggested.
Whenever I go through input and want to remove or replace characters I run it through little subroutines like this one.
sub clean {
my $text = shift;
$text =~ s/\n//g;
$text =~ s/\r//g;
return $text;
}
It may not be fancy but this method has been working flawless for me for years.
$line =~ s/[\r\n]+//g;
Reading perlport I'd suggest something like
$line =~ s/\015?\012?$//;
to be safe for whatever platform you're on and whatever linefeed style you may be processing because what's in \r and \n may differ through different Perl flavours.
Note from 2017: File::Slurp is not recommended due to design mistakes and unmaintained errors. Use File::Slurper or Path::Tiny instead.
extending on your answer
use File::Slurp ();
my $value = File::Slurp::slurp($filename);
$value =~ s/\R*//g;
File::Slurp abstracts away the File IO stuff and just returns a string for you.
NOTE
Important to note the addition of /g , without it, given a multi-line string, it will only replace the first offending character.
Also, the removal of $, which is redundant for this purpose, as we want to strip all line breaks, not just line-breaks before whatever is meant by $ on this OS.
In a multi-line string, $ matches the end of the string and that would be problematic ).
Point 3 means that point 2 is made with the assumption that you'd also want to use /m otherwise '$' would be basically meaningless for anything practical in a string with >1 lines, or, doing single line processing, an OS which actually understands $ and manages to find the \R* that proceed the $
Examples
while( my $line = <$foo> ){
$line =~ $regex;
}
Given the above notation, an OS which does not understand whatever your files '\n' or '\r' delimiters, in the default scenario with the OS's default delimiter set for $/ will result in reading your whole file as one contiguous string ( unless your string has the $OS's delimiters in it, where it will delimit by that )
So in this case all of these regex are useless:
/\R*$// : Will only erase the last sequence of \R in the file
/\R*// : Will only erase the first sequence of \R in the file
/\012?\015?// : When will only erase the first 012\015 , \012 , or \015 sequence, \015\012 will result in either \012 or \015 being emitted.
/\R*$// : If there happens to be no byte sequences of '\015$OSDELIMITER' in the file, then then NO linebreaks will be removed except for the OS's own ones.
It would appear nobody gets what I'm talking about, so here is example code, that is tested to NOT remove line feeds. Run it, you'll see that it leaves the linefeeds in.
#!/usr/bin/perl
use strict;
use warnings;
my $fn = 'TestFile.txt';
my $LF = "\012";
my $CR = "\015";
my $UnixNL = $LF;
my $DOSNL = $CR . $LF;
my $MacNL = $CR;
sub generate {
my $filename = shift;
my $lineDelimiter = shift;
open my $fh, '>', $filename;
for ( 0 .. 10 )
{
print $fh "{0}";
print $fh join "", map { chr( int( rand(26) + 60 ) ) } 0 .. 20;
print $fh "{1}";
print $fh $lineDelimiter->();
print $fh "{2}";
}
close $fh;
}
sub parse {
my $filename = shift;
my $osDelimiter = shift;
my $message = shift;
print "Parsing $message File $filename : \n";
local $/ = $osDelimiter;
open my $fh, '<', $filename;
while ( my $line = <$fh> )
{
$line =~ s/\R*$//;
print ">|" . $line . "|<";
}
print "Done.\n\n";
}
my #all = ( $DOSNL,$MacNL,$UnixNL);
generate 'Windows.txt' , sub { $DOSNL };
generate 'Mac.txt' , sub { $MacNL };
generate 'Unix.txt', sub { $UnixNL };
generate 'Mixed.txt', sub {
return #all[ int(rand(2)) ];
};
for my $os ( ["$MacNL", "On Mac"], ["$DOSNL", "On Windows"], ["$UnixNL", "On Unix"]){
for ( qw( Windows Mac Unix Mixed ) ){
parse $_ . ".txt", #{ $os };
}
}
For the CLEARLY Unprocessed output, see here: http://pastebin.com/f2c063d74
Note there are certain combinations that of course work, but they are likely the ones you yourself naĆ­vely tested.
Note that in this output, all results must be of the form >|$string|<>|$string|< with NO LINE FEEDS to be considered valid output.
and $string is of the general form {0}$data{1}$delimiter{2} where in all output sources, there should be either :
Nothing between {1} and {2}
only |<>| between {1} and {2}
In your example, you can just go:
chomp(#lines);
Or:
$_=join("", #lines);
s/[\r\n]+//g;
Or:
#lines = split /[\r\n]+/, join("", #lines);
Using these directly on a file:
perl -e '$_=join("",<>); s/[\r\n]+//g; print' <a.txt |less
perl -e 'chomp(#a=<>);print #a' <a.txt |less
To extend Ted Cambron's answer above and something that hasn't been addressed here: If you remove all line breaks indiscriminately from a chunk of entered text, you will end up with paragraphs running into each other without spaces when you output that text later. This is what I use:
sub cleanLines{
my $text = shift;
$text =~ s/\r/ /; #replace \r with space
$text =~ s/\n/ /; #replace \n with space
$text =~ s/ / /g; #replace double-spaces with single space
return $text;
}
The last substitution uses the g 'greedy' modifier so it continues to find double-spaces until it replaces them all. (Effectively substituting anything more that single space)