I'm trying to read contents from an input file, copy only certain lines of code from the file and print in an output file.
Certain lines of code is determined by:
Code name to determine the first line (IP1_NAME or IP2_NAME)
Pattern to determine the last line (END_OF_LIST)
Input file:
IP1_NAME
/ip1name/ip1dir/ //CLIENT_NAME/ip1name/ip1dir
/ip1testname/ip1testdir/ //CLIENT_NAME/ip1testname/ip1testdir
END_OF_LIST
IP2_NAME
/ip2name/ip2dir/ //CLIENT_NAME/ip2name/ip2dir
/ip2testname/ip2testdir/ //CLIENT_NAME/ip2testname/ip2testdir
END_OF_LIST
Output file:
(If IP1_NAME is chosen and the CLIENT_NAME should be replaced by tester_ip)
/ip1name/ip1dir/ //tester_ip/ip1name/ip1dir
/ip1testname/ip1testdir/ //tester_ip/ip1testname/ip1testdir
You could use the following one-liner to pull out the lines between the two patterns:
perl -0777 -ne 'print "$1\n" while /IP1_NAME(.*?)END_OF_LIST/gs' in.txt > out.txt
Where in.txt is your input file and out.txt is the output file.
This use case is actually described in perlfaq6: Regular Expressions.
You can then modify the output file to replace CLIENT_NAME with tester_ip:
perl -pi -e 's/CLIENT_NAME/tester_ip/' y.txt
As a script instead of a one-liner, using the scalar range operator:
#/usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
process('input.txt', qr/^IP1_NAME$/, qr/^END_OF_LIST$/, 'tester_ip');
sub process {
my ($filename, $startpat, $endpat, $newip) = #_;
open my $file, '<', $filename;
while (my $line = <$file>) {
chomp $line;
if ($line =~ /$startpat/ .. $line =~ /$endpat/) {
next unless $line =~ /^\s/; # Skip the start and lines.
$line =~ s/^\s+//; # Remove indentation
$line =~ s/CLIENT_NAME/$newip/g; # Replace with desired value
say $line;
}
}
}
Running this on your sample input file produces:
/ip1name/ip1dir/ //tester_ip/ip1name/ip1dir
/ip1testname/ip1testdir/ //tester_ip/ip1testname/ip1testdir
I am assuming there is additional stuff in your input file, otherwise we would not have to jump through the hoops with these start and end markers as and we could just say
perl -ne "print if /^ /"
and that would be silly, right ;-)
So, the flipflop has potential problems as I stated in my comment. And while clever, it does not buy you that much in terms of readability or verbosement (verbocity?), since you have to test again anyway in order to not process the marker lines.
As long as there is no exclusive flip flop operator, I would go for a more robust solution.
my $in;
while (<DATA>) {
$in = 1, next if /^IP\d_NAME/;
$in = 0 if /^END_OF_LIST/;
if ( $in )
{
s/CLIENT_NAME/tester_ip/;
print;
}
}
__DATA__
cruft
IP1_NAME
/ip1name/ip1dir/ //CLIENT_NAME/ip1name/ip1dir
/ip1testname/ip1testdir/ //CLIENT_NAME/ip1testname/ip1testdir
END_OF_LIST
more
cruft
IP2_NAME
/ip2name/ip2dir/ //CLIENT_NAME/ip2name/ip2dir
/ip2testname/ip2testdir/ //CLIENT_NAME/ip2testname/ip2testdir
END_OF_LIST
Lore Ipsargh!
Related
I am working on the perl script and need some help with it. The requirement is, I have to find a lable and once the label is found, I have to replace the word in a line immediately following the label. for Example, if the label is ABC:
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
I want to write a script to match the label (ABC) and once the label is found, replace a word in the next line immediately following the label.
Here is my attempt:
open(my $fh, "<", "file1.txt") or die "cannot open file:$!";
while (my $line = <$fh>))
{
next if ($line =~ /ABC/) {
$line =~ s/original_string/replaced_string/;
}
else {
$msg = "pattern not found \n ";
print "$msg";
}
}
Is this correct..? Any help will be greatly appreciated.
The following one-liner will do what you need:
perl -pe '++$x and next if /ABC:/; $x-- and s/old/new/ if $x' inFile > outFile
The code sets a flag and gets the next line if the label is found. If the flag is set, it's unset and the substitution is executed.
Hope this helps!
You're doing this in your loop:
next if ($line =~ /ABC/);
So, you're reading the file, if a line contains ABC anywhere in that line, you skip the line. However, for every other line, you do the replacement. In the end, you're replacing the string on all other lines and printing that out, and your not printing out your labels.
Here's what you said:
I have to read the file until I find a line with the label:
Once the label is found
I have to read the next line and replace the word in a line immediately following the label.
So:
You want to read through a file line-by-line.
If a line matches the label
read the next line
replace the text on the line
Print out the line
Following these directions:
use strict;
use warnings; # Hope you're using strict and warnings
use autodie; # Program automatically dies on failed opens. No need to check
use feature qw(say); # Allows you to use say instead of print
open my $fh, "<", "file1.txt"; # Removed parentheses. It's the latest style
while (my $line = <$fh>) {
chomp $line; # Always do a chomp after a read.
if ( $line eq "ABC:" ) { # Use 'eq' to ensure an exact match for your label
say "$line"; # Print out the current line
$line = <$fh> # Read the next line
$line =~ s/old/new/; # Replace that word
}
say "$line"; # Print the line
}
close $fh; # Might as well do it right
Note that when I use say, I don't have to put the \n on the end of the line. Also, by doing my chomp after my read, I can easily match the label without worrying about the \n on the end.
This is done exactly as you said it should be done, but there are a couple of issues. The first is that when we do $line = <$fh>, there's no guarantee we are really reading a line. What if the file ends right there?
Also, it's bad practice to read a file in multiple places. It makes it harder to maintain the program. To get around this issue, we'll use a flag variable. This allows us to know if the line before was a tag or not:
use strict;
use warnings; # Hope you're using strict and warnings
use autodie; # Program automatically dies on failed opens. No need to check
use feature qw(say); # Allows you to use say instead of print
open my $fh, "<", "file1.txt"; # Removed parentheses. It's the latest style
my $tag_found = 0; # Flag isn't set
while (my $line = <$fh>) {
chomp $line; # Always do a chomp after a read.
if ( $line eq "ABC:" ) { # Use 'eq' to ensure an exact match for your label
$tag_found = 1 # We found the tag!
}
if ( $tag_found ) {
$line =~ s/old/new/; # Replace that word
$tag_found = 0; # Reset our flag variable
}
say "$line"; # Print the line
}
close $fh; # Might as well do it right
Of course, I would prefer to eliminate mysterious values. For example, the tag should be a variable or constant. Same with the string you're searching for and the string you're replacing.
You mentioned this was a word, so your regular expression replacement should probably look like this:
$line =~ s/\b$old_word\b/$new_word/;
The \b mark word boundaries. This way, if you're suppose to replace the word cat with dog, you don't get tripped up on a line that says:
The Jeopardy category is "Say what".
You don't want to change category to dogegory.
Your problem is that reading in a file does not work like that. You're doing it line by line, so when your regex tests true, the line you want to change isn't there yet. You can try adding a boolean variable to check if the last line was a label.
#!/usr/bin/perl;
use strict;
use warnings;
my $found;
my $replacement = "Hello";
while(my $line = <>){
if($line =~ /ABC/){
$found = 1;
next;
}
if($found){
$line =~ s/^.*?$/$replacement/;
$found = 0;
print $line, "\n";
}
}
Or you could use File::Slurp and read the whole file into one string:
use File::Slurp;
$x = read_file( "file.txt" );
$x =~ s/^(ABC:\s*$ [\n\r]{1,2}^.*?)to\sbe/$1to was/mgx;
print $x;
using /m to make the ^ and $ match embedded begin/end of lines
x is to allow the space after the $ - there is probably a better way
Yields:
ABC:
string to was replaced
some other lines
ABC:
string to was replaced
some other lines
ABC:
string to was replaced
Also, relying on perl's in-place editing:
use File::Slurp qw(read_file write_file);
use strict;
use warnings;
my $file = 'fakefile1.txt';
# Initialize Fake data
write_file($file, <DATA>);
# Enclosed is the actual code that you're looking for.
# Everything else is just for testing:
{
local #ARGV = $file;
local $^I = '.bac';
while (<>) {
print;
if (/ABC/ && !eof) {
$_ = <>;
s/.*/replaced string/;
print;
}
}
unlink "$file$^I";
}
# Compare new file.
print read_file($file);
1;
__DATA__
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
ABC:
outputs
ABC:
replaced string
some other lines
ABC:
replaced string
some other lines
ABC:
replaced string
ABC:
I'm reading this textfile to get ONLY the words in it and ignore all kind of whitespaces:
hello
now
do you see this.sadslkd.das,msdlsa but
i hoohoh
And this is my Perl code:
#!usr/bin/perl -w
require 5.004;
open F1, './text.txt';
while ($line = <F1>) {
#print $line;
#arr = split /\s+/, $line;
foreach $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
#print #arr;
}
close F1;
And this is the output:
hello
now
do
you
see
this.sadslkd.das,msdlsa
but
i
hoohoh
The output is showing two newlines but I am expecting the output to be just words. What should I do to just get words?
You should always use strict and use warnings (in preference to the -w command-line qualifier) at the top of every Perl program, and declare each variable at its first point of use using my. That way Perl will tell you about simple errors that you may otherwise overlook.
You should also use lexical file handles with the three-parameter form of open, and check the status to make sure it succeeded. There is little point in explicitly closing an input file unless you expect your program to run for an appreciable time, as Perl will close all files for you on exit.
Do you really need to require Perl v5.4? That version is fifteen years old, and if there is anything older than that installed then you have a museum!
Your program would be better like this:
use strict;
use warnings;
open my $fh, '<', './text.txt' or die $!;
while (my $line = <$fh>) {
my #arr = split /\s+/, $line;
foreach my $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
}
Note: my apologies. The warnings pragma and lexical file handles were introduced only in v5.6 so that part of my answer is irrelevant. The latest version of Perl is v5.16 and you really should upgrade
As Birei has pointed out, the problem is that, when the line has leading whitespace, there is a empty field before the first separator. Imagine if your data was comma-separated, then you would want Perl to report a leading empty field if the line started with a comma.
To extract all the non-space characters you can use a regular expression that does exactly that
my #arr = $line =~ /\S+/g;
and this can be emulated by using the default parameter for split which is a single quoted space (not a regular expression)
my #arr = $line =~ split ' ', $line;
In this case split behaves like the awk utility and discards any leading empty fields as you expected.
This is even simpler if you let Perl use the $_ variable in the read loop, as all of the parameters for split can be defaulted:
while (<F1>) {
my #arr = split;
foreach my $w (#arr) {
print "$w\n" if $w !~ /^\s+$/;
}
}
This line is the problem:
#arr=split(/\s+/,$line);
\s+ does a match just before the leading spaces. Use ' ' instead.
#arr=split(' ',$line);
I believe that in this line:
if(!($w =~ /^\s+$/))
You wanted to ask if there's nothing in this row - don't print it.
But the "+" in the REGEX actually force it to have at least 1 space.
If you change the "\s+" to "\s*", you'll see that it's working. because * is 0 occurrences or more ...
I have a text file to parse in Perl. I parse it from the start of file and get the data that is needed.
After all that is done I want to read the last line in the file with data. The problem is that the last two lines are blank. So how do I get the last line that holds any data?
If the file is relatively short, just read on from where you finished getting the data, keeping the last non-blank line:
use autodie ':io';
open(my $fh, '<', 'file_to_read.txt');
# get the data that is needed, then:
my $last_non_blank_line;
while (my $line = readline $fh) {
# choose one of the following two lines, depending what you meant
if ( $line =~ /\S/ ) { $last_non_blank_line = $line } # line isn't all whitespace
# if ( line !~ /^$/ ) { $last_non_blank_line = $line } # line has no characters before the newline
}
If the file is longer, or you may have passed the last non-blank line in your initial data gathering step, reopen it and read from the end:
my $backwards = File::ReadBackwards->new( 'file_to_read.txt' );
my $last_non_blank_line;
do {
$last_non_blank_line = $backwards->readline;
} until ! defined $last_non_blank_line || $last_non_blank_line =~ /\S/;
perl -e 'while (<>) { if ($_) {$last = $_;} } print $last;' < my_file.txt
You can use the module File::ReadBackwards in the following way:
use File::ReadBackwards ;
$bw = File::ReadBackwards->new('filepath') or
die "can't read file";
while( defined( $log_line = $bw->readline ) ) {
print $log_line ;
exit 0;
}
If they're blank, just check $log_line for a match with \n;
If the file is small, I would store it in an array and read from the end. If its large, use File::ReadBackwards module.
Here's my variant of command line perl solution:
perl -ne 'END {print $last} $last= $_ if /\S/' file.txt
No one mentioned Path::Tiny. If the file size is relativity small you can do this:
use Path::Tiny;
my $file = path($file_name);
my ($last_line) = $file->lines({count => -1});
CPAN page.
Just remember for the large file, just as #ysth said it's better to use File::ReadBackwards. The difference can be substantial.
sometimes it is more comfortable for me to run shell commands from perl code. so I'd prefer following code to resolve the case:
$result=`tail -n 1 /path/file`;
I have several text files, that were once tables in a database, which is now disassembled. I'm trying to reassemble them, which will be easy, once I get them into a usable form. The first file, "keys.text" is just a list of labels, inconsistently formatted. Like:
Sa 1 #
Sa 2
U 328 #*
It's always letter(s), [space], number(s), [space], and sometime symbol(s). The text files that match these keys are the same, then followed by a line of text, also separated, or delimited, by a SPACE.
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
What I'm trying to do in the code below, is match the key from "keys.text", with the same key in the .txt files, and put a tab between the key, and the text. I'm sure I'm overlooking something very basic, but the result I'm getting, looks identical to the source .txt file.
Thanks in advance for any leads or assistance!
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
open(IN1, "keys.text");
my $key;
# Read each line one at a time
while ($key = <IN1>) {
# For each txt file in the current directory
foreach my $file (<*.txt>) {
open(IN, $file) or die("Cannot open TXT file for reading: $!");
open(OUT, ">temp.txt") or die("Cannot open output file: $!");
# Add temp modified file into directory
my $newFilename = "modified\/keyed_" . $file;
my $line;
# Read each line one at a time
while ($line = <IN>) {
$line =~ s/"\$key"/"\$key" . "\/t"/;
print(OUT "$line");
}
rename("temp.txt", "$newFilename");
}
}
EDIT: Just to clarify, the results should retain the symbols from the keys as well, if there are any. So they'd look like:
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
The regex seems quoted rather oddly to me. Wouldn't
$line =~ s/$key/$key\t/;
work better?
Also, IIRC, <IN1> will leave the newline on the end of your $key. chomp $key to get rid of that.
And don't put parentheses around your print args, esp when you're writing to a file handle. It looks wrong, whether it is or not, and distracts people from the real problems.
if Perl is not a must, you can use this awk one liner
$ cat keys.txt
Sa 1 #
Sa 2
U 328 #*
$ cat mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
$ awk 'FNR==NR{ k[$1 SEP $2];next }($1 SEP $2 in k) {$2=$2"\t"}1 ' keys.txt mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
Using split rather than s/// makes the problem straightforward. In the code below, read_keys extracts the keys from keys.text and records them in a hash.
Then for all files named on the command line, available in the special Perl array #ARGV, we inspect each line to see whether it begins with a key. If not, we leave it alone, but otherwise insert a TAB between the key and the text.
Note that we edit the files in-place thanks to Perl's handy -i option:
-i[extension]
specifies that files processed by the <> construct are to be edited in-place. It does this by renaming the input file, opening the output file by the original name, and selecting that output file as the default for print statements. The extension, if supplied, is used to modify the name of the old file to make a backup copy …
The line split " ", $_, 3 separates the current line into exactly three fields. This is necessary to protect whitespace that's likely to be present in the text portion of the line.
#! /usr/bin/perl -i.bak
use warnings;
use strict;
sub usage { "Usage: $0 text-file\n" }
sub read_keys {
my $path = "keys.text";
open my $fh, "<", $path
or die "$0: open $path: $!";
my %key;
while (<$fh>) {
my($text,$num) = split;
++$key{$text}{$num} if defined $text && defined $num;
}
wantarray ? %key : \%key;
}
die usage unless #ARGV;
my %key = read_keys;
while (<>) {
my($text,$num,$line) = split " ", $_, 3;
$_ = "$text $num\t$line" if defined $text &&
defined $num &&
$key{$text}{$num};
print;
}
Sample run:
$ ./add-tab input
$ diff -u input.bak input
--- input.bak 2010-07-20 20:47:38.688916978 -0500
+++ input 2010-07-20 21:00:21.119531937 -0500
## -1,3 +1,3 ##
-Sa 1 # Random line of text follows.
-Sa 2 This text is just as random.
-U 328 #* Continuing text...
+Sa 1 # Random line of text follows.
+Sa 2 This text is just as random.
+U 328 #* Continuing text...
Fun answers:
$line =~ s/(?<=$key)/\t/;
Where (?<=XXXX) is a zero-width positive lookbehind for XXXX. That means it matches just after XXXX without being part of the match that gets substituted.
And:
$line =~ s/$key/$key . "\t"/e;
Where the /e flag at the end means to do one eval of what's in the second half of the s/// before filling it in.
Important note: I'm not recommending either of these, they obfuscate the program. But they're interesting. :-)
How about doing two separate slurps of each file. For the first file you open the keys and create a preliminary hash. For the second file then all you need to do is add the text to the hash.
use strict;
use warnings;
my $keys_file = "path to keys.txt";
my $content_file = "path to content.txt";
my $output_file = "path to output.txt";
my %hash = ();
my $keys_regex = '^([a-zA-Z]+)\s*\(d+)\s*([^\da-zA-Z\s]+)';
open my $fh, '<', $keys_file or die "could not open $key_file";
while(<$fh>){
my $line = $_;
if ($line =~ /$keys_regex/){
my $key = $1;
my $number = $2;
my $symbol = $3;
$hash{$key}{'number'} = $number;
$hash{$key}{'symbol'} = $symbol;
}
}
close $fh;
open my $fh, '<', $content_file or die "could not open $content_file";
while(<$fh>){
my $line = $_;
if ($line =~ /^([a-zA-Z]+)/){
my $key = $1;
// strip content_file line from keys/number/symbols to leave text
line =~ s/^$key//;
line =~ s/\s*$hash{$key}{'number'}//;
line =~ s/\s*$hash{$key}{'symbol'}//;
$line =~ s/^\s+//g;
$hash{$key}{'text'} = $line;
}
}
close $fh;
open my $fh, '>', $output_file or die "could not open $output_file";
for my $key (keys %hash){
print $fh $key . " " . $hash{$key}{'number'} . " " . $hash{$key}{'symbol'} . "\t" . $hash{$key}{'text'} . "\n";
}
close $fh;
I haven't had a chance to test it yet and the solution seems a little hacky with all the regex but might give you an idea of something else you can try.
This looks like the perfect place for the map function in Perl! Read in the entire text file into an array, then apply the map function across the entire array. The only other thing you might want to do is use the quotemeta function to escape out any possible regular expressions in your keys.
Using map is very efficient. I also read the keys into an array in order to not have to keep opening and closing the keys file in my loop. It's an O^2 algorithm, but if your keys aren't that big, it shouldn't be too bad.
#! /usr/bin/env perl
use strict;
use vars;
use warnings;
open (KEYS, "keys.text")
or die "Cannot open 'keys.text' for reading\n";
my #keys = <KEYS>;
close (KEYS);
foreach my $file (glob("*.txt")) {
open (TEXT, "$file")
or die "Cannot open '$file' for reading\n";
my #textArray = <TEXT>;
close (TEXT);
foreach my $line (#keys) {
chomp $line;
map($_ =~ s/^$line/$line\t/, #textArray);
}
open (NEW_TEXT, ">$file.new") or
die qq(Can't open file "$file" for writing\n);
print TEXT join("\n", #textArray) . "\n";
close (TEXT);
}
I'm maintaining a script that can get its input from various sources, and works on it per line. Depending on the actual source used, linebreaks might be Unix-style, Windows-style or even, for some aggregated input, mixed(!).
When reading from a file it goes something like this:
#lines = <IN>;
process(\#lines);
...
sub process {
#lines = shift;
foreach my $line (#{$lines}) {
chomp $line;
#Handle line by line
}
}
So, what I need to do is replace the chomp with something that removes either Unix-style or Windows-style linebreaks.
I'm coming up with way too many ways of solving this, one of the usual drawbacks of Perl :)
What's your opinion on the neatest way to chomp off generic linebreaks? What would be the most efficient?
Edit: A small clarification - the method 'process' gets a list of lines from somewhere, not nessecarily read from a file. Each line might have
No trailing linebreaks
Unix-style linebreaks
Windows-style linebreaks
Just Carriage-Return (when original data has Windows-style linebreaks and is read with $/ = '\n')
An aggregated set where lines have different styles
After digging a bit through the perlre docs a bit, I'll present my best suggestion so far that seems to work pretty good. Perl 5.10 added the \R character class as a generalized linebreak:
$line =~ s/\R//g;
It's the same as:
(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])
I'll keep this question open a while yet, just to see if there's more nifty ways waiting to be suggested.
Whenever I go through input and want to remove or replace characters I run it through little subroutines like this one.
sub clean {
my $text = shift;
$text =~ s/\n//g;
$text =~ s/\r//g;
return $text;
}
It may not be fancy but this method has been working flawless for me for years.
$line =~ s/[\r\n]+//g;
Reading perlport I'd suggest something like
$line =~ s/\015?\012?$//;
to be safe for whatever platform you're on and whatever linefeed style you may be processing because what's in \r and \n may differ through different Perl flavours.
Note from 2017: File::Slurp is not recommended due to design mistakes and unmaintained errors. Use File::Slurper or Path::Tiny instead.
extending on your answer
use File::Slurp ();
my $value = File::Slurp::slurp($filename);
$value =~ s/\R*//g;
File::Slurp abstracts away the File IO stuff and just returns a string for you.
NOTE
Important to note the addition of /g , without it, given a multi-line string, it will only replace the first offending character.
Also, the removal of $, which is redundant for this purpose, as we want to strip all line breaks, not just line-breaks before whatever is meant by $ on this OS.
In a multi-line string, $ matches the end of the string and that would be problematic ).
Point 3 means that point 2 is made with the assumption that you'd also want to use /m otherwise '$' would be basically meaningless for anything practical in a string with >1 lines, or, doing single line processing, an OS which actually understands $ and manages to find the \R* that proceed the $
Examples
while( my $line = <$foo> ){
$line =~ $regex;
}
Given the above notation, an OS which does not understand whatever your files '\n' or '\r' delimiters, in the default scenario with the OS's default delimiter set for $/ will result in reading your whole file as one contiguous string ( unless your string has the $OS's delimiters in it, where it will delimit by that )
So in this case all of these regex are useless:
/\R*$// : Will only erase the last sequence of \R in the file
/\R*// : Will only erase the first sequence of \R in the file
/\012?\015?// : When will only erase the first 012\015 , \012 , or \015 sequence, \015\012 will result in either \012 or \015 being emitted.
/\R*$// : If there happens to be no byte sequences of '\015$OSDELIMITER' in the file, then then NO linebreaks will be removed except for the OS's own ones.
It would appear nobody gets what I'm talking about, so here is example code, that is tested to NOT remove line feeds. Run it, you'll see that it leaves the linefeeds in.
#!/usr/bin/perl
use strict;
use warnings;
my $fn = 'TestFile.txt';
my $LF = "\012";
my $CR = "\015";
my $UnixNL = $LF;
my $DOSNL = $CR . $LF;
my $MacNL = $CR;
sub generate {
my $filename = shift;
my $lineDelimiter = shift;
open my $fh, '>', $filename;
for ( 0 .. 10 )
{
print $fh "{0}";
print $fh join "", map { chr( int( rand(26) + 60 ) ) } 0 .. 20;
print $fh "{1}";
print $fh $lineDelimiter->();
print $fh "{2}";
}
close $fh;
}
sub parse {
my $filename = shift;
my $osDelimiter = shift;
my $message = shift;
print "Parsing $message File $filename : \n";
local $/ = $osDelimiter;
open my $fh, '<', $filename;
while ( my $line = <$fh> )
{
$line =~ s/\R*$//;
print ">|" . $line . "|<";
}
print "Done.\n\n";
}
my #all = ( $DOSNL,$MacNL,$UnixNL);
generate 'Windows.txt' , sub { $DOSNL };
generate 'Mac.txt' , sub { $MacNL };
generate 'Unix.txt', sub { $UnixNL };
generate 'Mixed.txt', sub {
return #all[ int(rand(2)) ];
};
for my $os ( ["$MacNL", "On Mac"], ["$DOSNL", "On Windows"], ["$UnixNL", "On Unix"]){
for ( qw( Windows Mac Unix Mixed ) ){
parse $_ . ".txt", #{ $os };
}
}
For the CLEARLY Unprocessed output, see here: http://pastebin.com/f2c063d74
Note there are certain combinations that of course work, but they are likely the ones you yourself naívely tested.
Note that in this output, all results must be of the form >|$string|<>|$string|< with NO LINE FEEDS to be considered valid output.
and $string is of the general form {0}$data{1}$delimiter{2} where in all output sources, there should be either :
Nothing between {1} and {2}
only |<>| between {1} and {2}
In your example, you can just go:
chomp(#lines);
Or:
$_=join("", #lines);
s/[\r\n]+//g;
Or:
#lines = split /[\r\n]+/, join("", #lines);
Using these directly on a file:
perl -e '$_=join("",<>); s/[\r\n]+//g; print' <a.txt |less
perl -e 'chomp(#a=<>);print #a' <a.txt |less
To extend Ted Cambron's answer above and something that hasn't been addressed here: If you remove all line breaks indiscriminately from a chunk of entered text, you will end up with paragraphs running into each other without spaces when you output that text later. This is what I use:
sub cleanLines{
my $text = shift;
$text =~ s/\r/ /; #replace \r with space
$text =~ s/\n/ /; #replace \n with space
$text =~ s/ / /g; #replace double-spaces with single space
return $text;
}
The last substitution uses the g 'greedy' modifier so it continues to find double-spaces until it replaces them all. (Effectively substituting anything more that single space)