Why is chomp not removing whitespace around my string? - perl

I don't understand why perl chomp isn't removing the whitespace surrounding my string. I've even tried to call chomp twice, for example, using bash:
$ perl -e 'use 5.22.4; chomp(my $extra=" lol "); chomp($extra); say "<$extra>"'
< lol >
I really expected to get
<lol>

Chomp only removes the line ending (can be set with $/ variable) from the end of the string. It does not trim the string. Perl does not have a built-in trim function. I usually spell out two substitutions instead:
s/^\s+//, s/\s+$// for $string;
Further reading:
perldoc -f chomp
Perl FAQ: How do I strip blank space from the beginning/end of a string?

To remove all whitespace:
$string =~ s/\s+//g;
Left trim:
$string =~ s/^\s+//;
Right Trim:
$string =~ s/\s+$//;
Left and Right trim:
$string =~ s/^\s+|\s+$//g
We can then also build trimming fucntions. This helps in much bigger scripts where you would not want to write the full replacement strings each time, we write them once, then use the function to do the work.
This simple function can be used in any script as trim($string);
sub trim {
$_[0] =~ s/^\s+|\s+$//g;
}
Similarly with a full strip of whitespace.
sub full_strip {
$_[0] =~ s/\s+//g;
}
in a script:
use strict;
use warnings;
my $string = " this is line with leading and trailing whitespaces ";
my $string2 = " another one of those lines ";
trim($string);
trim($string2);
print "$string\n";
print "$string2\n";
full_strip($string);
full_strip($string2);
print "$string\n";
print "$string2\n";
sub trim {
$_[0] =~ s/^\s+|\s+$//g;
}
sub full_strip {
$_[0] =~ s/\s+//g;
}

$string=~s/^\s+|\s+$//g;
This would work well for any generic string where you want to remove beginning and ending spaces.

Related

Unmatched ) in reg when using lc function

I am trying to run the following code:
$lines = "Enjoyable )) DAY";
$lines =~ lc $lines;
print $lines;
It fails on the second line where I get the error mentioned in the title. I understand the brackets are causing the trouble. I think I could use "quotemeta", but the thing is that my string contains info that I go on to process later, so I would like to keep the string intact as far as possible and not tamper with it too much.
You have two problems here.
1. =~ is used to execute a specific set of operations
The =~ operator is used to either match with //, m//, qr// or a string; or to substitute with s/// or tr///.
If all you want to do is lowercase the contents of $lines then you should use = not =~.
$lines = "Enjoyable )) DAY";
$lines = lc $lines;
print $lines;
2. Regular expressions have special characters which must be escaped
If you want to match $lines against a lower case version of $Lines, which should return true if $lines was already entirely lower case and false otherwise, then you need to escape the ")" characters.
#!/usr/bin/env perl
use strict;
use warnings;
my $lines = "enjoyable )) day";
if ($lines =~ lc quotemeta $lines) {
print "lines is lower case\n";
}
print $lines;
Note this is a toy example trying to find a reason for doing $lines =~ lc $lines - It would be much better (faster, safer) to solve this with eq as in $lines eq lc $lines.
See perldoc -f quotemeta or http://perldoc.perl.org/functions/quotemeta.html for more details on quotemeta.
=~ is used for regular expressions. "lc" is not part of regex, it's a function like this: $new = lc($old);
I don't recall the regex operator for lowercase, because I use lc() all the time.

A Perl Program Which divides a String by spaces between them?

I want my program to divide the string by the spaces between them
$string = "hello how are you";
The output should look like that:
hello
how
are
you
You can do this is a few different ways.
use strict;
use warnings;
my $string = "hello how are you";
my #first = $string =~ /\S+/g; # regex capture non-whitespace
my #second = split ' ', $string; # split on whitespace
my $third = $string;
$third =~ tr/ /\n/; # copy string, substitute space for newline
# $third =~ s/ /\n/g; # same thing, but with s///
The first two creates arrays with the individual words, the last creates a different single string. If all you want is something to print, the last will suffice. To print an array do something like:
print "$_\n" for #first;
Notes:
Normally, regex capture requires parentheses /(\S+)/, but when the /g modifier is used, and parentheses are omitted, the entire match is returned.
When using capture this way, you need to assure list context on the assignment. If the left hand parameter is a scalar, you would force list context with parentheses: my ($var) = ...
I think like simple....
$string = "hello how are you";
print $_, "\n" for split ' ', $string;
#Array = split(" ",$string); then the #Array contain the answer
You need a split for dividing the string by spaces like
use strict;
my $string = "hello how are you";
my #substr = split(' ', $string); # split the string by space
{
local $, = "\n"; # setting the output field operator for printing the values in each line
print #substr;
}
Output:
hello
how
are
you
Split with regexp to account for extra spaces if any:
my $string = "hello how are you";
my #words = split /\s+/, $string; ## account for extra spaces if any
print join "\n", #words

Check for spaces in perl using regex match in perl

I have a variable how do I use the regex in perl to check if a string has spaces in it or not ? For ex:
$test = "abc small ThisIsAVeryLongUnbreakableStringWhichIsBiggerThan20Characters";
So for this string it should check if any word in the string is not bigger than some x characters.
#!/usr/bin/env perl
use strict;
use warnings;
my $test = "ThisIsAVeryLongUnbreakableStringWhichIsBiggerThan20Characters";
if ( $test !~ /\s/ ) {
print "No spaces found\n";
}
Please make sure to read about regular expressions in Perl.
Perl regular expressions tutorial - perldoc perlretut
You should have a look at the perl regex tutorial. Adapting their very first "Hello World" example to your question would look like this:
if ("ThisIsAVeryLongUnbreakableStringWhichIsBiggerThan20Characters" =~ / /) {
print "It matches\n";
}
else {
print "It doesn't match\n";
}
die "No spaces" if $test !~ /[ ]/; # Match a space
die "No spaces" if $test =~ /^[^ ]*\z/; # Match non-spaces for entire string
die "No whitespace" if $test !~ /\s/; # Match a whitespace character
die "No whitespace" if $test =~ /^\S*\z/; # Match non-whitespace for entire string
To find the length of the longest unbroken sequence of non-space characters, write this
use strict;
use warnings;
use List::Util 'max';
my $string = 'abc small ThisIsAVeryLongUnbreakableStringWhichIsBiggerThan20Characters';
my $max = max map length, $string =~ /\S+/g;
print "Maximum unbroken length is $max\n";
output
Maximum unbroken length is 61

perl blank substitution in a string

I have this two kind of strings:
EVASA 2144
IN ELABORAZIONE 16278
I need some perl script to substitute all the blanks with just one.
The output I need is:
EVASA 2144
Any suggestion?
You can use a very simple regex:
#!/usr/bin/perl
use strict;
my $line = 'EVASA 2144';
# This is the line that actually does the work
$line =~ s/\s+/ /g;
print $line, "\n";
My suggestion would be that you spend some time reading the Regular Expression tutorial that is distributed with every modern version of Perl.
$a = "hello \t world";
$a =~ s/\s+/ /;
print $a;
if you may have multiple places in the string where you want the substitution to take place, use
$a = "hello \t world hi";
$a =~ s/\s+/ /g;
print $a;
You can also use the troperator with the s Option, this can do more things for you (transforming characters), probably faster than the regexp approach
$a =~ tr/ \t/ /s;
Explanation can be found in the perlop manpage:
perldoc perlop

Cleanest Perl parser for Makefile-like continuation lines

A perl script I'm writing needs to parse a file that has continuation lines like a Makefile. i.e. lines that begin with whitespace are part of the previous line.
I wrote the code below but don't feel like it is very clean or perl-ish (heck, it doesn't even use "redo"!)
There are many edge cases: EOF at odd places, single-line files, files that start or end with a blank line (or non-blank line, or continuation line), empty files. All my test cases (and code) are here: http://whatexit.org/tal/flatten.tar
Can you write cleaner, perl-ish, code that passes all my tests?
#!/usr/bin/perl -w
use strict;
sub process_file_with_continuations {
my $processref = shift #_;
my $nextline;
my $line = <ARGV>;
$line = '' unless defined $line;
chomp $line;
while (defined($nextline = <ARGV>)) {
chomp $nextline;
next if $nextline =~ /^\s*#/; # skip comments
$nextline =~ s/\s+$//g; # remove trailing whitespace
if (eof()) { # Handle EOF
$nextline =~ s/^\s+/ /;
if ($nextline =~ /^\s+/) { # indented line
&$processref($line . $nextline);
}
else {
&$processref($line);
&$processref($nextline) if $nextline ne '';
}
$line = '';
}
elsif ($nextline eq '') { # blank line
&$processref($line);
$line = '';
}
elsif ($nextline =~ /^\s+/) { # indented line
$nextline =~ s/^\s+/ /;
$line .= $nextline;
}
else { # non-indented line
&$processref($line) unless $line eq '';
$line = $nextline;
}
}
&$processref($line) unless $line eq '';
}
sub process_one_line {
my $line = shift #_;
print "$line\n";
}
process_file_with_continuations \&process_one_line;
How about slurping the whole file into memory and processing it using regular expressions. Much more 'perlish'. This passes your tests and is much smaller and neater:
#!/usr/bin/perl
use strict;
use warnings;
$/ = undef; # we want no input record separator.
my $file = <>; # slurp whole file
$file =~ s/^\n//; # Remove newline at start of file
$file =~ s/\s+\n/\n/g; # Remove trailing whitespace.
$file =~ s/\n\s*#[^\n]+//g; # Remove comments.
$file =~ s/\n\s+/ /g; # Merge continuations
# Done
print $file;
If you don't mind loading the entire file in memory, then the code below passes the tests.
It stores the lines in an array, adding each line either to the previous one (continuation) or at the end of the array (other).
#!/usr/bin/perl
use strict;
use warnings;
my #out;
while( <>)
{ chomp;
s{#.*}{}; # suppress comments
next unless( m{\S}); # skip blank lines
if( s{^\s+}{ }) # does the line start with spaces?
{ $out[-1] .= $_; } # yes, continuation, add to last line
else
{ push #out, $_; } # no, add as new line
}
$, = "\n"; # set output field separator
$\ = "\n"; # set output record separator
print #out;