Perl: function to trim string leading and trailing whitespace

Perl: function to trim string leading and trailing whitespace - perl

Is there a built-in function to trim leading and trailing whitespace such that trim(" hello world ") eq "hello world"?

Here's one approach using a regular expression:
$string =~ s/^\s+|\s+$//g ; # remove both leading and trailing whitespace
Perl 6 will include a trim function:
$string .= trim;
Source: Wikipedia

This is available in String::Util with the trim method:
Editor's note: String::Util is not a core module, but you can install it from CPAN with [sudo] cpan String::Util.
use String::Util 'trim';
my $str = " hello ";
$str = trim($str);
print "string is now: '$str'\n";
prints:
string is now 'hello'
However it is easy enough to do yourself:
$str =~ s/^\s+//;
$str =~ s/\s+$//;

There's no built-in trim function, but you can easily implement your own using a simple substitution:
sub trim {
(my $s = $_[0]) =~ s/^\s+|\s+$//g;
return $s;
}
or using non-destructive substitution in Perl 5.14 and later:
sub trim {
return $_[0] =~ s/^\s+|\s+$//rg;
}

According to this perlmonk's thread:
$string =~ s/^\s+|\s+$//g;

Complete howto in the perfaq here:
http://learn.perl.org/faq/perlfaq4.html#How-do-I-strip-blank-space-from-the-beginning-end-of-a-string-

For those that are using Text::CSV I found this thread and then noticed within the CSV module that you could strip it out via switch:
$csv = Text::CSV->new({allow_whitespace => 1});
The logic is backwards in that if you want to strip then you set to 1. Go figure. Hope this helps anyone.

One option is Text::Trim:
use Text::Trim;
print trim(" example ");

Apply: s/^\s*//; s/\s+$//; to it. Or use s/^\s+|\s+$//g if you want to be fancy.

I also use a positive lookahead to trim repeating spaces inside the text:
s/^\s+|\s(?=\s)|\s+$//g

No, but you can use the s/// substitution operator and the \s whitespace assertion to get the same result.

Related

Substitute, replace string with perl

I have string like this:
$string= "only this I need".
I am new in perl, and I tried to translate a PL/SQL code in perl.
My goal is to replace " with a blank space, finally it should look like this:
$string = only this I need
In PL/SQL I use this, and is working very well:
REGEXP_REPLACE(string,'"','');
In perl I tried this, but is not working: $string=~s/"/''; receiving an error.
Please, help me, tell me what I need to read to do my job properly?

Try this it should work:
use strict;
use warnings;
my $string= '"only this I need"';
print "$string \n"; #prints "only this I need"
$string =~ s/"/ /g;
print "$string \n"; #prints only this I need

This is a way to remove quotes from string:
my $string= '"only this I need"';
$string =~ m/"([^"]*)"/;
print "$1\n";
In case if you know the first and last character is quotes, you can do this without using regex, just use substr:
my $string= '"only this I need"';
$string = substr $string, 1, -1;
print "$string\n";

Why is Perl's chomp not doing what I want it to do?

I am receiving output from some process (shown in #result_listosp below). When I try to chomp output is weird. I desire the following output:
origin-server-pool-1 http_TestABC https_TestABC
Code:
use strict;
use warnings;
my #result_listosp = ( # From backticks
"origin-server-pool-1\n",
"http_TestABC \n",
"https_TestABC\n",
);
chomp #result_listosp;
Output:
origin-server-pool-1http_TestABC https_TestABC

I'm not sure what you think chomp is supposed to do, but it's not to add spaces?!
And it does not remove trailing whitespace either. If you want to remove trailing whitespace (including newlines) use the following instead of chomp(#result_listosp):
s/\s+\z// for #result_listosp;
As for adding a space between elements, you can use
print(join(' ', #result_listosp), "\n");
or even just
print("#result_listosp\n");

The function chomp only removes the newline (\n in this case) character at the end of a line.
If you want to trim (remove whitespaces from the ends), you can do this:
#!/usr/bin/perl
use strict;
use warnings;
sub trim_elements {
for my $i (#_) {
$i =~ s/^\s+|\s+$//g;
}
}
my #result_listosp = ( # From backticks
"origin-server-pool-1\n",
"http_TestABC \n",
"https_TestABC\n",
);
trim_elements #result_listosp;
for my $i (#result_listosp) {
print $i;
}
As you can see, I didn't use parenthesis. That works only because the sub is declared before the call. If you declare the sub after the code, you need to use parenthesis.
Francisco

If you have newlines in each line and you want to remove them, use chomp. If you want to concatenate strings with a space in between then use join:
my #result_listosp = ( # From backticks
"origin-server-pool-1\n",
"http_TestABC \n",
"https_TestABC\n",
);
print join (" ", map { /^\s*(.*?)\s*$/ } #result_listosp), "\n";
Output
origin-server-pool-1 http_TestABC https_TestABC

How to split a this string 'gi|216ATGCTGATGCTGTG' in this format 'gi|216 ATGCTGTGCTGATGCTG' in Perl?

I am parsing the fasta alignment file which contains
gi|216CCAACGAAATGATCGCCACACAA
gi|21-GCTGGTTCAGCGACCAAAAGTAGC
I want to split this string into this:
gi|216 CCAACGAAATGATCGCCACACAA
gi|21- GCTGGTTCAGCGACCAAAAGTAGC
For first string, I use
$aar=split("\d",$string);
But that didn't work. What should I do?

So you're parsing some genetic data and each line has a gi| prefix followed by a sequence of numbers and hyphens followed by the nucleotide sequence? If so, you could do something like this:
my ($number, $nucleotides);
if($string =~ /^gi\|([\d-]+)([ACGT]+)$/) {
$number = $1;
$nucleotides = $2;
}
else {
# Broken data?
}
That assumes that you've already stripped off leading and trailing whitespace. If you do that, you should get $number = '216' and $nucleotides = 'CCAACGAAATGATCGCCACACAA' for the first one and $number = '216-' and $nucleotides = 'GCTGGTTCAGCGACCAAAAGTAGC' for the second one.
Looks like BioPerl has some stuff for dealing with fasta data so you might want to use BioPerl's tools rather than rolling your own.

Here's how I'd go about doing that.
#!/usr/bin/perl -Tw
use strict;
use warnings;
use Data::Dumper;
while ( my $line = <DATA> ) {
my #strings =
grep {m{\A \S+ \z}xms} # no whitespace tokens
split /\A ( \w+ \| [\d-]+ )( [ACTG]+ ) /xms, # capture left & right
$line;
print Dumper( \#strings );
}
__DATA__
gi|216CCAACGAAATGATCGCCACACAA
gi|21-GCTGGTTCAGCGACCAAAAGTAGC

If you just want to add a space (can't really tell from your question), use substitution. To put a space in front of any grouping of ACTG:
$string =~ s/([ACTG]+)/ \1/;
or to add a tab after any grouping of digits and dashes:
$string =~ s/([\d-]+)/\1\t/;
note that this will substitute on $string in place.

Why is my Perl code not omitting newlines?

I'm reading this textfile to get ONLY the words in it and ignore all kind of whitespaces:
hello
now
do you see this.sadslkd.das,msdlsa but
i hoohoh
And this is my Perl code:
#!usr/bin/perl -w
require 5.004;
open F1, './text.txt';
while ($line = <F1>) {
#print $line;
#arr = split /\s+/, $line;
foreach $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
#print #arr;
}
close F1;
And this is the output:
hello
now
do
you
see
this.sadslkd.das,msdlsa
but
i
hoohoh
The output is showing two newlines but I am expecting the output to be just words. What should I do to just get words?

You should always use strict and use warnings (in preference to the -w command-line qualifier) at the top of every Perl program, and declare each variable at its first point of use using my. That way Perl will tell you about simple errors that you may otherwise overlook.
You should also use lexical file handles with the three-parameter form of open, and check the status to make sure it succeeded. There is little point in explicitly closing an input file unless you expect your program to run for an appreciable time, as Perl will close all files for you on exit.
Do you really need to require Perl v5.4? That version is fifteen years old, and if there is anything older than that installed then you have a museum!
Your program would be better like this:
use strict;
use warnings;
open my $fh, '<', './text.txt' or die $!;
while (my $line = <$fh>) {
my #arr = split /\s+/, $line;
foreach my $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
}
Note: my apologies. The warnings pragma and lexical file handles were introduced only in v5.6 so that part of my answer is irrelevant. The latest version of Perl is v5.16 and you really should upgrade
As Birei has pointed out, the problem is that, when the line has leading whitespace, there is a empty field before the first separator. Imagine if your data was comma-separated, then you would want Perl to report a leading empty field if the line started with a comma.
To extract all the non-space characters you can use a regular expression that does exactly that
my #arr = $line =~ /\S+/g;
and this can be emulated by using the default parameter for split which is a single quoted space (not a regular expression)
my #arr = $line =~ split ' ', $line;
In this case split behaves like the awk utility and discards any leading empty fields as you expected.
This is even simpler if you let Perl use the $_ variable in the read loop, as all of the parameters for split can be defaulted:
while (<F1>) {
my #arr = split;
foreach my $w (#arr) {
print "$w\n" if $w !~ /^\s+$/;
}
}

This line is the problem:
#arr=split(/\s+/,$line);
\s+ does a match just before the leading spaces. Use ' ' instead.
#arr=split(' ',$line);

I believe that in this line:
if(!($w =~ /^\s+$/))
You wanted to ask if there's nothing in this row - don't print it.
But the "+" in the REGEX actually force it to have at least 1 space.
If you change the "\s+" to "\s*", you'll see that it's working. because * is 0 occurrences or more ...

Split on comma, but only when not in parenthesis

I am trying to do a split on a string with comma delimiter
my $string='ab,12,20100401,xyz(A,B)';
my #array=split(',',$string);
If I do a split as above the array will have values
ab
12
20100401
xyz(A,
B)
I need values as below.
ab
12
20100401
xyz(A,B)
(should not split xyz(A,B) into 2 values)
How do I do that?

use Text::Balanced qw(extract_bracketed);
my $string = "ab,12,20100401,xyz(A,B(a,d))";
my #params = ();
while ($string) {
if ($string =~ /^([^(]*?),/) {
push #params, $1;
$string =~ s/^\Q$1\E\s*,?\s*//;
} else {
my ($ext, $pre);
($ext, $string, $pre) = extract_bracketed($string,'()','[^()]+');
push #params, "$pre$ext";
$string =~ s/^\s*,\s*//;
}
}
This one supports:
nested parentheses;
empty fields;
strings of any length.

Here is one way that should work.
use Regexp::Common;
my $string = 'ab,12,20100401,xyz(A,B)';
my #array = ($string =~ /(?:$RE{balanced}{-parens=>'()'}|[^,])+/g);
Regexp::Common can be installed from CPAN.
There is a bug in this code, coming from the depths of Regexp::Common. Be warned that this will (unfortunately) fail to match the lack of space between ,,.

Well, old question, but I just happened to wrestle with this all night, and the question was never marked answered, so in case anyone arrives here by Google as I did, here's what I finally got. It's a very short answer using only built-in PERL regex features:
my $string='ab,12,20100401,xyz(A,B)';
$string =~ s/((\((?>[^)(]*(?2)?)*\))|[^,()]*)(*SKIP),/$1\n/g;
my #array=split('\n',$string);
Commas that are not inside parentheses are changed to newlines and then the array is split on them. This will ignore commas inside any level of nested parentheses, as long as they're properly balanced with a matching number of open and close parens.
This assumes you won't have newline \n characters in the initial value of $string. If you need to, either temporarily replace them with something else before the substitution line and then use a loop to replace back after the split, or just pick a different delimiter to split the array on.

Limit the number of elements it can be split into:
split(',', $string, 4)

Here's another way:
my $string='ab,12,20100401,xyz(A,B)';
my #array = ($string =~ /(
[^,]*\([^)]*\) # comma inside parens is part of the word
|
[^,]*) # split on comma outside parens
(?:,|$)/gx);
Produces:
ab
12
20100401
xyz(A,B)

Here is my attempt. It should handle depth well and could even be extended to include other bracketed symbols easily (though harder to be sure that they MATCH). This method will not in general work for quotation marks rather than brackets.
#!/usr/bin/perl
use strict;
use warnings;
my $string='ab,12,20100401,xyz(A(2,3),B)';
print "$_\n" for parse($string);
sub parse {
my ($string) = #_;
my #fields;
my #comma_separated = split(/,/, $string);
my #to_be_joined;
my $depth = 0;
foreach my $field (#comma_separated) {
my #brackets = $field =~ /(\(|\))/g;
foreach (#brackets) {
$depth++ if /\(/;
$depth-- if /\)/;
}
if ($depth == 0) {
push #fields, join(",", #to_be_joined, $field);
#to_be_joined = ();
} else {
push #to_be_joined, $field;
}
}
return #fields;
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl: function to trim string leading and trailing whitespace - perl

Is there a built-in function to trim leading and trailing whitespace such that trim(" hello world ") eq "hello world"?

Here's one approach using a regular expression: $string =~ s/^\s+|\s+$//g ; # remove both leading and trailing whitespace Perl 6 will include a trim function: $string .= trim; Source: Wikipedia

There's no built-in trim function, but you can easily implement your own using a simple substitution: sub trim { (my $s = $_[0]) =~ s/^\s+|\s+$//g; return $s; } or using non-destructive substitution in Perl 5.14 and later: sub trim { return $_[0] =~ s/^\s+|\s+$//rg; }

According to this perlmonk's thread: $string =~ s/^\s+|\s+$//g;

Complete howto in the perfaq here: http://learn.perl.org/faq/perlfaq4.html#How-do-I-strip-blank-space-from-the-beginning-end-of-a-string-

For those that are using Text::CSV I found this thread and then noticed within the CSV module that you could strip it out via switch: $csv = Text::CSV->new({allow_whitespace => 1}); The logic is backwards in that if you want to strip then you set to 1. Go figure. Hope this helps anyone.

One option is Text::Trim: use Text::Trim; print trim(" example ");

Apply: s/^\s*//; s/\s+$//; to it. Or use s/^\s+|\s+$//g if you want to be fancy.

I also use a positive lookahead to trim repeating spaces inside the text: s/^\s+|\s(?=\s)|\s+$//g

No, but you can use the s/// substitution operator and the \s whitespace assertion to get the same result.

Related

Substitute, replace string with perl

Why is Perl's chomp not doing what I want it to do?

How to split a this string 'gi|216ATGCTGATGCTGTG' in this format 'gi|216 ATGCTGTGCTGATGCTG' in Perl?

Why is my Perl code not omitting newlines?

Split on comma, but only when not in parenthesis

Categories

Resources