removing trailing words end with : from a string using perl - perl

i have question on how to remove specific set of words that end with : in a string using perl.
For instance,
lunch_at_home: start at 1pm.
I want to get only "start at 1 pm"after discarding "lunch_at_home:"
note that lunch_at_home is just an example. It can be any string with any length but it should end with ":"

This should do the job.
my $string = "lunch_at_home: start at 1pm."
$string =~ s/^.*:\s*//;
It will remove all char before : including the :

If you want to remove a specific set of words that are set apart from the data you want:
my $string = 'lunch_at_home: start at 1pm.';
$string =~ s/\b(lunch_at_home|breakfast_at_work):\s*//;
That would leave you with start at 1pm. and you can expand the list as needed.
If you just want to remove any "words" (we'll use the term loosely) that end with a colon:
my $string = 'lunch_at_home: start at 1pm.';
$string =~ s/\b\S+:\s*//;
You'd end up with the same thing in this case.

take
my $string = "lunch_at_home: start at 1pm.";
to remove everything up to the last ":" and the period at the end of the entry as in your question:
$string =~ s/.*: (.*)\./$1/;
to remove everything up to the first ":"
$string =~ s/.*?: (.*)\./$1/;

split on : and discard the first part:
my (undef, $value) = split /:\s*/, $string, 2;
The final argument (2), ensures this works correctly if the trailing string contains a :.

You can use split function to achieve this:
my $string = "lunch_at_home: start at 1pm.";
$string = (split /:\s*/, $string)[1];
print "$string\n";

Related

Finding index of white space in Perl

I'm trying to find the index of white space in a string in Perl.
For example, if I have the string
stuff/more stuffhere
I'd like to select the word "more" with a substring method. I can find the index of "/" but haven't figured out how to find the index of white space. The length of the substring I'm trying to select will vary, so I can't hard code the index. There will only be one white space in the string (other than those after the end of the string).
Also, if anybody has any better ideas of how to do this, I'd appreciate hearing them. I'm fairly new to programming so I'm open to advice. Thanks.
Just use index:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = 'stuff/more stuffhere';
my $index_of_slash = index $string, '/';
my $index_of_space = index $string, ' ';
say "Between $index_of_slash and $index_of_space.";
The output is
Between 5 and 10.
Which is correct:
0 1
01234567890123456789
stuff/more stuffhere
If by "whitespace" you also mean tabs or whatever, you can use a regular expression match and the special variables #- and #+:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = "stuff/more\tstuffhere";
if ($string =~ m{/.*(?=\s)}) {
say "Between $-[0] and $+[0]";
}
The (?=\s) means is followed by a whitespace character, but the character itself is not part of the match, so you don't need to do any maths on the returned values.
As you stated, you want to select the word between the first /
and the first space following it.
If this is the case, you maybe don't need any index (you need just
the word).
A perfect tool to find something in a text is regex.
Look at the following code:
$txt = 'stuff/more stuffxx here';
if ($txt =~ /\/(.+?) /) {
print "Match: $1.\n";
}
The regex used tries to match:
a slash,
a non-empty sequence of any chars (note ? - reluctant
version), enclosed in a capturing group,
a space.
So after the match $1 contains what was captured by the first
capturing group, i.e. "your" word.
But if for any reason you were interested in starting and ending
offsets to this word, you can read them from $-[1]
and $+[1] (starting / ending indices of the first capturing group).
The arrays #- (#LAST_MATCH_START) and #+ (#LAST_MATCH_END) give offsets of the start and end of last successful submatches. See Regex related variables in perlvar.
You can capture your real target, and then read off the offset right after it with $+[0]
#+
This array holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. $+[0] is the offset into the string of the end of the entire match. This is the same value as what the pos function returns when called on the variable that was matched against.
Example
my $str = 'target and target with spaces';
while ($str =~ /(target)\s/g)
{
say "Position after match: $+[0]"
}
prints
Position after match: 7
Position after match: 18
These are positions right after 'target', so of spaces that come after it.
Or you can capture \s instead and use $-[1] + 1 (first position of the match, the space).
You can use
my $str = "stuff/more stuffhere";
if ($str =~ m{/\K\S+}) {
... substr($str, $-[0], $+[0] - $-[0]) ...
}
But why substr? That's very weird there. Maybe if you told us what you actually wanted to do, we could provide a better alternatives. Here are three cases:
Data extraction:
my $str = "stuff/more stuffhere";
if ( my ($word) = $str =~ m{/(\S+)} ) {
say $word; # more
}
Data replacement:
my $str = "stuff/more stuffhere";
$str =~ s{/\K\S+}{REPLACED};
say $str; # stuff/REPLACED stuffhere
Data replacement (dynamic):
my $str = "stuff/more stuffhere";
$str =~ s{/\K(\S+)}{ uc($1) }e;
say $str; # stuff/MORE stuffhere

how to remove date identifier and special charactar from string

I want to remove date identifier and * from string .
$string = "*102015 Supplied air hood";
$output = "Supplied air hood";
i have used
$string =~ s/[#\%&\""*+]//g;
$string =~ s/^\s+//;
what should i used to get string value = "Supplied air hood";
Thanks in advance
To remove everything from the string up to the first space, you can write
$str =~ s/^\S*\s+//;
Your pattern doesn't contain numbers. It would remove the *, but nothing else. If you want to remove a * followed by six digits and a blank at the beginning of the string, do it like this:
$string =~ s/^\*\d{6} //;
However, if that string always contains a pattern like this, you don't need a regular expression substitution. You can simply take a substring.
my $output = substr $string, 8;
That will assign the content of $string starting from the 9th character
The script below does what you want, assuming that the date always appears at the beginning the line, and that it is follow by exactly one space.
use strict;
use warnings;
while (<DATA>)
{
# skip one or more characters not a space
# then skip exactly one space
# then capture all remaining characters
# and assign them to $s
my ($s) = $_ =~ /[^ ]+ (.*)/;
print $s, "\n";
}
__DATA__
*110115 first date
*110115 second date
*110315 third date
Output is:
first date
second date
third date

issue in matching regexp in perl

I am having following code
$str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
$val = $str =~ /[
]*([\n]?[\n]+
[\n]?) ([^;^
]+)/s;
print "$1 and $2";
Getting output as
and PLNA
Why it is getting PLNA as output. I believe it should stop at first\n. I assume output should be OTNPKT0553 04-02-03 21:43:46
Your regex is messy and contains a lot of redundancy. The following steps demonstrate how it can be simplified and then it becomes more clear why it is matching PLNA.
1) Translating the literal new lines in your regex:
$val = $str =~ /[\n\n]*([\n]?[\n]+\n[\n]?) ([^;^\n]+)/s;
2) Then simplifying this code to remove the redundancy:
$val = $str =~ /(\n{2}) ([^;^\n]+)/s;
So basically, the regex is looking for two new lines followed by 3 spaces.
There are three spaces before OTNPKT0553, but there is only a single new line, so it won't match.
The next three spaces are before PLNA which IS preceded by two new lines, and so matches.
You have a whole lot of newlines in there - some literal and some encoded as \n. I'm not clear how you were thinking. Did you think \n matched a number maybe? A \d matches a digit, and will also match many Unicode characters that are digits in other languages. However for simple ASCII text it works fine.
What you need is something like this
use strict;
use warnings;
my $str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
my $val = $str =~ / (\w+) \s+ ( [\d-]+ \s [\d:]+ ) /x;
print "$1 and $2";
output
OTNPKT0553 and 04-02-03 21:43:46
You have an extra line feed, change the regex to:
$str =~ /[
]*([\n]?[\n]+[\n]?) ([^;^
]+)/s;
and simpler:
$str =~ /\n+ ([^;^\n]+)/s;

Read string within double quotes in perl

I have a file which contains lines like this:
"pin1" Inpin; "pin2" outpin; "pin3" inoutPin;
some other string "pin4" inpin
I want to store just pin1, pin2, pin3, pin4 (basically words within double quotes). Can someone please help..? Basically read one line at a time and grab the word within double quotes only. I tried to split a line by ";" but it doesn't work since ";" may not be present in all lines.
thanks!
join ', ' $str =~ /"([^"]*)"/g
try this code:
my $string = '"pin1" Inpin; "pin2" outpin; "pin3" inoutPin;`';
my #array;
while ( $string =~ /\"(.+?)\"/ )
{
put (array, $1.),
$string =~ s/\"$1\"//;
}

How can I expand a string like "1..15,16" into a list of numbers?

I have a Perl application that takes from command line an input as:
application --fields 1-6,8
I am required to display the fields as requested by the user on command line.
I thought of substituting '-' with '..' so that I can store them in array e.g.
$str = "1..15,16" ;
#arr2 = ( $str ) ;
#arr = ( 1..15,16 ) ;
print "#arr\n" ;
print "#arr2\n" ;
The problem here is that #arr works fine ( as it should ) but in #arr2 the entire string is not expanded as array elements.
I have tried using escape sequences but no luck.
Can it be done this way?
If this is user input, don't use string eval on it if you have any security concerns at all.
Try using Number::Range instead:
use Number::Range;
$str = "1..15,16" ;
#arr2 = Number::Range->new( $str )->range;
print for #arr2;
To avoid dying on an invalid range, do:
eval { #arr2 = Number::Range->new( $str )->range; 1 } or your_error_handling
There's also Set::IntSpan, which uses - instead of ..:
use Set::IntSpan;
$str = "1-15,16";
#arr2 = Set::IntSpan->new( $str )->elements;
but it requires the ranges to be in order and non-overlapping (it was written for use on .newsrc files, if anyone remembers what those are). It also allows infinite ranges (where the string starts -number or ends number-), which the elements method will croak on.
You're thinking of #arr2 = eval($str);
Since you're taking input and evaluating that, you need to be careful.
You should probably #arr2 = eval($str) if ($str =~ m/^[0-9.,]+$/)
P.S. I didn't know about the Number::Range package, but it's awesome. Number::Range ftw.
I had the same problem in dealing with the output of Bit::Vector::to_Enum. I solved it by doing:
$range_string =~ s/\b(\d+)-(\d+)\b/expand_range($1,$2)/eg;
then also in my file:
sub expand_range
{
return join(",",($_[0] .. $_[1]));
}
So "1,3,5-7,9,12-15" turns into "1,3,5,6,7,9,12,13,14,15".
I tried really hard to put that expansion in the 2nd part of the s/// so I wouldn't need that extra function, but I couldn't get it to work. I like this because while Number::Range would work, this way I don't have to pull in another module for something that should be trivial.
#arr2 = ( eval $str ) ;
Works, though of course you have to be very careful with eval().
You could use eval:
$str = "1..15,16" ;
#arr2 = ( eval $str ) ;
#arr = ( 1..15,16 ) ;
print "#arr\n" ;
print "#arr2\n" ;
Although if this is user input, you'll probably want to do some validation on the input string first, to make sure they haven't input anything dodgy.
Use split:
#parts = split(/\,/, $fields);
print $parts[0];
1-6
print $parts[1];
8
You can't just put a string containing ',' in an array, and expect it to turn to elements (except if you use some Perl black magic, but we won't go into that here)
But Regex and split are your friends.