I want to remove date identifier and * from string .
$string = "*102015 Supplied air hood";
$output = "Supplied air hood";
i have used
$string =~ s/[#\%&\""*+]//g;
$string =~ s/^\s+//;
what should i used to get string value = "Supplied air hood";
Thanks in advance
To remove everything from the string up to the first space, you can write
$str =~ s/^\S*\s+//;
Your pattern doesn't contain numbers. It would remove the *, but nothing else. If you want to remove a * followed by six digits and a blank at the beginning of the string, do it like this:
$string =~ s/^\*\d{6} //;
However, if that string always contains a pattern like this, you don't need a regular expression substitution. You can simply take a substring.
my $output = substr $string, 8;
That will assign the content of $string starting from the 9th character
The script below does what you want, assuming that the date always appears at the beginning the line, and that it is follow by exactly one space.
use strict;
use warnings;
while (<DATA>)
{
# skip one or more characters not a space
# then skip exactly one space
# then capture all remaining characters
# and assign them to $s
my ($s) = $_ =~ /[^ ]+ (.*)/;
print $s, "\n";
}
__DATA__
*110115 first date
*110115 second date
*110315 third date
Output is:
first date
second date
third date
Related
I need to increment a numeric value in a string:
my $str = "tool_v01.zip";
(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ ($1++);/eri;
#(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ ($1+1);/eri;
#(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ $1=~s{(\d+)}{$1+1}/r; /eri;
print $newstr;
Expected output is tool_v02.zip
Note: the version number 01 may contain any number of leading zeroes
I don't think this question has anything to do with the /r modifier, but rather how to properly format the output. For that, I'd suggest sprintf:
my $newstr = $str =~ s{ _v (\d+) \.zip$ }
{ sprintf("_v%0*d.zip", length($1), $1+1 ) }xeri;
Or, replacing just the number with zero-width Lookaround Assertions:
my $newstr = $str =~ s{ (?<= _v ) (\d+) (?= \.zip$ ) }
{ sprintf("%0*d", length($1), $1+1 ) }xeri;
Note: With either of these solutions, something like tool_v99.zip would be altered to tool_v100.zip because the new sequence number cannot be expressed in two characters. If that's not what you want then you need to specify what alternative behaviour you require.
The bit you're missing is sprintf which works the same way as printf except rather than outputting the formatted string to stdout or a file handle, it returns it as a string. Example:
sprintf("%02d",3)
generates a string 03
Putting this into your regex you can do this. Rather than using /r you can use do a zero-width look ahead ((?=...)) to match the file suffix and just replace the matched number with the new value
s/(\d+)(?=.zip$)/sprintf("%02d",$1+1)/ei
I'm trying to find the index of white space in a string in Perl.
For example, if I have the string
stuff/more stuffhere
I'd like to select the word "more" with a substring method. I can find the index of "/" but haven't figured out how to find the index of white space. The length of the substring I'm trying to select will vary, so I can't hard code the index. There will only be one white space in the string (other than those after the end of the string).
Also, if anybody has any better ideas of how to do this, I'd appreciate hearing them. I'm fairly new to programming so I'm open to advice. Thanks.
Just use index:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = 'stuff/more stuffhere';
my $index_of_slash = index $string, '/';
my $index_of_space = index $string, ' ';
say "Between $index_of_slash and $index_of_space.";
The output is
Between 5 and 10.
Which is correct:
0 1
01234567890123456789
stuff/more stuffhere
If by "whitespace" you also mean tabs or whatever, you can use a regular expression match and the special variables #- and #+:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = "stuff/more\tstuffhere";
if ($string =~ m{/.*(?=\s)}) {
say "Between $-[0] and $+[0]";
}
The (?=\s) means is followed by a whitespace character, but the character itself is not part of the match, so you don't need to do any maths on the returned values.
As you stated, you want to select the word between the first /
and the first space following it.
If this is the case, you maybe don't need any index (you need just
the word).
A perfect tool to find something in a text is regex.
Look at the following code:
$txt = 'stuff/more stuffxx here';
if ($txt =~ /\/(.+?) /) {
print "Match: $1.\n";
}
The regex used tries to match:
a slash,
a non-empty sequence of any chars (note ? - reluctant
version), enclosed in a capturing group,
a space.
So after the match $1 contains what was captured by the first
capturing group, i.e. "your" word.
But if for any reason you were interested in starting and ending
offsets to this word, you can read them from $-[1]
and $+[1] (starting / ending indices of the first capturing group).
The arrays #- (#LAST_MATCH_START) and #+ (#LAST_MATCH_END) give offsets of the start and end of last successful submatches. See Regex related variables in perlvar.
You can capture your real target, and then read off the offset right after it with $+[0]
#+
This array holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. $+[0] is the offset into the string of the end of the entire match. This is the same value as what the pos function returns when called on the variable that was matched against.
Example
my $str = 'target and target with spaces';
while ($str =~ /(target)\s/g)
{
say "Position after match: $+[0]"
}
prints
Position after match: 7
Position after match: 18
These are positions right after 'target', so of spaces that come after it.
Or you can capture \s instead and use $-[1] + 1 (first position of the match, the space).
You can use
my $str = "stuff/more stuffhere";
if ($str =~ m{/\K\S+}) {
... substr($str, $-[0], $+[0] - $-[0]) ...
}
But why substr? That's very weird there. Maybe if you told us what you actually wanted to do, we could provide a better alternatives. Here are three cases:
Data extraction:
my $str = "stuff/more stuffhere";
if ( my ($word) = $str =~ m{/(\S+)} ) {
say $word; # more
}
Data replacement:
my $str = "stuff/more stuffhere";
$str =~ s{/\K\S+}{REPLACED};
say $str; # stuff/REPLACED stuffhere
Data replacement (dynamic):
my $str = "stuff/more stuffhere";
$str =~ s{/\K(\S+)}{ uc($1) }e;
say $str; # stuff/MORE stuffhere
i have question on how to remove specific set of words that end with : in a string using perl.
For instance,
lunch_at_home: start at 1pm.
I want to get only "start at 1 pm"after discarding "lunch_at_home:"
note that lunch_at_home is just an example. It can be any string with any length but it should end with ":"
This should do the job.
my $string = "lunch_at_home: start at 1pm."
$string =~ s/^.*:\s*//;
It will remove all char before : including the :
If you want to remove a specific set of words that are set apart from the data you want:
my $string = 'lunch_at_home: start at 1pm.';
$string =~ s/\b(lunch_at_home|breakfast_at_work):\s*//;
That would leave you with start at 1pm. and you can expand the list as needed.
If you just want to remove any "words" (we'll use the term loosely) that end with a colon:
my $string = 'lunch_at_home: start at 1pm.';
$string =~ s/\b\S+:\s*//;
You'd end up with the same thing in this case.
take
my $string = "lunch_at_home: start at 1pm.";
to remove everything up to the last ":" and the period at the end of the entry as in your question:
$string =~ s/.*: (.*)\./$1/;
to remove everything up to the first ":"
$string =~ s/.*?: (.*)\./$1/;
split on : and discard the first part:
my (undef, $value) = split /:\s*/, $string, 2;
The final argument (2), ensures this works correctly if the trailing string contains a :.
You can use split function to achieve this:
my $string = "lunch_at_home: start at 1pm.";
$string = (split /:\s*/, $string)[1];
print "$string\n";
I have a two strings, XXXXXXnumber and XXXXXXdate and I want to strip all the XXXXXX from each string. The actual number of character represented by XXXXXX can vary. The suffixes 'number' and 'date' are constant. XXXXXXnumber and XXXXXXXdate should become XXXXXX.
my ($prefix) = ($string =~ /\A (.+?) (?:date|number) \z/x);
Alternatively:
$string =~ s/ (?:date|number) \z//x;
I would use a regular expression like $line =~ s/(number|date)$// for that task, where $line can be either line.
If your line has additional characters after number or date, they must be filtered out, too. An alternative approach would be using an expression like ($num) = ($line =~ /^(.*)(number|date).*$/);
use regexes:
($newvar = $oldvar) =~ s/^(.*)(number|date)$/$1/;
if you have no mor euse for $oldvar's original value (including the Xes) this simplifies to
$oldvar =~ s/^(.*)(number|date)$/$1/;
A simple substitution takes care of it:
$str =~ s/(?:number|date)\z/;
I have
print $str;
abcd*%1234$sdfsd..#d
The string would always have only one continuous stretch of numbers, like 1234 in this case. Rest all will be either alphabets or other special characters.
How can I extract the number (1234 in this case) and store it back in str?
This page suggests that I should use \d, but how?
If you don't want to modify the original string, you can extract the numbers by capturing them in the regex, using subpatterns. In list context, a regular expression returns the matches defined in the subpatterns.
my $str = 'abc 123 x456xy 789foo';
my ($first_num) = $str =~ /(\d+)/; # 123
my #all_nums = $str =~ /(\d+)/g; # (123, 456, 789)
$str =~ s/\D//g;
This removes all nondigit characters from the string. That's all that you need to do.
EDIT: if Unicode digits in other scripts may be present, a better solution is:
$str =~ s/[^0-9]//g;
If you wanted to do it the destructive way, this is the fastest way to do it.
$str =~ tr/0-9//cd;
translate all characters in the complement of 0-9 to nothing, delete them.
The one caveat to this approach, and Phillip Potter's, is that were there another group of digits further down the string, they would be concatenated with the first group of digits. So it's not clear that you would want to do this.
The surefire way to get one and only one group of digits is
( $str ) = $str =~ /(\d+)/;
The match, in a list context returns a list of captures. The parens around $str are simply to put the expression in a list context and assign the first capture to $str.
Personally, I would do it like this:
$s =~ /([0-9]+)/;
print $1;
$1 will contain the first group matched the given regular expression (the part in round brackets).