I am trying to write a script to make a integer into comma separated value:
For Example Input : 23454546.3435353 test 123454546789.3435353 #'n' number of values will be present.
I am receiving output: 234, 54, 546.34, 35, 353 test 123454546789.34, 35, 353
My Query is, after the decimal value do not insert ,.
Required Output : 2,34,54,546.**3435353** test 1,23,45,45,46,789.**3435353**
My Code:
my $strg = "23454546.3435353 test 123454546789.3435353"
$strg=~s#([^\.])(\d{4,})# my $no_dot=$1; my $nums = $2; my $cnt = $nums=~s/\d/$&/g;
if($cnt eq 8) { $nums=~s/^(\d{1})(\d{2})(\d{2})(\d{3})$/$1\, $2\, $3\, $4/g; }
if($cnt eq 7) { $nums=~s/^(\d{2})(\d{2})(\d{3})$/$1\, $2\, $3/g; }
if($cnt eq 6) { $nums=~s/^(\d{1})(\d{2})(\d{3})$/$1\, $2\, $3/g; }
if($cnt eq 5) { $nums=~s/^(\d{2})(\d{3})$/$1\, $2/g; }
if($cnt eq 4) { $nums=~s/^(\d{1})(\d{3})$/$1\, $2/g; }
($no_dot.$nums);
#ge;
Please advice where I am doing wrong.
Maybe easier to use Locale::Currency::Format ?
use Locale::Currency::Format;
my $inr = currency_format('INR', 23454546.3435353);
say $inr;
Output:
23,454,546.34 INR
Edit:
It seems like the precision cannot be modified by the module. If you want a precision other than 2, you can try use the brute force approach. For example:
my $strg = "23454546.3435353 test 123454546789.3435353";
$strg =~ s/(\d+)\.(\d+)/do_subst($1,$2)/ge;
sub do_subst {
my ($p1, $p2) = #_;
my $temp = reverse $p1;
my #parts;
push #parts, $1 if $temp =~ s/^(\d{1,3})//;
while (length $temp) {
push #parts, $1, if $temp =~ s/^(\d{1,2})//;
}
return (join ',', map {my $p = reverse $_; $p} reverse #parts) . '.' . $p2;
}
say $strg;
Output:
2,34,54,546.3435353 test 1,23,45,45,46,789.3435353
Your detection of relevant strings is flawed. You have:
([^\.])(\d{4,})
but this means the first character can be a digit.
That explains why you get 234, in your output.
It will also match the digits one beyond a period (so in .3435353, you match 3435353). Note that the \ is not needed (. is not special inside square brackets).
Try this:
$strg =~ s{
( # $1:
( ^ | \s ) # delimiter (whitespace or line start)
\d{4,} # digits that need commas
)
(?= # lookahead:
( \. \d+ )? # optional fractional part
( \s | $ ) # delimiter (whitespace or line end)
)
}{
local $_ = $1;
s/(\d)(?=(\d{2})*\d{3}$)/$1,/g;
$_;
}xeg;
The outer regex matches runs of 4 or more digits (optionally followed by a fractional part), delimited either by whitespace or at the start/end of the string.
The inner regex also makes use of lookahead to find digits that need commas (ie. 2n+4 from the right). Because of greedy matching of the lookahead, replacements are made starting from the left.
Related
To achieve below task I have written below C like perl program (As I am new to Perl), But I am not sure if this is the best way to achieve.
Can someone please guide?
Note: Not with the full program, But where I can make improvement.
Thanks in advance
Input :
$str = "mail1, local<mail1#mail.local>, mail2#mail.local, <mail3#mail.local>, mail4 local<mail4#mail.local>"
Expected Output :
mail1, local<mail1#mail.local>
mail2#mail.local
<mail3#mail.local>
mail4, local<mail4#mail.local>
Sample Program
my $str="mail1, \#local<mail1\#mail.local>, mail2\#mail.local, <mail3\#mail.local>, mail4, local<mail4\#mail.local>";
my $count=0, #array, $flag=0, $tempStr="";
for my $c (split (//,$str)) {
if( ($count eq 0) and ($c eq ' ') ) {
next;
}
if($c) {
if( ($c eq ',') and ($flag eq 1) ) {
push #array, $tempStr;
$count=0;
$flag1=0;
$tempStr="";
next;
}
if( ($c eq '>' ) or ( $c eq '#' ) ) {
$flag=1;
}
$tempStr="$tempStr$c";
$count++;
}
}
if($count>0) {
push #array, $tempStr;
}
foreach my $var (#array) {
print "$var\n";
}
Edit:
Input:
Input is the output of above code.
Expected Output :
"mail1, local"<mail1#mail.local>
"mail4, local"<mail4#mail.local>
Sample Code:
$str =~ s/([^#>]+[#>][^,]+),\s*/$1\n/g;
my #addresses = split('\n',$str);
if(scalar #addresses) {
foreach my $address (#addresses) {
if (($address =~ /</) and ($address !~ /\"/) and ($address !~ /^</)){
$address="\"$address";
$address=~ s/</\"</g;
}
}
$str = join(',',#addresses);
}
print "$str\n";
As I see, you want to replace each:
comma and following spaces,
occurring after either # or >,
with a newline.
To make such replacement, instead of writing a parsing program, you can use
a regex.
The search part can be as follows:
([^#>]+[#>][^,]+),\s*
Details:
( - Start of the 1st capturing group.
[^#>]+ - A non-empty sequence of chars other than # or >.
[#>] - Either # or >.
[^,]+ - A non-empty sequence of chars other than a comma.
) - End of the 1st capturing group.
,\s* - A comma and optional sequence of spaces.
The replace part should be:
$1 - The 1st capturing group.
\n - A newline.
So the whole program, much shorter than yours, can be as follows:
my $str='mail1, local<mail1#mail.local>, mail2#mail.local, <mail3#mail.local>, mail4, local<mail4#mail.local>';
print "Before:\n$str\n";
$str =~ s/([^#>]+[#>][^,]+),\s*/$1\n/g;
print "After:\n$str\n";
To replace all needed commas I used g option.
Note that I put the source string in single quotes, otherwise Perl
would have complained about Possible unintended interpolation of #mail.
Edit
Your modified requirements must be handled different way.
"Ordinary" replacement is not an option, because now there are some
fragments to match and some framents to ignore.
So the basic idea is to write a while loop with a matching regex:
(\w+),?\s+(\w+)(<[^>]+>), meaning:
(\w+) - First capturing group - a sequence of word chars (e.g. mail1).
,?\s+ - Optional comma and a sequence of spaces.
(\w+) - Second capturing group - a sequence of word chars (e.g. local).
(<[^>]+>) - Third capturing group - a sequence of chars other than >
(actual mail address), enclosed in angle brackets, e.g. <mail1#mail.local>.
Within each execution of the loop you have access to the groups
captured in this particular match ($1, $2, ...).
So the content of this loop is to print all these captured groups,
with required additional chars.
The code (again much shorter than yours) should look like below:
my $str = 'mail1, local<mail1#mail.local>, mail2#mail.local, <mail3#mail.local>, mail4 local<mail4#mail.local>';
while ($str =~ /(\w+),?\s+(\w+)(<[^>]+>)/g) {
print "\"$1, $2\"$3\n";
}
Here is an approach using split, which in this case also needs a careful regex
use warnings;
use strict;
use feature 'say';
my $string = # broken into two parts for readabililty
q(mail1, local<mail1#mail.local>, mail2#mail.local, )
. q(<mail3#mail.local>, mail4, local<mail4#mail.local>);
my #addresses = split /#.+?\K,\s*/, $string;
say for #addresses;
The split takes a full regex in its delimiter specification. In this case I figure that each record is delimited by a comma which comes after the email address, so #.+?,
To match a pattern only when it is preceded by another brings to mind a negative lookbehind before the comma. But those can't be of variable length, which is precisely the case here.
We can instead normally match the pattern #.+? and then use the \K form (of the lookbehind) which drops all previous matches so that they are not taken out of the string. Thus the above splits on ,\s* when that is preceded by the email address, #... (what isn't consumed).
It prints
mail1, local<mail1#mail.local>
mail2#mail.local
<mail3#mail.local>
mail4, local<mail4#mail.local>
The edit asks about quoting the description preceding <...> when it's there. A simple way is to make another pass once addresses have been parsed out of the string as above. For example
my #addresses = split /#.+?\K,\s*/, $string; #/ stop syntax highlight
s/(.+?,\s*.+?)</"$1"</ for #addresses;
say for #addresses;
The regex in a loop is one way to change elements of an array. I use it for its efficiency (changes elements in place), conciseness, and as a demonstration of the following properties.
In a foreach loop the index variable (or $_) is an alias for the currently processed element – so changing it changes that element. This is a known source of bugs when allowed unknowingly, which was another reason to show it in the above form.
The statement also uses the statement modifier and it is equivalent to
foreach my $elem (#addresses) {
$elem =~ s/(.+?,\s*.+?)</"$1"</;
}
This is often considered a more proper way to write it but I find that the other form emphasizes more clearly that elements are being changed, when that is the sole purpose of the foreach.
I need to increment a numeric value in a string:
my $str = "tool_v01.zip";
(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ ($1++);/eri;
#(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ ($1+1);/eri;
#(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ $1=~s{(\d+)}{$1+1}/r; /eri;
print $newstr;
Expected output is tool_v02.zip
Note: the version number 01 may contain any number of leading zeroes
I don't think this question has anything to do with the /r modifier, but rather how to properly format the output. For that, I'd suggest sprintf:
my $newstr = $str =~ s{ _v (\d+) \.zip$ }
{ sprintf("_v%0*d.zip", length($1), $1+1 ) }xeri;
Or, replacing just the number with zero-width Lookaround Assertions:
my $newstr = $str =~ s{ (?<= _v ) (\d+) (?= \.zip$ ) }
{ sprintf("%0*d", length($1), $1+1 ) }xeri;
Note: With either of these solutions, something like tool_v99.zip would be altered to tool_v100.zip because the new sequence number cannot be expressed in two characters. If that's not what you want then you need to specify what alternative behaviour you require.
The bit you're missing is sprintf which works the same way as printf except rather than outputting the formatted string to stdout or a file handle, it returns it as a string. Example:
sprintf("%02d",3)
generates a string 03
Putting this into your regex you can do this. Rather than using /r you can use do a zero-width look ahead ((?=...)) to match the file suffix and just replace the matched number with the new value
s/(\d+)(?=.zip$)/sprintf("%02d",$1+1)/ei
How to combine two regex . This is my input:
1.UE_frequency_offset_flag else { 2} UE_frequency_offset_flag
2.served1 0x00 Uint8,unsigned char
#my first regex expression is used for extracting the values inside curly braces
my ($first_match) = /(\b(\d+)\b)/g;
print "$1 \n";
#my second regex expression
my ($second_match) = / \S \s+ ( \{ [^{}]+ \} | \S+ ) /x;
I was trying to combine both regex but did not get the expected output.
my ($second_match) = / \S \s+ ( \{ [^{}]+ \} |\b(\d+)\b| \S+ ) /x;
My expected output:
2,0x00
Please help where I am doing mistake?
The question is not completely clear to me, because I don't see how you want to combine two regex or pass the output of one to the other.
If you want to pass the captured part of the first regex then you need to save it to a variable:
my ($first_match) = /(\b(\d+)\b)/g;
my $captured = $1;
Then you can place the variable $captured in the second regex.
If you want to use the complete match and search inside that. Then you need to do the following:
my ($first_match) = /(\b(\d+)\b)/g;
print "$1,"; # Don't print one space then new line if you want to have a comma separating the two values
my ($second_match) = $first_match =~ / \S \s+ ( \{ [^{}]+ \} | \S+ ) /x;
Based on your input, this won't generate the expected output.
The following code would print out:
2,0x00
When processing your input.
print "$1," if /\{\s*(\d+)\s*\}/;
print "$1\n" if /(\d+x\d+)/;
I am trying to use the index() function and I want to find the position of a word inside a string, only when it is an exact match. For example:
My string is STRING="CATALOG SCATTER CAT CATHARSIS"
And my search string is KEY=CAT
I want to say something like index($STRING, $KEY) and check match for CAT, and not CATALOG. How do I accomplish this? The documentation says
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-expression pattern match.
which makes me think that it may not be that straight-forward, but my perl skills are limited :). Is it possible to do what I am trying to do?
Hopefully, I was able to articulate my question well. Thanks in advance for your help!
How about:
my $str = "CATALOG SCATTER CAT CATHARSIS";
my $key = "CAT";
if ($str =~ /\b$key\b/) {
say "match at char ",$-[0];;
} else {
say "no match";
}
output:
match at char 16
You need to learn about Regular Expressions in Perl. Perl didn't invent Regular Expressions, but tremendously expanded upon the concept. In fact, many other programming languages talk specifically about using Perl Regular Expressions.
A regular expression matches a specific word pattern. For example, /cat/ matches the sequence cat in a string.
if ( $string =~ /cat/ ) {
print "String contains the letters 'cat' in a row\n";
}
In many ways, this does the same thing as:
my $location = index ( $string, "cat" );
if ( $location =! -1 ) { # index returns -1 when substring isn't found
print "String contains the letters 'cat' in a row\n";
}
But, both of these would match:
"Don't let the cat out of the bag"
"The Sears catalog arrived in the mail"
You don't want to match the last. So, you could do this:
my $location = index $string, " cat ";
Now, index $string, " cat " won't match the word catalog. Case closed! Or is it? What about:
"cat and dog it doth rain."
Maybe you could check and say things are okay if a sentence starts with "cat":
if ( (index ($string, " cat ") != -1) or (index ($string, "cat") = 0) ) {
print "String contains the letters 'cat' in a row\n";
}
But, what about these?
"The word CAT in all uppercase"
"Stupid cat"
"Cat! Here Cat! Common Cat!": Punctuation after the word "cat"
"Don't let the 'cat' out of the 'bag'": Quotation Marks around "cat"
It could take dozens of lines to specify each and every one of these conditions.
However:
if ( $string =~ /\bcat\b/i ) {
print "String contains the word 'cat' in it\n";
}
Specifies each and every one -- and then some. The \b says this is a word boundary. This could be a space, a tab, a quote, the beginning or ending of a line. Thus /\bcat\b/ specifies that this should be the word cat and not catalog. The i on the end tells your regular expression to ignore case when matching, so you'll find Cat, cat, CAT, cAt, and all other possible combinations.
In fact, Perl's regular expressions is what made Perl such a popular language to begin with.
Fortunately, Perl comes with not one, but two tutorials on Regular Expressions:
perlretut: Perl Regular Expression Tutorial
perlrequick: Perl Regular Expression Quick Start.
Hope this helps.
That's (partial) solution of this problem with index:
use warnings;
use strict;
my $test = 'CATALOG SCATTER CAT CATHARSIS';
my $key = 'CAT';
my $k_length = length $key;
my $s_length = (length $test) - $k_length;
my $pos = -1;
while (($pos = index $test, $key, $pos + 1) > -1) {
if ($pos > 0) {
my $prev_char = substr $test, $pos - 1, 1;
### print "Previous character: '$prev_char'\n";
next if $prev_char ge 'A' && $prev_char le 'Z'
|| $prev_char ge 'a' && $prev_char le 'z';
}
if ($pos < $s_length) {
my $next_char = substr $test, $pos + $k_length, 1;
### print "Next character: '$next_char'\n";
next if $next_char ge 'A' && $next_char le 'Z'
|| $next_char ge 'a' && $next_char le 'z';
}
print "Word '$key' found at " . $pos + 1 . "th position.\n";
}
As you see, it's kinda wordy, because it uses basic Perl string functions - index and substr - only. Checking whether the substring found is indeed a word is done via checking its next and previous characters (if they exist): if they belong to either A-Z or a-z range, it's not a word.
You can simplify it a bit by trying to lowercase these characters (with lc), then check against the single character range only:
my $lc_prev_char = lc( substr $test, $pos - 1, 1 );
next if $lc_prev_char ge 'a' && $lc_prev_char le 'z';
... but then again, it's quite a minor improvement (if improvement at all).
Now consider this:
my $test = 'CATALOG SCATTER CAT CATHARSIS CAT';
my $key = 'CAT';
while ($test =~ /(?<![A-Za-z])$key(?![A-Za-z])/g) {
print "Word '$key' found at " . ($-[0] + 1) . "th position.\n";
}
... and that's it! The pattern literally tests the string given ($test) for the substring given ($key) not being either preceded with or followed by the symbol of A-Za-z range, and supporting Perl regex magic (this variable, in particular) makes it easy to get the starting position of such substring.
The bottom line: use regexes to do the regexes' work.
Regular expressions allow for the search to contain word boundaries as well as distinct characters. While
my $string = "CATALOG SCATTER CAT CATHARSIS";
index($string, 'CAT');
will return zero or greater if $string contains the characters CAT, a regular expression like
$string =~ /\bCAT\b/;
will return false as $string doesn't contain CAT preceded and followed by a word boundary. (A word boundary is either the beginning or end of the string, or between an word character and a non-word character. A word character is any alphanumeric character or an underscore.)
use \E value.
so :
#!usr/bin/perl
my $string ="Little Tony";
my $check = "Ton";
if($string =~ m/$check\E/g)
{
print "match";
}
else
{
die("No Match");
}
I wrote a perl script whihc will output a list containing similar entries like below:
$var = ' whatever'
$var contains: a single quote, a space, the word whatever, single quote
actually, this is key of a hash and i want to pull the value for the same. but due to the single quotes and a space in betweene, i am not able to pull the hash key value.
So, i want to strip $var as below:
$var = whatever
meaning remove the single quote, the space and the trailing single quote.
so that I can use $var as hash key to pull the respective value.
could you guide me on a perl oneliner for the same.
thnaks.
Here is several ways to do it, but beware - modifying the keys in a hash can end with unwanted results, like:
use strict;
use warnings;
use Data::Dumper;
my $src = {
"a a" => 1,
" a a " => 2,
"' a a '" => 3,
};
print "src: ", Dumper($src);
my $trg;
#$trg{ map { s/^[\s']*(.*?)[\s']*$/$1/; $_ } keys %$src } = values %$src;
print "copy: ", Dumper($trg);
will produce:
src: $VAR1 = {
' a a ' => 2,
'\' a a \'' => 3,
'a a' => 1
};
copy: $VAR1 = {
'a a' => 1
};
Any regex is possible do explain with YAPE::Regex::Explain module. (from CPAN). For the above regex:
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new( qr(^[\s']*(.*?)[\s']*$) )->explain;
will produce:
The regular expression:
(?-imsx:^[\s']*(.*?)[\s']*$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
[\s']* any character of: whitespace (\n, \r, \t,
\f, and " "), ''' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
[\s']* any character of: whitespace (\n, \r, \t,
\f, and " "), ''' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
In short the: s/^[\s']*(.*?)[\s']*$/$1/; mean:
at the beginning of the string match whitespaces or apostrophe as much times is possible,
then match anything
match at the end of string whitespaces or apostrophes as much times as possible
and keep the only the "anything" part
#!/usr/bin/perl
$string = "' my string'";
print $string . "\n";
$string =~ s/'//g;
$string =~ s/^ //g;
print $string;
Output
' my string'
my string
$var =~ tr/ '//d;
see: tr operator
or, by regex
$var =~ s/(?:^['\s]+)|'//g;
The latter will keep the spaces in the middle of the word, the former removes all spaces and single quotes.
A short test:
...
$var = q{' what ever'};
$var =~ s/
(?: # find the following group
^ # at string begin, followed by
['\s]+ # space or single quote, one or more
) # close group
| # OR
' # single quotes in the while string
//gx ; # replace by nothing, use formatted regex (x)
print "|$var|\n";
...
prints:
|what ever|
as expected.