How do I find the sum of all numbers in STDIN even if there are non-digit characters? - perl

I have an assignment asking me to enter a sequence of numbers and characters each separated by a space and the sequence in ended by entering in "q" or "Q" followed by a space. Everything except the numbers should be discarded and we are to find the sum. So for example if the input is "1 12 a 2 5 P Q" then we should expect to get "20" as the output.
So far I'm using
$input = <>;
$input =~ tr/0-9//cd;
to get only the numbers but what I want is to split them up and get the sum. Right now the output would be 11225 and I want "1+12+2+5" and get the sum.

perl -ne '$s=0;($line)=/(.*?)[Qq]/;while($line=~/(\d+)/g) {$s+=$1} print "$s\n"'
Explanation:
Strips the trailing part of each line starting with a Q or a q, then scan the remaining part for isolated positive integers and adds these together.

First, strip out all characters that aren't numbers or spaces:
$input =~ s/[^0-9\s]//g;
Then, split on whitespace:
#digits = split(/\s/, $input);
Then you have a list of digits that you can add up.

Preserve spaces in your first step:
$input =~ tr/0-9 //cd;
Then split on spaces:
my #numbers = split ' ', $input;
(this is a special form of split that works like split /\s+/ but also discards empty leading fields).
You probably want to start by getting rid of everything after a Q though:
$input =~ s/Q .*//i;
For what it's worth, I wouldn't have jumped to using tr here; I'd have started by spliting on spaces, then processed fields that were only digits until a Q was reached.

Related

Add a new line in a variable using perl

I am trying to add a new line in a variable after certain number of words. For example: If we have a variable:
$x = "This a variable, start a new line here, This is a new line.";
If I print the above variable
print $x;
I should get the below output:
This is a variable,
start a new line here,
This is a new line.
How can I achieve this in Perl from the variable itself?
I do not agree to the formula "after certain number of words".
Note that the first target line has 4 words, whereas remaining 2 have
5 words each.
Actually you need to replace each comma and following sequence of
spaces (if any) with a comma and \n.
So the intuitive way to do it is:
$x =~ s/,\s*/,\n/g;
The simplest way is to split the string on comma followed by a space and then
join the word groups with a comma followed by a newline.
my $x = "This a variable, start a new line here, This is a new line.";
print join(",\n", split /, /, $x) . "\n";
output
This a variable,
start a new line here,
This is a new line.
For solving the general, how do I reformat this string with line breaks after n-columns? problem, use the Text::Wrap library (as suggested by #ikegami):
use Text::Wrap;
my $x = "The quick brown fox jumped over the lazy dog.";
$Text::Wrap::columns = 15;
# wrap() needs an array of words
my #words = split /\s+/, $x;
# Initial tab, subsequent tab values set to '' (think indent amount)
print wrap('', '', #words) . "\n";
output
The quick
brown fox
jumped over
the lazy dog.
You probably want to use regular expressions. You can do this:
$x =~ s/^(\S+\s+){3}\K/\n/;
Or if this is about the commas and not the spaces:
$x =~ s/^([^,]+,+){2}\s*\K/\n/;
(in this case I also remove any potential space that would be after the comma)
You can also configure separately how many words or comma you want, by putting this in a variable:
my $nbwords = 7; # add a line after the 7th word
$x =~ s/^(\S+\s+){$nbwords}\K/\n/;
Now, that would keep the last space so you may want to do this:
my $nbwords = 7; # add a line after the 7th word
$nbwords--; # becomes 6 because there is another word after that we match as well
$x =~ s/^(\S+\s+){$nbwords}\S+\K\s+/\n/;
You should probably learn to use Regexps but just to explain the above:
\s is any space character (like space, tab, line feed, etc)
\S (uppercase) is any character except a space character
+ means any number of characters of that type described with what is before. So \s+ means any number of consecutive space characters.
{123} means 123 times that type of character ...
{3,80} means 3 to 80 times. So + is equivalent to {1,} (one to unlimited)
\K means that whatever is before will not be replaced, only what is after.

Unpack and x option in TEMPLATE

I have a row which looks like (no new lines, this is one line and I replaces the spaces with _ as otherwise they are trimmed):
46S990BZ6BRIG___1381TRANSOCEAN_LTD______________BCALL_FEB00025000__1000000000000000000000000000000000000000B90002132015000000099999900161100000000000000007500111214111414121714100003000_H8817H100015012200005000000000010000000000000000000000009920202020150213__20_________________________________________________OV__0203P
The the use of unpack perl method returns as follows:
unpack("x49 A4", $line); # where $line is the above example line
returns: CALL
unpack("x68 A4", $line);
returns: 0122
unpack("x238 A4", $line);
return: 2015
Apparently, the column numbers do not match with the number given after 'x' in the TEMPLATE, as x238 is not equal to column 238 ('0000'), I have '2015' on column 251, not 238. The same for the other.
Please, explain how exactly the numbers given after 'x' in TEMPLATE work.
Thank you
First of all, your data isn't what you said it is. The data you provided produces the desired result.
$ perl -E'
my $line = "46S990BZ6BRIG___1381TRANSOCEAN_LTD______________BCALL_FEB00025000__1000000000000000000000000000000000000000B90002132015000000099999900161100000000000000007500111214111414121714100003000_H8817H100015012200005000000000010000000000000000000000009920202020150213__20_________________________________________________OV__0203P";
$line =~ s/_/ /g;
say unpack("x238 A4", $line);
'
0000
Maybe your actual data contained non-printable characters. Or maybe some of the spaces were actually tabs.
$ perl -E'
$_ = "46S990BZ6BRIG___1381TRANSOCEAN_LTD______________BCALL_FEB00025000__1000000000000000000000000000000000000000B90002132015000000099999900161100000000000000007500111214111414121714100003000_H8817H100015012200005000000000010000000000000000000000009920202020150213__20_________________________________________________OV__0203P";
s/_/ /g;
s/LTD\K\s+/\t\t/; # If the spaces after LTD were tabs
say unpack("x238 A4", $_);
'
2015
If you want to extract the characters at based on how your viewer expands tabs, you will need to expand tabs to spaces in the same fashion before passing the string to unpack.

Perl: Replace consecutive spaces in this given scenario?

an excerpt of a big binary file ($data) looks like this:
\n1ax943021C xxx\t2447\t5
\n1ax951605B yyy\t10400\t6
\n1ax919275 G2L zzz\t6845\t6
The first 25 characters contain an article number, filled with spaces. How can I convert all spaces between the article numbers and the next column into a \x09 ? Note the one or more spaces between different parts of the article number.
I tried a workaround, but that overwrites the article number with ".{25}xxx»"
$data =~ s/\n.{25}/\n.{25}xxx/g
Anyone able to help?
Thanks so much!
Gary
You can use unpack for fixed width data:
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Useqq=1;
print Dumper $_ for map join("\t", unpack("A25A*")), <DATA>;
__DATA__
1ax943021C xxx 2447 5
1ax951605B yyy 10400 6
1ax919275 G2L zzz 6845 6
Output:
$VAR1 = "1ax943021C\txxx\t2447\t5";
$VAR1 = "1ax951605B\tyyy\t10400\t6";
$VAR1 = "1ax919275 G2L\tzzz\t6845\t6";
Note that Data::Dumper's Useqq option prints whitecharacters in their escaped form.
Basically what I do here is take each line, unpack it, using 2 strings of space padded text (which removes all excess space), join those strings back together with tab and print them. Note also that this preserves the space inside the last string.
I interpret the question as there being a 25 character wide field that should have its trailing spaces stripped and then delimited by a tab character before the next field. Spaces within the article number should otherwise be preserved (like "1ax919275 G2L").
The following construct should do the trick:
$data =~ s/^(.{25})/{$t=$1;$t=~s! *$!\t!;$t}/emg;
That matches 25 characters from the beginning of each line in the data, then evaluates an expression for each article number by stripping its trailing spaces and appending a tab character.
Have a try with:
$data =~ s/ +/\t/g;
Not sure exactly what you what - this will match the two columns and print them out - with all the original spaces. Let me know the desired output and I will fix it for you...
#!/usr/bin/perl -w
use strict;
my #file = ('\n1ax943021C xxx\t2447\t5', '\n1ax951605B yyy\t10400\t6',
'\n1ax919275 G2L zzz\t6845\t6');
foreach (#file) {
my ($match1, $match2) = ($_ =~ /(\\n.{25})(.*)/);
print "$match1'[insertsomethinghere]'$match2\n";
}
Output:
\n1ax943021C '[insertsomethinghere]'xxx\t2447\t5
\n1ax951605B '[insertsomethinghere]'yyy\t10400\t6
\n1ax919275 G2L '[insertsomethinghere]'zzz\t6845\t6

How to extract a number from a string in Perl?

I have
print $str;
abcd*%1234$sdfsd..#d
The string would always have only one continuous stretch of numbers, like 1234 in this case. Rest all will be either alphabets or other special characters.
How can I extract the number (1234 in this case) and store it back in str?
This page suggests that I should use \d, but how?
If you don't want to modify the original string, you can extract the numbers by capturing them in the regex, using subpatterns. In list context, a regular expression returns the matches defined in the subpatterns.
my $str = 'abc 123 x456xy 789foo';
my ($first_num) = $str =~ /(\d+)/; # 123
my #all_nums = $str =~ /(\d+)/g; # (123, 456, 789)
$str =~ s/\D//g;
This removes all nondigit characters from the string. That's all that you need to do.
EDIT: if Unicode digits in other scripts may be present, a better solution is:
$str =~ s/[^0-9]//g;
If you wanted to do it the destructive way, this is the fastest way to do it.
$str =~ tr/0-9//cd;
translate all characters in the complement of 0-9 to nothing, delete them.
The one caveat to this approach, and Phillip Potter's, is that were there another group of digits further down the string, they would be concatenated with the first group of digits. So it's not clear that you would want to do this.
The surefire way to get one and only one group of digits is
( $str ) = $str =~ /(\d+)/;
The match, in a list context returns a list of captures. The parens around $str are simply to put the expression in a list context and assign the first capture to $str.
Personally, I would do it like this:
$s =~ /([0-9]+)/;
print $1;
$1 will contain the first group matched the given regular expression (the part in round brackets).

What's happening in this Perl foreach loop?

I have this Perl code:
foreach (#tmp_cycledef)
{
chomp;
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
$cycle_code =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$close_day =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$first_date =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
#print "$cycle_code, $close_day, $first_date\n";
$cycledef{$cycle_code} = [ $close_day, split(/-/,$first_date) ];
}
The value of tmp_cycledef comes from output of an SQL query:
select cycle_code,cycle_close_day,to_char(cycle_first_date,'YYYY-MM-DD')
from cycle_definition d
order by cycle_code;
What exactly is happening inside the for loop?
Huh, I'm surprised no one fixed it for you :)
It looks like the person who wrote this was trying to trim leading and trailing whitespace from each field. It's a really odd way to do that, and for some reason he was overly concerned with interior whitespace in each field despite his anchors.
I think that should be the same as trimming the whitespace around the delimiter in the split:
foreach (#tmp_cycledef)
{
s/^\s+//; s/$//; #leading and trailing whitespace on the whole string
my ($cycle_code, $close_day, $first_date) = split(/\s*\|\s*/, $_, 3);
$cycledef{$cycle_code} = [ $close_day, split(/-/,$first_date) ];
}
The key to thinking about split is considering which parts of the string you want to throw away, not just what separates the fields that you want.
For regex part, s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/ do stripping of leading and trailing whitespaces
Each row in #tmp_cycledef is composed of a string formatted following "cycle_code | close_day | first_date".
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
Split the string into three parts. The following regular expressions are used to strip leading and trailing whitespaces.
The last instruction of the loop creates an entry in the dictionary $cycledef indexed by $cycle_code. The entry is formated is formatted using the following scheme:
[ $close_day, YYYY, MM, DD ]
where $first_date = "YYYY-MM-DD".
#tmp_cycledef: The output of the sql query is stored in this array
foreach (#tmp_cycledef) : For every element in this array.
chomp : remove the \n char from the end of every element.
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
split the elements into 3 parts and assign the variable to each of the splited element. parts of split are "split(/PATTERN/,EXPR,LIMIT)"
$cycle_code =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$close_day =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$first_date =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
This regex part is sripping of leading and trailing whitespaces from each variable.
my god, it's been such a long time since I've read perl... but I'll give it a shot.
you grab a record from #tmp_cycledef, and chomp off the newline at the end, and split it up into the three variables: then, like S.Mark said, each substitution regex strips off the leading and trailing whitespace for each of the three variable. Finally, the values get pushed into a hash as a list, with some debugging code commented out right above it.
hth
Your query gives a set of rows that
are stored in the array
#tmp_cycledef.
We iterate over each row in the
result using: foreach
(#tmp_cycledef).
The result rows might have trailing
newline char, we get rid of them
using chomp.
Next we split the row (which is not
in $_) on the pipe and assign the
first 3 pieces to $cycle_code,
$close_day and $first_date
respectively.
The split pieces might have leading
and trailing white spaces, the next 3
lines are to remove the leading and
trailing white space in the 3
variables.
Finally we make an entry into the
hash %cycledef. The key use is
$cycle_code and the value is an
array whose first element is
$close_day and rest of the elements
are pieces got after splitting
$first_date on hyphen.