What's happening in this Perl foreach loop? - perl

I have this Perl code:
foreach (#tmp_cycledef)
{
chomp;
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
$cycle_code =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$close_day =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$first_date =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
#print "$cycle_code, $close_day, $first_date\n";
$cycledef{$cycle_code} = [ $close_day, split(/-/,$first_date) ];
}
The value of tmp_cycledef comes from output of an SQL query:
select cycle_code,cycle_close_day,to_char(cycle_first_date,'YYYY-MM-DD')
from cycle_definition d
order by cycle_code;
What exactly is happening inside the for loop?

Huh, I'm surprised no one fixed it for you :)
It looks like the person who wrote this was trying to trim leading and trailing whitespace from each field. It's a really odd way to do that, and for some reason he was overly concerned with interior whitespace in each field despite his anchors.
I think that should be the same as trimming the whitespace around the delimiter in the split:
foreach (#tmp_cycledef)
{
s/^\s+//; s/$//; #leading and trailing whitespace on the whole string
my ($cycle_code, $close_day, $first_date) = split(/\s*\|\s*/, $_, 3);
$cycledef{$cycle_code} = [ $close_day, split(/-/,$first_date) ];
}
The key to thinking about split is considering which parts of the string you want to throw away, not just what separates the fields that you want.

For regex part, s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/ do stripping of leading and trailing whitespaces

Each row in #tmp_cycledef is composed of a string formatted following "cycle_code | close_day | first_date".
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
Split the string into three parts. The following regular expressions are used to strip leading and trailing whitespaces.
The last instruction of the loop creates an entry in the dictionary $cycledef indexed by $cycle_code. The entry is formated is formatted using the following scheme:
[ $close_day, YYYY, MM, DD ]
where $first_date = "YYYY-MM-DD".

#tmp_cycledef: The output of the sql query is stored in this array
foreach (#tmp_cycledef) : For every element in this array.
chomp : remove the \n char from the end of every element.
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
split the elements into 3 parts and assign the variable to each of the splited element. parts of split are "split(/PATTERN/,EXPR,LIMIT)"
$cycle_code =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$close_day =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$first_date =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
This regex part is sripping of leading and trailing whitespaces from each variable.

my god, it's been such a long time since I've read perl... but I'll give it a shot.
you grab a record from #tmp_cycledef, and chomp off the newline at the end, and split it up into the three variables: then, like S.Mark said, each substitution regex strips off the leading and trailing whitespace for each of the three variable. Finally, the values get pushed into a hash as a list, with some debugging code commented out right above it.
hth

Your query gives a set of rows that
are stored in the array
#tmp_cycledef.
We iterate over each row in the
result using: foreach
(#tmp_cycledef).
The result rows might have trailing
newline char, we get rid of them
using chomp.
Next we split the row (which is not
in $_) on the pipe and assign the
first 3 pieces to $cycle_code,
$close_day and $first_date
respectively.
The split pieces might have leading
and trailing white spaces, the next 3
lines are to remove the leading and
trailing white space in the 3
variables.
Finally we make an entry into the
hash %cycledef. The key use is
$cycle_code and the value is an
array whose first element is
$close_day and rest of the elements
are pieces got after splitting
$first_date on hyphen.

Related

Add a new line in a variable using perl

I am trying to add a new line in a variable after certain number of words. For example: If we have a variable:
$x = "This a variable, start a new line here, This is a new line.";
If I print the above variable
print $x;
I should get the below output:
This is a variable,
start a new line here,
This is a new line.
How can I achieve this in Perl from the variable itself?
I do not agree to the formula "after certain number of words".
Note that the first target line has 4 words, whereas remaining 2 have
5 words each.
Actually you need to replace each comma and following sequence of
spaces (if any) with a comma and \n.
So the intuitive way to do it is:
$x =~ s/,\s*/,\n/g;
The simplest way is to split the string on comma followed by a space and then
join the word groups with a comma followed by a newline.
my $x = "This a variable, start a new line here, This is a new line.";
print join(",\n", split /, /, $x) . "\n";
output
This a variable,
start a new line here,
This is a new line.
For solving the general, how do I reformat this string with line breaks after n-columns? problem, use the Text::Wrap library (as suggested by #ikegami):
use Text::Wrap;
my $x = "The quick brown fox jumped over the lazy dog.";
$Text::Wrap::columns = 15;
# wrap() needs an array of words
my #words = split /\s+/, $x;
# Initial tab, subsequent tab values set to '' (think indent amount)
print wrap('', '', #words) . "\n";
output
The quick
brown fox
jumped over
the lazy dog.
You probably want to use regular expressions. You can do this:
$x =~ s/^(\S+\s+){3}\K/\n/;
Or if this is about the commas and not the spaces:
$x =~ s/^([^,]+,+){2}\s*\K/\n/;
(in this case I also remove any potential space that would be after the comma)
You can also configure separately how many words or comma you want, by putting this in a variable:
my $nbwords = 7; # add a line after the 7th word
$x =~ s/^(\S+\s+){$nbwords}\K/\n/;
Now, that would keep the last space so you may want to do this:
my $nbwords = 7; # add a line after the 7th word
$nbwords--; # becomes 6 because there is another word after that we match as well
$x =~ s/^(\S+\s+){$nbwords}\S+\K\s+/\n/;
You should probably learn to use Regexps but just to explain the above:
\s is any space character (like space, tab, line feed, etc)
\S (uppercase) is any character except a space character
+ means any number of characters of that type described with what is before. So \s+ means any number of consecutive space characters.
{123} means 123 times that type of character ...
{3,80} means 3 to 80 times. So + is equivalent to {1,} (one to unlimited)
\K means that whatever is before will not be replaced, only what is after.

Perl, Split string by specific pattern

I found how to split a string by whitespaces, but that only takes into an account a single character. In my case, I have comments pasted into a file that includes newlines and whitespaces. I have them separated by this string: [|]
So I need to split my $string into an array for example, where $string =
This is a comment.
This is a newline.
This is the end[|]This is second comment.
This is second newline.
[|]Last comment
Gets split into $array[0], $array[1], and $array[2] which include the newlines and whitespaces. Separated by [|]
Every example I find on the web uses a single character, such as space or newline, to split strings. In my case I have to use a more specific identifier, which is why I selected [|] but having troubles splitting it by this.
I have tried to limit it to parse by a single '|' character with this code:
my #words = split /|/, $string;
foreach my $thisline (#words) {
print "This line = '" . $thisline . "'\n";
But this seems to split the entire string, character-by-character into #words.
[, |, and ] are all special characters in regular expressions -- | is used to separate options, and […] are used to specify character sets. Using an unquoted | makes the expression match the empty string (more specifically: the empty string or the empty string), causing it to match and split on every character boundary. These characters must be escaped to use them literally in an expression:
my #words = split /\[\|\]/, $string;
Since all the lines makes this visually confusing, you should probably use m{} quotes instead of //, and \Q…\E to quote a range of characters instead of a separate backslash for each one. (This is functionally identical, it's just a little easier to read.)
my #words = split m{\Q[|]\E}, $string;

How do I find the sum of all numbers in STDIN even if there are non-digit characters?

I have an assignment asking me to enter a sequence of numbers and characters each separated by a space and the sequence in ended by entering in "q" or "Q" followed by a space. Everything except the numbers should be discarded and we are to find the sum. So for example if the input is "1 12 a 2 5 P Q" then we should expect to get "20" as the output.
So far I'm using
$input = <>;
$input =~ tr/0-9//cd;
to get only the numbers but what I want is to split them up and get the sum. Right now the output would be 11225 and I want "1+12+2+5" and get the sum.
perl -ne '$s=0;($line)=/(.*?)[Qq]/;while($line=~/(\d+)/g) {$s+=$1} print "$s\n"'
Explanation:
Strips the trailing part of each line starting with a Q or a q, then scan the remaining part for isolated positive integers and adds these together.
First, strip out all characters that aren't numbers or spaces:
$input =~ s/[^0-9\s]//g;
Then, split on whitespace:
#digits = split(/\s/, $input);
Then you have a list of digits that you can add up.
Preserve spaces in your first step:
$input =~ tr/0-9 //cd;
Then split on spaces:
my #numbers = split ' ', $input;
(this is a special form of split that works like split /\s+/ but also discards empty leading fields).
You probably want to start by getting rid of everything after a Q though:
$input =~ s/Q .*//i;
For what it's worth, I wouldn't have jumped to using tr here; I'd have started by spliting on spaces, then processed fields that were only digits until a Q was reached.

Perl: Replace consecutive spaces in this given scenario?

an excerpt of a big binary file ($data) looks like this:
\n1ax943021C xxx\t2447\t5
\n1ax951605B yyy\t10400\t6
\n1ax919275 G2L zzz\t6845\t6
The first 25 characters contain an article number, filled with spaces. How can I convert all spaces between the article numbers and the next column into a \x09 ? Note the one or more spaces between different parts of the article number.
I tried a workaround, but that overwrites the article number with ".{25}xxx»"
$data =~ s/\n.{25}/\n.{25}xxx/g
Anyone able to help?
Thanks so much!
Gary
You can use unpack for fixed width data:
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Useqq=1;
print Dumper $_ for map join("\t", unpack("A25A*")), <DATA>;
__DATA__
1ax943021C xxx 2447 5
1ax951605B yyy 10400 6
1ax919275 G2L zzz 6845 6
Output:
$VAR1 = "1ax943021C\txxx\t2447\t5";
$VAR1 = "1ax951605B\tyyy\t10400\t6";
$VAR1 = "1ax919275 G2L\tzzz\t6845\t6";
Note that Data::Dumper's Useqq option prints whitecharacters in their escaped form.
Basically what I do here is take each line, unpack it, using 2 strings of space padded text (which removes all excess space), join those strings back together with tab and print them. Note also that this preserves the space inside the last string.
I interpret the question as there being a 25 character wide field that should have its trailing spaces stripped and then delimited by a tab character before the next field. Spaces within the article number should otherwise be preserved (like "1ax919275 G2L").
The following construct should do the trick:
$data =~ s/^(.{25})/{$t=$1;$t=~s! *$!\t!;$t}/emg;
That matches 25 characters from the beginning of each line in the data, then evaluates an expression for each article number by stripping its trailing spaces and appending a tab character.
Have a try with:
$data =~ s/ +/\t/g;
Not sure exactly what you what - this will match the two columns and print them out - with all the original spaces. Let me know the desired output and I will fix it for you...
#!/usr/bin/perl -w
use strict;
my #file = ('\n1ax943021C xxx\t2447\t5', '\n1ax951605B yyy\t10400\t6',
'\n1ax919275 G2L zzz\t6845\t6');
foreach (#file) {
my ($match1, $match2) = ($_ =~ /(\\n.{25})(.*)/);
print "$match1'[insertsomethinghere]'$match2\n";
}
Output:
\n1ax943021C '[insertsomethinghere]'xxx\t2447\t5
\n1ax951605B '[insertsomethinghere]'yyy\t10400\t6
\n1ax919275 G2L '[insertsomethinghere]'zzz\t6845\t6

split() but keep delimiter

my $string1 = "Hi. My name is Vlad. It is snowy outside.";
my #array = split('.' $string1); ##essentially I want this, but I want the period to be kept
I want to split this string at the ., but I want to keep the period. How can this be accomplished?
You can use lookbehind to do this:
split(/(?<=\.)/, $string)
The regex matches an empty string that follows a period.
If you want to remove the whitespace between the sentences at the same time, you can change it to:
split(/(?<=\.)\s*/, $string)
Positive and negative lookbehind is explained here
If you don't mind the periods being split into their own elements in the array, you can use parentheses to tell split to keep them:
my #array = split(/(\.)/, $string);