I want to insert a colon between word and number then add a new line after a number.
For example:
"cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011";
my expected output:
cat:11052000
cow_and_owner_:01011999 12031981
dog:22032011
My attempt :
$Bday=~ /^([a-z]||\_)/:/^([0-9])/
print "\n";
#!/usr/bin/perl
use warnings;
use strict;
my $str = "cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011";
$str =~ s/\s*([a-z_]+)((?: \d+)+)/$1:$2\n/g;
print $str;
produces your desired output from your sample input.
Edit: Note the use of the s operator for regular expression substitution. One of the many problems with your code is that you're not using that (IF your intent is to modify the string in place and not extract bits from it for further processing)
One more variant -
> cat test_perl.pl
#!/usr/bin/perl
use strict;
use warnings;
while ( "cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011" =~ m/([a-z_]+)\s+([0-9 ]+)/g )
{
print "$1:$2\n";
}
> test_perl.pl
cat:11052000
cow_and_owner_:01011999 12031981
dog:22032011
>
The original code $Bday=~ /^([a-z]||\_)/:/^([0-9])/ doesn't make much sense. Apart from missing a semicolon and having too many delimiters (matching patterns are of the format /.../ or m/.../ and replacing ones s/.../.../), it could never match anything.
([a-z]||\_) would match:
one lowercase ASCII letter (a through z);
an empty string (the space between the two |s; or
one underscore (escape with a backslash is superfluous).
To get it (or the corresponding subexpression for numbers) to match a sequence of one
or more of the characters, you need to follow it with a +.
^([0-9]) would fail to match unless it was at the beginning of the string. There it would match a single digit.
My solution (taking into account the later comments by the OP about having input such as cat[1] or dog3):
use strict;
use warnings;
my $bday = "cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011 cat[1] 01012018 dog3 02012018";
# capture groups:
# $1------------------------\ $2-------------\
$bday =~ s/([A-Za-z][A-Za-z0-9_\[\]]*)\h+(\d+(?:\h+\d+)*)(?!\S)\s*/$1:$2\n/g;
print $bday;
will print out:
cat:11052000
cow_and_owner_:01011999 12031981
dog:22032011
cat[1]:01012018
dog3:02012018
Breakdown:
[A-Za-z]: Begin with a letter.
[A-Za-z0-9_\[\]]*: Follow with zero or more letters, numbers, underscores and square brackets.
\h+: Separate with one or more horizontal whitespace.
\d+(?:\h+\d+)*: One sequence of digits (\d+) followed by zero or more sequences of horizontal whitespace and digits.
(?!\S): Can't be followed by non-whitespace.
\s*: Consume following whitespace (including line feeds; this allows the input to be separated on multiple lines, as long as a single entry is not spread on multiple lines. To get that, replace all the \h+ with \s+.).
The replace pattern will repeat (the /g modifier) sequentially in the source string as long as it matches, placing each heading-date record on its own line and then proceeding with the rest of the string.
Note that if your headers (dog etc.) might contain non-ASCII letters, use \pL or \p{XPosixAlpha} instead of [A-Za-z]:
$bday =~ s/\pL[\pL0-9_\[\]]*)\h+(\d+(?:\h+\d+)*)(?!\S)\s*/$1:$2\n/g;
Related
i have a code in perl $str =~ s/([^\w ])/'%'.unpack('H2', $1)/eg; i am not undestanding what value will be stored in $str
Assuming $str is encoded using UTF-8, and assuming the code you provided is followed by $str =~ s/ /+/g, the result is a url-encoded string safe for use in URLs.
Specifically, the line of code in question replaces every non-word except spaces with a three character sequence starting with % and followed by two hex digits representing the character number.
For example,
foo's ⇒ foo%27s
20% ⇒ 20%25
A better solution would be to use uri_escape (for strings encoded using UTF-8) or uri_escape_utf8 (for strings of Unicode Code Points aka decoded strings) from URI::Escape.
Provided line of code modifies $str value according substitute rule set s/([^\w ])/'%'.unpack('H2', $1)/eg.
How does it work:
[^\w] - look at $str for character not \w known as complement to \w
\w - represents range [A-za-z0-9_], punctuation chars and Unicode marks see perlre
([^\w]) capture found character, 'store' it in $1
regex modifier e evaluates '%'.unpack('H2',$1) as substitution string
unpack('H2',$1) - unpack $1 with template 'H2' (hex representation of byte associated with $1)
take '%' and concatenate it with unpacked result
use result from step 6 as replacement string
regex modifier g instructs to make this operation for all occurrences in the $str
Without knowing initial $str value before this operation, impossible to evaluate final result.
If initial value is known then you can evaluate result by visiting https://regex101.com/ website.
Nothing could speak louder than sample code demonstrating transformation
use feature 'say';
$msg = "Date: Mar 6 2020, Msg: soon Alex's birthday";
$msg =~ s/([^\w ])/'%'.unpack('H2', $1)/eg;
say $msg;
Output
Date%3a Mar 6 2020%2c Msg%3a soon Alex%27s birthday
Following code demonstrates how "Hello World\n" will look as hex representation (for Dada).
use feature 'say';
my $msg = "Hello World!\n";
print $msg;
my $a = unpack('H*',$msg);
say $a;
Output
Hello World!
48656c6c6f20576f726c64210a
You could start by trying it out and seeing if that gives you a hint.
$ perl -E'$str = "&*("; $str =~ s/([^\w ])/"%".unpack('H2', $1)/eg; say $str'
%26%2a%28
So, we have a substitution operator that looks like this:
s/PATTERN/REPLACEMENT/OPTIONS
Our pattern is ([^\w ]) which means "match every individual character that isn't a 'word character' or a space and capture that character in $1.
The replacement string is "%".unpack('H2', $1). Which means "the character '%' followed by the result of running unpack('H2', $1). unpack() here is being used to convert characters to the hexadecimal equivalent of their ASCII code. "H" means "convert to hex" and "2" means produce two hex digits".
The options are /e which means "run this code and use the output as the replacement string" and /g which means "do this for every match in the input string".
Putting that all together, you have code that:
Looks for non-word characters
Converts them to their hexadecimal escape code
Replaces them in the string
Using URI::Escape is probably a better approach.
I am trying to add a new line in a variable after certain number of words. For example: If we have a variable:
$x = "This a variable, start a new line here, This is a new line.";
If I print the above variable
print $x;
I should get the below output:
This is a variable,
start a new line here,
This is a new line.
How can I achieve this in Perl from the variable itself?
I do not agree to the formula "after certain number of words".
Note that the first target line has 4 words, whereas remaining 2 have
5 words each.
Actually you need to replace each comma and following sequence of
spaces (if any) with a comma and \n.
So the intuitive way to do it is:
$x =~ s/,\s*/,\n/g;
The simplest way is to split the string on comma followed by a space and then
join the word groups with a comma followed by a newline.
my $x = "This a variable, start a new line here, This is a new line.";
print join(",\n", split /, /, $x) . "\n";
output
This a variable,
start a new line here,
This is a new line.
For solving the general, how do I reformat this string with line breaks after n-columns? problem, use the Text::Wrap library (as suggested by #ikegami):
use Text::Wrap;
my $x = "The quick brown fox jumped over the lazy dog.";
$Text::Wrap::columns = 15;
# wrap() needs an array of words
my #words = split /\s+/, $x;
# Initial tab, subsequent tab values set to '' (think indent amount)
print wrap('', '', #words) . "\n";
output
The quick
brown fox
jumped over
the lazy dog.
You probably want to use regular expressions. You can do this:
$x =~ s/^(\S+\s+){3}\K/\n/;
Or if this is about the commas and not the spaces:
$x =~ s/^([^,]+,+){2}\s*\K/\n/;
(in this case I also remove any potential space that would be after the comma)
You can also configure separately how many words or comma you want, by putting this in a variable:
my $nbwords = 7; # add a line after the 7th word
$x =~ s/^(\S+\s+){$nbwords}\K/\n/;
Now, that would keep the last space so you may want to do this:
my $nbwords = 7; # add a line after the 7th word
$nbwords--; # becomes 6 because there is another word after that we match as well
$x =~ s/^(\S+\s+){$nbwords}\S+\K\s+/\n/;
You should probably learn to use Regexps but just to explain the above:
\s is any space character (like space, tab, line feed, etc)
\S (uppercase) is any character except a space character
+ means any number of characters of that type described with what is before. So \s+ means any number of consecutive space characters.
{123} means 123 times that type of character ...
{3,80} means 3 to 80 times. So + is equivalent to {1,} (one to unlimited)
\K means that whatever is before will not be replaced, only what is after.
I found how to split a string by whitespaces, but that only takes into an account a single character. In my case, I have comments pasted into a file that includes newlines and whitespaces. I have them separated by this string: [|]
So I need to split my $string into an array for example, where $string =
This is a comment.
This is a newline.
This is the end[|]This is second comment.
This is second newline.
[|]Last comment
Gets split into $array[0], $array[1], and $array[2] which include the newlines and whitespaces. Separated by [|]
Every example I find on the web uses a single character, such as space or newline, to split strings. In my case I have to use a more specific identifier, which is why I selected [|] but having troubles splitting it by this.
I have tried to limit it to parse by a single '|' character with this code:
my #words = split /|/, $string;
foreach my $thisline (#words) {
print "This line = '" . $thisline . "'\n";
But this seems to split the entire string, character-by-character into #words.
[, |, and ] are all special characters in regular expressions -- | is used to separate options, and […] are used to specify character sets. Using an unquoted | makes the expression match the empty string (more specifically: the empty string or the empty string), causing it to match and split on every character boundary. These characters must be escaped to use them literally in an expression:
my #words = split /\[\|\]/, $string;
Since all the lines makes this visually confusing, you should probably use m{} quotes instead of //, and \Q…\E to quote a range of characters instead of a separate backslash for each one. (This is functionally identical, it's just a little easier to read.)
my #words = split m{\Q[|]\E}, $string;
I want to be able to be able to replace all of the line returns (\n's) in a single string (not an entire file, just one string in the program) with spaces and all commas in the same string with semicolons.
Here is my code:
$str =~ s/"\n"/" "/g;
$str =~ s/","/";"/g;
This will do it. You don't need to use quotations around them.
$str =~ s/\n/ /g;
$str =~ s/,/;/g;
Explanation of modifier options for the Substitution Operator (s///)
e Forces Perl to evaluate the replacement pattern as an expression.
g Replaces all occurrences of the pattern in the string.
i Ignores the case of characters in the string.
m Treats the string as multiple lines.
o Compiles the pattern only once.
s Treats the string as a single line.
x Lets you use extended regular expressions.
You don't need to quote in your search and replace, only to represent a space in your first example (or you could just do / / too).
$str =~ s/\n/" "/g;
$str =~ s/,/;/g;
I'd use tr:
$str =~ tr/\n,/ ;/;
If I had:
$foo= "12."bar bar bar"|three";
how would I insert in the text ".." after the text 12. in the variable?
Perl allows you to choose your own quote delimiters. If you find you need to use a double quote inside of an interpolating string (i.e. "") or single quote inside of a non-interpolating string (i.e. '') you can use a quote operator to specify a different character to act as the delimiter for the string. Delimiters come in two forms: bracketed and unbracketed. Bracketed delimiters have different beginning and ending characters: [], {}, (), [], and <>. All other characters* are available as unbracketed delimiters.
So your example could be written as
$foo = qq(12."bar bar bar"|three);
Inserting text after "12." can be done many ways (TIMTOWDI). A common solution is to use a substitution to match the text you want to replace.
$foo =~ s/^(12[.])/$1../;
the ^ means match at the start of the sting, the () means capture this text to the variable $1, the 12 just matches the string "12", and the [] mean match any one of the characters inside the brackets. The brackets are being used because . has special meaning in regexes in general, but not inside a character class (the []). Another option to the character class is to escape the special meaning of . with \, but many people find that to be uglier than the character class.
$foo =~ s/^(12\.)/$1../;
Another way to insert text into a string is to assign the value to a call to substr. This highlights one of Perl's fairly unique features: many of its functions can act as lvalues. That is they can be treated like variables.
substr($foo, 3, 0) = "..";
If you did not already know where "12." exists in the string you could use index to find where it starts, length to find out how long "12." is, and then use that information with substr.
Here is a fully functional Perl script that contains the code above.
#!/usr/bin/perl
use strict;
use warnings;
my $foo = my $bar = qq(12."bar bar bar"|three);
$foo =~ s/(12[.])/$1../;
my $i = index($bar, "12.") + length "12.";
substr($bar, $i, 0) = "..";
print "foo is $foo\nbar is $bar\n";
* all characters except whitespace characters (space, tab, carriage return, line feed, vertical tab, and formfeed) that is
If you want to use double quotes in a string in Perl you have two main options:
$foo = "12.\"bar bar bar\"|three";
or:
$foo = '12."bar bar bar"|three';
The first option escapes the quotes inside the string with backslash.
The second option uses single quotes. This means the double quotes are treated as part of the string. However, in single quotes everything is literal so $var or #array isn't treated as a variable. For example:
$myvar = 123;
$mystring = '"$myvar"';
print $mystring;
> "$myvar"
But:
$myvar = 123;
$mystring = "\"$myvar\"";
print $mystring;
> "123"
There are also a large number of other Quote-like Operators you could use instead.
$foo = "12.\"bar bar bar\"|three";
$foo =~s/12\./12\.\.\./;
print $foo; # results in 12...\"bar bar bar\"|three"