how to extract the subset from a special character string using perl - perl

I need to get the subset of a string starting from a specific start word and end before the specified word. Store in the string variable.
Example: pre-wrap">test-for??maths/camp
I need to fetch the subset.
Expected output: test-for??maths
After: pre-wrap"> or may be starting with: test
and before: /camp
I have no clue how to achieve this in Perl.
Here is the code I tried. The output is not coming as expected:
#!/usr/bin/perl
use warnings;
use strict;
my $string = 'pre-wrap">test-for??maths/camp';
my $quoted_substring = quotemeta($string);
my ($quoted_substring1) = split('/camp*', $quoted_substring);
my (undef, $substring2) = split('>\s*', $quoted_substring1);
print $string, "\n";
print $substring2, "\n";
Output:
$ perl test.pl
pre-wrap">test-for??maths/camp
test\-for\?\?maths\ # but why this \ is coming

The following code extracts the part between $before and $after (which may contain regex metacharacters, they are treated as pure characters inside the \Q...\E expressions):
my $string = 'pre-wrap">test-for??maths/camp';
my $before = 'pre-wrap">';
my $after = '/camp';
if ($string =~ /\Q$before\E(.*?)\Q$after\E/) {
print $1; # prints 'test-for??maths'
}

pre-wrap">test-for??maths/camp is in 'd',
perl -ne '/((?<=pre-wrap">)|(?<=>)(?=test))\S+(?=\/camp)/ ; print $&' d

Related

Split, insert and join

Here's I want to archive. I want to split a one-liner comma-separated and insert #domain.com then join it back as comma-separated.
The one-liner contains something like:
username1,username2,username3
and I want to be something like:
username1#domain.com,username2#domain.com,username3#domain.com
So my Perl script that I tried which doesn't not work properly:
my $var ='username1,username2,username3';
my #tkens = split /,/, $var;
my #user;
foreach my $tken (#tkens) {
push (#user, "$tken\#domain.com");
}
my $to = join(',',#user);
Is there any shortcut on this in Perl and please post sample please. Thanks
Split, transform, stitch:
my $var ='username1,username2,username3';
print join ",", map { "$_\#domain.com" } split(",", $var);
# ==> username1#domain.com,username2#domain.com,username3#domain.com
You could also use a regular expression substitution:
#!/usr/bin/perl
use strict;
use warnings;
my $var = "username1,username2,username3";
# Replace every comma (and the end of the string) with a comma and #domain.com
$var =~ s/$|,/\#domain.com,/g;
# Remove extra comma after last item
chop $var;
print "$var\n";
You already have good answers. Here I am just telling why your script is not working. I didn't see any print or say line in your code, so not sure how you are trying to print something. No need of last line in your program. You can simply suffix #domain.com with each value, push to an array and print it with join.
#!/usr/bin/perl
use strict;
use warnings;
my $var = 'username1,username2,username3';
my #tkens = split ',', $var;
my #user;
foreach my $tken (#tkens)
{
push #user, $tken."\#domain.com"; # `.` after `$tken` for concatenation
}
print join(',', #user), "\n"
Output:
username1#domain.com,username2#domain.com,username3#domain.com

Removing the $ in first part of the data

I have a string
my $string = $14.275; ## where i need to remove the $
I have tried by using the below code
$y = substr($string , 1, index($string));
The output should be 14.275
First quotes string value and write code like this:
#!/usr/bin/perl
use warnings;
use strict;
my $string = '$14.275';
$string =~ s/^\$//;
print "$string\n";
Output:
14.275

Matching in Perl

I am trying to get text in between two dots of a line, but my program returns the entire line.
For example: I have text which looks like:
My sampledata 1,2 for perl .version 1_1.
I used the following match statement
$x =~ m/(\.)(.*)(\.)/;
My output for $x should be version 1_1, but I am getting the entire line as my match.
In your code, the value of $x will not change after the match.
When $x is successfully matched with m/(.)(.*)(.)/, your three capture groups will contain '.', 'version 1_1' and '.' respectively (in the order given). $2 will give you 'version 1_1'.
Considering that you might probably only want the part 'version 1_1', you need not capture the two dots. This code will give you the same result:
$x =~ m/\.(.*)\./;
print $1;
Try this:
my $str = "My sampledata 1,2 for perl .version 1_1.";
$str =~ /\.\K[^.]+(?=\.)/;
print $&;
The period must be escaped out of a character class.
\K resets all that has been matched before (you can replace it by a lookbehind (?<=\.))
[^.] means any character except a period.
For several results, you can do this:
my $str = "qwerty .target 1.target 2.target 3.";
my #matches = ($str =~ /\.\K[^.]+(?=\.)/g);
print join("\n", #matches);
If you don't want to use twice a period you can do this:
my $str = "qwerty .target 1.target 2.target 3.";
my #matches = ($str =~ /\.([^.]+)\./g);
print join("\n", #matches)."\n";
It should be simple enough to do something like this:
#!/usr/bin/perl
use warnings;
use strict;
my #tests = (
"test one. get some stuff. extra",
"stuff with only one dot.",
"another test line.capture this. whatever",
"last test . some data you want.",
"stuff with only no dots",
);
for my $test (#tests) {
# For this example, I skip $test if the match fails,
# otherwise, I move on do stuff with $want
next if $test !~ /\.(.*)\./;
my $want = $1;
print "got: $want\n";
}
Output
$ ./test.pl
got: get some stuff
got: capture this
got: some data you want

How to decode ° to degree ASCII character in perl

I have tried using :
my $nomIHMBloc = $1;
print decode_entities($nomIHMBloc), "\n";
$nomIHMBloc = decode_entities($nomIHMBloc), "\n";
but no luck. Is there any thing wrong? I got error:
Undefined subroutine &main::decode_entities called at "same perl file"
Thanks for your help.
PS:
exact code goes here:
while($blocVars =~ m/\[(.*?)\]/g){
binmode STDOUT, ':utf8';
my $nomIHMBloc = $1;
print decode_entities($nomIHMBloc), "\n";
$nomIHMBloc = decode_entities($nomIHMBloc);
print "nomIHMBloc::::::::$nomIHMBloc=============$1\n";
print "insert into ASSOC_VAR_BLOC (ID_BLOC, ID_VAR, DOC_ID_MAQUETTAGE) VALUES ($id_bloc, (SELECT ID_VAR FROM VARIABLE WHERE NOM_IHM='$nomIHMBloc'),'$docId')\n";
}
Works fine here:
#!/usr/bin/env perl
use strict;
use warnings;
use open ':locale';
use HTML::Entities;
# example text
'42°' =~ /(.*)/; # 42°
# your code
my $nomIHMBloc = $1;
print decode_entities($nomIHMBloc), "\n";
#$nomIHMBloc = decode_entities($nomIHMBloc), "\n";
Your last line contained syntax errors. If you want to append a newline while assigning to a scalar, use the string concatenation operator ..
$nomIHMBloc = decode_entities($nomIHMBloc) . "\n";
It works on print because it's a list operator and takes a list of arguments, then joins them with the output field separator $, (see perlvar), which contains the empty string by default and acts like a simple string concatenation. However, output is
42°

How to loop over each word in an argument for perl

I'd like to have a perl program that I can call with something like:
perl myProgram --input="This is a sentence"
And then have perl print the output to terminal, in this format
word1 = This
word2 = is
word3 = a
word4 = sentence
I'm usually a c/c++/java programmer, but I've been looking at perl recently, and I just can't fathom it.
Use Getopt::Long and split.
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long;
my $input = '';
GetOptions( 'input=s' => \$input );
my $count = 0;
for (split ' ', $input) {
printf("word%d = %s\n", ++$count, $_);
}
'split' doesn't handle excess leading, trailing, and embedded spaces. Your best bet is a repeated match over non-space characters, m{\S+}gso.
The first command-line parameter is $ARGV[0]. Putting that together we have:
#! /usr/bin/perl
use strict;
use warnings;
my #words = $ARGV[0] =~ m{\S+}gso;
for (my $i = 0; $i < #words; $i++) {
print "word", $i + 1, " = ", $words[$i], "\n";
}
(I've iterated over the array using an index only because the question was originally framed in terms of emitting a rising value with each line. Ordinarily we would want to just use for or foreach to iterate over a list directly.)
Calling as:
perl test.pl ' This is a sentence '
prints:
word1 = This
word2 = is
word3 = a
word4 = sentence
If you explicitly want to pick up input on a double-dash long option name then use Getopt::Long as described by Quentin.
Please have a look at perldoc split().
foreach my $word (split (/ /, 'This is a sentence'))
{
print "word is $word\n";
}
Edit: Added parentheses around the split call.