How to get rid of control characters in perl.. specifically [gs]? - perl

my code is as follows
my $string = $cells[71];
print $string;
this prints the string but where spaces should be there is a box with 01 10 in it. I opened it in Notepad++ and the box turned into a black GS (which i am assuming is group separator).
I looked online and it said to use:
s/[^[:print:]]+//g
but when i set the string to:
my $string =~s/[^[:print:]]+//g
and I run the program i get:
4294967295
How do i resolve this?
I did what HOBBS said and it worked... thanks :)
Is there anyway I could print an enter where each of these characters are ( the box with 1001)?

When doing a regex match, you need to be careful to write $var =~ /pattern/, not $var = ~ /pattern/. When you use the second one, you're doing /pattern/, which is a regex match against $_, returning a number in scalar context. Then you do ~, which takes the bitwise inverse of that number, then ($var =) you assign that result to $var. Not what you wanted at all.

You have to assign the variable first, then do the substitution:
my $string = $cells[71];
$string =~ s/[^[:print:]]+//g;

Related

How do you match \'

I need a regex to match \' <---- literally backslash apostrophe.
my $line = '\'this';
$line =~ s/(\o{134})(\o{047})/\\\\'/g;
$line =~ s/\\'/\\\\'/g;
$line =~ s/[\\][']/\\\\'/g;
printf('%s',$line);
print "\n";
All I get out of this is
'this
When what I want is
\\'this
This occurs whether the string is declared using ' or ". This was a test script for tracking down a file parsing bug. I wanted to confirm that the regex was working as expected.
I don't know if when the backslash apostrophe is parsed by the regex it is not treated as 2 characters, but is instead treated as an escaped apostrophe.
Either way. what is the best way to match \' and print out \\'? I don't want to escape any other back slashes or apostrophes and I can't change the text I am parsing, just the way it is handled and outputted.
s/\\'/\\\\'/g
All three of your patterns match a backslash followed by a quote, the above being the simplest.
Your testing was in vain because your string doesn't contain any backslashes. Both string literals "\'this" (from earlier edit) and '\'this' (from later edit) produce the string 'this.
say "\'this"; # 'this
say '\'this'; # 'this
To produce the string \'this, you could use either of the following string literals (among others):
"\\'this"
'\\\'this'
say "\\'this"; # \'this
say '\\\'this'; # \'this
The answer is, of course
s/[\\][']/\\\\'/g
This will match
\'this
And substitute with this
\\'this
This was the only way I could get it to work.
Perl
Too much "regexing" in your snippet. Try:
my $line = '\'this';
$line =~ s/'/\\\\\'/g;
printf('%s',$line);
print "\n";
# \\'this
or... if you want another mode:
my $line = '\'this';
$line =~ s/'/\\'/g;
printf('%s',$line);
print "\n";
# \'this

How to do I convert an escaped t into a tab character

I have a variable that contains a slash and a t.
my $var = "\\t";
I want to convert that to a tab. How do I do that?
use Data::Dumper;
use Term::ReadLine;
my $rl = Term::ReadLine->new();
my $var = $rl->readline( 'Enter \t:' );
print Dumper $var;
The following is the simplest solution:
$var = "\t" if $var eq "\\t";
If you want to do this no matter where the sequence appears in the string, you could use
$var =~ s/\\t/\t/g;
But it sounds like you're not asking the right question. Nothing supports \t and nothing else. At the very least, I would also expect \\ to produce \. Are you perhaps trying to parse JSON? If so, there are number of other escape sequences you need to worry about.

Finding index of white space in Perl

I'm trying to find the index of white space in a string in Perl.
For example, if I have the string
stuff/more stuffhere
I'd like to select the word "more" with a substring method. I can find the index of "/" but haven't figured out how to find the index of white space. The length of the substring I'm trying to select will vary, so I can't hard code the index. There will only be one white space in the string (other than those after the end of the string).
Also, if anybody has any better ideas of how to do this, I'd appreciate hearing them. I'm fairly new to programming so I'm open to advice. Thanks.
Just use index:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = 'stuff/more stuffhere';
my $index_of_slash = index $string, '/';
my $index_of_space = index $string, ' ';
say "Between $index_of_slash and $index_of_space.";
The output is
Between 5 and 10.
Which is correct:
0 1
01234567890123456789
stuff/more stuffhere
If by "whitespace" you also mean tabs or whatever, you can use a regular expression match and the special variables #- and #+:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = "stuff/more\tstuffhere";
if ($string =~ m{/.*(?=\s)}) {
say "Between $-[0] and $+[0]";
}
The (?=\s) means is followed by a whitespace character, but the character itself is not part of the match, so you don't need to do any maths on the returned values.
As you stated, you want to select the word between the first /
and the first space following it.
If this is the case, you maybe don't need any index (you need just
the word).
A perfect tool to find something in a text is regex.
Look at the following code:
$txt = 'stuff/more stuffxx here';
if ($txt =~ /\/(.+?) /) {
print "Match: $1.\n";
}
The regex used tries to match:
a slash,
a non-empty sequence of any chars (note ? - reluctant
version), enclosed in a capturing group,
a space.
So after the match $1 contains what was captured by the first
capturing group, i.e. "your" word.
But if for any reason you were interested in starting and ending
offsets to this word, you can read them from $-[1]
and $+[1] (starting / ending indices of the first capturing group).
The arrays #- (#LAST_MATCH_START) and #+ (#LAST_MATCH_END) give offsets of the start and end of last successful submatches. See Regex related variables in perlvar.
You can capture your real target, and then read off the offset right after it with $+[0]
#+
This array holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. $+[0] is the offset into the string of the end of the entire match. This is the same value as what the pos function returns when called on the variable that was matched against.
Example
my $str = 'target and target with spaces';
while ($str =~ /(target)\s/g)
{
say "Position after match: $+[0]"
}
prints
Position after match: 7
Position after match: 18
These are positions right after 'target', so of spaces that come after it.
Or you can capture \s instead and use $-[1] + 1 (first position of the match, the space).
You can use
my $str = "stuff/more stuffhere";
if ($str =~ m{/\K\S+}) {
... substr($str, $-[0], $+[0] - $-[0]) ...
}
But why substr? That's very weird there. Maybe if you told us what you actually wanted to do, we could provide a better alternatives. Here are three cases:
Data extraction:
my $str = "stuff/more stuffhere";
if ( my ($word) = $str =~ m{/(\S+)} ) {
say $word; # more
}
Data replacement:
my $str = "stuff/more stuffhere";
$str =~ s{/\K\S+}{REPLACED};
say $str; # stuff/REPLACED stuffhere
Data replacement (dynamic):
my $str = "stuff/more stuffhere";
$str =~ s{/\K(\S+)}{ uc($1) }e;
say $str; # stuff/MORE stuffhere

conditional substitution using hashes

I'm trying for substitution in which a condition will allow or disallow substitution.
I have a string
$string = "There is <tag1>you can do for it. that dosen't mean <tag2>you are fool.There <tag3>you got it.";
Here are two hashes which are used to check condition.
my %tag = ('tag1' => '<you>', 'tag2'=>'<do>', 'tag3'=>'<no>');
my %data = ('you'=>'<you>');
Here is actual substitution in which substitution is allowed for hash tag values not matched.
$string =~ s{(?<=<(.*?)>)(you)}{
if($tag{"$1"} eq $data{"$2"}){next;}
"I"
}sixe;
in this code I want to substitute 'you' with something with the condition that it is not equal to the hash value given in tag.
Can I use next in substitution?
Problem is that I can't use \g modifier. And after using next I cant go for next substitution.
Also I can't modify expression while matching and using next it dosen't go for second match, it stops there.
You can't use a variable length look behind assertion. The only one that is allowed is the special \K marker.
With that in mind, one way to perform this test is the following:
use strict;
use warnings;
while (my $string = <DATA>) {
$string =~ s{<([^>]*)>\K(?!\1)\w+}{I}s;
print $string;
}
__DATA__
There is <you>you can do for it. that dosen't mean <notyou>you are fool.
There is <you>you can do for it. that dosen't mean <do>you are fool.There <no>you got it.
Output:
There is <you>you can do for it. that dosen't mean <notyou>I are fool.
There is <you>you can do for it. that dosen't mean <do>I are fool.There <no>you got it.
It was simple but got my two days to think about it. I just written another substitution where it ignores previous tag which is cancelled by next;
$string = "There is <tag1>you can do for it. that dosen't mean <tag2>you are fool.There <tag3>you got it.";
my %tag = ('tag1' => '<you>', 'tag2'=>'<do>', 'tag3'=>'<no>');
my %data = ('you'=>'<you>');
my $notag;
$string =~ s{(?<=<(.*?)>)(you)}{
$notag = $2;
if($tag{"$1"} eq $data{"$2"}){next;}
"I"
}sie;
$string =~ s{(?<=<(.*?)>)(?<!$notag)(you)}{
"I"
}sie;

How to extract a number from a string in Perl?

I have
print $str;
abcd*%1234$sdfsd..#d
The string would always have only one continuous stretch of numbers, like 1234 in this case. Rest all will be either alphabets or other special characters.
How can I extract the number (1234 in this case) and store it back in str?
This page suggests that I should use \d, but how?
If you don't want to modify the original string, you can extract the numbers by capturing them in the regex, using subpatterns. In list context, a regular expression returns the matches defined in the subpatterns.
my $str = 'abc 123 x456xy 789foo';
my ($first_num) = $str =~ /(\d+)/; # 123
my #all_nums = $str =~ /(\d+)/g; # (123, 456, 789)
$str =~ s/\D//g;
This removes all nondigit characters from the string. That's all that you need to do.
EDIT: if Unicode digits in other scripts may be present, a better solution is:
$str =~ s/[^0-9]//g;
If you wanted to do it the destructive way, this is the fastest way to do it.
$str =~ tr/0-9//cd;
translate all characters in the complement of 0-9 to nothing, delete them.
The one caveat to this approach, and Phillip Potter's, is that were there another group of digits further down the string, they would be concatenated with the first group of digits. So it's not clear that you would want to do this.
The surefire way to get one and only one group of digits is
( $str ) = $str =~ /(\d+)/;
The match, in a list context returns a list of captures. The parens around $str are simply to put the expression in a list context and assign the first capture to $str.
Personally, I would do it like this:
$s =~ /([0-9]+)/;
print $1;
$1 will contain the first group matched the given regular expression (the part in round brackets).