How do I insert a lot of whitespace in Perl?

How do I insert a lot of whitespace in Perl? - perl

I need to buff out a line of text with a varying but large number of whitespace. I can figure out a janky way of doing a loop and adding whitespace to $foo, then splicing that into the text, but it is not an elegant solution.

I need a little more info. Are you just appending to some text or do you need to insert it?
Either way, one easy way to get repetition is perl's 'x' operator, eg.
" " x 20000
will give you 20K spaces.
If have an existing string ($s say) and you want to pad it out to 20K, try
$s .= (" " x (20000 - length($s)))
BTW, Perl has an extensive set of operators - well worth studying if you're serious about the language.
UPDATE: The question as originally asked (it has since been edited) asked about 20K spaces, not a "lot of whitespace", hence the 20K in my answer.

If you always want the string to be a certain length you can use sprintf:
For example, to pad out $var with white space so it 20,000 characters long use:
$var = sprintf("%-20000s",$var);

use the 'x' operator:
print ' ' x 20000;

Related

Generate all combinations from list of characters

I am busy implementing a lab for pen testers to create MD5 hashes from 4 letter words. I need the words to have a combination of lower and uppercase letters as well as numeric and special characters, but I just do not seem to find out how to combine any given characters in all orders. So currently I have this:
my $str = 'aaaa';
print $str++, $/ while $str le 'dddd';
Which will do:
aaaa
aaab
aaac
aaad
...
...
dddd
There is no way however how I can make it do:
Aaaa
AAaa
aAaa
...
dddD
Not even to mention adding numbers and special characters. What I really wanted to do was to make the characters to create words based on a given list. So if I feel I want to use abeDod## it should create all combinations from those characters.
Edit to clarify.
Let's say I give the characters aBc# I need it to give it a a count to say it must have maximum of 4 letters per word and with combination of all the given characters, like:
aBc#
Bac#
caB#
#Bca
...
I hope that clarifies the question.

Use a list of integers that are ASCII codes for the characters you accept, to sample from it using your favorite (pseudo-)random number generator. Then convert each to its character using chr and concatenate them.
Like
perl -wE'$rw .= chr( 32+(int rand 126-32) ) for 1..4; say $rw'
Notes
I use a one-liner merely for easy copy-paste testing. Write this nicely in a script, please
I use the sketchy rand, good for shuffling things a bit. Replace with a better one if needed
Glueing four (pseudo-)random numbers does not build a good distribution; even as each letter on its own does, the whole thing does not. But the four should satisfy most needs.
If not, I think that you'd need to produce a far longer list (range of allowed chars repeated four times perhaps) and randomize it, then draw four-letter subsequences. A lot more work
I need to tap dance a little to produce (random-ish) integers from 32 to 126 using rand, since it takes only the end of range. Also, this takes all of them from that range, likely not what you want; so specify subranges, or specific lists that you want to draw from

Why are ##, #!, #, etc. not interpolated in strings?

First, please note that I ask this question out of curiosity, and I'm aware that using variable names like ## is probably not a good idea.
When using doubles quotes (or qq operator), scalars and arrays are interpolated :
$v = 5;
say "$v"; # prints: 5
$# = 6;
say "$#"; # prints: 6
#a = (1,2);
say "#a"; # prints: 1 2
Yet, with array names of the form #+special char like ##, #!, #,, #%, #; etc, the array isn't interpolated :
#; = (1,2);
say "#;"; # prints nothing
say #; ; # prints: 1 2
So here is my question : does anyone knows why such arrays aren't interpolated? Is it documented anywhere?
I couldn't find any information or documentation about that. There are too many articles/posts on google (or SO) about the basics of interpolation, so maybe the answer was just hidden in one of them, or at the 10th page of results..
If you wonder why I could need variable names like those :
The -n (and -p for that matter) flag adds a semicolon ; at the end of the code (I'm not sure it works on every version of perl though). So I can make this program perl -nE 'push#a,1;say"#a"}{say#a' shorter by doing instead perl -nE 'push#;,1;say"#;"}{say#', because that last ; convert say# to say#;. Well, actually I can't do that because #; isn't interpolated in double quotes. It won't be useful every day of course, but in some golfing challenges, why not!
It can be useful to obfuscate some code. (whether obfuscation is useful or not is another debate!)

Unfortunately I can't tell you why, but this restriction comes from code in toke.c that goes back to perl 5.000 (1994!). My best guess is that it's because Perl doesn't use any built-in array punctuation variables (except for #- and #+, added in 5.6 (2000)).
The code in S_scan_const only interprets # as the start of an array if the following character is
a word character (e.g. #x, #_, #1), or
a : (e.g. #::foo), or
a ' (e.g. #'foo (this is the old syntax for ::)), or
a { (e.g. #{foo}), or
a $ (e.g. #$foo), or
a + or - (the arrays #+ and #-), but not in regexes.
As you can see, the only punctuation arrays that are supported are #- and #+, and even then not inside a regex. Initially no punctuation arrays were supported; #- and #+ were special-cased in 2000. (The exception in regex patterns was added to make /[\c#-\c_]/ work; it used to interpolate #- first.)
There is a workaround: Because #{ is treated as the start of an array variable, the syntax "#{;}" works (but that doesn't help your golf code because it makes the code longer).

Perl's documentation says that the result is "not strictly predictable".
The following, from perldoc perlop (Perl 5.22.1), refers to interpolation of scalars. I presume it applies equally to arrays.
Note also that the interpolation code needs to make a decision on
where the interpolated scalar ends. For instance, whether
"a $x -> {c}" really means:
"a " . $x . " -> {c}";
or:
"a " . $x -> {c};
Most of the time, the longest possible text that does not include
spaces between components and which contains matching braces or
brackets. because the outcome may be determined by voting based on
heuristic estimators, the result is not strictly predictable.
Fortunately, it's usually correct for ambiguous cases.

Some things are just because "Larry coded it that way". Or as I used to say in class, "It works the way you think, provided you think like Larry thinks", sometimes adding "and it's my job to teach you how Larry thinks."

Getting Error of Modification of a read-only value attempted

I am trying to select the below value from database:
Reporting that one of #its many problems had been the recent# extended
sales slump in women's apparel, the seven-store retailer said it would
start a three-month liquidation sale in all of its stores.~(A) its
many problems had been the recent~(B) its many problems has been the
recently~(C) its many problems is the recently~(D) their many problems
is the recent~(E) their many problems had been the recent~
i am selecting this value in variable $ques and then selecting a text as below:
$ques=~s/^(.*?)\#(.*?)\#(.*?)$/$2/;
Now, while replacing the ~ character in the string by
$3=~s/~/\n/g; ---->line 171
and running the script, I am getting one error as:
Modification of a read-only value attempted at main.pl line 171
I want to replace all the ~ character with '\n' and print the final value. Please suggest how to do it.
*I have researched this on net, but got confused that how to handle these read only variables.

You've already got a good explanation of the problem from José Castro. But there's another solution if you're using a recent-ish version of Perl (Update: having checked more carefully, I find that means 5.14+). The /r argument to the substitution operator will copy your string, make the substitution on the copy and then return that altered value.
So you could write:
my $new_value = $3 =~ s/~/\n/rg;

It sounds like what you really want in this case is split rather than regular expression capture groups:
my #parts = split(/#/, $ques);
$parts[2] =~ s/~/\n/g;
It makes the intent of your code clearer since you are, in fact, splitting on # symbols.

Just like you say, the special variables $1, $2, etc., are read-only, and that means that you can't perform that substitution on them.
Performing the substitution on $ques will do what you need:
$ques =~ s/~/\n/g;
print $ques;
Do note that in the earlier substitution that you're performing on $ques you're getting rid of all the ~ characters.

validate 32bit integer with regex

I'm trying to come up with a regex that will match anything that is not a 32bit integer. My eventual goal is to match lines that are not in the following format
Integer\tInteger\tInteger\tInteger\tInteger\tInteger\tInteger
(7 32bit integers and 1 tab in between each integer)
So far I've come up with this
#!/usr/bin/perl -w
use strict;
while ( my $line = <> ) {
if ( $line =~ /^(429496729[0-6]|42949672[0-8]\d|4294967[01]\d{2}|429496[0-6]\d{3}|42949[0-5]\d{4}|4294[0-8]\d{5}|429[0-3]\d{6}|42[0-8]\d{7}|4[01]\d{8}|[1-3]\d{9}|[1-9]\d{8}|[1-9]\d{7}|[1-9]\d{6}|[1-9]\d{5}|[1-9]\d{4}|[1-9]\d{3}|[1-9]\d{2}|[1-9]\d|\d)$/ ) {
print "Match at line $.\n";
print "$line"
}
}
But I can't even get to the first step of having the regex match a 32bit numbers (once I tackle that problem I can tackle having the tabs be the way they need to be)
Am I solving this problem the right way? Any thoughts?

Am I solving this problem the right way?
Assuming validation is actually needed, my first approach would be to split on tabs, check the number of fields, check each field but not by using a regex. Doing a range check in a regex is silly! (Padding using sprintf then doing a string compare would solve overflow problems.)
Other issues:
\d matches far more than just 0-9. Use /\d/a or /[0-9]/ if you want to match just 0-9.
What about negative numbers? 32-bit integers can also be used to store 2147483647..-2147483648.
What about leading zeros and leading plus or minus signs?
What about thousand separators?
Is 10.0 an integer? Mathematically speaking, it is. Perl would also store that as an integer.

I would say no, this is not the correct way - it's very hard to try and follow that regex; while it can be done, consider if it'll make sense tomorrow. Or how hard it will be to alter if the range changes or a slight variation to the format is required :)
Here are my suggestions:
Read Is it a Number? to find out how to tell if a value is a number and, if so, extract it as one. That is, get a real numeric value, not a string. Additional checks can be done at this stage if desired to restrict what "valid" numbers are; don't restrict the range, just the format.
Use a simple range check for the extracted number - between 0 and 232-1 in this case?

You could do it all in a regex, but it's better to treat them as numbers and use math.
# Split it into fields.
my #fields = split /\t/, $line;
# Scan for fields which do not look like integers
# or are outside the unsigned 32 bit integer range
my $valid_line = !grep { /[^0-9]/ || ($_ < 0) || (2**32-1 < $_) } #fields;
All the caveats in the other answers about "what is a 32 bit integer" still apply. Is "+10" valid? "10.0"? Can't answer that without knowing why you're filtering for these numbers, adjust the logic as necessary.
And just to throw in a perl5i plug...
use perl5i::2;
my $valid_line = !grep { $_->is_integer && ($_ < 0) || (2**32-1 < $_) } #fields;

How does this Perl one-liner actually work?

So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?

This is a variant on “Just another Perl hacker”, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning “any number of characters, specified by their number in the current character set”. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, …, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, …, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.

The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How do I insert a lot of whitespace in Perl? - perl

I need to buff out a line of text with a varying but large number of whitespace. I can figure out a janky way of doing a loop and adding whitespace to $foo, then splicing that into the text, but it is not an elegant solution.

If you always want the string to be a certain length you can use sprintf: For example, to pad out $var with white space so it 20,000 characters long use: $var = sprintf("%-20000s",$var);

use the 'x' operator: print ' ' x 20000;

Related

Generate all combinations from list of characters

Why are ##, #!, #, etc. not interpolated in strings?

Getting Error of Modification of a read-only value attempted

validate 32bit integer with regex

How does this Perl one-liner actually work?

Categories

Resources