string search with user defined mismatch perl - perl

I am trying to search a substring in a string with certain mismatch (mismatch allowed is taken from user). I came across String::Approx but I cant seem to understand how it work and your input is appreciated.
use String::Approx 'aindex';
$a="Aperlisaparlhardobjectproduced...";
$b="paal";
print aindex($b, ["I0", "D0", "S1"], $a);
When I search with 1 substitution, it works fine (I0 and D0 because I want to search for same length match) and gives output as 8. When I search with S2 or S3 it still gives 8 instead of position 1 (should have matched perl). Is it intended result?
Thank you!

Related

Why does this line return sum of integers 1-10?

I'd like to understand how unpack is returning the sum in the given perl one-liner.
I've looked at pack man page and mostly understood that it is simply formatting the given array into a scalar of ten doubles.
However, I couldn't find proper documentation for unpack with %123. Looking for help here.
print unpack "%123d*" , pack( "d*", (1..10));
This line correctly outputs 55 which is 1+2+3+...+10.
From perldoc -f unpack:
In addition to fields allowed in pack(), you may prefix a field with a % to indicate that you want a <number>-bit checksum of the items instead of the items themselves.
Thus %123d* means to add all the input integers 1..10 and then take the first 123 bit of this result in order to construct the "<number>-bit checksum". Note that %8d* or just %d* (which is equivalent to %16d*) would suffice too given that the sum is small enough.

How to take a substring with the endpoint being a carriage return and/or line feed?

How do I take a substring where I don't know the length of the thing I want, but I know that the end of it is a CR/LF?
I'm communicating with a server trying to extract some information. The start point of the substring is well defined, but the end point can be variable. In other scripting languages, I'd expect there to be a find() command, but I haven't found one in PowerShell yet. Most articles and SE questions refer to Get-Content, substring, and Select-String, with the intent to replace a CRLF rather than just find it.
The device I am communicating with has a telnet-like command structure. It starts out with it's model as a prompt. You can give it commands and it responds. I'm trying to grab the hostname from it. This is what a prompt, command, and response look like in a terminal:
TSS-752>hostname
Host Name: ThisIsMyHostname
TSS-752>
I want to extract the hostname. I came across IndexOf(), which seems to work like the find command I am looking for. ":" is a good start point, and then I want to truncate it to the next CRLF.
NOTE: I have made my code work to my satisfaction, but in the interest of not receiving anymore downvotes (3 at the time of this writing) or getting banned again, I will not post the solution, nor delete the question. Those are taboo here. Taking into account the requests for more info from the comments has only earned me downvotes, so I think I'm just stuck in the SO-Catch-22.
You could probably have found the first 20 examples in c# outlining this exact same approach, but here goes with PowerShell examples
If you want to find the index at which CR/LF occurs, use String.IndexOf():
PS C:\> " `r`n".IndexOf("`r`n")
2
Use it to calculate the length parameter argument for String.Substring():
$String = " This phrase starts at index 4 ends at some point`r`nand then there's more"
# Define the start index
$Offset = 4
# Find the index of the end marker
$CRLFIndex = $string.IndexOf("`r`n")
# Check that the end marker was actually found
if($CRLFIndex -eq -1){
throw "CRLF not found in string"
}
# Calculate length based on end marker index - start index
$Length = $CRLFIndex - $Offset
# Generate substring
$Substring = $String.Substring($Offset,$Length)

Getting Error of Modification of a read-only value attempted

I am trying to select the below value from database:
Reporting that one of #its many problems had been the recent# extended
sales slump in women's apparel, the seven-store retailer said it would
start a three-month liquidation sale in all of its stores.~(A) its
many problems had been the recent~(B) its many problems has been the
recently~(C) its many problems is the recently~(D) their many problems
is the recent~(E) their many problems had been the recent~
i am selecting this value in variable $ques and then selecting a text as below:
$ques=~s/^(.*?)\#(.*?)\#(.*?)$/$2/;
Now, while replacing the ~ character in the string by
$3=~s/~/\n/g; ---->line 171
and running the script, I am getting one error as:
Modification of a read-only value attempted at main.pl line 171
I want to replace all the ~ character with '\n' and print the final value. Please suggest how to do it.
*I have researched this on net, but got confused that how to handle these read only variables.
You've already got a good explanation of the problem from José Castro. But there's another solution if you're using a recent-ish version of Perl (Update: having checked more carefully, I find that means 5.14+). The /r argument to the substitution operator will copy your string, make the substitution on the copy and then return that altered value.
So you could write:
my $new_value = $3 =~ s/~/\n/rg;
It sounds like what you really want in this case is split rather than regular expression capture groups:
my #parts = split(/#/, $ques);
$parts[2] =~ s/~/\n/g;
It makes the intent of your code clearer since you are, in fact, splitting on # symbols.
Just like you say, the special variables $1, $2, etc., are read-only, and that means that you can't perform that substitution on them.
Performing the substitution on $ques will do what you need:
$ques =~ s/~/\n/g;
print $ques;
Do note that in the earlier substitution that you're performing on $ques you're getting rid of all the ~ characters.

How does this Perl one-liner actually work?

So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?
This is a variant on “Just another Perl hacker”, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning “any number of characters, specified by their number in the current character set”. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, …, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, …, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.
The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);

regexpression [R] "About 52,883,038 results"

I want to parse an html webpage (Specifically a Google Search Results Page)
Looking for the specific counter string
"About *many results"
where *many can range from 0 to 999,999,999,999 results
grep("About [0-9] results",file)
I can't figure out how to incorporate the range of numbers (including commas) into the regular expression. Can anyone clarify? I've looked for similar questions posted, but their codes do not work for this task.
I'm guessing introduce some kind of wildcard "." but I don't think I'm using it correctly
The structure I had in mind was
Any#Times { { Any#Times( [0-9] ) },}
Solved own question...
didn't have to be fancy at all
"About .* results"
works fine
Depending on the content of the page then your .* works, but could get a very long and incorrect string.
If you want to make sure that you get only numbers, try:
"About ([0-9]+|[0-9]{1,3}(,[0-9]{3})*) results"
I've tested it with grep -E and it'll give you ungrouped numbers:
About 10000000 results
as well as grouped numbers using British/English conventions:
About 100,000 results
but not non-numbers:
About a bajillion results
or badly grouped numbers:
About 100,0 results