perl to hardcode a static value in a field - perl

I am still learning perl and have all most got a program written. My question, as simple as it may be, is if I want to hardcode a string to a field would the below do that? Thank you :).
$out[45]="VUS";
In the other lines I use the below to define the values that are passed into the `$[out], but the one in question is hardcoded and the others come from a split.
my #vals = split/\t/; # this splits the line at tabs
my #mutations=split/,/,$vals[9]; # splits on comma to create an array of mutations
my ($gene,$transcript,$exon,$coding,$aa);
for (#mutations)
{
($gene,$transcript,$exon,$coding,$aa) = split/\:/; # this takes col AB and splits it at colons
grep {$transcript eq $_} keys %nms or next;
}
my #out=($.,#colsleft,$_,#colsright);
$out[2]=$gene;
$out[3]=$nms{$transcript};
$out[4]=$transcript;
$out[15]=$coding;
$out[17]=$aa;

Your line of code: $out[45]="VUS"; is correct in that it is defining that 46th element of the array #out to the string, "VUS". I am trying to understand from your code, however why you would want to do that? Usually, it is better practice to not hardcode if at all possible. You want to make it your goal to make your program as dynamic as possible.

Related

Perl $1 giving uninitialized value error

I am trying to extract a part of a string and put it into a new variable. The string I am looking at is:
maker-scaffold_26653|ref0016423-snap-gene-0.1
(inside a $gene_name variable)
and the thing I want to match is:
scaffold_26653|ref0016423
I'm using the following piece of code:
my $gene_name;
my $scaffold_name;
if ($gene_name =~ m/scaffold_[0-9]+\|ref[0-9]+/) {
$scaffold_name = $1;
print "$scaffold_name\n";
}
I'm getting the following error when trying to execute:
Use of uninitialized value $scaffold_name in concatenation (.) or string
I know that the pattern is right, because if I use $' instead of $1 I get
-snap-gene-0.1
I'm at a bit of a loss: why will $1 not work here?
If you want to use a value from the matching you have to make () arround the character in regex
To expand on Jens' answer, () in a regex signifies an anonymous capture group. The content matched in a capture group is stored in $1-9+ from left to right, so for example,
/(..):(..):(..)/
on an HH:MM:SS time string will store hours, minutes, and seconds in $1, $2, $3 respectively. Naturally this begins to become unwieldy and is not self-documenting, so you can assign the results to a list instead:
my ($hours, $mins, $secs) = $time =~ m/(..):(..):(..)/;
So your example could bypass the use of $ variables by doing direct assignment:
my ($scaffold_name) = $gene_name =~ m/(scaffold_[0-9]+[|]ref[0-9]+)/;
# $scaffold_name now contains 'scaffold_26653|ref0016423'
You can even get rid of the ugly =~ binding by using for as a topicalizer:
my $scaffold_name;
for ($gene_name) {
($scaffold_name) = m/(scaffold_\d+[|]ref\d+)/;
print $scaffold_name;
}
If things start to get more complex, I prefer to use named capture groups (introduced in Perl v5.10.0):
$gene_name =~ m{
(?<scaffold_name> # ?<name> creates a named capture group
scaffold_\d+? # 'scaffold' and its trailing digits
[|] # Literal pipe symbol
ref\d+ # 'ref' and its trailing digits
)
}xms; # The x flag lets us write more readable regexes
print $+{scaffold_name}, "\n";
The results of named capture groups are stored in the magic hash %+. Access is done just like any other hash lookup, with the capture groups as the keys. %+ is locally scoped in the same way the $ are, so it can be used as a drop-in replacement for them in most situations.
It's overkill for this particular example, but as regexes start to get larger and more complicated, this saves you the trouble of either having to scroll all the way back up and count anonymous capture groups from left to right to find which of those darn $ variables is holding the capture you wanted, or scan across a long list assignment to find where to add a new variable to hold a capture that got inserted in the middle.
My personal rule of thumb is to assign the results of anonymous captured to descriptively named lexically scoped variables for 3 or less captures, then switch to using named captures, comments, and indentation in regexes when more are necessary.

Perl get array count so can start foreach loop at a certain array element

I have a file that I am reading in. I'm using perl to reformat the date. It is a comma seperated file. In one of the files, I know that element.0 is a zipcode and element.1 is a counter. Each row can have 1-n number of cities. I need to know the number of elements from element.3 to the end of the line so that I can reformat them properly. I was wanting to use a foreach loop starting at element.3 to format the other elements into a single string.
Any help would be appreciated. Basically I am trying to read in a csv file and create a cpp file that can then be compiled on another platform as a plug-in for that platform.
Best Regards
Michael Gould
you can do something like this to get the fields from a line:
my #fields = split /,/, $line;
To access all elements from 3 to the end, do this:
foreach my $city (#fields[3..$#fields])
{
#do stuff
}
(Note, based on your question I assume you are using zero-based indexing. Thus "element 3" is the 4th element).
Alternatively, consider Text::CSV to read your CSV file, especially if you have things like escaped delimiters.
Well if your line is being read into an array, you can get the number of elements in the array by evaluating it in scalar context, for example
my $elems = #line;
or to be really sure
my $elems = scalar(#line);
Although in that case the scalar is redundant, it's handy for forcing scalar context where it would otherwise be list context. You can also find the index of the last element of the array with $#line.
After that, if you want to get everything from element 3 onwards you can use an array slice:
my #threeonwards = #line[3 .. $#line];

Transform data to array with Perl

How do I transform my data to an array with Perl?
Here is my data:
my $data =
"203.174.38.128203.174.38.129203.174.38.1" .
"30203.174.38.131203.174.38.132203.174.38" .
".133203.174.38.134173.174.38.135203.174." .
"38.136203.174.38.137203.174.38.142";
And I want to transform it to be array like this
my #array= (
"203.174.38.128",
"203.174.38.129",
"203.174.38.130",
"203.174.38.131",
"203.174.38.132",
"203.174.38.133",
"203.174.38.134",
"173.174.38.135",
"203.174.38.136",
"203.174.38.137",
"203.174.38.142"
);
Anyone know how to do that with Perl?
If the first part of IP logged is always 203, it's kinda easy:
my #arr = split /(?<=\d)(?=203\.)/, $data;
In the example given it's not, but the first part is always 3-digit, and the second part is always 174, so it's enough to do...
my #arr = split /(?<=\d)(?=\d{3}\.174\.)/, $data;
... to get the correct result.
But please understand that it's close to impossible to give a more generic (and bulletproof) solution here - when these 'marker' parts are... too dynamic. For example, take this string...
11.11.11.22222.11.11.11
The question is, where to split it? Should it be 11.11.11.22; 222.11.11.11? Or 11.11.11.222; 22.11.11.11? Both are quite valid IPs, if you ask me. And it could get even worse, with trying to split '2222' part (can be '2; 222', '22; 22' and even '222; 2').
You can, for example, make a rule: "split each sequence of > 3 digits followed by a dot sign so that the second part of this split would always start from 3 digits":
my #arr = split /(?<=\d)(?=\d{3}\.)/, $data;
... but this will obviously fail to work properly in the ambiguous cases mentioned earlier IF there are IPs with two- or even one-digit first octet in your datastring.
If you write a regex that will match any valid value for one of the numbers in the quartet then you can just search for them all and recombine them in sets of four. This
/2[0-5][0-5]|1\d\d|[1-9]\d|\d/
matches 200-255 or 100-199 or 10-99 or 0-9, and a program to use it is shown below.
There is no way to know which option to take if there is more than one way to split the string, and this solution assigns the longest value to the first of the two ip addresses. For instance, 1.1.1.1234.1.1.1 will split as 1.1.1.123 and 4.1.1.1
use strict;
use warnings;
use feature 'say';
my $data =
"203.174.38.128203.174.38.129203.174.38.1" .
"30203.174.38.131203.174.38.132203.174.38" .
".133203.174.38.134173.174.38.135203.174." .
"38.136203.174.38.137203.174.38.142";
my $byte = qr/2[0-5][0-5]|1\d\d|\d\d|\d/;
my #bytes = $data =~ /($byte)/g;
my #addresses;
push #addresses, join('.', splice(#bytes, 0, 4)) while #bytes;
say for #addresses;
output
203.174.38.128
203.174.38.129
203.174.38.130
203.174.38.131
203.174.38.132
203.174.38.133
203.174.38.134
173.174.38.135
203.174.38.136
203.174.38.137
203.174.38.142
Using your sample, it looks like you have 3 digits for the first and last node. That would prompt using this pattern:
/(\d{3}\.\d{1,3}\.\d{1,3}\.\d{3})/
Add that with a /g switch and it will pull every one.
However, if you have a larger and divergent set of data than what you show for your sample, somebody should have separated the ips before dumping them into this string. If they are separate data points, they should have some separation.

How does this Perl one-liner actually work?

So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?
This is a variant on “Just another Perl hacker”, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning “any number of characters, specified by their number in the current character set”. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, …, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, …, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.
The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);

(3 lines) from bash to perl?

I have these three lines in bash that work really nicely. I want to add them to some existing perl script but I have never used perl before ....
could somebody rewrite them for me? I tried to use them as they are and it didn't work
note that $SSH_CLIENT is a run-time parameter you get if you type set in bash (linux)
users[210]=radek #where 210 is tha last octet from my mac's IP
octet=($SSH_CLIENT) # split the value on spaces
somevariable=$users[${octet[0]##*.}] # extract the last octet from the ip address
These might work for you. I noted my assumptions with each line.
my %users = ( 210 => 'radek' );
I assume that you wanted a sparse array. Hashes are the standard implementation of sparse arrays in Perl.
my #octet = split ' ', $ENV{SSH_CLIENT}; # split the value on spaces
I assume that you still wanted to use the environment variable SSH_CLIENT
my ( $some_var ) = $octet[0] =~ /\.(\d+)$/;
You want the last set of digits from the '.' to the end.
The parens around the variable put the assignment into list context.
In list context, a match creates a list of all the "captured" sequences.
Assigning to a scalar in a list context, means that only the number of scalars in the expression are assigned from the list.
As for your question in the comments, you can get the variable out of the hash, by:
$db = $users{ $some_var };
# OR--this one's kind of clunky...
$db = $users{ [ $octet[0] =~ /\.(\d+)$/ ]->[0] };
Say you have already gotten your IP in a string,
$macip = "10.10.10.123";
#s = split /\./ , $macip;
print $s[-1]; #get last octet
If you don't know Perl and you are required to use it for work, you will have to learn it. Surely you are not going to come to SO and ask every time you need it in Perl right?