Convert the position of a character in a string to account for "gaps" (i.e., non alphanumeric characters in the string) - perl

In a nutshell
I have a string that looks something like this ...
---MNTSDSEEDACNERTALVQSESPSLPSYTRQTDPQHGTTEPKRAGHT--------LARGGVAAPRERD
And I have a list of positions and corresponding characters that looks something like this...
position character
10 A
12 N
53 V
54 A
This position/character key doesn't account for hyphen (-) characters in the string. So for example, in the given string the first letter M is in position 1, the N in position 2, the T in position 3, etc. The T preceding the second chunk of hyphens is position 47, and the L after that hyphen chunk is position 48.
I need to convert the list of positions and corresponding characters so that the position accounts for hyphen characters. Something like this...
position character
13 A
15 N
64 V
65 A
I think there should be a simple enough way to do this, but I am fairly new so I am probably missing something obvious, sorry about that! I am doing this as part of bigger script, so if anyone had a way to accomplish this using perl that would be amazing. Thank you so much in advance and please let me know if I can clarify anything or provide more information!
What I tried
At first, I took a substring of characters equal to the position value, counted the number of hyphens in that substring, and added the hyphen count onto the original position. So for the first position/character in my list, take the first 10 characters, and then there are 3 hyphens in that substring, so 10+3 = 13 which gives the correct position. This works for most of my positions, but fails when the original position falls within a bunch of hyphens like for positions 53 and 54.
I also tried grabbing the character by taking out the hyphens and then using the original position value like this...
my #array = ($string =~ /\w/g);
my $character = $array[$position];
which worked great, but then I was having a hard time using this to convert the position to include the hyphens because there are too many matching characters to match the character I grabbed here back to the original string with hyphens and find the position in that (this may have been a dumb thing to try from the start).

The actual character seems not to be relevant. It's enough to count the non-hyphens:
use strict;
use warnings;
use Data::Dumper;
my $s = '---MNTSDSEEDACNERTALVQSESPSLPSYTRQTDPQHGTTEPKRAGHT--------LARGGVAAPRERD';
my #positions = (10,12,53,54);
my #transformed = ();
my $start = 0;
for my $loc(#positions){
my $dist = $loc - $start;
while ($dist){
$dist-- if($s =~ m/[^-]/g);
}
my $pos = pos($s);
push #transformed, $pos;
$start = $loc;
}
print Dumper \#transformed;
prints:
$VAR1 = [
13,
15,
64,
65
];

Related

How to check is the text is in right format? Flutter

I'm having a condition as the particular text must be in the particular format where in I'm getting the text from the scanner and need to check if it is in the right format.
The format comes like
The starting letter must start with e or E, next letter might be any letter from a to z or A to Z alphabets, next 9 characteres must be numbers and the last two characters must be from anything within a to z or A to Z alphabets
I tried something like
if (_scannedCode.startsWith('e|E') && _scannedCode[1].startsWith('a-zA-Z') && _scannedCode.substring(2, 10))
but got struck.
Seeing the answers did get the condtions correctly but was struck up with one, so just wanted to get a clarification if its right or not
RegExp emo = new RegExp(r'[0-9]{6}(EM|em|eM|Em){2}[0-9]{10}$');
As i needed the first 6 characters to be numbers the next two characters be alphabet(em) and the remaining 10 characters be numbers.
final regex = RegExp(r'(e|E)[a-zA-z]\d{9}[a-zA-z]{2}');
if (_scannedCode.length == 13 && regex.hasMatch(_scannedCode) ) {
// your code
}

Extrating the digits before the decimal point with Powershell

I am using Powershell and need to extract the digits before the decimal point so that I can evaluate the number extracted
So with $Interest_Rate = 15.5
I have tried the following code .. but they do not work:
$Interest_RatePart1 = "{0:N0}" -f $Interest_Rate
It rounds the value to 16
$Interest_RatePart1 = ($Interest_Rate -split '.')[0].trim()
It returns a blank.
I just want to return 15
Formatting the number will cause rounding away from zero
Use Math.Truncate() - which always rounds towards zero - instead:
$Interest_RatePart1 = [Math]::Truncate($Interest_Rate)
FWIW, the reason your last attempt returns nothing, is because -split defaults to regular expressions, and . means any character in regex.
Either escape the . with \:
$Interest_RatePart1 = ($Interest_Rate -split '\.')[0].Trim()
or specify that it shouldn't use regex:
$Interest_RatePart1 = ($Interest_Rate -split '.', 2, 'SimpleMatch')[0].Trim()
or use the String.Split() method instead:
$Interest_RatePart1 = "$Interest_Rate".Split('.')[0].Trim()
Mathias' [Math]::Truncate is correct - some other options for you though, pay attention to Floor as it is Slightly Different to Truncate when working with negative numbers.
Cast to int (can round up)
[int]$Interest_Rate
Use [Math]::Floor (will always round down, similar to truncate for non-negative numbers)
[Math]::Floor($Interest_Rate)
Use [Math]::Round with 0 decimal places. (can round up)
[Math]::Round($Interest_Rate, 0)

How to read a specific number (or word) from an answer

I have an .nc file I'm reading in matlab, and getting info out of the time variable.
the code looks like this
>> ncreadatt(model_list{3},'T','units')
ans =
'months since 1850-01-01'
what I want to do is get just the '1850' out of the answer.
Regular expression is a very powerful tool to parse and manipulate strings.
Matlab has regexp command:
line = 'months since 1850-01-01';
res = regexp( line, '\s(\d+)-', 'tokens', 'once');
year = str2double(res{1})
And the results is:
year =
1850
The regular expression used '\s(\d+)-' means:
\s - look for a single white space character (the space before 1850).
'(\d+)' - look for one or more digit ('\d+'), the parentheses means that all charcters matching here will be saved as a "token".
'-' - look for a single '-' after the digits.
You can play with it on ideone.

Function to split string in matlab and return second number

I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.
(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));
If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str

Trouble padding a Perl string array without increasing array length

I have an integer value my $reading = 1200;.
I have an array my #DigitField = "000000000";
I want to replace the right-hand 4 elements of the array with $reading's value, and I want to do this programmatically using Perl's length function as shown below.
I've tried.
my #DigitField = "000000000";
my $reading = 1200;
splice #DigitField, length(#DigitField) + 1, length $reading, $reading;
print #DigitField;
but I'm getting
0000000001200
and I want the string to remain nine characters wide.
What are some other ways to replace part of a Perl string array?
I think you are possibly confused - the # sigil indicates #DigitField is an array variable. A string is not an array.
I think you want to format the number:
my $reading = 1200;
my $digitfield = sprintf('%09d', $reading);
print $digitfield, "\n";
I added a \n to the end of the print, this adds a newline. Depending on the context of your program, you may or may not want this in the final.