How to convert binary to hex in Batch or Powershell?

How to convert binary to hex in Batch or Powershell? - powershell

I wondering if there is a way to convert binary to hexadecimal, in Batch or Powershell language.
Exemple :
10000100 to 84
01010101 to 55
101111111111 to BFF
In a simple way, I’m not very good in Batch or Powershell.
I will appreciate any kind of information

Converting a binary string to an integer is pretty straightforward:
$number = [Convert]::ToInt32('10000100', 2)
Now we just need to convert it to hexadecimal:
$number.ToString('X')
or
'{0:X}' -f $number

(pure batch)
#ECHO OFF
SETLOCAL
CALL :CONVERT 10000100
CALL :CONVERT 101111111111
CALL :CONVERT 1111111111
GOTO :EOF
:: Convert %1 to hex
:CONVERT
SET "data=%1"
SET "result="
:cvtlp
:: If there are no characters left in `data` we are finished
IF NOT DEFINED data ECHO %1 ----^> %result%&GOTO :EOF
:: Get the last 4 characters of `data` and prefix with "000"
:: This way, if there are only say 2 characters left (xx), the result will be
:: 000xx. we then use the last 4 characters only
=
SET "hex4=000%data:~-4%"
SET "hex4=%hex4:~-4%"
:: remove last 4 characters from `data`
SET "data=%data:~0,-4%"
:: now convert to hex
FOR %%a IN (0 0000 1 0001 2 0010 3 0011 4 0100 5 0101 6 0110 7 0111
8 1000 9 1001 A 1010 B 1011 C 1010 D 1101 E 1110 F 1111
) DO IF "%%a"=="%hex4%" (GOTO found) ELSE (SET "hex4=%%a")
:found
SET "result=%hex4%%result%"
GOTO cvtlp
This solution uses a parsing trick in the for %%a loop. The original value of hex4 is compared in the if and where the if fails, the value tested is assigned to hex4 so that when a match is found, the previous value tested remains in hex4.

Related

Verifying NSEC3 records

I'm fiddling with DNSSEC, and I'd like to try to verify NSEC3 records generated by dnssec-signzone from bind9-utils (which I presume are valid). This is my zone file:
$ORIGIN dnssectest.mvolfik.tk.
$TTL 120
# SOA dnssectestns.mvolfik.tk. email.example.com. 15 259200 3600 300000 3600
A 192.168.0.101
s3c A 192.168.0.101
$INCLUDE zsk.key
$INCLUDE ksk.key
ZSK and KSK are generated with dnssec-keygen -a ECDSAP256SHA256 dnssectest.mvolfik.tk. (add -f KSK respectively)
I then signed it using the command dnssec-signzone -3 deadbeef -H 5 -o dnssectest.mvolfik.tk -k ksk.key zonefile zsk.key (use NSEC3 with deadbeef hex salt, 5 iterations)
I got the following NSEC3 records in the zonefile.signed: (omitted RRSIG and DNSKEY as irrelevant; A and SOA didn't change)
0 NSEC3PARAM 1 0 5 DEADBEEF
F66KKS17FM851AVA4EARFHS55I3TOO85.dnssectest.mvolfik.tk. 3600 IN NSEC3 1 0 5 DEADBEEF (
D60TA5J5RS4JD5AQK25B1BCUAHGP4DHC
A SOA RRSIG DNSKEY NSEC3PARAM )
D60TA5J5RS4JD5AQK25B1BCUAHGP4DHC.dnssectest.mvolfik.tk. 3600 IN NSEC3 1 0 5 DEADBEEF (
F66KKS17FM851AVA4EARFHS55I3TOO85
A RRSIG )
Now that I know that the only domains in this zone are s3c.dnssectest.mvolfik.tk. and dnssectest.mvolfik.tk., I assume that the following Python script would get me the same hashes as in the signe zone file above: (from pseudocode in RFC 5155)
import hashlib
def ih(salt, x, k):
if k == 0:
return hashlib.sha1(x + salt).digest()
return hashlib.sha1(ih(salt, x, k-1) + salt).digest()
print(ih(bytes.fromhex("deadbeef"), b"s3c.dnssectest.mvolfik.tk.", 5).hex())
print(ih(bytes.fromhex("deadbeef"), b"dnssectest.mvolfik.tk.", 5).hex())
However, I instead got b58374998347ba833ab33f15332829a589a80d82 and 545e01397a776ee73aa0372aea015408cc384574. What am I doing wrong?

So I looked into dnspython source code, and found the nsec3_hash function. Turns out that the name must be in wire format (means removing dots and instead prefixing labels a length byte - \x03s3c\x10dnssectest\x07mvolfik\x02tk\x00 etc, null byte at the end). And the result is encoded with base32 (0-9A-V), not hex. Probably easier just to use the dnspython library, but here's the full (a bit naive) code:
import hashlib, base64
b32_trans = str.maketrans(
"ABCDEFGHIJKLMNOPQRSTUVWXYZ234567", "0123456789ABCDEFGHIJKLMNOPQRSTUV"
)
def ih(salt, x, k):
if k == 0:
return hashlib.sha1(x + salt).digest()
return hashlib.sha1(ih(salt, x, k - 1) + salt).digest()
def nsec3(salt, name, k):
if not name.endswith("."):
name += "."
labels = name.split(".")
name_wire = b"".join(len(l).to_bytes(1, "big") + l.lower().encode() for l in labels)
digest = ih(bytes.fromhex(salt), name_wire, k)
return base64.b32encode(digest).decode().translate(b32_trans)
print(nsec3("deadbeef", "dnssectest.mvolfik.tk.", 5))
print(nsec3("deadbeef", "s3c.dnssectest.mvolfik.tk.", 5))
This gets the correct hashes seen in the NSEC3 records

A code to take an amount of others in a string of lines

Before I begin, I would include this link to the problem in a word document, with highlighted texts, so the problem would be much clearer
https://archive.org/download/batfile10112019/bat%20file%2010112019.rar
*0000000000003000345800483854651180013732112019 0
*000000000010004466170000003000083BOUBADJA SAFIA 1
*000000000010010346810000003110730BOUKHEMKHEM NABILA 1
*000000000010010694160000000000806ROUIBAH MESSAOUDA 1
*000000000010014708210000000000999SETILA AFAF 1
*000000000010024010600000003176161ZAITER EP BOUHAROUD SOUAD 1
*000000000010054726551524653176161BOULASSEL NORA 1
Let’s suppose I have the above text file that contains lines, the length of each line is 62 (a combination of characters and spaces, you could verify by placing the cursor before the « * » character and count till the last character). I want to keep the header as it is, but for the other lines, I want a batch file (.bat) that will do the following :
Keep the header as it is ( as I mentioned above).
It will take 10.00 (unit of money, whether it’s euro, dollar etc…) out of each amount in the lines, the amount of each line begins from position « 22 » to position « 34 », so the amount of :
The second line is : 30000.83
The third line is : 31107.30
The fourth line is : 8.06
The fifth line is : 9.99
The sixth line is : 31761.61
The seventh line is : 15246531761.61
We can’t take 10.00 (dollars or euro or whatever…) out of the amounts of the fourth and the fifth lines which are 8.06 and 9.99 respectively, so the batch file will keep them as they are.
But for the amounts of the second, third, sixth and the seventh line will be changed as follows:
The second line is : 29990.83
The third line is : 31097.30
The fourth line is : 8.06
The fifth line is : 9.99
The sixth line is : 31751.61
The seventh line is : 15246531751.61
So the output file will look like this :
*0000000000003000345800483854651180013732112019 0
*000000000010004466170000002999083BOUBADJA SAFIA 1
*000000000010010346810000003109730BOUKHEMKHEM NABILA 1
*000000000010010694160000000000806ROUIBAH MESSAOUDA 1
*000000000010014708210000000000999SETILA AFAF 1
*000000000010024010600000003175161ZAITER EP BOUHAROUD SOUAD 1
*000000000010054726551524653175161BOULASSEL NORA 1
I have another problem when I deal with larger text file (15000 lines)
A friend helped me, but there were some errors in the code, that's why i included the above link in a word document to see the error message when dealing with text files that contain more than 10000 lines.
the code is:
#echo off
setlocal enableextensions enabledelayedexpansion
chcp 28591 >nul
set nouveau=modified.txt
echo. > %nouveau%
for /f "usebackq delims=" %%A in ("original file.txt") do (
set "line=%%A"
set "index=!line:~-1!"
if !index! EQU 1 (
set "account=!line:~0,21!"
set "amount=!line:~21,13!"
set "number=!line:~21,9!"
set "cut=!line:~30,4!"
set "client=!line:~34!
call :zeros amount
if !amount! GEQ 1000 (
set /a cut=!cut!-1000
set cut=000!cut!
set cut=!cut:~-4!
)
echo.!account!!number!!!cut!!client!
) else (echo.!line!)
) >> %nouveau%
exit
:zeros
set "chaine=!%1!"
for /L %%E in (0,1,12) do (
if not "!chaine:~%%E,1!"=="0" (set "%1=!chaine:~%%E!" & goto :eof)
)
goto :eof
I hope that I can take any amount I want from any line
(10.00 in this example).
If I want to change it to 5.00, is it possible to change simply the value 10.00 to 5.00 in the provided code.
thanks in advance for any help from you guys

You have a logical flaw (using the last four digits only for calculation). Probably you did this to work around the INT32 limit of set /a, but it will cause false results in some cases. You have to calculate with the whole amount. As cmd isn't able to do this, use the help of another language (I chose PowerShell here). The downside is poor performance because PowerShell has to be loaded for each calculation.
#echo off
setlocal enableextensions enabledelayedexpansion
chcp 28591 >nul
set nouveau=modified.txt
break> %nouveau%
(for /f "usebackq delims=" %%A in ("original file.txt") do (
set "line=%%A"
set "index=!line:~-1!"
if !index! EQU 1 (
set "account=!line:~0,21!"
set "amount=!line:~21,13!"
set "client=!line:~34!
REM strip leading zeros:
for /f "tokens=* delims=0" %%a in ("!amount!") do set cut=%%a
if !cut! geq 1000 (
for /f %%b in ('powershell "if (!cut! -ge 1000) {!cut!-1000} else {!cut!}"') do set "cut=0000000000000%%b"
set cut=!cut:~-13!
) else set cut=!amount!
echo !account!!cut!!client!
)
))>"%nouveau%"
goto :eof

Your original code is not just wrong, but also very inefficient... In order to perform the subtraction using the 9-digits limit of set /A command, you may split the operation in two parts: the low order (right side) 7 digits of the number plus the high part (left side) remaining 6 digits. The result is a pure Batch file that should run fast even over a file 15000 lines long.
#echo off
setlocal EnableDelayedExpansion
rem "subtract" may have maximum 7 digits including two decimal digits
set "subtract=1000"
set "nouveau=modified.txt"
set "subtract=0000000%subtract%"
set "subtract=1%subtract:~-7%"
set /P "header=" < "original file.txt"
(
echo %header%
for /F "skip=1 usebackq delims=" %%A in ("original file.txt") do (
set "line=%%A"
set "high=1!line:~21,6!" & set "low=1!line:~27,7!"
set /A "lowN=low-subtract"
if !lowN! geq 0 (
set /A "low=10000000+lowN"
) else (
set /A "highN=high-1000001"
if !highN! geq 0 (
set /A "high=1000000+highN, low=20000000+lowN"
)
)
echo !line:~0,21!!high:~1!!low:~1!!line:~34!
)
) > "%nouveau%"

If you're okay with using powershell, then you could probably just use a simple .ps1 script:
$Minus = 10.00
$LineNo = 1
Get-Content ".\original file.txt" | ForEach {
If ($LineNo -Eq 1) {$_} ElseIf ($LineNo -GT 1) {
[Decimal]$Decimal = $_.Substring(21,11)+"."+$_.Substring(32,2)
If ($Decimal-$Minus -LT 0) {$Result = $_.Substring(21,13)} Else {
$Result = (100*($Decimal-$Minus)).ToString("0000000000000")}
$_.SubString(0,21)+$Result+$_.SubString(34)}
$LineNo++} | Set-Content ".\modified.txt"
Just adjust $Minus as necessary.
Note: Get-Content may not be the quickest method to use if your files are very large.

CMD start .exe with just one of multiple parameters

I would like my batch script to randomly choose one parameter on its own (from around 70 parameters eg. param1 - param70), without my input.
In addition to the random param, the exe has more parameters which always stay the same.
I dont know how to put this in code.
Here's an example of my thought:
param1=--abc
param2=--mno
param3=--xyz
./example.exe --hello --world --(param1 OR param2 OR param3)
which equals to:
./example.exe --hello --world --abc
or
./example.exe --hello --world --mno
or
./example.exe --hello --world --xyz

This can work in batch.You need to set each param though.
set /a numb=%random% %% 3
goto :param%numb%
:param0
Set "var=abc"
Goto :execute
:param1
Set "var=mno"
Goto :execute
:param2
Set "var=xyz"
Goto :execute
:execute
.\example.exe --hello --%var%
For 70 params you need to change %% 3 to %% 70

In powershell:
$params = "abc","mno","xyz"
& example.exe --hello --world --$(Get-Random -InputObject $params -Count 1)

Handling 70 parameters Gerhards way will get tedious. I'd build a parameter array and get a random one.
:: Q:\Test\2018\04\27\SO_50059458.cmd
#Echo off&SetLocal EnableExtensions EnableDelayedExpansion
Rem Build param[] array and count params
Set Cnt=-1&Set "param= abc bcd cde def efg fgh ghi hij ijk jkl klm lmn mno"
Set "param=%param: ="&Set /a Cnt+=1&Set "param[!Cnt!]=%"
:: show array
Set param
:: get random # in Cnt
Set /a Rnd=%Random% %% Cnt
echo Random %Rnd% out of %Cnt%
Echo .\example.exe --hello --!param[%Rnd%]!
Sample output:
> Q:\Test\2018\04\27\SO_50059458.cmd
param[0]=abc
param[10]=klm
param[11]=lmn
param[12]=mno
param[1]=bcd
param[2]=cde
param[3]=def
param[4]=efg
param[5]=fgh
param[6]=ghi
param[7]=hij
param[8]=ijk
param[9]=jkl
Random 10 out of 12
.\example.exe --hello --klm

Remove leading zeroes binary

I want to basically remove my leading zeroes. When I print out a number for example 17 is 00000 0000 0000 0000 0000 0000 00001 0001 but to do remove those leading zeroes. Because in sparc machine that is what is printed out and I need to do this using some sort of loop or logic or shift function.
this is my psuedocode for printing the binary
store input, %l1 ! store my decimal number in l1
move 1,%l2 !move 1 into l2 register
shift logical left l2,31,l2 !shift my mask 1 31 times to the left
loop:
and l2,l1,l3 ! do and logic between l1 and l2 and put this in l3
compare l3,0 compare l3 zero
bne print 1 !branch not equal to zero, to print 1
if equal to 0
print zero
print 1:
print a 1
go: increment counter
compare counter 32
if counter less than 32 return to loop
shift l2 to the right to continue comparison
so this is what is being done say my input is l1 is 17
00000 0000 0000 0000 0000 0000 00001 0001
10000 0000 0000 0000 0000 0000 00000 0000 and my mask 1 shift left 31 times
this pseucode print out my input decimal into binary. But how can I make it remove leading zeroes?
because in the sparc 17 input inside the machine is
0000 0000 0000 0000 0000 0000 0001 00001

You create the labels, like go and print 1 (more commonly done in all caps and without spaces, FYI). So, starting with bne you should always be printing 1, or falling through to see if it needs to print the 0:
! same initialization
mov 0, l4 ! Initialize a flag to avoid printing
LOOP:
and l2, l1, l3 ! do and logic between l1 and l2 and put this in l3
cmp l3, 0 ! Is this a 0 digit?
bne ALWAYS_PRINT ! If it's not 0, then it must be 1 (was "bne print 1")
cmp l4, 1 ! Should we be printing the 0?
be PRINT_VALUE ! Yes, we need to print the 0 because we have seen a 1
ba INCREMENT ! We should not be printing the 0, so check the next
! digit (ba is "branch always")
ALWAYS_PRINT: !
mov 1, %l4 ! Note that we want to always print for the
! rest of the loop
PRINT_VALUE: ! Do whatever you're doing to print values
print value in l3 ! Always print the value
INCREMENT: ! Formerly your "go:" label
! same logic
! AFTER LOOP IS DONE LOGIC
cmp l4, 0 ! If the flag was never set, then the value is 0
! Alternatively, you could just compare the value to 0
! and skip the loop entirely, only printing 0 (faster)
bne DO_NOT_PRINT ! If it was set (1), then do nothing
print zero ! If it wasn't set, then print the 0
DO_NOT_PRINT:
To walk through it a little, you need to continue to initialize your values and shift the bits to figure out what the current digit is for each iteration. Since you will need another flag, then you need to use another register that is initialized to an expected value (I chose 0, which commonly represents false).
Get current digit into l3 (0 or 1)
See if it is 0
If it's not 0, then it must be 1. So go remember that we found a 1, for later, then print the value and increment/loop.
If it's 0, then see if we have found a 1 before. If so, then print the value and increment/loop. If not, then increment/loop.
For actually printing, I have no idea what you are actually doing. However, you can avoid a second comparison by using the labels. For example, ALWAYS_PRINT will always be used when the value is 1, so you can just set the flag and immediately print 1, then jump to INCREMENT. If you did that, then PRINT_VALUE would only be used to print 0, which could then fall through to INCREMENT.
From a high level language's perspective, you want:
int l2 = // value...
bool seenOneAlready = false;
if (l2 != 0)
{
// MSB first
for (int i = 31; i > -1; --i)
{
int l3 = (l2 >> i) & 1;
if (l3 == 1)
{
seenOneAlready = true;
printf("1");
}
else if (seenOneAlready)
{
printf("0");
}
}
}
else
{
printf("0");
}

How can I searching for different variants of bioinformatics motifs in string, using Perl?

I have a program output with one tandem repeat in different variants. Is it possible to search (in a string) for the motif and to tell the program to find all variants with maximum "3" mismatches/insertions/deletions?

I will take a crack at this with the very limited information supplied.
First, a short friendly editorial:
<editorial>
Please learn how to ask a good question and how to be precise.
At a minimum, please:
Refrain from domain specific jargon such as "motif" and "tandem repeat" and "base pairs" without providing links or precise definitions;
Say what the goal is and what you have done so far;
Important: Provide clear examples of input and desired output.
It is not helpful to potential helpers on SO have to have to play 20 questions in comments to try and understand your question! I spent more time trying to figure out what you were asking than answering it.
</editorial>
The following program generates a string of 2 character pairs 5,428 pairs long in an array of 1,000 elements long. I realize it is more likely that you will be reading these from a file, but this is just an example. Obviously you would replace the random strings with your actual data from whatever source.
I do not know if 'AT','CG','TC','CA','TG','GC','GG' that I used are legitimate base pair combinations or not. (I slept through biology...) Just edit the map block pairs to legitimate pairs and change the 7 to the number of pairs if you want to generate legitimate random strings for testing.
If the substring at the offset point is 3 differences or less, the array element (a scalar value) is stored in an anonymous array in the value part of a hash. The key part of the hash is the substring that is a near match. Rather than array elements, the values could be file names, Perl data references or other relevant references you want to associate with your motif.
While I have just looked at character by character differences between the strings, you can put any specific logic that you need to look at by replacing the line foreach my $j (0..$#a1) { $diffs++ unless ($a1[$j] eq $a2[$j]); } with the comparison logic that works for your problem. I do not know how mismatches/insertions/deletions are represented in your string, so I leave that as an exercise to the reader. Perhaps Algorithm::Diff or String::Diff from CPAN?
It is easy to modify this program to have keyboard input for $target and $offset or have the string searched beginning to end rather than several strings at a fixed offset. Once again: it was not really clear what your goal is...
use strict; use warnings;
my #bps;
push(#bps,join('',map { ('AT','CG','TC','CA','TG','GC','GG')[rand 7] }
0..5428)) for(1..1_000);
my $len=length($bps[0]);
my $s_count= scalar #bps;
print "$s_count random strings generated $len characters long\n" ;
my $target="CGTCGCACAG";
my $offset=832;
my $nlen=length $target;
my %HoA;
my $diffs=0;
my #a2=split(//, $target);
substr($bps[-1], $offset, $nlen)=$target; #guarantee 1 match
substr($bps[-2], $offset, $nlen)="CATGGCACGG"; #anja example
foreach my $i (0..$#bps) {
my $cand=substr($bps[$i], $offset, $nlen);
my #a1=split(//, $cand);
$diffs=0;
foreach my $j (0..$#a1) { $diffs++ unless ($a1[$j] eq $a2[$j]); }
next if $diffs > 3;
push (#{$HoA{$cand}}, $i);
}
foreach my $hit (keys %HoA) {
my #a1=split(//, $hit);
$diffs=0;
my $ds="";
foreach my $j (0..$#a1) {
if($a1[$j] eq $a2[$j]) {
$ds.=" ";
} else {
$diffs++;
$ds.=$a1[$j];
}
}
print "Target: $target\n",
"Candidate: $hit\n",
"Differences: $ds $diffs differences\n",
"Array element: ";
foreach (#{$HoA{$hit}}) {
print "$_ " ;
}
print "\n\n";
}
Output:
1000 random strings generated 10858 characters long
Target: CGTCGCACAG
Candidate: CGTCGCACAG
Differences: 0 differences
Array element: 999
Target: CGTCGCACAG
Candidate: CGTCGCCGCG
Differences: CGC 3 differences
Array element: 696
Target: CGTCGCACAG
Candidate: CGTCGCCGAT
Differences: CG T 3 differences
Array element: 851
Target: CGTCGCACAG
Candidate: CGTCGCATGG
Differences: TG 2 differences
Array element: 986
Target: CGTCGCACAG
Candidate: CATGGCACGG
Differences: A G G 3 differences
Array element: 998
..several cut out..
Target: CGTCGCACAG
Candidate: CGTCGCTCCA
Differences: T CA 3 differences
Array element: 568 926

I believe that there are routines for this sort of thing in BioPerl.
In any case, you might get better answers if you asked this over at BioStar, the bioinformatics stack exchange.

When I was in my first couple years of learning perl, I wrote what I now consider to be a very inefficient (but functional) tandem repeat finder (which used to be available on my old job's company website) called tandyman. I wrote a fuzzy version of it a couple years later called cottonTandy. If I were to re-write it today, I would use hashes for a global search (given the allowed mistakes) and utilize pattern matching for a local search.
Here's an example of how you use it:
#!/usr/bin/perl
use Tandyman;
$sequence = "ATGCATCGTAGCGTTCAGTCGGCATCTATCTGACGTACTCTTACTGCATGAGTCTAGCTGTACTACGTACGAGCTGAGCAGCGTACgTG";
my $tandy = Tandyman->new(\$sequence,'n'); #Can't believe I coded it to take a scalar reference! Prob. fresh out of a cpp class when I wrote it.
$tandy->SetParams(4,2,3,3,4);
#The parameters are, in order:
# repeat unit size
# min number of repeat units to require a hit
# allowed mistakes per unit (an upper bound for "mistake concentration")
# allowed mistakes per window (a lower bound for "mistake concentration")
# number of units in a "window"
while(#repeat_info = $tandy->FindRepeat())
{print(join("\t",#repeat_info),"\n")}
The output of this test looks like this (and takes a horrendous 11 seconds to run):
25 32 TCTA 2 0.87 TCTA TCTG
58 72 CGTA 4 0.81 CTGTA CTA CGTA CGA
82 89 CGTA 2 0.87 CGTA CGTG
45 51 TGCA 2 0.87 TGCA TGA
65 72 ACGA 2 0.87 ACGT ACGA
23 29 CTAT 2 0.87 CAT CTAT
36 45 TACT 3 0.83 TACT CT TACT
24 31 ATCT 2 1 ATCT ATCT
51 59 AGCT 2 0.87 AGTCT AGCT
33 39 ACGT 2 0.87 ACGT ACT
62 72 ACGT 3 0.83 ACT ACGT ACGA
80 88 ACGT 2 0.87 AGCGT ACGT
81 88 GCGT 2 0.87 GCGT ACGT
63 70 CTAC 2 0.87 CTAC GTAC
32 38 GTAC 2 0.87 GAC GTAC
60 74 GTAC 4 0.81 GTAC TAC GTAC GAGC
23 30 CATC 2 0.87 CATC TATC
71 82 GAGC 3 0.83 GAGC TGAGC AGC
1 7 ATGC 2 0.87 ATGC ATC
54 60 CTAG 2 0.87 CTAG CTG
15 22 TCAG 2 0.87 TCAG TCGG
70 81 CGAG 3 0.83 CGAG CTGAG CAG
44 50 CATG 2 0.87 CTG CATG
25 32 TCTG 2 0.87 TCTA TCTG
82 89 CGTG 2 0.87 CGTA CGTG
55 73 TACG 5 0.75 TAGCTG TAC TACG TACG AG
69 83 AGCG 4 0.81 ACG AGCTG AGC AGCG
15 22 TCGG 2 0.87 TCAG TCGG
As you can see, it allows indels and SNPs. The columns are, in order:
Start position
Stop position
Consensus sequence
The number of units found
A quality metric out of 1
The repeat units separated by spaces
Note, that it's easy to supply parameters (as you can see from the output above) that will output junk/insignificant "repeats", but if you know how to supply good params, it can find what you set it upon finding.
Unfortunately, the package is not publicly available. I never bothered to make it available since it's so slow and not amenable to even prokaryotic-sized genome searches (though it would be workable for individual genes). In my novice coding days, I had started to add a feature to take a "state" as input so that I could run it on sections of a sequence in parallel and I never finished that once I learned hashes would make it so much faster. By that point, I had moved on to other projects. But if it would suit your needs, message me, I can email you a copy.
It's just shy of 1000 lines of code, but it has lots of bells & whistles, such as the allowance of IUPAC ambiguity codes (BDHVRYKMSWN). It works for both amino acids and nucleic acids. It filters out internal repeats (e.g. does not report TTTT or ATAT as 4nt consensuses).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to convert binary to hex in Batch or Powershell? - powershell

I wondering if there is a way to convert binary to hexadecimal, in Batch or Powershell language. Exemple : 10000100 to 84 01010101 to 55 101111111111 to BFF In a simple way, I’m not very good in Batch or Powershell. I will appreciate any kind of information

Converting a binary string to an integer is pretty straightforward: $number = [Convert]::ToInt32('10000100', 2) Now we just need to convert it to hexadecimal: $number.ToString('X') or '{0:X}' -f $number

Related

Verifying NSEC3 records

A code to take an amount of others in a string of lines

CMD start .exe with just one of multiple parameters

Remove leading zeroes binary

How can I searching for different variants of bioinformatics motifs in string, using Perl?

Categories

Resources