Binary to DNA encoding - matlab

I have an 8 bit binary sequence. I need to encode this 8 bit binary sequence into DNA sequence.
E.g., I have 10011100, the encoding rule I'm following is,
A=00;T=11;G=10;C=01,
So I want it to be something like GCTA. Therefore I need 4 bit DNA sequence as result.
I need to do this for a 256 * 256 matrix where each element is an 8 bit binary sequence.
I've created the matrix using the following code
a=imread('C:\Users\Desktop\lena.png');
disp(a);
imshow(a);
for i=1:1:256
for j=1:1:256
b{i,j,1} = dec2bin(a(i,j),8);
end
end
disp(b)

Here's a no for loop approach for you. We can actually do this in three lines.
You have the first step which is to take each 8-bit number in your image and convert it into its binary representation. Take note that this is a 2D cell array that is the same size as the image you used for doing this conversion. Each cell array would be the representation of the number as a string.
Now, all you really need to do now is create a lookup, then use this lookup to generate four characters per location in a new 2D cell array. As such, I would use the containers.Map() class to create a key-value lookup where each pair of bits gets mapped to a single character. Once we do this, we can then use cellfun and iterate over each 8 character string in your cell array, break up the bits into 2 element strings, and use these as keys into our lookup. We will inevitably get 4 separate cells for the output, so we'll need to use cell2mat to bring it all back together. As such, try doing this:
codebook = containers.Map({'00','11','10','01'},{'A','T','G','C'}); %// Lookup
outputCell = cellfun(#(x) values(codebook, {x(1:2),x(3:4),x(5:6),x(7:8)}), ...
b, 'uni', 0);
finalOutput = cellfun(#cell2mat, outputCell, 'uni', 0);
As an example, let's say we had this 2 x 2 matrix of cell elements:
b = {'11111111', '10101010'; '11001100', '00001101'}
b =
'11111111' '10101010'
'11001100' '00001101'
Running through the above code, this is what we get:
finalOutput =
'TTTT' 'GGGG'
'TATA' 'AATC'

Similar to rayryeng's solution using a lookup table, but imho containers.Map() is overkill:
codebook = 'ACGT';
output = cellfun(#(x) codebook(bin2dec(reshape(x, 2, 4)') + 1), b, 'UniformOutput', false)
I don't think it gets much shorter if the input consists of "binary numbers" in the sense of 8-character 0/1-strings. reshape breaks the strings into 4 portions of 2 characters each, bin2dec transforms these into four numbers in the range 0 to 3, codebook(... + 1) translates these into the characters ACGT.
If the input consists of actual 8-bit binary numbers, e.g. the uint8 data a that you get from reading in that Lena image, you can save the detour through 0/1-strings and use base 4 from the start:
output = reshape(cellstr(codebook(dec2base(a, 4) - '0' + 1)), size(a))
Here dec2base(a, 4) represents the binary numbers as 4-character strings of characters '0' to '3', - '0' is a trick to get numbers 0 to 3, then the lookup as before, and finally some stuff to get everything in the cell-array-of-strings format.

Related

How to split cell array values into two columns in MATLAB?

I have data in a cell array as shown in the variable viewer here:
{[2.13949546690144;56.9515770543056],
[1.98550875192835;50.4110852121618],
...}
I want to split it into two columns with two decimal-point numbers as:
2.13 56.95
1.98 50.41
by removing opening and closing braces and semicolons such as [;]
(to do as like "Text to columns" in Excel).
If your N-element cell array C has 2-by-1 numeric data in each cell, you can easily convert that into an N-by-2 numeric matrix M like so (using the round function to round each element to 2 significant digits):
M = round([C{:}].', 2);
The syntax C{:} creates a comma-separated list of the contents of C, equivalent to C{1}, C{2}, ... C{N}. These are all horizontally concatenated using [ ... ], then the result is transposed using .'.
% let's build a matching example...
c = cell(2,1);
c{1} = [2.13949546690144; 56.9515770543056];
c{2} = [1.98550875192835; 50.4110852121618];
% convert your cell array to a double array...
m = cell2mat(c);
% take the odd rows and place them to the left
% take the even rows and place them to the right
m = [m(1:2:end,:) m(2:2:end,:)];
% round the whole matrix to two decimal digits
m = round(m,2);
Depending on your environment settings, you may still see a lot of trailing zeros after the first two decimal digits... but don't worry, everything is ok (on the precision point of view). If you want to display only the "real" digits of your numbers, use this command:
format short g;
you should use cell2mat
A={2.14,1.99;56.95,50.41};
B=cell2mat(A);
As for the rounding, you can do:
B=round(100*B)/100;

How to perform XOR in a recursive scenario

I have a 1x5 char matrix. I need to perform a bitwise XOR operation on all the elements in the matrix.If T is the char matrix , I need a matrix T' such that
T'= T XOR (T-1)' for all T
T for T=1
Let the char matrix be T
T=['0000000000110111' '0000000001000001' '0000000001001010' '0000000010111000' '0000000000101111']
T'=['0000000000110111' '0000000001110110' '0000000000111100' '0000000010000100' '0000000010101011']
ie; Leaving the first element as such , I need to XOR all the other elements with the newly formed matrix. I tried the following code but I'm unable to get the correct result.
Yxor1d = [T(1) cellfun(#(a,b) char((a ~= b) + '0'), T(2:end), T'(1:end-1), 'UniformOutput', false)]
I need to perform the XOR operation such that , for obtaining the elements of T'
T' (2)= T(2) XOR T' (1)
T' (3)= T(3) XOR T' (2)
It'll be really helpful to know where I went wrong.Thanks.
You are using cellfun when a cell array is expected as the input. You are using a character array, and what you're actually doing is taking each of those 5 strings and creating a single character array out of them. Chaining those strings together is actually performing a character concatenation.
You probably don't want that. To fix this, all you have to do is make T a cell array by placing {} characters instead of array ([]) characters to declare your characters:
T={'0000000000110111' '0000000001000001' '0000000001001010' '0000000010111000' '0000000000101111'};
Because you have edited your post after I provided my answer, my previous answer using cellfun is now incorrect. Because you are using a recurrence relation where you are referring to the previous output rather than input, you can no longer use cellfun. You'll need to use a for loop. There are probably more elegant ways to do it, but this is the easiest if you want to get something working.
As such, initialize an output cell array that is the same size as the input cell array like above, then you'll need to initialize the first cell to be the first cell of the input, then iterate through each pair of input and output elements yourself.
So do something like this:
Yxor1d = cell(1,numel(T));
Yxor1d{1} = T{1};
for idx = 2 : numel(T)
Yxor1d{idx} = char(double(T{idx} ~= Yxor1d{idx-1}) + '0');
end
For each value i of T', we XOR with the current input at T{i} with the previous output of T'{i-1}.
Use the above and your input cell array T, we get:
Yxor1d =
Columns 1 through 3
'0000000000110111' '0000000001110110' '0000000000111100'
Columns 4 through 5
'0000000010000100' '0000000010101011'
This matches with your specifications in your modified post.
Edit: There is a solution without a loop:
T=['0000000000110111';'0000000001000001';'0000000001001010';'0000000010111000' ;'0000000000101111'];
Yxor = dec2bin(bi2de(mod(cumsum(de2bi(bin2dec(T))),2)),16)
Yxor =
0000000000110111
0000000001110110
0000000000111100
0000000010000100
0000000010101011
This uses the fact that you effectively want a cumulative xor operation on the elements of your array.
For N booleans it should be either any one of them or else all of them. So if you do a cumulative sum of each of your bits, the sum should be an odd number for a true answer to 'xor'.
The one liner above can be decomposed like that:
Y = bin2dec(T) ; %// convert char array T into decimal numbers
Y = de2bi( Y ) ; %// convert decimal array Tbin into array of "bit"
Y = cumsum(Y) ; %// do the cumulative sum on each bit column
Y = mod(Y,2) ; %// convert all "even" numbers to '0', and 'odd' numbers to '1'
Y = bi2de(Y) ; %// re-assemble the bits into decimal numbers
Yxor = dec2bin(Y,16) ; %// get their string representation
Note that if you are happy to handle arrays of bits (boolean) instead of character arrays, you can shave off a few lines from above ;-)
Initial answer (simpler to grasp, but with a loop):
You can use the bitxor function, but you have to convert your char array in numeric value first:
T=['0000000000110111';'0000000001000001';'0000000001001010' ;'0000000010111000' ;'0000000000101111'];
Tbin = bin2dec(T) ; %// convert to numeric values
Ybin = Tbin ; %// pre-assign result, then loop ...
for idx = 2 : numel(Tbin)
Ybin(idx) = bitxor( Ybin(idx) , Ybin(idx-1) ) ;
end
Ychar = dec2bin(Ybin,16) %// convert back to 16bit char array representation if necessary
Ychar =
0000000000110111
0000000001110110
0000000000111100
0000000010000100
0000000010101011
edited answer after you redefined your problem

How to convert a String to a Matrix Matlab

Im trying to convert a String into a Matrix. So like a=1 b=2... "Space"=28. Etc.
My question is how would I convert a string to a matrix?
aka..
abc=[1,2,3]
Tried a for loop, which does convert the string into numbers.
Here is where I try to make it into a Matrix
String1=char(string)
String2=reshape(String1,[10,14]);
the error I get is
"To RESHAPE the number of elements must not change"
"String2=reshape(String1,[10,14]);
If you need a general coding from characters into numbers (not necessarily ASCII):
Define the coding by means of a string, such that the character that appears first corresponds to number 1, etc.
Use ismember to do the "reverse indexing" operation.
Code:
coding = 'abcdefghijklmnñopqrstuvwxyz .,;'; %// define coding: 'a' is 1, 'b' is 2 etc
str = 'abc xyz'; %// example text
[~, result] = ismember(str, coding);
In this example,
result =
1 2 3 28 25 26 27

Bitwise XOR operation to scramble two character matrices by generating a truth table

I need to perform the XOR operation for four characters where each of them have a bit representation as follows:
A = 00
G = 01
C = 10
T = 11
I need to create a table that XORs two characters together which gives the values for all combinations of XORing pairs of characters in the following way.
XOR A G C T
A A G C T
G G A T C
C C T A G
T T C G A
To obtain the output, you need to convert each character into its bit representation, XOR the bits, then use the result and convert it back into the right character. For example, consulting the third row and second column of the table, by XORing C and G:
C = 10
G = 01
C XOR G = 10 XOR 01 = 11 --> T
I would ultimately like to apply this rule to scrambling characters in a 5 x 5 matrix.
As an example:
A = 'GATT' 'AACT' 'ACAC' 'TTGA' 'GGCT'
'GCAC' 'TCAT' 'GTTC' 'GCCT' 'TTTA'
'AACG' 'GTTA' 'ACGT' 'CGTC' 'TGGA'
'CTAC' 'AAAA' 'GGGC' 'CCCT' 'TCGT'
'GTGT' 'GCGG' 'GTTT' 'TTGC' 'ATTA'
B = 'ATAC' 'AAAT' 'AGCT' 'AAGC' 'AAGT'
'TAGG' 'AAGT' 'ATGA' 'AAAG' 'AAGA'
'TAGC' 'CAGT' 'AGAT' 'GAAG' 'TCGA'
'GCTA' 'TTAC' 'GCCA' 'CCCC' 'TTTC'
'CCAA' 'AGGA' 'GCAG' 'CAGC' 'TAAA'
I would like to generate a matrix C such that each element of A gets XORed with its corresponding element in B.
For example, considering the first row and first column:
A{1,1} XOR B{1,1} = GATT XOR ATAC = GTTG
How can I do this for the entire matrix?
Looks like you're back for some more!
First, let's define the function letterXOR that takes two 4-character strings and XORs both strings corresponding to that table that you have. Recalling from our previous post, let's set up a lookup table where a unique two-bit string corresponds to a letter. We can use the collections.Map class to help us do this. We will also need the inverse lookup table using a collections.Map class where given a letter, we produce a two-bit string. We need to do this as you want to convert each letter into its two bit representation, and we need the inverse lookup to do this. After, we XOR the bits individually, then use the forward lookup table to get back to where we started. As such:
function [out] = letterXOR(A,B)
codebook = containers.Map({'00','11','10','01'},{'A','T','G','C'}); %// Lookup
invCodebook = containers.Map({'A','T','G','C'},{'00','11','10','01'}); %// Inv-lookup
lettersA = arrayfun(#(x) x, A, 'uni', 0); %// Split up each letter into a cell
lettersB = arrayfun(#(x) x, B, 'uni', 0);
valuesA = values(invCodebook, lettersA); %// Obtain the binary bit strings
valuesB = values(invCodebook, lettersB);
%// Convert each into a matrix
valuesAMatrix = cellfun(#(x) double(x) - 48, valuesA, 'uni', 0);
valuesBMatrix = cellfun(#(x) double(x) - 48, valuesB, 'uni', 0);
% XOR the bits now
XORedBits = arrayfun(#(x) bitxor(valuesAMatrix{x}, valuesBMatrix{x}), 1:numel(A), 'uni', 0);
%// Convert each bit pair into a string
XORedString = cellfun(#(x) char(x + 48), XORedBits, 'uni', 0);
%// Access lookup, then concatenate as a string
out = cellfun(#(x) codebook(x), XORedString);
Let's go through the above code slowly. The inputs into letterXOR are expected to be a character array of letters that are composed of A, T, G and C. We first define the forward and reverse lookups. We then split up each character of the input strings A and B into a cell array of individual characters, as looking up multiple keys in your codebook requires it to be this way. We then figure out what the bits are for each character in each string. These bits are actually strings, and so what we need to do is convert each string of bits into an array of numbers. We simply cast the string to double and subtract by 48, which is the ASCII code for 0. By converting to double, you'll either get 48 or 49, which is why we need to subtract with 48.
As such, each pair of bits is converted into a 1 x 2 array of bits. We then take each 1 x 2 array of bits between A and B, use bitxor to XOR the bits. The outputs at this point are still 1 x 2 arrays. As such, we need to convert each array into a string of bits, then use our forward lookup table to look up the character equivalent of these bits. After this, we concatenate all of the characters together to make the final string for the output.
Make sure you save the above in a function called letterXOR.m. Once we have this, we now simply have to use one cellfun call that will XOR each four-element string in your cell array and we then output our final matrix. We will use arrayfun to do that, and the input into arrayfun will be a 5 x 5 matrix that is column major defined. We do this as MATLAB can access elements in a 2D array using a single value. This value is the column major index of the element in the matrix. We define a vector that goes from 1 to 25, then use reshape to get this into the right 2D form. The reason why we need to do this is because we want to make sure that the output matrix (which is C in your example) is structured in the same way. As such:
ind = reshape(1:25, 5, 5); %// Define column major indices
C = arrayfun(#(x) letterXOR(A{x},B{x}), ind, 'uni', 0); % // Get our output matrix
Our final output C is:
C =
'GTTG' 'AACA' 'ATCG' 'TTAC' 'GGTA'
'CCGT' 'TCGA' 'GACC' 'GCCC' 'TTCA'
'TATT' 'TTCT' 'ATGA' 'TGTT' 'ATAA'
'TGTC' 'TTAC' 'ATTC' 'AAAG' 'AGCG'
'TGGT' 'GTAG' 'AGTC' 'GTAA' 'TTTA'
Good luck!

Matlab binary encoding

I have a vector containing a series of integers, and what I want to do is take all numbers, convert them into their corresponding binary forms, and concatenate all of the resulting binary values together. Is there any easy way to do this?
e.g. a=[1 2 3 4] --> b=[00000001 00000010 00000011 00000100] --> c=00000001000000100000001100000100
Try:
b = dec2bin(a)
As pointed out by the other answers, the function DEC2BIN is one option that you have to solve this problem. However, as pointed out by this other SO question, it can be a very slow option when converting a large number of values.
For a faster solution, you can instead use the function BITGET as follows:
a = [1 2 3 4]; %# Your array of values
nBits = 8; %# The number of bits to get for each value
nValues = numel(a); %# The number of values in a
c = zeros(1,nValues*nBits); %# Initialize c to an array of zeroes
for iBit = 1:nBits %# Loop over the bits
c(iBit:nBits:end) = bitget(a,nBits-iBit+1); %# Get the bit values
end
The result c will be an array of zeroes and ones. If you want to turn this into a character string, you can use the function CHAR as follows:
c = char(c+48);
Yes, use dec2bin, followed by string concatenation.