In Matlab, I want to make a seqlogo plot of an amino acid sequence profile. But instead of scaling the heights of the plot columns by entropy, I want all the columns to be the same height.
I'm in the process of modifying the code from the answers to this question, but I wonder if there is a parameter to seqlogo or some other function that I have missed that will make the column heights uniform.
Alternatively, is there a statistical transformation I can apply to the sequence profile to hack the desired output? (column heights uniform, height of each letter linearly proportion to
its probability in the seqprofile)
Probably the easiest way around this problem is to directly modify the code for the Bioinformatics Toolbox function SEQLOGO (if possible). In R2010b, you can do:
edit seqlogo
And the code for the function will be shown in the editor. Next, find the following lines (lines 267-284) and either comment them out or remove them entirely:
S_before = log2(nSymbols);
freqM(freqM == 0) = 1; % log2(1) = 0
% The uncertainty after the input at each position
S_after = -sum(log2(freqM).*freqM, 1);
if corrError
% The number of sequences correction factor
e_corr = (nSymbols -1)/(2* log(2) * numSeq);
R = S_before - (S_after + e_corr);
else
R = S_before - S_after;
end
nPos = (endPos - startPos) + 1;
for i =1:nPos
wtM(:, i) = wtM(:, i) * R(i);
end
Then put this line in their place:
wtM = bsxfun(#times,wtM,log2(nSymbols)./sum(wtM));
You will probably want to save the file under a new name, like seqlogo_norm.m, so you can still use the original unmodified SEQLOGO function. Now you can create sequence profile plots with all the columns normalized to the same height. For example:
S = {'LSGGQRQRVAIARALAL',... %# Sample amino acid sequence
'LSGGEKQRVAIARALMN',...
'LSGGQIQRVLLARALAA',...
'LSGGERRRLEIACVLAL',...
'FSGGEKKKNELWQMLAL',...
'LSGGERRRLEIACVLAL'};
seqlogo_norm(S,'alphabet','aa'); %# Use the modified SEQLOGO function
OLD ANSWER:
I'm not sure how to transform the sequence profile information to get the desired output from the Bioinformatics Toolbox function SEQLOGO, but I can show you how to modify the alternative seqlogo_new.m that I wrote for my answer to the related question you linked to. If you change the line that initializes bitValues from this:
bitValues = W{2};
to this:
bitValues = bsxfun(#rdivide,W{2},sum(W{2}));
Then you should get each column scaled to a height of 1. For example:
S = {'ATTATAGCAAACTA',... %# Sample sequence
'AACATGCCAAAGTA',...
'ATCATGCAAAAGGA'};
seqlogo_new(S); %# After applying the above modification
For now, my workaround is to generate a bunch of fake sequences that match the sequence profile, then feed those sequences to http://weblogo.berkeley.edu/logo.cgi . Here is the code to make the fake sequences:
function flatFakeSeqsFromPwm(pwm, letterOrder, nSeqsToGen, outFilename)
%translates a pwm into a bunch of fake seqs with the same probabilities
%for use with http://weblogo.berkeley.edu/
%pwm should be a 4xn or a 20xn position weight matrix. Each col must sum to 1
%letterOrder = e.g. 'ARNDCQEGHILKMFPSTWYV' for my data
%nSeqsToGen should be >= the # of pixels tall you plan to make your chart
[height windowWidth] = size(pwm);
assert(height == length(letterOrder));
assert(isequal(abs(1-sum(pwm)) < 1.0e-10, ones(1, windowWidth))); %assert all cols of pwm sum to 1.0
fd = fopen(outFilename, 'w');
for i = 0:nSeqsToGen-1
for seqPos = 1:windowWidth
acc = 0; %accumulator
idx = 0;
while i/nSeqsToGen >= acc
idx = idx + 1;
acc = acc + pwm(idx, seqPos);
end
fprintf(fd, '%s', letterOrder(idx));
end
fprintf(fd, '\n');
end
fclose(fd);
end
Related
I have image matrix 420x700, and I want to delete a specific value in each row, changing the image dimensions. It is like deleting a column from it, but not in a straight line, to become 420x699 image. I should keep the values before the deleted value horizontally and shift all the values after it back by 1 position.
RGB = imread('image.jpg');
I1 = RGB(:,:,1);
How do I do that?
This is a good question, and I cannot think of a way to do this without a for-loop.
Let M be the nr-by-nc matrix from which you want to remove a column, and R the nr-by-1 vector with the column index of the element to be remove on each row.
The following code creates a new matrix A with the "column" removed from M, and vector B with the elements that were removed:
[nr,nc] = size(M);
A = zeros(nr,nc-1,'like',M);
B = zeros(nr,1,'like',M);
for k = 1:nr
r = R(k);
t = [ 1:r-1, r+1:nc ];
A(k,:) = M(k,t);
B(k) = M(k,r);
end
#beaker and #Cris are correct, but just to add some flavor to this, I've attempted to demonstrate an alternate method - using linear indexing, which can teach an interesting lesson on column major indexing of 2D arrays in MATLAB.
Another point to note is that this kind of process is what's followed in the seam carving algorithm, where we remove a vertical seam in this manner.
Load a test image to run this on - crop it to analyze easier.
I = imread('peppers.png');
I = I(100:100+9, 100:100+19, :);
figure, imshow(I)
Create a mask indicating which pixels are to be removed. This simulates the condition which I think you're pointing to - in this case, we choose random column indices for each row to be removed. You'd likely have this information as an input.
mask = zeros(size(I, [1:2]), 'logical');
for idx = 1:size(mask, 1)
randidx = randi(size(mask, 2));
mask(idx, randidx) = 1;
end
figure, imshow(mask)
Use the column major linear indexing trick to do the removal faster! Since we're removing a column at at time, we rotate the image 90 degrees, and translate this problem to removing one row at a time. MATLAB indexes 'vertically', and so we can then use linear indexing to simply remove the masked pixels all at once (rather than one row/column at a time), and then restore the shape using reshape, and finally rotate back to the original orientation.
It = rot90(I);
maskt = rot90(mask);
% Preallocate output
Ioutput = zeros([size(I, 1), size(I, 2) - 1, size(I, 3)], 'like', I);
for nchannel = 1:3
Icropped = It(:, :, nchannel);
% MATLAB indexes column wise - so, we can use linear indexing to make
% this computation simpler!
Icropped = Icropped(maskt(:) == 0);
Icropped = reshape(Icropped, [size(maskt, 1) - 1, size(maskt, 2)]);
% Restore the correct orientation after removing element!
Icropped = rot90(Icropped, 3);
Ioutput(:, :, nchannel) = Icropped;
end
figure, imshow(Ioutput)
I've cropped the 'peppers' image to demonstrate this, so that you can convince yourself that this is doing it right. This method should work similarly for larger images as well.
I have created this code to generate a 1 set of lottery numbers, but I am trying to make it so that the user can enter how many sets they want (input n), and it will print out as one long matrix of size nX6? I was messing around with a few options from online suggestions, but to no avail. I put the initial for i=1:1:n at the beginning, but I do not know how to store each run into a growing matrix. Right now it still generates just 1 set.
function lottery(n)
for i=1:1:n
xlow=1;
xhigh=69;
m=5;
i=1;
while (i<=m)
lottonum(i)=floor(xlow+rand*(xhigh-xlow+1));
flag=0;
for j=1:i-1
if (lottonum(i)==lottonum(j))
flag=1;
end
end
if flag==0
i=i+1;
end
end
ylow=1;
yhigh=26;
m=1;
lottonum1=floor(ylow+rand*(yhigh-ylow+1));
z = horzcat(lottonum, lottonum1);
end
disp('The lotto numbers picked are')
fprintf('%g ',z)
disp (' ')
The problem is that you are not storing or displaying the newly generated numbers, only the last set. To solve this, initialize z with NaNs or zeros, and later index z to store each set in a row of z, by using z(i,:) = lottonum.
However, you are using i as iterator in the while loop already, so you should use another variable, e.g. k.
You can also set z as an output of the function, so you can use this matrix in some other part of a program.
function z = lottery(n)
% init z
z = NaN(n,6);
for k = 1:n
xlow=1;
xhigh=69;
m=5;
i=1;
while (i<=m)
lottonum(i)=floor(xlow+rand*(xhigh-xlow+1));
flag=0;
for j=1:i-1
if (lottonum(i)==lottonum(j))
flag=1;
end
end
if flag==0
i=i+1;
end
end
ylow=1;
yhigh=26;
lottonum1 = floor(ylow+rand*(yhigh-ylow+1));
z(k,:) = horzcat(lottonum, lottonum1); % put the numbers in a row of z
end
disp('The lotto numbers picked are')
disp(z) % prettier display than fprintf in this case.
disp (' ')
end
The nice answer from rinkert corrected your basic mistakes (like trying to modify your loop iterator i from within the loop => does not work), and answered your question on how to store all your results.
This left you with a working code, however, I'd like to propose to you a different way to look at it.
The porposed architecture is to divide the tasks into separate functions:
One function draw_numbers which can draw N numbers randomly (and does only that)
One function draw_lottery which call the previous function as many times as it needs (your n), collect the results and display them.
draw_lottery
This architecture has the benefit to greatly simplify your main function. It can now be as simple as:
function Draws = draw_lottery(n)
% define your draw parameters
xmin = 1 ; % minimum number drawn
xmax = 69 ; % maximum number drawn
nballs = 5 ; % number of number to draw
% pre allocate results
Draws = zeros( n , nballs) ;
for iDraw=1:1:n
% draw "nballs" numbers
thisDraw = draw_numbers(xmin,xmax,nballs) ;
% add them to the result matrix
Draws(iDraw,:) = thisDraw ;
end
disp('The lotto numbers picked are:')
disp (Draws)
disp (' ')
end
draw_numbers
Instead of using a intricated set of if conditions and several iterators (i/m/k) to branch the program flow, I made the function recursive. It means the function may have to call itself a number of time until a condition is satisfied. In our case the condition is to have a set of nballs unique numbers.
The function:
(1) draws N integer numbers randomly, using randi.
(2) remove duplicate numbers (if any). Using unique.
(3) count how many unique numbers are left Nu
(4a) if Nu = N => exit function
(4b) if Nu < N => Call itself again, sending the existing Nu numbers and asking to draw an additional N-Nu numbers to add to the collection. Then back to step (2).
in code, it looks like that:
function draw = draw_numbers(xmin,xmax,nballs,drawn_set)
% check if we received a partial set
if nargin == 4
% if yes, adjust the number of balls to draw
n2draw = nballs - numel(drawn_set) ;
else
% if not, make a full draw
drawn_set = [] ;
n2draw = nballs ;
end
% draw "nballs" numbers between "xmin" and "xmax"
% and concatenate these new numbers with the partial set
d = [drawn_set , randi([xmin xmax],1,n2draw)] ;
% Remove duplicate
drawn_set = unique(d) ;
% check if we have some more balls to draw
if numel(drawn_set) < nballs
% draw some more balls
draw = draw_numbers(xmin,xmax,nballs,drawn_set) ;
else
% we're good to go, assign output and exit funtion
draw = drawn_set ;
end
end
You can have both functions into the same file if you want.
I encourage you to look at the documentation of a couple of Matlab built-in functions used:
randi
unique
I wrote this matlab code in order to concatenate the results of the integration of all the columns of a matrix extracted form a multi matrix array.
"datimf" is a matrix composed by 100 matrices, each of 224*640, vertically concatenated.
In the first loop i select every single matrix.
In the second loop i integrate every single column of the selected matrix
obtaining a row of 640 elements.
The third loop must concatenate vertically all the lines previously calculated.
Anyway i got always a problem with the third loop. Where is the error?
singleframe = zeros(224,640);
int_frame_all = zeros(1,640);
conc = zeros(100,640);
for i=0:224:(22400-224)
for j = 1:640
for k = 1:100
singleframe(:,:) = datimf([i+1:(i+223)+1],:);
int_frame_all(:,j) = trapz(singleframe(:,j));
conc(:,k) = vertcat(int_frame_all);
end
end
end
An alternate way to do this without using any explicit loops (edited in response to rayryeng's comment below. It's also worth noting that using cellfun may not be more efficient than explicitly looping.):
nmats = 100;
nrows = 224;
ncols = 640;
datimf = rand(nmats*nrows, ncols);
% convert to an nmats x 1 cell array containing each matrix
cellOfMats = mat2cell(datimf, ones(1, nmats)*nrows, ncols);
% Apply trapz to the contents of each cell
cellOfIntegrals = cellfun(#trapz, cellOfMats, 'UniformOutput', false);
% concatenate the results
conc = cat(1, cellOfIntegrals{:});
Taking inspiration from user2305193's answer, here's an even better "loop-free" solution, based on reshaping the matrix and applying trapz along the appropriate dimension:
datReshaped = reshape(datimf, nrows, nmats, ncols);
solution = squeeze(trapz(datReshaped, 1));
% verify solutions are equivalent:
all(solution(:) == conc(:)) % ans = true
I think I understand what you want. The third loop is unnecessary as both the inner and outer loops are 100 elements long. Also the way you have it you are assigning singleframe lots more times than necessary since it does not depend on the inner loops j or k. You were also trying to add int_frame_all to conc before int_frame_all was finished being populated.
On top of that the j loop isn't required either since trapz can operate on the entire matrix at once anyway.
I think this is closer to what you intended:
datimf = rand(224*100,640);
singleframe = zeros(224,640);
int_frame_all = zeros(1,640);
conc = zeros(100,640);
for i=1:100
idx = (i-1)*224+1;
singleframe(:,:) = datimf(idx:idx+223,:);
% for j = 1:640
% int_frame_all(:,j) = trapz(singleframe(:,j));
% end
% The loop is uncessary as trapz can operate on the entire matrix at once.
int_frame_all = trapz(singleframe,1);
%I think this is what you really want...
conc(i,:) = int_frame_all;
end
It looks like you're processing frames in a video.
The most efficent approach in my experience would be to reshape datimf to be 3-dimensional. This can easily be achieved with the reshape command.
something along the line of vid=reshape(datimf,224,640,[]); should get you far in this regard, where the 3rd dimension is time. vid(:,:,1) then would display the first frame of the video.
This question already has answers here:
Get the indices of the n largest elements in a matrix
(4 answers)
Closed 6 years ago.
When using a binary image with several lines I know that this code displays the longest line:
lineStats = regionprops(imsk, {'Area','PixelIdxList'});
[length, index] = max([lineStats.Area]);
longestLine = zeros(size(imsk));
longestLine(lineStats(index).PixelIdxList)=1;
figure
imshow(longestLine)
Is there a way to display the second longest line? I need to display a line that is a little shorter than the longest line in order to connect them.
EDIT: Is there a way to display both lines on the binary image figure?
Thank you.
I would set the longest line to zero and use max again, after I copy the original vector.
lineStats = regionprops(imsk, {'Area','PixelIdxList'});
[length, index] = max([lineStats.Area]);
lineAreas = [lineStats.Area]; %copy all lineStats.Area values into a new vector
lineAreas(index) = NaN; %remove the longest line by setting it to not-a-number
[length2, index2] = max(lineAreas);
EDIT: Response to new question
sort may be a more straight forward approach for multiples, but you can still use max.
lineAreas = [lineStats.Area]; %copy all lineStats.Area values into a new vector
% add a for loop that iteratively stores the desired indices
nLines = 3;
index = zeros(1,nLines);
for iLines = 1:nLines
[length, index(iLines)] = max(lineAreas);
lineAreas(index) = NaN; %remove the longest line by setting it to not-a-number
end
longestLine = zeros(size(imsk));
% I cannot be certain this will work since your example is not reproducible
longestLine([lineStats(index).PixelIdxList]) = 1;
figure
imshow(longestLine)
Instead of using max use sort in descending order and take the second element. Like max, sort also provides the indexes of the returned values, so the two functions are pretty compatible.
eStats = regionprops(imsk, {'Area','PixelIdxList'});
[length, index] = sort([lineStats.Area], 'descend');
longestLine = zeros(size(imsk));
longestLine(lineStats(index(2)).PixelIdxList)=1; % here take the second largest
figure
imshow(longestLine)
As an alternative with focus on performance and ease of use, here's one approach using bwlabel instead of regionprops -
[L, num] = bwlabel(imsk, 8);
count_pixels_per_obj = sum(bsxfun(#eq,L(:),1:num));
[~,sidx] = sort(count_pixels_per_obj,'descend');
N = 3; % Shows N biggest objects/lines
figure,imshow(ismember(L,sidx(1:N))),title([num2str(N) ' biggest blobs'])
On the performance aspect, here's one post that does some benchmarking on snowflakes and coins images from MATLAB's image gallery.
Sample run -
imsk = im2bw(imread('coins.png')); %%// Coins photo from MATLAB Library
N = 2:
N = 3:
Elements of a column matrix of non-sequential numbers (sourceData) should have their values incremented if their index positions lie between certain values as defined in a second column matrix (triggerIndices) which lists the indices sequentially.
This can be easily done with a for-loop but can it be done in a vectorized way?
%// Generation of example data follows
sourceData = randi(1e3,100,1);
%// sourceData = 1:1:1000; %// Would show more clearly what is happening
triggerIndices = randperm(length(sourceData),15);
triggerIndices = sort(triggerIndices);
%// End of example data generation
%// Code to be vectorized follows
increment = 75;
addOn = 100;
for index = 1:1:length(triggerIndices)-1
sourceData(triggerIndices(index):1:triggerIndices(index+1)-1) = ...
sourceData(triggerIndices(index):1:triggerIndices(index+1)-1) + addOn;
addOn = addOn + increment;
end
sourceData(triggerIndices(end):1:end) = ....
sourceData(triggerIndices(end):1:end) + addOn;
%// End of code to be vectorized
How about replacing everything with:
vals = sparse(triggerIndices, 1, increment, numel(sourceData), 1);
vals(triggerIndices(1)) = addOn;
sourceData(:) = sourceData(:) + cumsum(vals);
This is basically a variant of run-length decoding shown here.