Multiple sequence alignment of 12 species - matlab

i need to perform MSA( multiple sequence alignment on nucleotide sequences of 12 wheat varieties. all these varieties have different length bps(base pairs).I followed this documentation of MATLAB http://www.mathworks.in/help/bioinfo/ref/multialign.html. But when i type this "
ma = multialign(p53,tree,'ScoringMatrix',...
{'pam150','pam200','pam250'})
showalignment(ma)"
i get an error :
??? Out of memory. Type HELP MEMORY for
your options.
Error in ==> profalign>affinegap at 648
F = zeros(n+1,m+1,numStates);
Error in ==> profalign at 426
[F, pointer] =
affinegap(prof1,len1,prof2,len2,SM,go1,go2,ge1,ge2,wg1,wg2);
Error in ==> multialign at 655
[profs{rootInd} h1 h2] =
profalign(profs{[i,rootInd]},...
Please help

This is a hard problem to debug, because it is highly dependent on your specific settings. As mentioned in the comments, Matlab is saying that it ran out of memory. This might be because of the way you have Matlab configured or because your computer doesn't have enough RAM (or maybe you were using too much RAM for other things at the time). It's also possible that you just gave it more data than it can handle. However, assuming that the sequences aren't unreasonably long, 12 sequences should be pretty manageable for a progressive alignment algorithm, which multalign seems to be.
Given all of those variables, the simplest solution is just to avoid trying to run it on your computer. There are websites where you can submit your data to be aligned on a server that will definitely have sufficient RAM. The most popular of such websites is ClustalOmega, the successor to ClustalW. These sites will generally return results fairly quickly.

Related

Multiplication of matrices when cannot loaded into memory at once

I've read some similar posts, while none of them actually tackled my problem.
I need to do a series of multiplication-similar operations for A, B, specifically calculating kernel matrices, on Windows Platform. While, the problem is both of A, B could be really large, let us say, 20000-by-360000. While, my server can only provide 96 GB memory. It may seem infeasible to have them in memory at the same time and do the calculation. So is there any good way to efficiently handle such a large multiplication? Btw, The size of result, which is 20000-by-20000, is much less than the multiplier and can fit in the memory properly.
Because I do it on Windows, it may be not feasible to call functions like mmap2.
I wonder whether converting them into sparse matrix is a good option. However, it may heavily depend on the properties of data.
Another solution I've come up with is to partition the origin matrix into blocks. Then do the calculation block-by-block.
Is there any other better solution? Any practical suggestions would be really appreciated.
Best regards,
Peiyun
If I where you I'd look into the block processing function:
B = blockproc(filename,[M N],fun)
and use the Destination parameter to allow saving the results without overflowing your memory.

how to display only select powers of expansion maple

Say I have a super long polynomial of multiple variables. far too long to display on-screen, or print out, so collect http://www.maplesoft.com/support/help/Maple/view.aspx?path=collect is unlikely to help. I would like to tell maple to display only terms that contain a specific variable to one select power. I am sure there must be a simple way to do this. And no, I haven't looked into this extensively. feel free to just provide a link to the answer, if it already exists online.
thank's...
If you care about speed -- perhaps because you need to do similar queries for other powers, of possibly other variables -- then consider using the coeff command. Eg, for polynomial f, the terms with x^2 could be obtained with the command,
x^2*coeff(f,x,2);
For a trivariate dense polynomial of about 1000000 terms as in the following example, the coeff command is several hundred times faster in Maple 16 and 17 than is the has command approach as shown below.
restart:
f:=expand(randpoly(x,degree=100,dense)
*randpoly(y,degree=100,dense)
*randpoly(z,degree=100,dense)):
nops(f); # number of terms
990000
sol1:=CodeTools:-Usage( select(has,f,x^2) ):
memory used=105.36MiB, alloc change=58.22MiB, cpu time=842.00ms, real time=843.00ms
sol2:=CodeTools:-Usage( x^2*coeff(f,x,2) ):
memory used=156.84KiB, alloc change=0 bytes, cpu time=0ns, real time=4.00ms
expand(sol1-sol2);
0
# Check that the timing difference was not just due to the order in which
# the two approaches were done, by a simple repeat.
CodeTools:-Usage( select(has,f,x^2) ):
memory used=105.30MiB, alloc change=23.11MiB, cpu time=733.00ms, real time=691.00ms
CodeTools:-Usage( x^2*coeff(f,x,2) ):
memory used=156.81KiB, alloc change=0 bytes, cpu time=0ns, real time=3.00ms
That was all done in Maple 17 64bit on Windows 7, and timings are pretty similar in Maple 16. This is in stark contrast to Maple 15 and earlier, where the coeff approach is about 3 times slower than that has approach. Those improvements relate to major work done in handling polynomial structures in Maple 16 and 17. See here and here.
Let's say that you want to see all terms of polynomial poly with x^2. Then do select(has, poly, x^2);

'Out of memory' in Matlab. A slow but a permanent solution?

I am wondering if my suggestion to 'Out of Memory' problem is impossible. Here is my suggestion:
The idea is seamlessly saving huge matrices (say BIG = rand(10^6)) to HDD as a .mat(-v7.3) file when it is not possible to keep it in memory and call it seamlessly whenever required. Then, when you want to use it like:
a = BIG(3678,2222);
s = size(BIG);
, it seamlessly does this behind the scene:
m = matfile('BIG.m');
a = m.BIG(3678,2222);
s = size(m,'BIG');
I know that speed is important but suppose that I have enough time but not enough memory. And also its better to write an memory efficient program but again suppose that I need to use someone else's function which cant be optimized. I do actually have some more related questions: Can this be implemented using objects? Or does it require a infrastructural change in Matlab?
Seems to me like this is certainly possible, as this essentially what many operating systems do in the form of paging.
Moreover, something similar is provided by MATLAB Distributed Computing Server. This allows you (among other things) to store the data for a single matrix on multiple machines, and access it seamlessly in the way that you propose.
IMHO, allowing for data to be paged to file/swap should be a setting in MATLAB. Unfortunately, that is not how MATLAB's memory model works, and I suspect it is very difficult to implement this on their side. Plus, when this setting is enabled, users will not be protected anymore against making silly mistakes like zeros(1e7) instead of zeros(1e7,1); it will simply seem to hang the system, as MATLAB is busy filling your entire drive with zeros.
Anyway, I think it is possible using MATLAB classes. But I wouldn't recommend it. Note that implementing a proper subsref and subsasgn is *ahum* challenging, plus, it is likely that you'll have to re-implement many algorithms (like mldivide). This will most likely mean you'll lose a great deal of performance; think factors of in the thousands.
Here's an interesting random relevant paper I found while googling around a bit.
What could probably be a solution to your problem is memory mapped io (which matlab supports).
There a file is mapped to the memory and all read/writes to that memory address are actually read/writes to the file.
This only reserves/blocks memory addresses, it does not consume physical memory.
However, I would only advise it with 64 bit matlab, since with 32 bit matlab the address space is simply not large enough to use ram for data, code for matlab and dlls, and memory mapped io.
Check out the examples for the documentation page of memmapfile(), e.g.,
m = memmapfile('records.dat', ...
'Offset', 1024, ...
'Format', {'uint32' [4 10 18] 'x'});
A = m.Data(1).x;
whos A
Name Size Bytes Class
A 4x10x18 2880 uint32 array
Note that, accesses to m.Data(1).x redirect to file IO, i.e. no memory is consumed. So it provides efficient random access to parts of possibly very large data files residing on disk. Also note, that more complex data structures as in the example can be realized.
It seems to me that this provides the "implementation with objects" you had in mind.
Unfortunately, this does not allow to memmap MATfiles directly, which would be really useful. Probably this is difficult because of the compression.
Write a function:
function a=BIG(x,y)
m = matfile('BIG.mat');
a = m.BIG(x,y);
end
Every time you write BIG(a,b) the function is called.

Fastest language for FOR loops

I'm trying to figure out the best programming language for an analytical model I'm building. Primary consideration is speed at which it will run FOR loops.
Some detail:
The model needs to perform numerous (~30 per entry, over 12 cycles) operations on a set of elements from an array -- there are ~300k rows, and ~150 columns in the array. Most of these operations are logical in nature, e.g., if place(i) = 1, then j(i) = 2.
I've built an earlier version of this model using Octave -- to run it takes ~55 hours on an Amazon EC2 m2.xlarge instance (and it uses ~10 GB of memory, but I'm perfectly happy to throw more memory at it). Octave/Matlab won't do elementwise logical operations, so a large number of for loops are needed -- I'm relatively certain that I've vectorized as much as possible -- the loops that remain are necessary. I've gotten octave-multicore to work with this code, which makes some improvement (~30% speed reduction when I get it running on 8 EC2 cores), but ends up being unstable with file locking, etc.
+I'm really looking for a step change in run-time -- I know that actually using Matlab might get me as much as a 50% improvement from looking at some benchmarks, but that is cost-prohibitive. The original plan when starting this was to actually run a Monte Carlo with this, but at 55 hours a run, that's completely impractical.
The next version of this is going to be a complete rebuild from the ground up (for IP reasons I won't get in to if nothing else), so I'm completely open to any programming language. I'm most familiar with Octave/Matlab, but have dabbled in R, C, C++, Java. I'm also proficient w/ SQL if the solution involves storing the data in a database. I'll learn any language for this -- these aren't complicated functionality we're looking for, no interfacing with other programs, etc., so not too concerned about learning curve.
So with all that said, what's the fastest programming language specifically for FOR loops? From a search of SO and Google, Fortran and C bubble to the top, but looking for some more advice before diving in to one or the other.
Thanks!
This for loop looks no more complex than this when it hits the CPU:
for(int i = 0; i != 1024; i++) translates to
mov r0, 0 ;;start the counter
top:
;;some processing
add r0, r0, 1 ;;increment the counter by 1
jne top: r0, 1024 ;;jump to the loop top if we havn't hit the top of the for loop (1024 elements)
;;continue on
As you can tell, this is sufficiently simple you can't really optimize it very well[1]... Refocus towards the algorithm level.
The first cut at the problem is to look at cache locality. Look up the classic example of matrix multiplication and swapping the i and j indexes.
edit: As a second cut, I would suggest evaluating the algorithm for data-dependencies between iterations and data dependency between localities in your 'matrix' of data. It may be a good candidate for parallelization.
[1] There are some micro-optimizations possible, but those will not produce the speedsups you're looking for.
~300k * ~150 * ~30 * ~12 = ~16G iterations, right?
This number of primitive operations should complete in a matter of minutes (if not seconds) in any compiled language on any decent CPU.
Fortran, C/C++ should do it almost equally well. Even managed languages, such as Java and C#, would only fall behind by a small margin (if at all).
If you have a problem of ~16G iterations running 55 hours, this means that they are very far from being primitive (80k per second? this is ridiculous), so maybe we should know more. (as was already suggested, is disk access limiting performance? is it network access?)
As #Rotsor said, 16G operations / 55 hours is about 80,000 operations per second, or one operation every 12.5 microseconds. That's a lot of time per operation.
That means your loops are not the cause of poor performance, it's what's in the innermost loop that's taking the time. And Octave is an interpreted language. That alone means an order of magnitude slowdown.
If you want speed, you first need to be in a compiled language. Then you need to do performance tuning (aka profiling) or, just single step it in a debugger at the instruction level. That will tell you what it is actually doing in its heart of hearts. Once you've got it to where it's not wasting cycles, fancier hardware, cores, CUDA, etc. will give you a parallelism speedup. But it's silly to do that if your code is taking unnecessarily many cycles. (Here's an example of performance tuning - a 43x speedup just by trimming the fat.)
I can't believe the number of responders talking about matlab, APL, and other vectorized languages. Those are interpreters. They give you concise source code, which is not at all the same thing as fast execution. When it comes down to the bare metal, they are stuck with the same hardware as every other language.
Added: to show you what I mean, I just ran this C++ code, which does 16G operations, on this moldy old laptop, and it took 94 seconds, or about 6ns per iteration. (I can't believe you baby-sat that thing for 2 whole days.)
void doit(){
double sum = 0;
for (int i = 0; i < 1000; i++){
for (int j = 0; j < 16000000; j++){
sum += j * 3.1415926;
}
}
}
In terms of absolute speed, probably Fortran followed by C, followed by C++. In practical application, well written code in any of the three, compiled with a descent compiler should be plenty fast.
Edit- generally you are going to see much better performance with any kind of looped or forking (e.g.- if statements) code with a compiled language, versus an interpreted language.
To give an example, on a recent project I was working on, the data sizes and operations were about 3/4 the size of what you're talking about here, but like your code, had very little room for vectorization, and required significant looping. After converting the code from matlab to C++, runtimes went from 16-18 hours, down to around 25 minutes.
For what you're discussing, Fortran is probably your first choice. The closest second place is probably C++. Some C++ libraries use "expression templates" to gain some speed over C for this kind of task. It's not entirely certain those will do you any good, but C++ can be at least as fast as C, and possibly somewhat faster.
At least in theory, there's no reason Ada couldn't be competitive as well, but it's been so long since I used it for anything like this that I hesitate to recommend it -- not because it isn't good, but because I just haven't kept track of current Ada compilers well enough to comment on them intelligently.
Any compiled language should perform the loop itself on roughly equal terms.
If you can formulate your problem in its terms, you might want to look at CUDA or OpenCL and run your matrix code on the GPU - though this is less good for code with lots of conditionals.
If you want to stay on conventional CPUs, you may be able to formulate your problem in terms of SSE scatter/gather and bitmask operations.
Probably the assembly language for whatever your platform is. But compilers (especially special-purpose ones that specifically target a single platform (e.g., Analog Devices or TI DSPs)) are often as good as or better than humans. Also, compilers often know about tricks that you don't. For example, the aforementioned DSPs support zero-overhead loops and the compiler will know how to optimize code to use those loops.
Matlab will do element-wise logical operations and they are generally quite fast.
Here is a quick example on my computer (AMD Athalon 2.3GHz w/3GB) :
d=rand(300000,150);
d=floor(d*10);
>> numel(d(d==1))
ans =
4501524
>> tic;d(d==1)=10;toc;
Elapsed time is 0.754711 seconds.
>> numel(d(d==1))
ans =
0
>> numel(d(d==10))
ans =
4501524
In general I've found matlab's operators are very speedy, you just have to find ways to express your algorithms directly in terms of matrix operators.
C++ is not fast when doing matrixy things with for loops. C is, in fact, specifically bad at it. See good math bad math.
I hear that C99 has __restrict pointers that help, but don't know much about it.
Fortran is still the goto language for numerical computing.
How is the data stored? Your execution time is probably more effected by I/O (especially disk or worse, network) than by your language.
Assuming operations on rows are orthogonal, I would go with C# and use PLINQ to exploit all the parallelism I could.
Might you not be best with a hand-coded assembler insert? Assuming, of course, that you don't need portability.
That and an optimized algorithm should help (and perhaps restructuring the data?).
You might also want to try multiple algorithms and profile them.
APL.
Even though it's interpreted, its primitive operators all operate on arrays natively, therefore you rarely need any explicit loops. You write the same code, whether the data is scalar or array, and the interpreter takes care of any looping needed internally, and thus with the minimum overhead - the loops themselves are in a compiled language, and will have been heavily optimised for the specific architecture of the CPU it's running on.
Here's an example of the simplicity of array handling in APL:
A <- 2 3 4 5 6 8 10
((2|A)/A) <- 0
A
2 0 4 0 6 8 10
The first line sets A to a vector of numbers.
The second line replaces all the odd numbers in the vector with zeroes.
The third line queries the new values of A, and the fourth line is the resulting output.
Note that no explicit looping was required, as scalar operators such as '|' (remainder) automatically extend to arrays as required. APL also has built-in primitives for searching and sorting, which will probably be faster than writing your own loops for these operations.
Wikipedia has a good article on APL, which also provides links to suppliers such as IBM and Dyalog.
Any modern compiled or JITted language is going to render down to pretty much the same machine language code, giving a loop overhead of 10 nano seconds or less, per iteration, on modern processors.
Quoting #Rotsor:
If you have a problem of ~16G iterations running 55 hours, this means that they are very far from being primitive (80k per second? this is ridiculous), so maybe we should know more.
80k operations per second is around 12.5 microseconds each - a factor of 1000 greater than the loop overhead you'd expect.
Assuming your 55 hour runtime is single threaded, and if your per item operations are as simple as suggested, you should be able to (conservatively) achieve a speedup of 100x and cut it down to under an hour very easily.
If you want to run faster still, you'll want to look at writing multi-threaded solution, in which case a language that provides good support would be essential. I'd lean towards PLINQ and C# 4.0, but that's because I already know C# - YMMV.
what about a lazy loading language like clojure. it is a lisp so like most lisp dialects lacks a for loop but has many other forms that operate more idiomatically for list processing. It might help your scaling issues as well because the operations are thread safe and because the language is functional it has fewer side effects. If you wanted to find all the items in the list that were 'i' values to operate on them you might do something like this.
(def mylist ["i" "j" "i" "i" "j" "i"])
(map #(= "i" %) mylist)
result
(true false true true false true)

redundant encoding?

This is more of a computer science / information theory question than a straightforward programming one, so if anyone knows of a better site to post this, please let me know.
Let's say I have an N-bit piece of data that will be sent redundantly in M messages, where at least M-1 of those messages will be received successfully. I am interested in different ways of encoding the N-bit piece of data in fewer bits per message. (this is similar to RAID but at a much smaller level, where N = 8 or 16 or 32)
Example: suppose N = 16 and M = 4. Then I could use the following algorithm:
1st and 3rd message: send "0" + bits 0-7
2nd and 4th message: send "1" + bits 8-15
If I can guarantee that 3 messages of the 4 will get through, then at least one message from each group will get through. Thus I can make this work with 9 bits or less, there's probably a way to do this with fewer total bits but I'm not sure how.
Are there some simple encoding/decoding algorithms to do this kind of thing? Does this problem have a name? (if I know what it's called, I can google it!)
note: in my particular case, the messages either arrive correctly or do not arrive at all (no messages arrive with errors).
(edit: moved 2nd part to a separate question)
(Incomplete answer follows. I may add more later.)
The term you may be interested in is channel coding: adding redundancy to a source in order to make it robust during transmission over a noisy channel. In information theory, the complementary problem to channel coding is source coding: reducing the redundancy in a source to represent it using fewer bits. (The combination of these two problems is called joint source-channel coding.)
Your first question asks to find a channel code. The simple example you give is similar to a repetition code, i.e., you send the same message more than twice (usually an odd number of times), and then the message which is received most often is accepted as the original message.
This code is inefficient. To use standard notation, let k = number of bits in original message, and n = number of bits in the transmitted message. For your example, k = 16 and n = 36. A measure of coding efficiency is k/n, where higher means more efficient. In your case, k/n = 0.44. This is low.
The repetition code is a simple kind of block code, i.e., redundancy is added to each block of k bits to create a codeword of n bits. So are the Hamming and Reed-Solomon codes as others mentioned. Hamming codes are relatively easy to understand with some basic linear algebra.
These should be enough terms for you to search on your own. Good luck.
I'm not sure if I understood all the details of your question correctly, but your problem is definitely aboud designing some kind of error correcting code. This is a vast area of computer science and thick tomes have been written about it. Start with wikipedia and see if you can get any simple schemes (like Hamming or Reed-Solomon codes) to work in your case.
If you want to deal not only with symbol corruption, but also deletion of symbols, you should look at erasure codes, this is definitely a more difficult task but good methods exist in many cases.
EDIT: This material from hackersdelight.org seems a nice introduction.
See erasure codes.
You're looking for a packet erasure code. There are only two useful packet erasure codes that are not totally encumbered by patents, and there's only one open-source library to implement those. Find it here: http://planete-bcast.inrialpes.fr/rubrique.php3?id_rubrique=5
Here's a trivially simple scheme that's almost twice as efficient as your example.
You chopped the message into blocks of (N/M)*2 bits. Instead, chop it into N/(M-1)-bit blocks. (Round it up if necessary.) The first block, src[0], encodes as itself: enc[0]=src[0]. The same for the last block: enc[M-1]=src[M-1]. Each of the other blocks gets XORed with its left neighbor: enc[i]=src[i-1]^src[i].
Prefix each encoded block with a log(M)-bit sequence number, essentially as you did, so the receiver can tell which was dropped. (If you can be sure that whichever blocks arrive will arrive in order, then a 1-bit sequence number will do. Just alternate 0 and 1.)
To decode, successively XOR from the left and the right until you hit the dropped block. E.g. src[1] == enc[0]^enc[1]. (Dropping one of the endpoint blocks isn't a special case -- e.g. if the first block is dropped, the scan from the right recovers it, and the scan from the left is of length 0.)