getblast error unavilable matlab - matlab

I'm writing matlab code that's supposed to iterate over a list sequences and blast each at a time. Here is the relevant part of the code:
%blast the seq
[res, ROTE] = blastncbi(seq, 'blastn');
res1 = getblast(res, 'WaitTime',ROTE);
resName = res1.Hits(1).Name
for some seq's it worked, and then for the last it gave me this error message:
Error using getblast (line 176)
BLAST V7EBUE0901R is unavailable - try later.
please note that I've defined ROTE as the 'WaitTime' value, as suggested in the documentation of this function.
The script must iterate over lots and lots of genes, so I can't let it crash every 5 minutes!

The RTOE returned by blastncbi is an estimated time of how long it takes. Perhaps the estimate is sometimes simply incorrect.
Two simple ways to deal with this could be waiting longer, or trying it twice:
res1 = getblast(res, 'WaitTime',ROTE*10);
or
try
res1 = getblast(res, 'WaitTime',ROTE);
catch
res1 = getblast(res, 'WaitTime',ROTE);
end
Of course this assumes that you are sure that the info that you request is actually available.

Related

Displaying Data as a String in Simulink RT Display Port

My issue involves using the RS-232 Simulink RT blocks.
A model is uploaded to the target PC (xPC) and it transmits and receives data from a variable frequency drive (VFD) that controls a motor. The issue arises on the receiving end when I take data and try to send that data to a display block in my model as a string. Code would be helpful here:
disp = uint8(zeros(1,24));
display = uint8(zeros(1,length(disp)));
cmd = 0;
status = stat_lb;
%% Start-Up
% Initialization Period
if (status == 0 || status == 1)
cmd = 0;
msg = uint8('Start up');
display = [msg uint8(zeros( 1, length(disp)- length(msg) ))];
end
...
%Multiple status cases with unique displays.
...
disp = display
So, here the cmd portion functions as expected. As noted above, I want to display the display string on a display block in my Simulink model. As you can see, though, it is of type uint8, so I need to convert it to type string; however, when I pass it through either the ascii2str Simulink block or just place it in the function call (e.g. display = ascii2str(display)) I get the following error message:
Executing the 'CheckData' command produced the following error: Invalid parameter/value pair arguments
My thought is that this has something to do with the fact that I am using MEX and this function (ascii2str) is not supported. Anyways, I am wondering if anyone knows why I receive this error and if there is anything I can do to resolve it.
Oh, and one last thing: I can get the display to work if I just remove the ascii2str; however, the only problem with this is that the display is in uint8 form and not really helpful. So, if there is any other way that I can decode the uint8 to a string I am all ears.
Thanks!
I have found that there is no support for this feature in Simulink RT. One option is to use external functions, but I found it better for my application to simply output a number and have a table in the simulation that explained what each number meant.

MATLAB 'Gather' on Tall Array Never Terminating

I am running R2017a on Windows 10, and using a tall array that was constructed off a Datastore object (that was, itself, constructed from tall arrays built on MATLAB matrices of doubles).
What really bugs me about the issue I'm having now is that all my code used to work fine. One day it simply started hanging when I tried to run it.
This is the block in question:
%load
tallTraindat = tall(datastore);
sz = size(tallTraindat);
sz = gather(sz);
numExamples = sz(1);
exampleLen = sz(2);
My tall array is a M x ~1700 single array, constructed on this datastore of smaller single arrays. In a loop:
write(fname,tallEx); % tall ex constructed by tall(someSingles);
folderNames{end+1} = fname;
and then:
ds = datastore(folderNames,'Type','tall');
As you can see, this is about as vanilla as it gets. But the operation 'sz = gather(sz)' simply hangs forever. It never finishes or returns. My parallel pool has started properly, and the gather operation gets to the point where it prints 'Evaluation 100% complete'. But it goes nowhere from there. If I pause execution, I'm always taken to a point in RemoteSpdmExecutor, line 129 'obj.RemoteSpmdController.drainIO( false );'. This line apparently lasts forever.
EDIT: When I woke up today, it started failing with an error message instead : 'Error using parallel.FevalOnAllFuture/fetchOutputs (line 69)
fetchOutputs could not concatenate the OutputArguments. Set 'UniformOutput' to false.
Cell contents reference from a non-cell array object.'
EDIT: Re-created my default local parallel pool a couple times. Now it's back to hanging on that same line of code.
Based on all my tests, it seems that my issue occurs whenever I construct a tall datastore from more than one tall-array folder, and then construct a tall array off of that datastore.
Tearing my hair out over this one. If anyone even has a suspicion where to look for the source of this problem, I'd appreciate it. I'll try to respond quickly to requests for more info.

Spark::KMeans calls takeSample() twice?

I have many data and I have experimented with partitions of cardinality [20k, 200k+].
I call it like that:
from pyspark.mllib.clustering import KMeans, KMeansModel
C0 = KMeans.train(first, 8192, initializationMode='random', maxIterations=10, seed=None)
C0 = KMeans.train(second, 8192, initializationMode='random', maxIterations=10, seed=None)
and I see that initRandom() calls takeSample() once.
Then the takeSample() implementation doesn't seem to call itself or something like that, so I would expect KMeans() to call takeSample() once. So why the monitor shows two takeSample()s per KMeans()?
Note: I execute more KMeans() and they all invoke two takeSample()s, regardless of the data being .cache()'d or not.
Moreover, the number of partitions doesn't affect the number takeSample() is called, it's constant to 2.
I am using Spark 1.6.2 (and I cannot upgrade) and my application is in Python, if that matters!
I brought this to the mailing list of the Spark devs, so I am updating:
Details of 1st takeSample():
Details of 2nd takeSample():
where one can see that the same code is executed.
As suggested by Shivaram Venkataraman in Spark's mailing list:
I think takeSample itself runs multiple jobs if the amount of samples
collected in the first pass is not enough. The comment and code path
at GitHub
should explain when this happens. Also you can confirm this by
checking if the logWarning shows up in your logs.
// If the first sample didn't turn out large enough, keep trying to take samples;
// this shouldn't happen often because we use a big multiplier for the initial size
var numIters = 0
while (samples.length < num) {
logWarning(s"Needed to re-sample due to insufficient sample size. Repeat #$numIters")
samples = this.sample(withReplacement, fraction, rand.nextInt()).collect()
numIters += 1
}
However, as one can see, the 2nd comment said it shouldn't happen often, and it does happen always to me, so if anyone has another idea, please let me know.
It was also suggested that this was a problem of the UI and takeSample() was actually called only once, but that was just hot air.

Trouble with running a parallelized script from a driver script

I'm trying to parallelize my code, and I finally got the parfor loops set up such that Matlab doesn't crash every time. However, I've now got an error that I can't seem to figure out.
I have a driver script (Driver12.m) that calls the script that I'm trying to parallelize (Worker12.m). If I run Worker12.m directly, it usually finishes with no problem. However, every time I try to run it from Driver12.m, it either 1) causes Matlab to crash, or 2) throws a strange error at me. Here's some of my code:
%Driver script
run('(path name)/Worker12.m');
%Relevant worker script snippet
parfor q=1:number_of_ranges
timenumber = squeeze(new_TS(q,:,:));
timenumber_shift = circshift(timenumber, [0 1]);
for m = 1:total_working_channels
timenumberm = timenumber(m,:);
for n = 1:total_working_channels
R_P(m,n,q) = mean(timenumberm.*conj(timenumber(n,:)),2);
R_V(m,n,q) = mean(timenumberm.*conj(timenumber_shift(n,:)),2);
end
end
end
Outcome #1: "Matlab has encountered an unexpected error and needs to close."
Outcome #2: "An UndefinedFunction error was thrown on the workers for ''. This might be because the file containing '' is not accessible on the workers. Use addAttachedFiles(pool, files) to specify the required files to be attached. See the documentation for 'parallel.Pool/addAttachedFiles' for more details. Caused by: Undefined function or variable ""."
However, if I run Worker12.m directly, it works fine. It's only when I run it from the driver script that I get issues. Obviously, this error message from Outcome #2 isn't all that useful. Any suggestions?
Edit: So I created a toy example that reproduces an error, but now both my toy example and the original code are giving me a new, 3rd error. Here's the toy example:
%Driver script
run('parpoolexample.m')
%parpoolexample.m
clear all
new_TS = rand([1000,32,400]);
[number_of_ranges,total_working_channels,~] = size(new_TS);
R_P = zeros(total_working_channels,total_working_channels,number_of_ranges);
R_V = zeros(total_working_channels,total_working_channels,number_of_ranges);
parfor q=1:number_of_ranges
timenumber = squeeze(new_TS(q,:,:));
timenumber_shift = circshift(timenumber, [0 1]);
for m = 1:total_working_channels
timenumberm = timenumber(m,:);
for n = 1:total_working_channels
R_P(m,n,q) = mean(timenumberm.*conj(timenumber(n,:)),2);
R_V(m,n,q) = mean(timenumberm.*conj(timenumber_shift(n,:)),2);
end
end
end
Outcome #3: "Index exceeds matrix dimensions (line 7)."
So, at the 'parfor' line, it's saying that I'm exceeding the matrix dimensions, even though I believe that should not be the case. Now I can't even get my original script to recreate Outcomes #1 or #2.
Don't use run with parallel language constructs like parfor and spmd. Unfortunately it doesn't work very well. Instead, use cd or addpath to let MATLAB see your script.

Index error when filling in an array with a randomly generated array

I am getting an error when running this bit of code about the index. I have ran through the logic several times and have yet to catch my error and I am thinking it is in the way I coded this section. Any help would be greatly appreciated. Please let me know if I am missing any information vital for this bit of code.
index_pairs = [1,12661;12662,46147;46148,52362]
group_class_count = [10137,2524;127448,20738;1570,4645]
group_count = 3
cross_sections = 10
for j=1:group_count
rand_index=randsample(index_pairs(j,1):index_pairs(j,2),(group_class_count(j,1)+group_class_count(j,2)),true); % Creates an index of random rows for the current group.
cross_size(j)=floor(size(rand_index,2)/cross_sections);
for k=1:cross_sections
cross_rand_indices(j,k)={rand_index(cross_size*(k-1)+1:cross_size*(k))};
end
end
error: Index exceeds matrix dimensions. Error in cross_rand_indices(j,k)={rand_index(cross_size*(k-1)+1:cross_size*(k))};
If you change
cross_rand_indices(j,k)={rand_index(cross_size*(k-1)+1:cross_size*(k))};
to
cross_rand_indices(j,k)={rand_index(cross_size(j)*(k-1)+1:cross_size(j)*(k))};
the error will disappear.
I assume this is in line with your intent when saving something to cross_size(j) in the outer loop.