Related
I have a very long list of x,y,z coordinates of items (200K-800K,3) that I need to search through and find the nearest items to a particular point - This final list is always has at least 1 item and usually less than 10 items.
I've tried a few simple search methods to find this list but I've hit a bit of a limit - here are my two best methods to date:
Method 1 - find Indexing
xInd = find(PositionsList(:,1) > (searchPoint(i,1) - searchRad) & PositionsList(:,1) < (searchPoint(i,1) + searchRad));
yInd = find(PositionsList(xInd,2) > (searchPoint(i,2) - searchRad) & PositionsList(xInd,2) < (searchPoint(i,2) + searchRad));
xyInd = xInd(yInd);
zInd = find(PositionsList(xyInd,3) > (searchPoint(i,3) - searchRad) & PositionsList(xyInd,3) < (searchPoint(i,3) + searchRad));
xyzInd = xyInd(zInd);
Method 2 - Brute force distance search
neighbours = sqrt(sum(bsxfun(#minus,searchPoint(i,:),PositionsList).^2,2)) <= searchRad;
xyzInd = find(neighbours == 1);
Method 3 - logial Indexing
xInd = PositionsList(:,1) > (searchPoint(i,1) - searchRad) & PositionsList(:,1) < (searchPoint(i,1) + searchRad);
newlist = PositionsList(xInd==1,:);
yzInd = newlist(:,2) > (searchPoint(i,2) - searchRad) & newlist(:,2) < (searchPoint(i,2) + searchRad)...
& newlist(:,3) > (searchPoint(i,3) - searchRad) & newlist(:,3) < (searchPoint(i,3) + searchRad);
xyzInd = newlist(yzInd==1,:);
For my data method 1 is much quicker - for a small list of 20000 particles it runs in about 25s whereas method 2 runs in about 170s, but method 2 is slightly more accurate - it has dubious neighbours (outlier on edge of search area) much less often.
My code calls this search several thousand times so I'm keen to save as much time on it as possible - it currently makes up about 85% of my run-time. I've read that mex implementation may be much quicker but I'm not familiar mex. I've also tried a 3rd method of logical indexing rather than find, but it is slower at 35s.
Can someone help with making this search faster? Maybe a mex function?
Following on from obchardon suggestion I had a look at k-d trees for searching and found the following, kd-tree for matlab, on file exchange with my data and began testing with my data.
Now I can complete a search in under 5s for 500K coordinates, compared with my previous best of 25s for 20K sets of coordinates. Huge improvement.
The only down side is the order it returns the neighbours seems to be random which mean I have some slight "caressing" of the results to do, but with the time saved, this is more than acceptable.
Thanks for the great suggestion!!
I am trying to get connected components of an image and then run ocr for each connected component.This is my code-
clc
image=imread('im.png');
image=imcomplement(image);
[imx imy]=size(image);
n1=zeros(imx,imy);
symb=zeros(imx,imy);
lin=zeros(imx,imy);
L = bwlabel(image,8) ;%Calculating connected components
mx=max(max(L));
for i=1:mx
[r,c] = find(L==i);
n1=zeros(imx,imy);
rc = [r c];
[sx sy]=size(rc);
for j=1:sx
x1=rc(j,1);
y1=rc(j,2);
n1(x1,y1)=1;
end
figure,imshow(n1);title('components');
r = ocr(n1,'TextLayout','Word')
n=strtrim(r.Text);
end
This is my input image-
One of the connected components which I get is this-
I get this when I display the components in 4th last line.But in the next line I dont get any result for the ocr of this component.So my question is why am I not getting ocr for this component whereas all other componets give some result in ocr.
If,instead of im.png I use this component as input in my very first line of the code-I get an ocr for this.Why is this happening?
Edit- If I use this component as input,I get the ocr.
I don't know exactly what you want to achieve (it was not very clear from your post). But if what you want is to extract the letters in a chemical formula the below code will do the trick.
I = imread('6oua6.png');
s = regionprops(~I,{'BoundingBox'});
for ii=1:numel(s)
bb = s(ii).BoundingBox;
if bb(4)<30 % enforce a limit to discard non-letters
ocr(I,s(ii).BoundingBox, 'TextLayout', 'Block' )
rectangle('Position',s(ii).BoundingBox+[-1 -1 2 2 ],'Edgecolor','y')
pause
end
end
The ocr will identify the letters correctly (at least from the image you supplied).
Binding them as further will just require you to construct some rules.
enjoy.
Alrighty everybody, it's the time of the week where I learn how to do weird things with MATLAB. This week it's DJing. What I need to do is figure out how to make my function output the name of the song whose length is closest to the time left. For instance, if I'm showing off my DJing skills and I have 3:22 left, I have to pick a song whose length is closest to the time left (can be shorter or longer). I'm given a .txt file to choose from.
Test Case
song1 = pickSong('Funeral.txt', '3:13')
song1 => 'Neighborhood #2 (Laika)'
The file for this looks like:
1. Neighborhood #1 (Tunnels) - 4:48
2. Neighborhood #2 (Laika) - 3:33
3. Une annee sans lumiere - 3:40
4. Neighborhood #3 (Power Out) - 5:12
5. Neighborhood #4 (7 Kettles) - 4:49
6. Crown of Love - 4:42
7. Wake Up - 5:39
8. Haiti - 4:07
9. Rebellion (Lies) - 5:10
10. In the Backseat - 6:21
I have most of it planned out, what I'm having an issue with is populating my cell array. It only puts in the last song, and then changes it to a -1 after my loop runs. I've tried doing it three different ways, the last one being the most complex (and gross looking sorry). Once I get the cell array into it's proper form (as the full song list and not just -1) I should be in the clear.
function[song] = pickSong(file_name,time_remain)
Song_list = fopen(file_name, 'r'); %// Opens the file
Song_names = fgetl(Song_list); %// Retrieves the lines, or song names here
Songs_in = ''; %// I had this as a cell array first, but tried to populate a string this time
while ischar(Songs) %// My while loop to pull out the song names
Songs_in = {Songs_in, Songs};
Songs = fgetl(Song_list);
if ischar(Songs_in) %//How I was trying to populate my string
song_info = [];
while ~isempty(Songs_in)
[name, time] = strtok(Songs_in);
song_info = [song_info {name}];
end
end
end
[songs, rest] = strtok(Songs, '-');
[minutes, seconds] = strtok(songs, ':');
[minutes2, seconds2] = strtok(time_remain, ':')
all_seconds = (minutes*60) + seconds; %// Converting the total time into seconds
all_seconds2 = (minutes2*60) + seconds2;
song_times = all_seconds;
time_remain = all_seconds2
time_remain = min(time_remain - song_times);
fclose(file_name);
end
Please and thank you for the help :)
A troublesome case:
song3 = pickSong('Resistance.txt', '3:57')
song3 => 'Exogenesis: Symphony Part 2 (Cross-Pollination)'
1. Uprising - 5:02
2. Resistance - 5:46
3. Undisclosed Desires - 3:56
4. United States of Eurasia (+Collateral Damage) - 5:47
5. Guiding Light - 4:13
6. Unnatural Selection - 6:54
7. MK ULTRA - 4:06
8. I Belong to You (+Mon Coeur S'ouvre a Ta Voix) - 5:38
9. Exogenesis: Symphony Part 1 (Overture) - 4:18
10. Exogenesis: Symphony Part 2 (Cross-Pollination) - 3:57
11. Exogenesis: Symphony Part 3 (Redemption) - 4:37
Here is my implementation:
function song = pickSong(filename, time_remain)
% read songs file into a table
t = readSongsFile(filename);
% query song length (in seconds)
len = str2double(regexp(time_remain, '(\d+):(\d+)', ...
'tokens', 'once')) * [60;1];
% find closest match
[~,idx] = min(abs(t.Duration - len));
% return song name
song = t.Title(idx);
end
function t = readSongsFile(filename)
% read the whole file (as a cell array of lines)
fid = fopen(filename,'rt');
C = textscan(fid, '%s', 'Delimiter',''); C = C{1};
fclose(fid);
% parse lines of the form: "0. some name - 00:00"
C = regexp(C, '^(\d+)\.\s+(.*)\s+-\s+(\d+):(\d+)$', 'tokens', 'once');
C = cat(1, C{:});
% extract columns and create a table
t = table(str2double(C(:,1)), ...
strtrim(C(:,2)), ...
str2double(C(:,3:4)) * [60;1], ...
'VariableNames',{'ID','Title','Duration'});
t.Properties.VariableUnits = {'', '', 'sec'};
end
We should get the expected results on the test files:
>> pickSong('Funeral.txt', '3:13')
ans =
'Neighborhood #2 (Laika)'
>> pickSong('Resistance.txt', '3:57')
ans =
'Exogenesis: Symphony Part 2 (Cross-Pollination)'
Note: The code above uses MATLAB tables to store the data, which allows for easy manipulation. For example:
>> t = readSongsFile('Funeral.txt');
>> t.Minutes = fix(t.Duration/60); % add minutes column
>> t.Seconds = rem(t.Duration,60); % add seconds column
>> sortrows(t, 'Duration', 'descend') % show table sorted by duration
ans =
ID Title Duration Minutes Seconds
__ _____________________________ ________ _______ _______
10 'In the Backseat' 381 6 21
7 'Wake Up' 339 5 39
4 'Neighborhood #3 (Power Out)' 312 5 12
9 'Rebellion (Lies)' 310 5 10
5 'Neighborhood #4 (7 Kettles)' 289 4 49
1 'Neighborhood #1 (Tunnels)' 288 4 48
6 'Crown of Love' 282 4 42
8 'Haiti' 247 4 7
3 'Une annee sans lumiere' 220 3 40
2 'Neighborhood #2 (Laika)' 213 3 33
% find songs that are at least 5 minutes long
>> t(t.Minutes >= 5,:)
% songs with the word "Neighborhood" in the title
>> t(~cellfun(#isempty, strfind(t.Title, 'Neighborhood')),:)
I'm going to write an answer using most of what you have already written, instead of suggesting something completely different. Though regexp is a powerful too (and I like regular expressions), I find that it is too advanced for what you have learned so far, so let's scrap it for now.
This way, you get to learn what was wrong with your code, as well as how awesome of a debugger I am (just kidding). What you have when reading in the text file almost works. You made a good choice in creating a cell array to store all of the strings.
I'm also going to borrow MrAzzaman's logic in calculating the time in seconds through strtok (awesome job btw).
In addition, I'm going to change your logic a bit so that it makes sense to me on how I would do it. Here's the basic algorithm:
Open up the file and read the first line (song) as you did in your code
Initialize a cell array that contains the first song in the text file
Until we reach the end of the text file, read in the entire line and add it into the cell array. You've also noticed that as soon as you hit a -1, we don't have any more songs to read, so break out of the loop.
Now that we have our songs in a cell array, which include the track number, song and the time for each song, we are going to create two more cell arrays. The first one will store just the times of the songs as strings, with both the minutes and the seconds delimited by :. The next one will just contain the names of the songs themselves. Now, we go through each element in our cell array that we created from Step #3.
(a) To populate the first cell array, I use strfind to find all occurrences of where the - character occurs. Once I find where these occur, I choose the last location of where the - occurs. I use this to index into our song string, and skip over 2 characters to skip over the - character and the space character. We extract all of the characters from this point until the end of the line to extract our times.
(b) To populate the second cell array, I again use strfind, but then I figure out where the spaces occur, and choose the index of where the first space happens. This corresponds to the gap in between the song number and the track of the song. Using my result of the index from (a), I extract the song title by skipping one character from the index of the first space to the index two characters before the last - character to successfully get the song. This is because there will probably be a space in between the last word of the song title before the - character so we want to remove that space.
Next, for each song time in the first cell array computed in Step #4, I use strtok like you have used and split up the string by the :. MrAzzaman has used this as well and I'm going to borrow his logic on computing the total amount of seconds that each time takes.
Finally, we figure out which time is the closest to the time remaining. Note that we also need to convert the time remaining into seconds like we did in Step #5. As MrAzzaman has said, you can use the min function in MATLAB, and use the second output of the function. This tells you where in the array the minimum occurred. As such, we simply search for the minimum difference between the time remaining and the time elapsed for each song. Take note that you said you don't care whether or not you go over or under the time elapsed. You just want the closest time. In that case, you need to take the absolute value of the time differences. Let's say you had a song that took 3:59 and another song that was 6:00, and the time remaining was 4:00. Assuming that there is no song that is 4:00 long in your track, you would want to choose the song that is at 3:59. However, if you just subtract the time remaining from the longer track (6:00), you would get a negative difference, and min would return this track... not the song at 3:59. This is why you need to take the absolute value, so this will disregard whether you're over or under the time remaining.
Once we figure out which song to choose, return the song name that gives us the minimum. Make sure you close the file too!
Without further ado, here's the code:
function [song] = pickSong(file_name, time_remain)
% // Open up the file
fid = fopen(file_name, 'r');
%// Read the first line
song_name = fgetl(fid);
%// Initialize cell array
song_list = {song_name};
%// Read in the song list and place
%// each entry into a cell array
while ischar(song_name)
song_name = fgetl(fid);
if song_name == -1
break;
end
song_list = [song_list {song_name}];
end
%// Now, for each entry in our song list, find all occurrences of the '-'
%// with strfind, and choose the last index that '-' occurs at
%// Make sure you skip over by 2 spaces to remove the '-' and the space
song_times = cell(1,length(song_list));
song_names = cell(1,length(song_list));
for idx = 1 : length(song_list)
idxs = strfind(song_list{idx}, '-');
song_times{idx} = song_list{idx}(idxs(end)+2:end);
idxs2 = strfind(song_list{idx}, ' ');
%// Figure out the index of where the first space is, then extract
%// the string that starts from 1 over, to two places before the
%// last '-' character
song_names{idx} = song_list{idx}(idxs2(1)+1 : idxs(end)-2);
end
%// Now we have a list of times for each song. Tokenize by the ':' to
%// separate the minutes and times, then calculate the number of seconds
%// Logic borrowed by MrAzzaman
song_seconds = zeros(1,length(song_list));
for idx = 1 : length(song_list)
[minute_str, second_str] = strtok(song_times{idx}, ':');
song_seconds(idx) = str2double(minute_str)*60 + str2double(second_str(2:end));
end
%// Now, calculate how much time is remaining from the input
[minute_str, second_str] = strtok(time_remain, ':');
seconds_remain = str2double(minute_str)*60 + str2double(second_str(2:end));
%// Now, choose the song that is closest to the amount of time
%// elapsed
[~,song_to_choose] = min(abs(seconds_remain - song_seconds));
%// Return the song you want
song = song_names{song_to_choose};
%// Close the file
fclose(fid);
end
With your two example cases you've shown above, this is the output I get. I've taken the liberty in creating my own text files with your (awesome taste in) music:
>> song1 = pickSong('Funeral.txt', '3:13')
song1 =
Neighborhood #2 (Laika)
>> song2 = pickSong('Resistance.txt', '3:57')
song2 =
Exogenesis: Symphony Part 2 (Cross-Pollination)
You can manage this with textscan, as follows:
function[song,len] = pickSong(file_name,time_remain)
fid = fopen(filename);
toks = textscan(fid,'%[^-] - %d:%d');
songs = toks{1};
song_len = double(toks{2}*60 + toks{3});
[min_rem, sec_rem] = strtok(time_remain, ':');
time_rem = str2double(min_rem)*60 + str2double(sec_rem(2:end));
[len,i] = min(abs(time_rem - song_len));
song = songs{i};
Note that this will only work if none of your song names have a '-' character in them.
EDIT: Here's a solution that (should) work on any song titles:
function[song,len] = pickSong(file_name,time_remain)
file = fileread(file_name);
toks = regexp(file,'\d+. (.*?) - (\d+):(\d+)\n','tokens');
songs = cell(1,length(toks));
song_lens = zeros(1,length(toks));
for i=1:length(toks)
songs{i} = toks{i}{1};
song_lens(i) = str2double(toks{i}{2})*60 + str2double(toks{i}{3});
end
[min_rem, sec_rem] = strtok(time_remain, ':');
time_rem = str2double(min_rem)*60 + str2double(sec_rem(2:end));
[len,i] = min(abs(time_rem - song_lens));
song = songs{i};
regexp is a MATLAB function that runs regular expressions on a string (in this case your file of song names). The string '\d+. (.*?) - (\d+):(\d+)\n' scans each line extracting the name and length of each song. \d+ matches one or more digit, while .*? matches anything. The brackets are for grouping the output. So, we have:
match n digits, followed by a (string), followed by (n-digits):(n-digits)
Every thing in brackets is returned as a cell array to the toks variable. The for loop is just extracting the song names and lengths from the resulting cell array.
I have the following problem:
I have over 20 different models which I want to simulate one after another but I want to change the simulation directory each time.
Right now I'm manually changing directory after each simulation (from ./ModelOne to ./ModelTwo) and I'd like to know if there's a way to change it automatically when I initialize or translate the new model.
Regards
Nev
the best way is to write a script I think:
pathOfSave = {"E:\\work\\modelica\\SimulationResult\\Model1\\","E:\\work\\modelica\\SimulationResult\\Model2\\"};
nbSim = 2;
pathOfMod = { "MyModel.",
"MyModel.};
modelsToSimulate = { ""Model1" ,
"Model2"};
//If equdistant=true: ensure that the same number of data points is written in all result files
//store variables at events is disabled.
experimentSetupOutput(equdistant=false, events=false);
//Keep in the plot memory the last nbSim results
experimentSetupOutput(equdistant=false, events=false);
for i in 1:nbSim loop
//delete the result file if it already exists
Modelica.Utilities.Files.removeFile(pathOfSave + modelsToSimulate[i]);
//translate models
translateModel(pathOfMod[i]+modelsToSimulate[i]);
// simulate
simulateModel(
pathOfMod[i]+modelsToSimulate[i],
method="dassl",
stopTime=186350,
numberOfIntervals=nbOfPoi,
resultFile=pathOfSave + modelsToSimulate[i]);
end for;
You can also put the command cd("mynewpath") in the initial algorithm section, if you want it tobe attached to the model.
model example
Real variable;
protected
parameter String currDir = Modelica.Utilities.System.getWorkDirectory();
initial algorithm
cd("C:\\Users\\xxx\\Documents\\Dymola\\MyModelFolder");
equation
variable = time;
when terminal() then
cd(currDir);
end when;
end example;
In any case you can find all commands of dymola in the manual one under the section "builtin commands".
I hope this helps,
Marco
I have a CSV file 1.6 GB large, that I need to feed into matlab. I will have to do this frequently and I need it to run quickly. The file is of the form:
20111205 00:00.2 99.18 6 E
20111205 00:00.2 99.18 5 E
20111205 00:00.2 99.18 1 E
20111205 00:00.2 99.195 5 E
20111205 00:00.2 99.195 5 E
20111205 01:27.0 99.19 5 E
20111205 02:01.4 99.185 1 E
20111205 02:01.4 99.185 1 E
20111205 02:01.4 99.185 1 E
20111205 02:01.4 99.185 1 E
The code I have right now is the following:
tic;
format long g
fid = fopen('C:\Program Files\MATLAB\R2013a\EDU13.csv','r');
[c] = fscanf(fid, '%d,%d:%d.%d,%f,%d,%c');
c = reshape(c, 7, length(c)/7)
toc;
But this is far too slow. I would appreciate a method of getting this CSV file into matlab in the most efficient manner possible. Thank you!
Consider using a binary file format. Binary files are much smaller and don't need to be converted by MATLAB into the binary format. Hence they are much faster to read and write. They may also be more accurate (precision may be higher).
http://www.mathworks.com.au/help/matlab/ref/fread.html
The recommended syntax is textscan (http://www.mathworks.com/help/matlab/ref/textscan.html)
Your code would look like this:
fid = fopen('C:\Program Files\MATLAB\R2013a\EDU13.csv','r');
c = textscan(fid, '%d,%d:%d.%d,%f,%d,%c');
fclose(fid);
You end up with a cell array... whether it's worth converting that to another shape really depends on how you want to access the data afterwards.
It is quite likely that this would be faster if you include a loop that allows you to use a smaller, fixed amount of memory for much of the operation. One problem with reading large files is the fact that you don't know ahead of time how big it will be - and that very likely means that Matlab guesses the amount of memory it needs, and frequently has to rescale. That is a very slow operation - if it happens every 1MB, say, then it copies 1 MB once, next 2 MB, then again 3 MB, etc - as you can see it is quadratic in the size of the array.
If instead you allocate a fixed amount of memory for the final result, and process in smaller batches, you avoid all that overhead. I'm pretty sure it will be much faster - but you would have to experiment a bit with the block size. That would look something like this:
block = 1000;
Nlines = 35E6;
fid = fopen('C:\Program Files\MATLAB\R2013a\EDU13.csv','r');
c = struct(field1, field2, fieldn, value); %... initialize structure array or other storage for c ...
c_offset = 0;
while ~feof(fid)
temp = textscan(fid, '%d,%d:%d.%d,%f,%d,%c', block);
bt = size(temp, 1); % first dimension - should be `block`, except for last loop
%... extract, process, store in c(c_offset + (1:bt))...
c_offset = c_offset + bt;
end
fclose(fid);
Inspired by #Axon's answer, I implemented a "fast" C program to convert the file to binary, then read it in using Matlab's fread function. Spoiler alert: reading is then 20x faster... although the initial conversion takes a little bit of time.
To make the job in Matlab easier, and the file size smaller, I am converting each of the number fields into an int16 (short integer). For the first field - which looks like a yyyymmdd field - that involves splitting into two smaller numbers; similarly the decimal numbers are converted to two short integers (given the apparent range I think that is valid). All this is recognizing that "to really optimize, you must really know your problem" - so if assumptions are invalid, the results will be too.
Here is the C code:
#include <stdio.h>
int main(){
FILE *fp, *fo;
long int ld1;
int d2, d3, d4, d5, d6, d7;
short int buf[9];
char c8;
int n;
short int year, monthday;
fp = fopen("bigdata.txt", "r");
fo = fopen("bigdata.bin", "wb");
if (fp == NULL || fo == NULL) {
printf("unable to open file\n");
return 1;
}
while(!feof(fp)) {
n = fscanf(fp, "%ld %d:%d.%d %d.%d %d %c\n", \
&ld1, &d2, &d3, &d4, &d5, &d6, &d7, &c8);
year = d1 / 10000;
monthday = d1 - 10000 * year;
// move everything into buffer for single call to fwrite:
buf[0] = year;
buf[1] = monthday;
buf[2] = d2;
buf[3] = d3;
buf[4] = d4;
buf[5] = d5;
buf[6] = d6;
buf[7] = d7;
buf[8] = c8;
fwrite(buf, sizeof(short int), 9, fo);
}
fclose(fp);
fclose(fo);
return 0;
}
The resulting file is about half the size of the original - which is encouraging and will speed up access. Note that it would be a good idea if the output file could be written to a different disk than the input file - it really helps keep data streaming without a lot of time wasted in seek operations.
Benchmark: using a file of 2 M lines as input, this ran in about 2 seconds (same disk). The resulting binary file is read in Matlab with the following:
tic
fid = fopen('bigdata.bin');
d = fread(fid, 'int16');
d = reshape(d, 9, []);
toc
Of course, now if you want to recover the numbers as floating point numbers, you will have to do a little bit of work; but I think it's worth it. One possible problem you will have to solve is the situation where the value after the decimal point has a different number of digits: converting (a,b) into float isn't as simple as "a + b/100" when b > 100... "exercise for the student"?
A little benchmarking: The above code took about 0.4 seconds. By comparison, my first suggestion with textread took about 9 seconds on the same file; and your original code took a little over 11 seconds. The difference may get bigger when the file gets bigger.
If you do this a lot (as you said), it clearly is worth converting your files once to binary format, and using them that way. Especially if the file needs to be converted only once, and read many times, the savings will be considerable.
update
I repeated the benchmark with a 13M line file. The conversion took 13 seconds, the binary read < 3 seconds. By contrast each of the other two methods took over a minute (textscan: 61s; fscanf: 77s). It seems that things are scaling linearly (file size 470M text, 240M binary)