Look for multiple values in a cell array at the same time in Matlab - matlab

I have CellArray1 with 50 unique strings and CellArray2 with 2000 unique strings (50 of which are the same as the ones in CellArray1). Is there a way to find the positions of all 50 unique strings from the first cell array in the second cell array without using loops?

Yes - the following code demonstrates this:
cellArray1 = {'hello', 'world'};
cellArray2 = {'good', 'morning', 'world'};
overlap = find(ismember(cellArray2, cellArray1)};
This will return the value 3 in overlap since cellArray2{3} appears in cellArray1.
UPDATE
The above code returns the indices, but not in the order of the original. If you need the original order, you can do the following
overlap = cellfun(#(x)find(ismember(cellArray2, x)), cellArray1, 'uniformOutput', false);
overlapSorted = cell2mat(overlap);
It could be argued that cellfun actually has an implicit loop in it (but then all vector operations have implicit loops, really); but one of these constructions will do what you asked for. If you don't need it sorted, the first will be significantly faster, I imagine.

Related

Most efficient way to store dictionaries in Matlab

I have a set of IDs associated with costs which is just a double value. IDs are integers and unique. Two IDs may have same costs. I stored them as:-
a=containers.Map('KeyType','uint32','ValueType','double');
a(1)=7.3
a(2)=8.4
a(3)=7.3
Now i want to find the minimum cost.
b=[];
c=values(a);
b=[b,c{:}];
cost_min=min(b);
Now i want to find all IDs associated i.e. 1 and 3 with the minimum cost i.e. 7.3. I can collect all the keys into an array and then do a for loop over this array. Is there a better way to do this entire thing in Matlab so that for loops are not required?
sparse matrix can work as a hashmap, just do this:
a= sparse(1:3,1,[7.3 8.4 7.3])
find(a == min(nonzeros(a))
There are methods which can be used on maps for this kind of operations
http://se.mathworks.com/help/matlab/ref/containers.map-class.html
The approach finding minimum values and minimum keys can be done something like this,
a=containers.Map('KeyType','uint32','ValueType','double');
a(1)=7.3;
a(3)=8.4;
a(4)=7.3;
minval = inf;
minkeys = -1;
for k = keys(a)
val = a.values(k);
val = val{1};
if (val < minval(1))
minkeys = k;
minval = val;
elseif (val == minval(1))
minkeys = [minkeys,k];
end
end
disp(minval);
disp(minkeys);
This is not efficient though and value search is clumsy for maps. This is not what they are intended for. Maps is supposed to do efficient key lookup. In case you are going to do a lot of lookups and this is what takes time, then use a map. If you need to do a lot of value searches, I would recommend that you use a matrix (or two arrays) for this instead.
idx = [1;3;4];
val = [7.3,8.3,7.3];
minval = min(val);
minidx = idx(val==minval);
disp(minval);
disp(minidx);
There is also another post with an example where it is shown how a sparse matrix can be used as a hashmap. Let the index become the key. This will take about 3 times the memory as all non-zero elements an ordinary array, but a map uses more memory than an array as well.

I'm having trouble shuffling a deck of cards in Matlab. Need help to see where I went wrong

I currently have the deck of cards coded, but it is unshuffled. This is for programming the card game of War if it helps. I need to shuffle the deck, but whenever I do, it will only shuffle together the card numbers and the suits, not the full card. For example, I have A identified as an ace and the suits come after each number. A normal card would be "AH" (an ace of hearts) or "6D" (a six of diamonds). Instead, it will output "5A" as one of the cards, as in a 5 of aces. I don't know how to fix this, but the code that I currently have is this:
card_nums = ('A23456789TJQK')';
card_suits = ('HDSC')';
unshuffled_deck = [repmat(card_nums,4,1),repmat(card_suits,13,1)];
disp(unshuffled_deck)
shuffled_deck = unshuffled_deck(randperm(numel(unshuffled_deck)));
disp(shuffled_deck)
I would appreciate any help with this, and thank you very much for your time!
You're creating a random permutation of all of the elements from both columns of unshuffled_deck combined. Instead you need to create a random permutation of the rows of unshuffled_deck:
shuffled_deck = unshuffled_deck(randperm(size(unshuffled_deck,1)),:);
The call to size gives you the number of rows in the deck array, then we get a random permutation of the row indices, and copy the row (value, suit) as a single entity.
Here's a version using a structure array in response to #Carl Witthoft's comment. I was afraid it would add too much complexity to the solution, but it really isn't bad:
card_nums = ('A23456789TJQK')';
card_suits = ('HDSC')';
deck_nums = repmat(card_nums,4,1);
deck_suits = repmat(card_suits,13,1);
cell_nums = cellstr(deck_nums).'; %// Change strings to cell arrays...
cell_suits = cellstr(deck_suits).'; %// so we can use them in struct
%// Construct a struct array with fields 'value' and 'suit'
unshuffled_deck = struct('value',cell_nums,'suit',cell_suits);
disp('unshuffled deck:');
disp([unshuffled_deck.value;unshuffled_deck.suit]);
%// Shuffle the deck using the number of elements in the structure array
shuffled_deck = unshuffled_deck(randperm(numel(unshuffled_deck)));
disp('shuffled deck:');
disp([shuffled_deck.value; shuffled_deck.suit]);
Here's a test run:
unshuffled deck:
A23456789TJQKA23456789TJQKA23456789TJQKA23456789TJQK
HDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSC
shuffled deck:
4976TT93KTJQJATK953A75QA82Q6226K5J784J4A3372486K859Q
CHSSSHCDSCSSHDDCDSHHCDHSDDCDHCCHHCHHHDDCSCDSSCHDSCSD
To access an individual card, you can do:
>> shuffled_deck(2)
ans =
scalar structure containing the fields:
value = 9
suit = H
Or you can access the individual fields:
>> shuffled_deck(2).value
ans = 9
>> shuffled_deck(2).suit
ans = H
Unfortunately, I don't know of any way to simply index the struct array and get, for instance, 9H as you would in a regular array using disp(shuffled_deck(2,:)). In this case, the only option I know of is to explicitly concatenate each field:
disp([shuffled_deck(2).value,shuffled_deck(2).suit]);

find value in a string of cell considering some margin

Suppose that I have a string of values corresponding to the height of a group of people
height_str ={'1.76000000000000';
'1.55000000000000';
'1.61000000000000';
'1.71000000000000';
'1.74000000000000';
'1.79000000000000';
'1.74000000000000';
'1.86000000000000';
'1.72000000000000';
'1.82000000000000';
'1.72000000000000';
'1.63000000000000'}
and a single height value.
height_val = 177;
I would like to find the indices of the people that are in the range height_val +- 3cm.
To find the exact match I would do like this
[idx_height,~]=find(ismember(cell2mat(height_str),height_val/100));
How can I include the matches in the previous range (174-180)?
idx_height should be = [1 5 6 7]
You can convert you strings into an numeric array (as #Divakar mentioned) by
height = str2num(char(height_str))*100; % in cm
Then just
idx_height = find(height>=height_val-3 & height<=height_val+3);
Assuming that the precision of heights stays at 0.01cm, you can use a combination of str2double and ismember for a one-liner -
idx_height = find(ismember(str2double(height_str)*100,[height_val-3:height_val+3]))
The magic with str2double is that it works directly with cell arrays to get us a numeric array without resorting to a combined effort of converting that cell array to a char array and then to a numeric array.
After the use of str2double, we can use ismember as you tried in your problem to get us the matches as a logical array, whose indices are picked up with find. That's the whole story really.
Late addition, but for binning my first choice would be to go with bsxfun and logical operations:
idx_height = find(bsxfun(#le,str2double(height_str)*100,height_val+3) & ...
bsxfun(#ge,str2double(height_str)*100,height_val-3))

Is there a quick way to assign unique text entries in an array a number?

In MatLab, I have several data vectors that are in text. For example:
speciesname = [species1 species2 species3];
genomelength = [8 10 5];
gonometype = [RNA DNA RNA];
I realise that to make a plot, arrays must be numerical. Is there a quick and easy way to assign unique entries in an array a number, for example so that RNA = 1 and DNA = 2? Note that some arrays might not be binary (i.e. have more than two options).
Thanks!
So there is a quick way to do it, but im not sure that your plots will be very intelligible if you use numbers instead of words.
You can make a unique array like this:
u = unique(gonometype);
and make a corresponding number array is just 1:length(u)
then when you go through your data the number of the current word will be:
find(u == current_name);
For your particular case you will need to utilize cells:
gonometype = {'RNA', 'DNA', 'RNA'};
u = unique(gonometype)
u =
'DNA' 'RNA'
current = 'RNA';
find(strcmp(u, current))
ans =
2

How to extract year from a dates cell array in MATLAB?

i have a cell array as below, which are dates. I am wondering how can i extract the year at the last 4 digits? Could anyone teach me how to locate the year in the string? Thank you!
'31.12.2001'
'31.12.2000'
'31.12.2004'
'31.12.2003'
'31.12.2002'
'31.12.2000'
'31.12.1999'
'31.12.1998'
'31.12.1997'
'31.12.2005'
'31.12.2004'
'31.12.2003'
'31.12.2002'
'31.12.2001'
'31.12.2000'
'31.12.1999'
'31.12.1998'
'31.12.2005'
'31.12.2004'
'31.12.2003'
'31.12.2002'
'31.12.2005'
Example cell array:
A = {'31.12.2001'; '31.12.2002'; '31.12.2003'};
Apply some regular expressions:
B = regexp(A, '\d\d\d\d', 'match')
B = [B{:}];
EDIT: I never realized that matlab will "nest" an extra layer of cells until I tested this. I don't like this solution as much now that I know the second line is necessary. Here is an alternative approach that gets you the years in numeric form:
C = datevec(A, 'dd.mm.yyyy');
C = C(:, 1);
SECOND EDIT: Suprisingly, if your cell array has less than 10000 elements, the regexp approach is faster on my machine. But the output of it is another cell array (which takes up much more memory than a numeric matrix). You can use B = cell2mat(B) to get a character array instead, but this brings the two approaches to approximately equal efficiency.
Just to add a fun answer, designed to take the OP to the stranger regions of Matlab:
C = char(C);
y = (D(:,7:end)-'0') * 10.^(3:-1:0).'
which is an order of magnitude faster than anything posted in the other answers :)
Or, to stay a bit closer to home,
y = cellfun(#(x)str2double(x(7:end)),C);
or, yet another regexp variation:
y = str2num(char(regexprep(C, '\d+\.\d+\.','')));
Assuming your matrix with dates is M or a cell array C:
In case your data is in a cell array start with
M = cell2mat(C)
Then get the relevant part
Y=M(:,end-4:end)
If required you can even make the year a number
Year = str2num(Y)
Using regexp this will works also with dates with slightly different formats, like 1.1.2000, which can mess with you offsets
res = regexp(dates, '(?<=\d+\.\d+\.)\d+', 'match')