Is there a quick way to assign unique text entries in an array a number? - matlab

In MatLab, I have several data vectors that are in text. For example:
speciesname = [species1 species2 species3];
genomelength = [8 10 5];
gonometype = [RNA DNA RNA];
I realise that to make a plot, arrays must be numerical. Is there a quick and easy way to assign unique entries in an array a number, for example so that RNA = 1 and DNA = 2? Note that some arrays might not be binary (i.e. have more than two options).
Thanks!

So there is a quick way to do it, but im not sure that your plots will be very intelligible if you use numbers instead of words.
You can make a unique array like this:
u = unique(gonometype);
and make a corresponding number array is just 1:length(u)
then when you go through your data the number of the current word will be:
find(u == current_name);
For your particular case you will need to utilize cells:
gonometype = {'RNA', 'DNA', 'RNA'};
u = unique(gonometype)
u =
'DNA' 'RNA'
current = 'RNA';
find(strcmp(u, current))
ans =
2

Related

Most efficient way to store dictionaries in Matlab

I have a set of IDs associated with costs which is just a double value. IDs are integers and unique. Two IDs may have same costs. I stored them as:-
a=containers.Map('KeyType','uint32','ValueType','double');
a(1)=7.3
a(2)=8.4
a(3)=7.3
Now i want to find the minimum cost.
b=[];
c=values(a);
b=[b,c{:}];
cost_min=min(b);
Now i want to find all IDs associated i.e. 1 and 3 with the minimum cost i.e. 7.3. I can collect all the keys into an array and then do a for loop over this array. Is there a better way to do this entire thing in Matlab so that for loops are not required?
sparse matrix can work as a hashmap, just do this:
a= sparse(1:3,1,[7.3 8.4 7.3])
find(a == min(nonzeros(a))
There are methods which can be used on maps for this kind of operations
http://se.mathworks.com/help/matlab/ref/containers.map-class.html
The approach finding minimum values and minimum keys can be done something like this,
a=containers.Map('KeyType','uint32','ValueType','double');
a(1)=7.3;
a(3)=8.4;
a(4)=7.3;
minval = inf;
minkeys = -1;
for k = keys(a)
val = a.values(k);
val = val{1};
if (val < minval(1))
minkeys = k;
minval = val;
elseif (val == minval(1))
minkeys = [minkeys,k];
end
end
disp(minval);
disp(minkeys);
This is not efficient though and value search is clumsy for maps. This is not what they are intended for. Maps is supposed to do efficient key lookup. In case you are going to do a lot of lookups and this is what takes time, then use a map. If you need to do a lot of value searches, I would recommend that you use a matrix (or two arrays) for this instead.
idx = [1;3;4];
val = [7.3,8.3,7.3];
minval = min(val);
minidx = idx(val==minval);
disp(minval);
disp(minidx);
There is also another post with an example where it is shown how a sparse matrix can be used as a hashmap. Let the index become the key. This will take about 3 times the memory as all non-zero elements an ordinary array, but a map uses more memory than an array as well.

I'm having trouble shuffling a deck of cards in Matlab. Need help to see where I went wrong

I currently have the deck of cards coded, but it is unshuffled. This is for programming the card game of War if it helps. I need to shuffle the deck, but whenever I do, it will only shuffle together the card numbers and the suits, not the full card. For example, I have A identified as an ace and the suits come after each number. A normal card would be "AH" (an ace of hearts) or "6D" (a six of diamonds). Instead, it will output "5A" as one of the cards, as in a 5 of aces. I don't know how to fix this, but the code that I currently have is this:
card_nums = ('A23456789TJQK')';
card_suits = ('HDSC')';
unshuffled_deck = [repmat(card_nums,4,1),repmat(card_suits,13,1)];
disp(unshuffled_deck)
shuffled_deck = unshuffled_deck(randperm(numel(unshuffled_deck)));
disp(shuffled_deck)
I would appreciate any help with this, and thank you very much for your time!
You're creating a random permutation of all of the elements from both columns of unshuffled_deck combined. Instead you need to create a random permutation of the rows of unshuffled_deck:
shuffled_deck = unshuffled_deck(randperm(size(unshuffled_deck,1)),:);
The call to size gives you the number of rows in the deck array, then we get a random permutation of the row indices, and copy the row (value, suit) as a single entity.
Here's a version using a structure array in response to #Carl Witthoft's comment. I was afraid it would add too much complexity to the solution, but it really isn't bad:
card_nums = ('A23456789TJQK')';
card_suits = ('HDSC')';
deck_nums = repmat(card_nums,4,1);
deck_suits = repmat(card_suits,13,1);
cell_nums = cellstr(deck_nums).'; %// Change strings to cell arrays...
cell_suits = cellstr(deck_suits).'; %// so we can use them in struct
%// Construct a struct array with fields 'value' and 'suit'
unshuffled_deck = struct('value',cell_nums,'suit',cell_suits);
disp('unshuffled deck:');
disp([unshuffled_deck.value;unshuffled_deck.suit]);
%// Shuffle the deck using the number of elements in the structure array
shuffled_deck = unshuffled_deck(randperm(numel(unshuffled_deck)));
disp('shuffled deck:');
disp([shuffled_deck.value; shuffled_deck.suit]);
Here's a test run:
unshuffled deck:
A23456789TJQKA23456789TJQKA23456789TJQKA23456789TJQK
HDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSC
shuffled deck:
4976TT93KTJQJATK953A75QA82Q6226K5J784J4A3372486K859Q
CHSSSHCDSCSSHDDCDSHHCDHSDDCDHCCHHCHHHDDCSCDSSCHDSCSD
To access an individual card, you can do:
>> shuffled_deck(2)
ans =
scalar structure containing the fields:
value = 9
suit = H
Or you can access the individual fields:
>> shuffled_deck(2).value
ans = 9
>> shuffled_deck(2).suit
ans = H
Unfortunately, I don't know of any way to simply index the struct array and get, for instance, 9H as you would in a regular array using disp(shuffled_deck(2,:)). In this case, the only option I know of is to explicitly concatenate each field:
disp([shuffled_deck(2).value,shuffled_deck(2).suit]);

Complementary array Matlab

We've got an array of values, and we would like to create another array whose values are not in the first one.
Example:
load('internet.mat')
The first column contains the values in MBs, we have thought in something like:
MB_no = setdiff(v, internet(:,1))
where v is a 0 vector whose length equals to the number of rows in internet.mat. But it just doesn't work.
So, how do we do this?
You need to specify the range of possible values to define what values are not in internet . Say the range is v = 1:10 then setdiff(v,internet(:,1)) will give you the values in 1:10 that are not in the first column of internet.
It seems as if you don't want the first column.
You can simply do:
MB_no=internet(:,2:end);
assuming internet(:,1) has only positive integers and you wish to find which are the integers in [1,...,max( internet(:,1) )] that do not appear in that range you can simply do
app = [];
app( internet(:,1) ) = 1;
MB_no = find( app == 0 );
This is somewhat like bucket sort.

How to extract year from a dates cell array in MATLAB?

i have a cell array as below, which are dates. I am wondering how can i extract the year at the last 4 digits? Could anyone teach me how to locate the year in the string? Thank you!
'31.12.2001'
'31.12.2000'
'31.12.2004'
'31.12.2003'
'31.12.2002'
'31.12.2000'
'31.12.1999'
'31.12.1998'
'31.12.1997'
'31.12.2005'
'31.12.2004'
'31.12.2003'
'31.12.2002'
'31.12.2001'
'31.12.2000'
'31.12.1999'
'31.12.1998'
'31.12.2005'
'31.12.2004'
'31.12.2003'
'31.12.2002'
'31.12.2005'
Example cell array:
A = {'31.12.2001'; '31.12.2002'; '31.12.2003'};
Apply some regular expressions:
B = regexp(A, '\d\d\d\d', 'match')
B = [B{:}];
EDIT: I never realized that matlab will "nest" an extra layer of cells until I tested this. I don't like this solution as much now that I know the second line is necessary. Here is an alternative approach that gets you the years in numeric form:
C = datevec(A, 'dd.mm.yyyy');
C = C(:, 1);
SECOND EDIT: Suprisingly, if your cell array has less than 10000 elements, the regexp approach is faster on my machine. But the output of it is another cell array (which takes up much more memory than a numeric matrix). You can use B = cell2mat(B) to get a character array instead, but this brings the two approaches to approximately equal efficiency.
Just to add a fun answer, designed to take the OP to the stranger regions of Matlab:
C = char(C);
y = (D(:,7:end)-'0') * 10.^(3:-1:0).'
which is an order of magnitude faster than anything posted in the other answers :)
Or, to stay a bit closer to home,
y = cellfun(#(x)str2double(x(7:end)),C);
or, yet another regexp variation:
y = str2num(char(regexprep(C, '\d+\.\d+\.','')));
Assuming your matrix with dates is M or a cell array C:
In case your data is in a cell array start with
M = cell2mat(C)
Then get the relevant part
Y=M(:,end-4:end)
If required you can even make the year a number
Year = str2num(Y)
Using regexp this will works also with dates with slightly different formats, like 1.1.2000, which can mess with you offsets
res = regexp(dates, '(?<=\d+\.\d+\.)\d+', 'match')

Extract field of struct array to new array

I have a struct, which has 2 fields: time and pose. I have multiple instances of this struct composed in an array, so an example of this is:
poses(1)
-time = 1
-pose = (doesn't Matter)
poses(2)
-time = 2
-pose = (doesn't Matter)
poses(3)
-time = 3
-pose = (doesn't Matter)
...
Now when I print this:
poses.time
I get this:
ans =
1
ans =
2
ans =
3
How can I take that output and put it into a vector?
Use brackets:
timevec=[poses.time];
tricky matlab, I know I know, you'll just have to remember this one if you're working with structs ;)
For cases that the field values are vectors (of same size) and you need the result in a matrix form:
posmat = cell2mat({poses.pose}');
That returns each pose vector in a different row of posmat.