How to extract year from a dates cell array in MATLAB? - matlab

i have a cell array as below, which are dates. I am wondering how can i extract the year at the last 4 digits? Could anyone teach me how to locate the year in the string? Thank you!
'31.12.2001'
'31.12.2000'
'31.12.2004'
'31.12.2003'
'31.12.2002'
'31.12.2000'
'31.12.1999'
'31.12.1998'
'31.12.1997'
'31.12.2005'
'31.12.2004'
'31.12.2003'
'31.12.2002'
'31.12.2001'
'31.12.2000'
'31.12.1999'
'31.12.1998'
'31.12.2005'
'31.12.2004'
'31.12.2003'
'31.12.2002'
'31.12.2005'

Example cell array:
A = {'31.12.2001'; '31.12.2002'; '31.12.2003'};
Apply some regular expressions:
B = regexp(A, '\d\d\d\d', 'match')
B = [B{:}];
EDIT: I never realized that matlab will "nest" an extra layer of cells until I tested this. I don't like this solution as much now that I know the second line is necessary. Here is an alternative approach that gets you the years in numeric form:
C = datevec(A, 'dd.mm.yyyy');
C = C(:, 1);
SECOND EDIT: Suprisingly, if your cell array has less than 10000 elements, the regexp approach is faster on my machine. But the output of it is another cell array (which takes up much more memory than a numeric matrix). You can use B = cell2mat(B) to get a character array instead, but this brings the two approaches to approximately equal efficiency.

Just to add a fun answer, designed to take the OP to the stranger regions of Matlab:
C = char(C);
y = (D(:,7:end)-'0') * 10.^(3:-1:0).'
which is an order of magnitude faster than anything posted in the other answers :)
Or, to stay a bit closer to home,
y = cellfun(#(x)str2double(x(7:end)),C);
or, yet another regexp variation:
y = str2num(char(regexprep(C, '\d+\.\d+\.','')));

Assuming your matrix with dates is M or a cell array C:
In case your data is in a cell array start with
M = cell2mat(C)
Then get the relevant part
Y=M(:,end-4:end)
If required you can even make the year a number
Year = str2num(Y)

Using regexp this will works also with dates with slightly different formats, like 1.1.2000, which can mess with you offsets
res = regexp(dates, '(?<=\d+\.\d+\.)\d+', 'match')

Related

I'm having trouble shuffling a deck of cards in Matlab. Need help to see where I went wrong

I currently have the deck of cards coded, but it is unshuffled. This is for programming the card game of War if it helps. I need to shuffle the deck, but whenever I do, it will only shuffle together the card numbers and the suits, not the full card. For example, I have A identified as an ace and the suits come after each number. A normal card would be "AH" (an ace of hearts) or "6D" (a six of diamonds). Instead, it will output "5A" as one of the cards, as in a 5 of aces. I don't know how to fix this, but the code that I currently have is this:
card_nums = ('A23456789TJQK')';
card_suits = ('HDSC')';
unshuffled_deck = [repmat(card_nums,4,1),repmat(card_suits,13,1)];
disp(unshuffled_deck)
shuffled_deck = unshuffled_deck(randperm(numel(unshuffled_deck)));
disp(shuffled_deck)
I would appreciate any help with this, and thank you very much for your time!
You're creating a random permutation of all of the elements from both columns of unshuffled_deck combined. Instead you need to create a random permutation of the rows of unshuffled_deck:
shuffled_deck = unshuffled_deck(randperm(size(unshuffled_deck,1)),:);
The call to size gives you the number of rows in the deck array, then we get a random permutation of the row indices, and copy the row (value, suit) as a single entity.
Here's a version using a structure array in response to #Carl Witthoft's comment. I was afraid it would add too much complexity to the solution, but it really isn't bad:
card_nums = ('A23456789TJQK')';
card_suits = ('HDSC')';
deck_nums = repmat(card_nums,4,1);
deck_suits = repmat(card_suits,13,1);
cell_nums = cellstr(deck_nums).'; %// Change strings to cell arrays...
cell_suits = cellstr(deck_suits).'; %// so we can use them in struct
%// Construct a struct array with fields 'value' and 'suit'
unshuffled_deck = struct('value',cell_nums,'suit',cell_suits);
disp('unshuffled deck:');
disp([unshuffled_deck.value;unshuffled_deck.suit]);
%// Shuffle the deck using the number of elements in the structure array
shuffled_deck = unshuffled_deck(randperm(numel(unshuffled_deck)));
disp('shuffled deck:');
disp([shuffled_deck.value; shuffled_deck.suit]);
Here's a test run:
unshuffled deck:
A23456789TJQKA23456789TJQKA23456789TJQKA23456789TJQK
HDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSC
shuffled deck:
4976TT93KTJQJATK953A75QA82Q6226K5J784J4A3372486K859Q
CHSSSHCDSCSSHDDCDSHHCDHSDDCDHCCHHCHHHDDCSCDSSCHDSCSD
To access an individual card, you can do:
>> shuffled_deck(2)
ans =
scalar structure containing the fields:
value = 9
suit = H
Or you can access the individual fields:
>> shuffled_deck(2).value
ans = 9
>> shuffled_deck(2).suit
ans = H
Unfortunately, I don't know of any way to simply index the struct array and get, for instance, 9H as you would in a regular array using disp(shuffled_deck(2,:)). In this case, the only option I know of is to explicitly concatenate each field:
disp([shuffled_deck(2).value,shuffled_deck(2).suit]);

find value in a string of cell considering some margin

Suppose that I have a string of values corresponding to the height of a group of people
height_str ={'1.76000000000000';
'1.55000000000000';
'1.61000000000000';
'1.71000000000000';
'1.74000000000000';
'1.79000000000000';
'1.74000000000000';
'1.86000000000000';
'1.72000000000000';
'1.82000000000000';
'1.72000000000000';
'1.63000000000000'}
and a single height value.
height_val = 177;
I would like to find the indices of the people that are in the range height_val +- 3cm.
To find the exact match I would do like this
[idx_height,~]=find(ismember(cell2mat(height_str),height_val/100));
How can I include the matches in the previous range (174-180)?
idx_height should be = [1 5 6 7]
You can convert you strings into an numeric array (as #Divakar mentioned) by
height = str2num(char(height_str))*100; % in cm
Then just
idx_height = find(height>=height_val-3 & height<=height_val+3);
Assuming that the precision of heights stays at 0.01cm, you can use a combination of str2double and ismember for a one-liner -
idx_height = find(ismember(str2double(height_str)*100,[height_val-3:height_val+3]))
The magic with str2double is that it works directly with cell arrays to get us a numeric array without resorting to a combined effort of converting that cell array to a char array and then to a numeric array.
After the use of str2double, we can use ismember as you tried in your problem to get us the matches as a logical array, whose indices are picked up with find. That's the whole story really.
Late addition, but for binning my first choice would be to go with bsxfun and logical operations:
idx_height = find(bsxfun(#le,str2double(height_str)*100,height_val+3) & ...
bsxfun(#ge,str2double(height_str)*100,height_val-3))

Is there a quick way to assign unique text entries in an array a number?

In MatLab, I have several data vectors that are in text. For example:
speciesname = [species1 species2 species3];
genomelength = [8 10 5];
gonometype = [RNA DNA RNA];
I realise that to make a plot, arrays must be numerical. Is there a quick and easy way to assign unique entries in an array a number, for example so that RNA = 1 and DNA = 2? Note that some arrays might not be binary (i.e. have more than two options).
Thanks!
So there is a quick way to do it, but im not sure that your plots will be very intelligible if you use numbers instead of words.
You can make a unique array like this:
u = unique(gonometype);
and make a corresponding number array is just 1:length(u)
then when you go through your data the number of the current word will be:
find(u == current_name);
For your particular case you will need to utilize cells:
gonometype = {'RNA', 'DNA', 'RNA'};
u = unique(gonometype)
u =
'DNA' 'RNA'
current = 'RNA';
find(strcmp(u, current))
ans =
2

MatLab cells date format conversion

I have a lot of dates in MatLab (over 2 millions). Al these dates are in a cell array in 'yyyymmdd' format, and I want to convert them to 'yyyy-mm-dd' format and put this result in a cell array (not in a char matrix).
I know that I can use
temp = datestr(datenum(datesArray,'yyyymmdd'),'yyyy-mm-dd'),
and then use
mat2cell(temp, ones(1,n),10),
where n is the number of rows of datesArray (in this case approximately 2 millions) in order to get my result, but this approach is very slow.
So, I want to know a different way to do that.
Regards.
You could avoid for loops by using cellfun, let's say your date cell array is
dates = {'20120101', '20120102', '20120103'}
You can then convert them to your format as
cellfun(#(x)[x(1:4),'-',x(5:6),'-',x(7:8)], dates, 'Uniform', false)
Hope that helps.
If your date format is always "yyyymmdd" and it's in a linear cell array called datesArray, you could maybe do it by accessing the strings in datesArray and transforming them by inserting hyphens and concatenating the string.
for i=1:length(datesArray)
newDatesArray{i} = [datesArray{i}(1:4), '-', datesArray{i}(5:6), '-', datesArray{i}(7:8)];
end
Transform your dates into serial one and keep them! However, here's a solution:
% Create dummy dates (takes 10 seconds on my pc)
tic;d = cellstr(datestr(now-2e5+1:now,'yyyymmdd'));toc
% Convert to char, then concatenate with '-' and back to `cellstr()` (1 sec):
c = char(d);
dash = repmat('-',2e5,1);
c = cellstr([c(:,1:4) dash c(:,5:6) dash c(:,7:8)]);
So here my solution, which i think is quite nice!
dates = {'20120101', '20120102', '20120103'}
And you can convert using this :
cellfun(#(x)regexprep(num2str(x), '(?<=\d{4})\d{2}', '-$0'),dates,'Uniform',false)
The answer is similar to radarhead, but it uses the regexprep function instead.

Look for multiple values in a cell array at the same time in Matlab

I have CellArray1 with 50 unique strings and CellArray2 with 2000 unique strings (50 of which are the same as the ones in CellArray1). Is there a way to find the positions of all 50 unique strings from the first cell array in the second cell array without using loops?
Yes - the following code demonstrates this:
cellArray1 = {'hello', 'world'};
cellArray2 = {'good', 'morning', 'world'};
overlap = find(ismember(cellArray2, cellArray1)};
This will return the value 3 in overlap since cellArray2{3} appears in cellArray1.
UPDATE
The above code returns the indices, but not in the order of the original. If you need the original order, you can do the following
overlap = cellfun(#(x)find(ismember(cellArray2, x)), cellArray1, 'uniformOutput', false);
overlapSorted = cell2mat(overlap);
It could be argued that cellfun actually has an implicit loop in it (but then all vector operations have implicit loops, really); but one of these constructions will do what you asked for. If you don't need it sorted, the first will be significantly faster, I imagine.