I have a cell that has string IDs. I need to replace them with integer IDs so that the cell can be transformed into a matrix. I especially need this to be a vectorized operation as the celldata is huge.
celldata = { 'AAPL' [0.1] ; 'GOOG' [0.643] ; 'IBM' [0.435] ; 'MMM' [0.34] ; 'AAPL' [0.12] ; 'GOOG' [1.5] ; 'IBM' [0.75] ; 'AAPL' [0.56] ; 'GOOG' [0.68] ; 'IBM' [0.97] ; };
I designed a sequential intID:
intIDs = {'AAPL' [1] ; 'GOOG' [2] ; 'IBM' [3] ; 'MMM' [4]};
intIDs contain ALL IDs that are possible in celldata. Also, celldata has IDs in sequential order and grouper together by dates. The date column is not shown here.
Desired result:
celldata = {[1] [0.1] ; [2] [0.643] ; [3] [0.435] ; [4] [0.34] ; [1] [0.12] ; [2] [1.5] ; [3] [0.75] ; [1] [0.56] ; [2] [0.68] ; [3] [0.97] ;};
Thanks!
You can use the ismember function and logical indexing to achieve what you want.
[~,indx]=ismember(celldata(:,1),intIDs(:,1));
celldata(:,1)=intIDs(indx,2)
celldata =
[1] [0.1000]
[2] [0.6430]
[3] [0.4350]
[4] [0.3400]
[1] [0.1200]
[2] [1.5000]
[3] [0.7500]
[1] [0.5600]
[2] [0.6800]
[3] [0.9700]
Related
I'm getting stuck in what's happening here. This is my understanding, so, C{1} has a column of strings. something like this:
A231
A354
A356
A234
.
.
pattern continues until the end
ids then gets a copy of that column, and idmlh becomes the second element of the cell array which is a matrix in this case. Then an empty array is created in idsCo and idx. Then it goes through all the rows in the column of ids and checks if what is in that row is found in another data structure which has similar dimensions to ids, parIDs. So this is where the first confusion comes in,
if it isnt a member then it stores the index value in idx ? And if it is a member then what happens exactly?
Im most uncertain about this part:
else
[~,~,ii] = intersect(ids{cnt}, parIDs) ;
idsCo = [idsCo ; Lbll(ii) ] ;
end
end
ids(idx) = [] ;
idmlh(idx,:) = [] ;
Below is the full code:
ids = C{1} ;
idmlh = C{2} ;
idsCo = [] ;
idx = [] ; label for
for cnt=1:length(ids)
if ~ismember(strtrim(ids{cnt}), parIDs)
idx = [idx cnt] ;
else
[~,~,ii] = intersect(ids{cnt}, parIDs) ;
idsCo = [idsCo ; Lbl(ii) ] ;
end
end
ids(idx) = [] ;
idmlh(idx,:) = [] ;
1: ids = C{1} ;
2: idmlh = C{2} ;
3: idsCo = [] ;
4: idx = [] ; label for
5: for cnt=1:length(ids)
6: if ~ismember(strtrim(ids{cnt}), parIDs)
7: idx = [idx cnt] ;
8: else
9: [~,~,ii] = intersect(ids{cnt}, parIDs) ;
10: idsCo = [idsCo ; Lbl(ii) ] ;
11: end
12: end
13: ids(idx) = [] ;
14: idmlh(idx,:) = [] ;
It is essential to know what strtrim and intersect do...
In the loop over all elements of ids it checks whether the cnsth element is part of parIDs array.
If it is not (~ismember is evaluated as true) then actual cnt value is appended to idx array.
If it is member (~ismember is evaluated as false) then it asks what elements of ids{cnt} (1x1) and parIDs (Nx1) are in both arrays. Values of the element is ignored ([~,...), index of this element in first array is also ignored (...,~,...). Index of the element in second array is assigned to ii (...,ii]=).
Then the iith element of Lbl is appended to idsCo array.
In the end the elements of ids and their counterparts in idmlh that aren't in parIDs are replaced by empty array.
I have a 50000 * 2 cell with number contents. Now I want to replace the second column which has numbers ranging from 1 to 10 with corresponding strings like 'airplane' for 1, 'automobile' for 2 and so on. What is the most efficient method for this?
I tried first by splitting the second column content to a new cell classes1 and coverted it to strings and tried replacing by applying the code below :
classes1(strcmp('1',classes1))={'airplane'};
classes1(strcmp('2',classes1))={'automobile'};
classes1(strcmp('3',classes1))={'bird'};
classes1(strcmp('4',classes1))={'cat'};
classes1(strcmp('5',classes1))={'deer'};
classes1(strcmp('6',classes1))={'dog'};
classes1(strcmp('7',classes1))={'frog'};
classes1(strcmp('8',classes1))={'horse'};
classes1(strcmp('9',classes1))={'ship'};
classes1(strcmp('10',classes1))={'truck'};
But that was not successfull. It only replaced '10' with 'truck'.
UPDATE : This code will actually work. But in my case strings ' 1' has to be used instead '1' (a space was missing).
Use this to extend to your big case -
%%// Create look up and numeral data cell arrays for demo
LOOKUP_CELL_ARRAY = {'airplane','automobile','chopper'};
IN_CELL_ARRAY = num2cell(round(1+2.*rand(10,2)))
%%// Replace the second column of data cell array with corresponding
%%// strings in the look up array
IN_CELL_ARRAY(:,2)= LOOKUP_CELL_ARRAY(cell2mat(IN_CELL_ARRAY(:,2)))
Output -
IN_CELL_ARRAY =
[2] [2]
[2] [2]
[2] [1]
[2] [2]
[3] [1]
[2] [3]
[1] [1]
[3] [3]
[2] [2]
[2] [3]
IN_CELL_ARRAY =
[2] 'automobile'
[2] 'automobile'
[2] 'airplane'
[2] 'automobile'
[3] 'airplane'
[2] 'chopper'
[1] 'airplane'
[3] 'chopper'
[2] 'automobile'
[2] 'chopper'
You can do it as follows with cellfun:
% replacement strings
R = {'airplane','automobile','bird','cat','deer', ...
'dog','frog','horse','ship','truck'};
% example data
nums = randi(10,100,1);
data(:,1) = num2cell(nums)
data(:,2) = cellstr(num2str(nums))
data =
[ 3] ' 3'
[ 1] ' 1'
[ 1] ' 1'
[ 8] ' 8'
[ 8] ' 8'
[ 8] ' 8'
[ 7] ' 7'
[ 9] ' 9'
[ 1] ' 1'
...
str2double(x) does not care about whether its '01' or '1':
% replicate number strings with strings
data(:,2) = cellfun(#(x) R( str2double(x) ), data(:,2) )
data =
[ 3] 'bird'
[ 1] 'airplane'
[ 1] 'airplane'
[ 8] 'horse'
[ 8] 'horse'
[ 8] 'horse'
[ 7] 'frog'
[ 9] 'ship'
[ 1] 'airplane'
...
You can do it just with indexing:
data = {'aa' 1
'bb' 3
'cc' 2
'dd' 6
'ee' 1
'ff' 5}; %// example data: two-col cell array, 2nd col is numbers
str = {'airplane','automobile','bird','cat','deer', ...
'dog','frog','horse','ship','truck'}; %// replacement strings
data(:,2) = str(vertcat(data{:,2})); %// do the replacing
I have an array of words:
x=['ae' ; 'be' ; 'ce' ; 'de' ; 'ee' ; 'fe']
I would like to extract sets of characters. So assume each set has N = 2 words, how can I go about getting return values that look like this
'ae' 'be'
'be' 'ce'
'ce' 'de'
'de' 'ee'
'ee' 'fe'
So if N = 2, I get back a matrix where each row contains pairs of the current and previous characters. If N=3 i will get back current and previous 2 chars for each row. I want to avoid loops if possible.
Any ideas?
You can use the Circulant Matrix Maltlab provides, truncate it as needed and use it as an index vector:
x = {'ae' ; 'be' ; 'ce' ; 'de' ; 'ee' ; 'fe'}
N = 3;
n = numel(x);
A = gallery('circul',n:-1:1)
B = fliplr( A(1:n-N+1,n-N+1:end) )
result = x(B)
or a little shorter:
A = fliplr( gallery('circul',n:-1:1) )
result = x( A(1:n-N+1,1:n-N) )
or another option using the hankel-Matrix:
A = hankel(1:n,1:N)
result = x( A(1:n-N+1,:) )
gives:
result =
'ae' 'be' 'ce'
'be' 'ce' 'de'
'ce' 'de' 'ee'
'de' 'ee' 'fe'
I have a matrix generated from the program written in Matlab something like this :
'A' 'B' 'C' 'D' 'E'
[ 4] [ 1] [ 0.9837] [ 0.9928] [0.9928]
[ 4] [ 1] [ 0.9995] [ 0.9887] [0.9995]
[ 4] [ 1] [ 0.9982] [ 0.9995] [0.9995]
[ 4] [ 1] [ 0.9959] [ 0.9982] [0.9887]
I am trying to extract the column 'D' without the header 'D'.
I can put into a temporary variable and then extract the column data. But I am wondering, if it could be done in a single step.
Thanks
If your variable is data, then data(2:end,4) should do it.
Edit:
For example:
>> data
data =
'A' 'B' 'C' 'D' 'E'
[4] [1] [0.9837] [0.9928] [0.9928]
[4] [1] [0.9995] [0.9887] [0.9995]
[4] [1] [0.9982] [0.9995] [0.9995]
[4] [1] [0.9959] [0.9982] [0.9887]
>> data(2:end,4) %Extract the data as a cell array
ans =
[0.9928]
[0.9887]
[0.9995]
[0.9982]
>> cell2mat(data(2:end,4)) %Convert to a numeric (typical) array
ans =
0.9928
0.9887
0.9995
0.9982
I need to output a cell to an excel file. Before this I need to convert a date column of integers to datestrings. I know how to do this, but I am not able to put this new string array back into the cell -
mycell = { 'AIR' [780] [1] [734472] [0.01] ; ...
'ABC' [780] [1] [734472] [0.02]}
I did this -->
dates = datestr(cell2mat(mycell(:,4))) ;
What I need as an answer is:
{'AIR' [780] [1] '14-Dec-2010' [0.01] ;
'ABC' [780] [1] '23-Dec-2010' [0.03] ; }
so that I can now send it to an excel file using xlswrite.m
mycell = { 'AIR' 780 1 734472 0.01] ; ...
'ABC' 780 1 734472 0.02]}
mycell(:,4) = cellstr(datestr(cell2mat(mycell(:,4))))
mycell =
'AIR' [780] [1] '30-Nov-2010' [0.01]
'ABC' [780] [1] '30-Nov-2010' [0.02]
One alternative that avoids the conversions is to use the function CELLFUN:
mycell(:,4) = cellfun(#datestr,mycell(:,4),'UniformOutput',false);
%# Or an alternative format...
mycell(:,4) = cellfun(#(d) {datestr(d)},mycell(:,4));
Both of the above give the following result for your sample cell array:
mycell =
'AIR' [780] [1] '30-Nov-2010' [0.0100]
'ABC' [780] [1] '30-Nov-2010' [0.0200]