Convert string array to cell - matlab

I need to output a cell to an excel file. Before this I need to convert a date column of integers to datestrings. I know how to do this, but I am not able to put this new string array back into the cell -
mycell = { 'AIR' [780] [1] [734472] [0.01] ; ...
'ABC' [780] [1] [734472] [0.02]}
I did this -->
dates = datestr(cell2mat(mycell(:,4))) ;
What I need as an answer is:
{'AIR' [780] [1] '14-Dec-2010' [0.01] ;
'ABC' [780] [1] '23-Dec-2010' [0.03] ; }
so that I can now send it to an excel file using xlswrite.m

mycell = { 'AIR' 780 1 734472 0.01] ; ...
'ABC' 780 1 734472 0.02]}
mycell(:,4) = cellstr(datestr(cell2mat(mycell(:,4))))
mycell =
'AIR' [780] [1] '30-Nov-2010' [0.01]
'ABC' [780] [1] '30-Nov-2010' [0.02]

One alternative that avoids the conversions is to use the function CELLFUN:
mycell(:,4) = cellfun(#datestr,mycell(:,4),'UniformOutput',false);
%# Or an alternative format...
mycell(:,4) = cellfun(#(d) {datestr(d)},mycell(:,4));
Both of the above give the following result for your sample cell array:
mycell =
'AIR' [780] [1] '30-Nov-2010' [0.0100]
'ABC' [780] [1] '30-Nov-2010' [0.0200]

Related

What the best way to build a dictionary (word count) for NLP in matlab?

I have a frequency count dictionary, I want to be able to read the frequency count to a given word in my dictonary.
for example
my input word is 'about' ,so the output will be the count of 'about' in my dictionary, which 139 to be able to calculate the probability.
139 about
133 according
163 accusing
244 actually
567 afternoon
175 again
156 ah
167 a-ha
165 ahh
I tried do this with fopen method, but not getting the wanted result.
1 fid = fopen('dictionary.txt');
2 words = textscan(fid, '%s');
3 fclose(fid);
4 words = words{1};
I tried this as well, but getting different result,
countfunction = #(word) nnz(strcmp(word, words));
count = cellfun(countfunction, words);
tally = [words num2cell(count)];
sortrows(tally, 2);
The problem is that you're running countfunction for each instance of each word in the dictionary, rather than each unique word in the dictionary.
Here's how to incrementally improve your code:
words = {'hi' 'hi' 'the' 'hi' 'the' 'a'};
unique_words = unique(words(:));
countfunction = #(word) nnz(strcmp(word, words));
count = cellfun(countfunction, unique_words);
tally = [unique_words, num2cell(count)];
disp(sortrows(tally, 2));
'a' [1]
'the' [2]
'hi' [3]
However, I'd recommend using grpstats instead:
words = {'hi' 'hi' 'the' 'hi' 'the' 'a'};
[unique_words, count] = grpstats(ones(size(words)), words(:), {'gname', 'numel'});
tally = [unique_words, num2cell(count)];
disp(sortrows(tally, 2));
'a' [1]
'the' [2]
'hi' [3]

Print all elements of a struct

I am just approaching Matlab, I have a function with a struct:
function [out] = struct1()
Account(1).name = 'John';
Account(1).number = 321;
Account(1).type = 'Current';
%.......2 to 9
Account(10).name = 'Denis';
Account(10).number = 123;
Account(10).type = 'Something';
for ii= 1:10
out=fprintf('%s\n','%d\n','%s\n',Account{ii}.name, Account{ii}.number,Account{ii}.type);
end
end
The above code gives me an error: "Cell contents reference from a non-cell array object."
How do I output all elements of such struct to get this output using "fprintf"?
name: 'John'
number: 321
type: 'Current'
...... 2 to 9
name: 'Denis'
number: 123
type: 'Something'
You are indexing the elements of the struct array with { and } which are only used for cell arrays. Simple ( and ) will work just fine.
Also, since you have the line breaks in the formatspec, you should just combine all three strings together.
Example:
formatspec = 'name: %s\nnumber: %d\ntype: %s\n';
for ii= 1:10
out=fprintf(formatspec,Account(ii).name,Account(ii).number,Account(ii).type);
end

extracting data from excel to matlab

Suppose i have an excel file (data.xlsx) , which contains the following data.
Name age
Tom 43
Dick 24
Harry 32
Now i want to extract the data from it and make 2 cell array (or matrix) which shall contain
name = ['Tom' ; 'Dick';'Harry'] age = [43;24;32]
i have used xlsread(data.xlsx) , but its only extracting the numerical values ,but i want to obtain both as mentioned above . Please help me out
You have to use additional output arguments from xlread in order to get the text.
I created a dummy Excel file with your data and here is the output (nevermind about the NaNs):
[ndata, text, alldata] = xlsread('DummyExcel.xlsx')
ndata =
43
24
32
text =
'Name' 'Age'
'Tom' ''
'Dick' ''
'Harry' ''
alldata =
[NaN] 'Name' 'Age'
[NaN] 'Tom' [ 43]
[NaN] 'Dick' [ 24]
[NaN] 'Harry' [ 32]
Now if you use this:
text{2:end,1}
you get
ans =
Tom
ans =
Dick
ans =
Harry
You can use the function called importdata.
Example:
%Import Data
filename = 'yourfilename.xlsx';
delimiterIn = ' ';
headerlinesIn = 1;
A = importdata(filename,delimiterIn,headerlinesIn);
This will help to take both the text data and numerical data. Textdata will be under A.textdata and numerical data will be under A.data.

MATLAB: textscan to parse irregular text, trouble debugging format specifier

I've been browsing stack overflow and the mathworks website trying to come up with a solution for reading an irregularly formatted text file into MATLAB using textscan but have yet to figure out a good solution.
The format of the text file looks as such:
// Reference=MNI
// Citation=Beauregard M, 1998
// Condition=Primed - Unprimed Semantic Category Decision
// Domain=Semantics
// Modality=Visual
// Subjects=13
-55 -25 -23
33 -9 -20
// Citation=Beauregard M, 1998
// Condition=Unprimed Semantic Category Decision - Baseline
// Domain=Semantics
// Modality=Visual
// Subjects=13
0 -73 9
-25 -59 47
0 -14 59
8 -18 63
-21 -90 -11
-24 -4 62
24 -93 -6
-21 15 47
-35 -26 -21
9 13 44
// Citation=Binder J R, 1996
// Condition=Words > Tones - Passive
// Domain=Language Perception
// Modality=Auditory
// Subjects=12
-58.73 -12.05 -4.61
I would like to end up with a cell array that looks like this
{nx3 double} {nx1 cellstr} {nx1 cellstr} {nx1 cellstr} {nx1 double}
Where the first element in the array are the 3d coordinates, the second element the citation, the third element the condition, the fourth element the domain, the fifth element the modality and the sixth element the number of subjects.
I would then like to use these cell array to organize the data into a structure to allow for easy indexing of the coordinates by each of the features I extracted from the text file.
I've tried a bunch of things but have only been able to extract out the coordinates as a string and the feature as a single cell array.
Here is how far I have gotten after searching through stack overflow and the mathworks website:
fid = fopen(fullfile(path2proj,path2loc),'r');
data = textscan(fid,'%s %s %s','HeaderLines',1,...
'delimiter',{...
sprintf('// '),...
'Citation=',...
'Condition=',...
'Domain=',...
'Modality=',...
'Subjects='});
I get the following output with this code:
data =
{16470x1 cell} {16470x1 cell} {16470x1 cell}
data{1}(1:20)
ans =
''
''
''
''
''
'-55 -25 -23'
'33 -9 -20'
''
''
''
''
''
'0 -73 9'
'-25 -59 47'
'0 -14 59'
'8 -18 63'
'-21 -90 -11'
'-24 -4 62'
'24 -93 -6'
'-21 15 47'
data{2}(1:20)
ans =
''
''
''
''
''
''
''
''
''
''
''
''
''
''
''
''
''
''
''
''
data{3}(1:20)
ans =
'Beauregard M, 1998'
'Primed - Unprimed Semantic Category Decision'
'Semantics'
'Visual'
'13'
''
''
'Beauregard M, 1998'
'Unprimed Semantic Category Decision - Baseline'
'Semantics'
'Visual'
'13'
''
''
''
''
''
''
''
''
Although I can work with the data in this format, it would be nice to understand how to correctly right a format specifier to extract out piece of data into it's own cell array. Does anyone have any dieas?
Assuming that Reference is only in the first line, you could do the following to obtained the values you want from each section Citation section.
% read the file and split it into sections based on Citation
filecontents = strsplit(fileread('data.txt'), '// Citation');
% iterate through section and extract desired info from each
% section. We start from i=2, as for i=1 we have 'Reference' line.
for i = 2:numel(filecontents)
lines = regexp(filecontents{i}, '\n', 'split');
% remove empty lines
lines(find(strcmp(lines, ''))) = [];
% get values of the fields
citation = lines{1};
condition = get_value(lines{2}, 'Condition');
domain = get_value(lines{3}, 'Domain');
modality = get_value(lines{4}, 'Modality');
subjects = get_value(lines{5}, 'Subjects');
coordinates = cellfun(#str2num, lines(6:end), 'UniformOutput', 0)';
% now you can save in some global cell,
% display or process the extracted values as you please.
end
where get_value is:
function value = get_value(line, search_for)
[tokens, ~] = regexp(line, [search_for, '=(.+)'],'tokens','match');
value = tokens{1};
Hope this helps.

Replace strings with integer IDs in a Cell - Matlab

I have a cell that has string IDs. I need to replace them with integer IDs so that the cell can be transformed into a matrix. I especially need this to be a vectorized operation as the celldata is huge.
celldata = { 'AAPL' [0.1] ; 'GOOG' [0.643] ; 'IBM' [0.435] ; 'MMM' [0.34] ; 'AAPL' [0.12] ; 'GOOG' [1.5] ; 'IBM' [0.75] ; 'AAPL' [0.56] ; 'GOOG' [0.68] ; 'IBM' [0.97] ; };
I designed a sequential intID:
intIDs = {'AAPL' [1] ; 'GOOG' [2] ; 'IBM' [3] ; 'MMM' [4]};
intIDs contain ALL IDs that are possible in celldata. Also, celldata has IDs in sequential order and grouper together by dates. The date column is not shown here.
Desired result:
celldata = {[1] [0.1] ; [2] [0.643] ; [3] [0.435] ; [4] [0.34] ; [1] [0.12] ; [2] [1.5] ; [3] [0.75] ; [1] [0.56] ; [2] [0.68] ; [3] [0.97] ;};
Thanks!
You can use the ismember function and logical indexing to achieve what you want.
[~,indx]=ismember(celldata(:,1),intIDs(:,1));
celldata(:,1)=intIDs(indx,2)
celldata =
[1] [0.1000]
[2] [0.6430]
[3] [0.4350]
[4] [0.3400]
[1] [0.1200]
[2] [1.5000]
[3] [0.7500]
[1] [0.5600]
[2] [0.6800]
[3] [0.9700]