octave/matlab - convert string into matrix of unique words - matlab

In Octave, I want to convert a string into a matrix of strings. Say I have a string:
s = "one two three one one four five two three five four"
I want to split it into a matrix so that it looks like:
one
two
three
four
five
With duplicates removed.
This code:
words = strsplit(s, ",") %Split the string s using the delimiter ',' and return a cell string array of substrings
Just creates a matrix words into exactly the same as s.
How to I convert my string into a matrix of unique words?

The following will also accomplish that:
unique(regexp(string, '[A-z]*', 'match'))
or, alternatively,
unique(regexp(s, '\s', 'split'))
Basically the same as Werner's solution, but it saves a temporary and is more flexible when more complicated matches need to be made.

On matlab:
string = 'one two three one one four five two three five four'
% Convert it to a cell string:
cell_string = strread(string,'%s');
% Now get the unique values:
unique_strings=unique(cell_string,'stable')
If you want char array with the unique values separated with spaces, add the following lines:
unique_strings_with_spaces=cellfun(#(input) [input ' '],unique_strings,'UniformOutput',false) % Add a space to each cell
final_unique_string = cell2mat(unique_strings_with_spaces') % Concatenate cells
final_unique_string = final_unique_string(1:end-1) % Remove white space
Output:
'one two three four five'

words=unique(strsplit('one two three one one four five two three five four',' '))
words =
'five' 'four' 'one' 'three' 'two'

Related

Pull specific cells out of a cell array by comparing the last digit of their filename

I have a cell array of filenames - things like '20160303_144045_4.dat', '20160303_144045_5.dat', which I need to separate into separate arrays by the last digit before the '.dat'; one cell array of '...4.dat's, one of '...5.dat's, etc.
My code is below; it uses regex to split the file around the '.dat', reshapes a bit then regexes again to pull out the last number of the filename, builds a cell to store the filenames in then, and then I get a tad stuck. I have an array produced such as '1,0,1,0,1,0..' of required cell indexes which I thought might be trivial to pull out, but I'm struggling to get it to do what I want.
numFiles = length(sampleFile); %sampleFile is the input cell array
splitFiles = regexp(sampleFile,'.dat','split');
column = vertcat(splitFiles{:});
column = column(:,1);
splitNums = regexp(column,'_','split');
splitNums = splitNums(:,1);
column = vertcat(splitNums{:});
column = column(:,3);
column = cellfun(#str2double,column); %produces column array of values - 3,4,3,4,3,4, etc
uniqueVals = unique(column);
numChannels = length(uniqueVals);
fileNameCell = cell(ceil(numFiles/numChannels),numChannels);
for i = 1:numChannels
column(column ~= uniqueVals(i)) = 0;
column = column / uniqueVals(i); %e.g. 1,0,1,0,1,0
%fileNameCell(i)
end
I feel there should be an easier way than my hodge-podge of code, and I don't want to throw together a ton of messy for-loops if I can avoid it; I definitely believe I've overcomplicated this problem massively.
We can neaten your code quite a bit.
Take some example data:
files = {'abc4.dat';'abc5.dat';'def4.dat';'ghi4.dat';'abc6.dat';'def5.dat';'nonum.dat'};
You can get the final numbers using regexp and matching one or more digits followed by '.dat', then using strrep to remove the '.dat'.
filenums = cellfun(#(r) strrep(regexp(r, '\d+.dat', 'match', 'once'), '.dat', ''), ...
files, 'uniformoutput', false);
Now we can put these in a structure, using the unique numbers (prefixed by a letter because fields can't start with numbers) as field names.
% Get unique file numbers and set up the output struct
ufilenums = unique(filenums);
filestruct = struct;
% Loop over file numbers
for ii = 1:numel(ufilenums)
% Get files which have this number
idx = cellfun(#(r) strcmp(r, ufilenums{ii}), filenums);
% Assign the identified files to their struct field
filestruct.(['x' ufilenums{ii}]) = files(idx);
end
Now you have a neat output
% Files with numbers before .dat given a field in the output struct
filestruct.x4 = {'abc4.dat' 'def4.dat' 'ghi4.dat'}
filestruct.x5 = {'abc5.dat' 'def5.dat'}
filestruct.x6 = {'abc6.dat'}
% Files without numbers before .dat also captured
filestruct.x = {'nonum.dat'}

How to separate the elements of a matrix with comma in Matlab

I would like to separate each element in the matrix below with a comma.
1 2 3
4 5 6
7 8 9
Here's my attempt:
s= sprintf('%.17g,',matrix)
Ouput=1,2,3,4,5,6,7,8,9,
Desired output:
1, 2, 3
4, 5, 6
7, 8, 9
Thanks in advance for your suggestions.
You just need to specify the formatting of the entire first line:
s = sprintf('%.17g, %.17g, %.17g\n',matrix.')
MATLAB keeps re-using the formatting string as long as there are elements left in matrix.
To generalize this process, use the following expression:
s = sprintf([strjoin(repmat({'%.17g'},1,size(matrix,2)), ', ') '\n'], matrix.')
So there's a lot going on in this one line - let's unpack it from inside out:
repmat({'%.17g'},1,size(matrix,2))
This sub-expression takes a single cell array of size 1x1, containing the string %.17g, and duplicates it into a cell array with dimensions specified by the next two arguments. We want to construct a cell array with a single row (hence the argument 1) representing all the format specifiers (%...) we need. Since we want one instance of %.17g for each column, we use size(matrix,2) as the last argument to repmat, since that returns the number of columns of the matrix.
As an example, if you have 5 columns, you get this:
>> repmat({'%.17g'},1,5)
ans =
'%.17g' '%.17g' '%.17g' '%.17g' '%.17g'
Next, since you want columns delimited by commas and spaces, you can use strjoin():
>> strjoin(repmat({'%.17g'},1,5), ', ')
ans =
%.17g, %.17g, %.17g, %.17g, %.17g
Note the use of a comma and several spaces as the second argument (the delimiting string) to strjoin(). Adjust the number of spaces according to your display needs. We need one more thing to be able to print a multi-line matrix - a carriage return. To do this, we use the fact that two strings in square brackets [] are concatenated by MATLAB:
[strjoin(repmat({'%.17g'},1,size(matrix,2)), ', ') '\n']
This produces the final formatting string that we need. All that is left, is to add the sprintf and pass in the matrix argument. As Rijul Sudhir pointed out, you do have to transpose your matrix because MATLAB will walk down a column to pair the matrix elements with the format specifiers.
EDIT: Stewie Griffin was correct about the transpose operation (.') - code has been corrected.

How to display selected entries of an array of structures in MATLAB

Suppose we have an array of structure. The structure has fields: name, price and cost.
Suppose the array A has size n x 1. If I'd like to display the names of the 1st, 3rd and the 4th structure, I can use the command:
A([1,3,4]).name
The problem is that it prints the following thing on screen:
ans =
name_of_item_1
ans =
name_of_item_3
ans =
name_of_item
How can I remove those ans = things? I tried:
disp(A([1,3,4]).name);
only to get an error/warning.
By doing A([1,3,4]).name, you are returning a comma-separated list. This is equivalent to typing in the following in the MATLAB command prompt:
>> A(1).name, A(3).name, A(4).name
That's why you'll see the MATLAB command prompt give you ans = ... three times.
If you want to display all of the strings together, consider using strjoin to join all of the names together and we can separate the names by a comma. To do this, you'll have to place all of these in a cell array. Let's call this cell array names. As such, if we did this:
names = {A([1,3,4]).name};
This is the same as doing:
names = {A(1).name, A(3).name, A(4).name};
This will create a 1 x 3 cell array of names and we can use these names to join them together by separating them with a comma and a space:
names = {A([1,3,4]).name};
out = strjoin(names, ', ');
You can then show what this final string looks like:
disp(out);
You can use:
[A([1,3,4]).name]
which will, however, concatenate all of the names into a single string.
The better way is to make a cell array using:
{ A([1,3,4]).name }

How do I compute the number of times a character appears in a character array in MATLAB? [duplicate]

This question already has answers here:
how to count unique elements of a cell in matlab?
(2 answers)
Closed 7 years ago.
I want to determine the number of times a character appears in a character array, excluding the time it appears at the last position.
How would I do this?
In Matlab computing environment, all variables are arrays, and strings are of type char (character arrays). So your Character Array is actually a string (Or in reality the other way around). Which means you can apply string methods on it to achieve your results. To find total count of occurrence of a character except on last place in a String/Character Array named yourStringVar you can do this:
YourSubString = yourStringVar(1:end-1)
//Now you have substring of main string in variable named YourSubString without the last character because you wanted to ignore it
numberOfOccurrence = length(find(YourSubString=='Character you want to count'))
It has been pointed out by Ray that length(find()) is not a good approach due to various reasons. Alternatively you could do:
numberOfOccurrence = nnz(YourSubString == 'Character you want to count')
numberOfOccurrence will give you your desired result.
What you can do is map each character into a unique integer ID, then determine the count of each character through histcounts. Use unique to complete the first step. The first output of unique will give you a list of all possible unique characters in your string. If you want to exclude the last time each character occurs in the string, just subtract 1 from the total count. Assuming S is your character array:
%// Get all unique characters and assign them to a unique ID
[unq,~,id] = unique(S);
%// Count up how many times we see each character and subtract by 1
counts = histcounts(id) - 1;
%// Show table of occurrences with characters
T = table(cellstr(unq(:)), counts.', 'VariableNames', {'Character', 'Counts'});
The last piece of code displays everything in a nice table. We ensure that the unique characters are placed as individual cells in a cell array.
Example:
>> S = 'ABCDABABCDEFFGACEG';
Running the above code, we get:
>> T
T =
Character Counts
_________ ______
'A' 3
'B' 2
'C' 2
'D' 1
'E' 1
'F' 1
'G' 1

Function to split string in matlab and return second number

I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.
(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));
If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str