Prefix match in MATLAB - matlab

Hey guys, I have a very simple problem in MATLAB:
I have some strings which are like this:
Pic001
Pic002
Pic003
004
Not every string starts with the prefix "Pic". So how can I cut off the part "pic" that only the numbers at the end shall remain to have an equal format for all my strings?
Greets, poeschlorn

If 'Pic' only ever occurs as a prefix in your strings and nowhere else within the strings then you could use STRREP to remove it like this:
>> x = {'Pic001'; 'Pic002'; 'Pic003'; '004'}
x =
'Pic001'
'Pic002'
'Pic003'
'004'
>> x = strrep(x, 'Pic', '')
x =
'001'
'002'
'003'
'004'
If 'Pic' can occur elsewhere in your strings and you only want to remove it when it occurs as a prefix then use STRNCMP to compare the first three characters of your strings:
>> x = {'Pic001'; 'Pic002'; 'Pic003'; '004'}
x =
'Pic001'
'Pic002'
'Pic003'
'004'
>> for ii = find(strncmp(x, 'Pic', 3))'
x{ii}(1:3) = [];
end
>> x
x =
'001'
'002'
'003'
'004'

strings = {'Pic001'; 'Pic002'; 'Pic003'; '004'};
numbers = regexp(strings, '(PIC)?(\d*)','match');
for cc = 1:length(numbers);
fprintf('%s\n', char(numbers{cc}));
end;

Related

how to read broken numbers on two lines in a text file in matlab?

how to read broken numbers on two lines in matlab?
I am generating some results in text files that are being broken into two lines. Example:
text x = 1.
2345 text
What would a code look like to read the value x = 1.2345?
Suppose the value of x = 1.2345 is in file named name.txt.
When it doesn't break the values I'm looking for:
text x = 1.2345 text
I use the following (working) code:
buffer = fileread('name.txt') ;
search = 'x = ' ;
local = strfind(buffer, search);
xvalue = sscanf(buffer(local(1,1)+numel(search):end), '%f', 1);
You can remove line breaks (and other "white space", if needed) before parsing the string:
>> str = sprintf('text x = 1.\n2345 text')
str =
'text x = 1.
2345 text'
>> str = regexprep(str, '\n', '')
str =
'text x = 1.2345 text'

Excel Range object throws error 0x800A03EC because of a string longer than 255 characters

Using an ActiveX server from MATLAB, I am trying to highlight many cells in an Excel sheet at once. These are not in specific columns or rows so I use Range('A1,B2,...') to access them. However the string accepted by the Range object has to be less than 255 characters or an error:
Error: Object returned error code: 0x800A03EC
is thrown. The following code reproduces this error with an empty Excel file.
hActX = actxserver('Excel.Application');
hWB = hActX.Workbooks.Open('C:\Book1.xlsx');
hSheet = hWB.Worksheets.Item('Sheet1');
col = repmat('A', 100, 1);
row = num2str((1:100)'); %'
cellInd = strcat(col, strtrim(cellstr(row)));
str1 = strjoin(cellInd(1:66), ','); %// 254 characters
str2 = strjoin(cellInd(1:67), ','); %// 258 characters
hSheet.Range(str1).Interior.Color = 255; %// Works
hSheet.Range(str2).Interior.Color = 255; %// Error 0x800A03EC
hWB.Save;
hWB.Close(false);
hActX.Quit;
How can I get around this? I found no other relevant method of calling Range, or of otherwise getting the cells I want to modify.
If you start with a String, you can test its length to determine if Range() can handle it. Here is an example of building a diagonal range:
Sub DiagonalRange()
Dim BigString As String, BigRange As Range
Dim i As Long, HowMany As Long, Ln As String
HowMany = 100
For i = 1 To HowMany
BigString = BigString & "," & Cells(i, i).Address(0, 0)
Next i
BigString = Mid(BigString, 2)
Ln = Len(BigString)
MsgBox Ln
If Ln < 250 Then
Set BigRange = Range(BigString)
Else
Set BigRange = Nothing
arr = Split(BigString, ",")
For Each a In arr
If BigRange Is Nothing Then
Set BigRange = Range(a)
Else
Set BigRange = Union(BigRange, Range(a))
End If
Next a
End If
BigRange.Select
End Sub
For i = 10, the code will the the direct method, but if the code were i=100, the array method would be used.
The solution, as Rory pointed out, is to use the Union method. To minimize the number of calls from MATLAB to the ActiveX server, this is what I did:
str = strjoin(cellInd, ',');
isep = find(str == ',');
isplit = diff(mod(isep, 250)) < 0;
isplit = [isep(isplit) (length(str) + 1)];
hRange = hSheet.Range(str(1:(isplit(1) - 1)));
for ii = 2:numel(isplit)
hRange = hActX.Union(hRange, ...
hSheet.Range(str((isplit(ii-1) + 1):(isplit(ii) - 1))));
end
I used 250 in the mod to account for the cell names being up to 6 characters long, which is sufficient for me.

Compute the Frequency of bigrams in Matlab

I am trying to compute and plot the distribution of bigrams frequencies
First I did generate all possible bigrams which gives 1296 bigrams
then i extract the bigrams from a given file and save them in words1
my question is how to compute the frequency of these 1296 bigrams for the file a.txt?
if there are some bigrams did not appear at all in the file, then their frequencies should be zero
a.txt is any text file
clear
clc
%************create bigrams 1296 ***************************************
chars ='1234567890abcdefghijklmonpqrstuvwxyz';
chars1 ='1234567890abcdefghijklmonpqrstuvwxyz';
bigram='';
for i=1:36
for j=1:36
bigram = sprintf('%s%s%s',bigram,chars(i),chars1(j));
end
end
temp1 = regexp(bigram, sprintf('\\w{1,%d}', 1), 'match');
temp2 = cellfun(#(x,y) [x '' y],temp1(1:end-1)', temp1(2:end)','un',0);
bigrams = temp2;
bigrams = unique(bigrams);
bigrams = rot90(bigrams);
bigram = char(bigrams(1:end));
all_bigrams_len = length(bigrams);
clear temp temp1 temp2 i j chars1 chars;
%****** 1. Cleaning Data ******************************
collection = fileread('e:\a.txt');
collection = regexprep(collection,'<.*?>','');
collection = lower(collection);
collection = regexprep(collection,'\W','');
collection = strtrim(regexprep(collection,'\s*',''));
%*******************************************************
temp = regexp(collection, sprintf('\\w{1,%d}', 1), 'match');
temp2 = cellfun(#(x,y) [x '' y],temp(1:end-1)', temp(2:end)','un',0);
words1 = rot90(temp2);
%*******************************************************
words1_len = length(words1);
vocab1 = unique(words1);
vocab_len1 = length(vocab1);
[vocab1,void1,index1] = unique(words1);
frequencies1 = hist(index1,vocab_len1);
I. Character counting problem for a string
bsxfun based solution for counting characters -
counts = sum(bsxfun(#eq,[string1-0]',65:90))
Output -
counts =
2 0 0 0 0 2 0 1 0 0 ....
If you would like to get a tabulate output of counts against each letter -
out = [cellstr(['A':'Z']') num2cell(counts)']
Output -
out =
'A' [2]
'B' [0]
'C' [0]
'D' [0]
'E' [0]
'F' [2]
'G' [0]
'H' [1]
'I' [0]
....
Please note that this was a case-sensitive counting for upper-case letters.
For a lower-case letter counting, use this edit to this earlier code -
counts = sum(bsxfun(#eq,[string1-0]',97:122))
For a case insensitive counting, use this -
counts = sum(bsxfun(#eq,[upper(string1)-0]',65:90))
II. Bigram counting case
Let us suppose that you have all the possible bigrams saved in a 1D cell array bigrams1 and the incoming bigrams from the file are saved into another cell array words1. Let us also assume certain values in them for demonstration -
bigrams1 = {
'ar';
'de';
'c3';
'd1';
'ry';
't1';
'p1'}
words1 = {
'de';
'c3';
'd1';
'r9';
'yy';
'de';
'ry';
'de';
'dd';
'd1'}
Now, you can get the counts of the bigrams from words1 that are present in bigrams1 with this code -
[~,~,ind] = unique(vertcat(bigrams1,words1));
bigrams_lb = ind(1:numel(bigrams1)); %// label bigrams1
words1_lb = ind(numel(bigrams1)+1:end); %// label words1
counts = sum(bsxfun(#eq,bigrams_lb,words1_lb'),2)
out = [bigrams1 num2cell(counts)]
The output on code run is -
out =
'ar' [0]
'de' [3]
'c3' [1]
'd1' [2]
'ry' [1]
't1' [0]
'p1' [0]
The result shows that - First element ar from the list of all possible bigrams has no find in words1 ; second element de has three occurrences in words1 and so on.
Hey similar to Dennis solution you can just use histc()
string1 = 'ASHRAFF'
histc(string1,'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
this checks the number of entries in the bins defined by the string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' which is hopefully the alphabet (just wrote it fast so no garantee). The result is:
Columns 1 through 21
2 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0
Columns 22 through 26
0 0 0 0 0
Just a little modification of my solution:
string1 = 'ASHRAFF'
alphabet1='A':'Z'; %%// as stated by Oleg Komarov
data=histc(string1,alphabet1);
results=cell(2,26);
for k=1:26
results{1,k}= alphabet1(k);
results{2,k}= data(k);
end
If you look at results now you can easily check rather it works or not :D
This answer creates all bigrams, loads in the file does a little cleanup, ans then uses a combination of unique and histc to count the rows
Generate all Bigrams
note the order here is important as unique will sort the array so this way it is created presorted so the output matches expectation;
[y,x] = ndgrid(['0':'9','a':'z']);
allBigrams = [x(:),y(:)];
Read The File
this removes capitalisation and just pulls out any 0-9 or a-z character then creates a column vector of these
fileText = lower(fileread('d:\loremipsum.txt'));
cleanText = regexp(fileText,'([a-z0-9])','tokens');
cleanText = cell2mat(vertcat(cleanText{:}));
create bigrams from file by shifting by one and concatenating
fileBigrams = [cleanText(1:end-1),cleanText(2:end)];
Get Counts
the set of all bigrams is added to our set (so the values are created for all possible). Then a value ∈{1,2,...,1296} is assigned to each unique row using unique's 3rd output. Counts are then created with histc with the bins equal to the set of values from unique's output, 1 is subtracted from each bin to remove the complete set bigrams we added
[~,~,c] = unique([fileBigrams;allBigrams],'rows');
counts = histc(c,1:1296)-1;
Display
to view counts against text
[allBigrams, counts+'0']
or for something potentially more useful...
[sortedCounts,sortInd] = sort(counts,'descend');
[allBigrams(sortInd,:), sortedCounts+'0']
ans =
or9
at8
re8
in7
ol7
te7
do6 ...
Did not look into the entire code fragment, but from the example at the top of your question, I think you are looking to make a histogram:
string1 = 'ASHRAFF'
nr = histc(string1,'A':'Z')
Will give you:
2 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
(Got a working solution with hist, but as #The Minion shows histc is more easy to use here.)
Note that this solution only deals with upper case letters.
You may want to do something like so if you want to put lower case letters in their correct bin:
string1 = 'ASHRAFF'
nr = histc(upper(string1),'A':'Z')
Or if you want them to be shown separately:
string1 = 'ASHRaFf'
nr = histc(upper(string1),['a':'z' 'A':'Z'])
bi_freq1 = zeros(1,all_bigrams_len);
for k=1: vocab_len1
for i=1:all_bigrams_len
if char(vocab1(k)) == char(bigrams(i))
bi_freq1(i) = frequencies1(k);
end
end
end

nested for loop query

I am trying to do something rather simple, but can't seem to get it...
I have 3 cell-arrays with strings,
A = {'ConditionA'; 'ConditionB'; 'ConditionC'; 'ConditionD'};
B = {'Case1'; 'Case2'; 'Case3'; 'Case4'};
C = {'Rice'; 'Beans'; 'Carrots'; 'Cereal';'Tomato'; 'Cabbage';...
'Sugar'}
I want to produce a vector with the concatenated (strcat?) combinations, as it this were a "tree diagram", like:
strcat(A(1),B(1),C(1))
strcat(A(1),B(1),C(2))
strcat(A(1),B(1),C(3))
strcat(A(1),B(1),C(4))
strcat(A(1),B(1),C(5))
strcat(A(1),B(1),C(6))
strcat(A(1),B(1),C(7))
strcat(A(1),B(2),C(1))
So what the first elements I am trying to get are (in a column ideally):
ConditionACase1Rice
ConditionACase1Beans
ConditionACase1Carrots
ConditionACase1Cereal
ConditionACase1Tomato
ConditionACase1Cabbage
ConditionACase1Sugar
ConditionACase2Rice
etc etc etc...
I know that:
for i=1:length(A)
E(i) = strcat(A(i),B(1),C(1))
end
Works for one "level". I have tried:
for i=1:length(A)
for j=1:length(B)
for k=1:length(C)
P(i) = strcat(A(i),B(j),C(k));
end
end
end
But this doesn't work...
I would be really grateful if I could be helped with this.
Thanks in advance!
From what I understood, you want all possible combinations of the strings of the input arrays as specified. If so, simply replace your nested loops with the following:
P = cell(length(A)*length(B)*length(C),1);
t=1;
for i=1:length(A)
for j=1:length(B)
for k=1:length(C)
P(t) = strcat(A(i),B(j),C(k));
t = t+1;
end
end
end
For the input arrays,
>> A = {'ConditionA'; 'ConditionB'; 'ConditionC'; 'ConditionD'};
>> B = {'Case1'; 'Case2'; 'Case3'; 'Case4'};
>> C = {'Rice'; 'Beans'; 'Carrots'; 'Cereal';'Tomato'; 'Cabbage';'Sugar'};
The value of P would be:
>> P
P =
'ConditionACase1Rice'
'ConditionACase1Beans'
'ConditionACase1Carrots'
'ConditionACase1Cereal'
'ConditionACase1Tomato'
'ConditionACase1Cabbage'
'ConditionACase1Sugar'
'ConditionACase2Rice'
'ConditionACase2Beans'
'ConditionACase2Carrots'
'ConditionACase2Cereal'
'ConditionACase2Tomato'
'ConditionACase2Cabbage'
'ConditionACase2Sugar'
'ConditionACase3Rice'
'ConditionACase3Beans'
'ConditionACase3Carrots'
'ConditionACase3Cereal'
'ConditionACase3Tomato'
'ConditionACase3Cabbage'
'ConditionACase3Sugar'
'ConditionACase4Rice'
'ConditionACase4Beans'
'ConditionACase4Carrots'
'ConditionACase4Cereal'
'ConditionACase4Tomato'
'ConditionACase4Cabbage'
'ConditionACase4Sugar'
'ConditionBCase1Rice'
'ConditionBCase1Beans'
'ConditionBCase1Carrots'
'ConditionBCase1Cereal'
'ConditionBCase1Tomato'
'ConditionBCase1Cabbage'
'ConditionBCase1Sugar'
'ConditionBCase2Rice'
'ConditionBCase2Beans'
'ConditionBCase2Carrots'
'ConditionBCase2Cereal'
'ConditionBCase2Tomato'
'ConditionBCase2Cabbage'
'ConditionBCase2Sugar'
'ConditionBCase3Rice'
'ConditionBCase3Beans'
'ConditionBCase3Carrots'
'ConditionBCase3Cereal'
'ConditionBCase3Tomato'
'ConditionBCase3Cabbage'
'ConditionBCase3Sugar'
'ConditionBCase4Rice'
'ConditionBCase4Beans'
'ConditionBCase4Carrots'
'ConditionBCase4Cereal'
'ConditionBCase4Tomato'
'ConditionBCase4Cabbage'
'ConditionBCase4Sugar'
'ConditionCCase1Rice'
'ConditionCCase1Beans'
'ConditionCCase1Carrots'
'ConditionCCase1Cereal'
'ConditionCCase1Tomato'
'ConditionCCase1Cabbage'
'ConditionCCase1Sugar'
'ConditionCCase2Rice'
'ConditionCCase2Beans'
'ConditionCCase2Carrots'
'ConditionCCase2Cereal'
'ConditionCCase2Tomato'
'ConditionCCase2Cabbage'
'ConditionCCase2Sugar'
'ConditionCCase3Rice'
'ConditionCCase3Beans'
'ConditionCCase3Carrots'
'ConditionCCase3Cereal'
'ConditionCCase3Tomato'
'ConditionCCase3Cabbage'
'ConditionCCase3Sugar'
'ConditionCCase4Rice'
'ConditionCCase4Beans'
'ConditionCCase4Carrots'
'ConditionCCase4Cereal'
'ConditionCCase4Tomato'
'ConditionCCase4Cabbage'
'ConditionCCase4Sugar'
'ConditionDCase1Rice'
'ConditionDCase1Beans'
'ConditionDCase1Carrots'
'ConditionDCase1Cereal'
'ConditionDCase1Tomato'
'ConditionDCase1Cabbage'
'ConditionDCase1Sugar'
'ConditionDCase2Rice'
'ConditionDCase2Beans'
'ConditionDCase2Carrots'
'ConditionDCase2Cereal'
'ConditionDCase2Tomato'
'ConditionDCase2Cabbage'
'ConditionDCase2Sugar'
'ConditionDCase3Rice'
'ConditionDCase3Beans'
'ConditionDCase3Carrots'
'ConditionDCase3Cereal'
'ConditionDCase3Tomato'
'ConditionDCase3Cabbage'
'ConditionDCase3Sugar'
'ConditionDCase4Rice'
'ConditionDCase4Beans'
'ConditionDCase4Carrots'
'ConditionDCase4Cereal'
'ConditionDCase4Tomato'
'ConditionDCase4Cabbage'
'ConditionDCase4Sugar'
Let me know if you need further assistance.
I am not really familiar with matlab.. but maybe try something like this?
for A = {'ConditionA'; 'ConditionB'; 'ConditionC'; 'ConditionD'};
for B = {'Case1'; 'Case2'; 'Case3'; 'Case4'};
for C = {'Rice'; 'Beans'; 'Carrots'; 'Cereal';'Tomato'; 'Cabbage'; 'Sugar'}
P(i) = strcat(A(i),B(j),C(k));
end
end
end
{
x = 1;
for i=1:length(A)
for j=1:length(B)
for k=1:length(C)
P(x) = strcat(A(i),B(j),C(k));
x = x + 1;
end
end
end
}
Please basically check your code before posting to PO as this is a very simple debugging

Recursive concatenation of Matlab structs

Is it somehow possible to concatenate two matlab structures recursively without iterating over all leaves of one of the structures.
For instance
x.a=1;
x.b.c=2;
y.b.d=3;
y.a = 4 ;
would result in the following
res = mergeStructs(x,y)
res.a=4
res.b.c=2
res.b.d=3
The following function works for your particular example. There will be things it doesn't consider, so let me know if there are other cases you want it to work for and I can update.
function res = mergeStructs(x,y)
if isstruct(x) && isstruct(y)
res = x;
names = fieldnames(y);
for fnum = 1:numel(names)
if isfield(x,names{fnum})
res.(names{fnum}) = mergeStructs(x.(names{fnum}),y.(names{fnum}));
else
res.(names{fnum}) = y.(names{fnum});
end
end
else
res = y;
end
Then res = mergeStructs(x,y); gives:
>> res.a
ans =
4
>> res.b
ans =
c: 2
d: 3
as you require.
EDIT: I added isstruct(x) && to the first line. The old version worked fine because isfield(x,n) returns 0 if ~isstruct(x), but the new version is slightly faster if y is a big struct and ~isstruct(x).