Find two maximum points - matlab

I have a two arrays that I plot: A a 1x101 vector and B the same
A = [0.140673450903833 0.143148937279028 0.145430952171596 0.147474938627147 0.149581060870114 0.151187105347571 0.152646348246015 0.153892222566265 0.154913060187075 0.155701930397674 0.156253328260122 0.156562551841967 0.156625533585493 0.156438787610539 0.155999394209637 0.155304997895017 0.154353810555534 0.153144616301392 0.151676776486360 0.149950234280632 0.147965519042205 0.145723755995511 0.143229676241241 0.140476287800831 0.137475884805212 0.134228713435530 0.130738812449387 0.127010778531129 0.123049766057659 0.118861478234099 0.114452155847321 0.109828564345449 0.104997979409803 0.0999681710919947 0.0947473865690700 0.0893443315667412 0.0837681505026921 0.0780284054055798 0.0721350536699391 0.0660984247128685 0.0599291956061627 0.0536383657701255 0.0472372308396036 0.0407373558685518 0.0341505481893574 0.0274888307202605 0.0207644183921863 0.0139897104812690 0.00717740846258673 0.000358034181698980 0.00651349557333709 0.0133637955715171 0.0202018404602034 0.0270147225489001 0.0337898204971252 0.0405146415132138 0.0471768260462406 0.0537641715916784 0.0602646603043279 0.0666664873507057 0.0729580891146200 0.0791281709097673 0.0851657340195109 0.0910601019446384 0.0968009457657087 0.102378308539557 0.107782628657363 0.113004762097380 0.118036003510261 0.122868106079509 0.127493300104313 0.131904310257409 0.136094371477732 0.140057243469708 0.143787223810476 0.147279159770258 0.150528459504324 0.153531108836772 0.156280444813554 0.158783035106175 0.161027296288627 0.163014562505352 0.164743731117677 0.166214276765471 0.167426257040343 0.168380310331524 0.169077651806683 0.169520068571722 0.169709914896378 0.169650109087113 0.169344135453180 0.168796059816963 0.168010582212876 0.166993205517562 0.165750858213848 0.164295206012858 0.162692813100379 0.160590402150861 0.158550181408264 0.156271984944015 0.153800366335689]
B = [-2 -1.96000000000000 -1.92000000000000 -1.88000000000000 -1.84000000000000 -1.80000000000000 -1.76000000000000 -1.72000000000000 -1.68000000000000 -1.64000000000000 -1.60000000000000 -1.56000000000000 -1.52000000000000 -1.48000000000000 -1.44000000000000 -1.40000000000000 -1.36000000000000 -1.32000000000000 -1.28000000000000 -1.24000000000000 -1.20000000000000 -1.16000000000000 -1.12000000000000 -1.08000000000000 -1.04000000000000 -1 -0.960000000000000 -0.920000000000000 -0.880000000000000 -0.840000000000000 -0.800000000000000 -0.760000000000000 -0.720000000000000 -0.680000000000000 -0.640000000000000 -0.600000000000000 -0.560000000000000 -0.520000000000000 -0.480000000000000 -0.440000000000000 -0.400000000000000 -0.360000000000000 -0.320000000000000 -0.280000000000000 -0.240000000000000 -0.200000000000000 -0.160000000000000 -0.120000000000000 -0.0800000000000001 -0.0400000000000000 0 0.0400000000000000 0.0800000000000001 0.120000000000000 0.160000000000000 0.200000000000000 0.240000000000000 0.280000000000000 0.320000000000000 0.360000000000000 0.400000000000000 0.440000000000000 0.480000000000000 0.520000000000000 0.560000000000000 0.600000000000000 0.640000000000000 0.680000000000000 0.720000000000000 0.760000000000000 0.800000000000000 0.840000000000000 0.880000000000000 0.920000000000000 0.960000000000000 1 1.04000000000000 1.08000000000000 1.12000000000000 1.16000000000000 1.20000000000000 1.24000000000000 1.28000000000000 1.32000000000000 1.36000000000000 1.40000000000000 1.44000000000000 1.48000000000000 1.52000000000000 1.56000000000000 1.60000000000000 1.64000000000000 1.68000000000000 1.72000000000000 1.76000000000000 1.80000000000000 1.84000000000000 1.88000000000000 1.92000000000000 1.96000000000000 2];
Plotting these two plot(B,A) I get this
with two maximum points at B = -1.52 and B = +1.52
I want to add automatically a point as marker in the two maximum values, a horizontal line above the highest point and a two way row pointing from the line to the second peak like this
I tried to sort A and find the position of the two maximum
[val ind] = sort(A,'descend');
max_values = val(1:2)
index = ind(1:2)
r_max = A(ind(1:2))
but the second peak is not the the second position of val because I get this sort:
Columns 1 through 13
0.1697 0.1697 0.1695 0.1693 0.1691 0.1688 0.1684 0.1680 0.1674 0.1670 0.1662 0.1658 0.1647
Columns 14 through 26
0.1643 0.1630 0.1627 0.1610 0.1606 0.1588 0.1586 0.1566 0.1566 0.1564 0.1563 0.1563 0.1563
The first value 0.1697 (in this case) is the correct one, but the second peak is not in the second position but at the 22nd position.
Looking at the plot, how can I get easily the two maximum points?
Once I know the two coordinates I can easily add all the objects that I need.

Using findpeaks (requires Signal Processing Toolbox), yline (introduced in R2018b) and annotation :
A = [0.140673450903833 0.143148937279028 0.145430952171596 0.147474938627147 0.149581060870114 0.151187105347571 0.152646348246015 0.153892222566265 0.154913060187075 0.155701930397674 0.156253328260122 0.156562551841967 0.156625533585493 0.156438787610539 0.155999394209637 0.155304997895017 0.154353810555534 0.153144616301392 0.151676776486360 0.149950234280632 0.147965519042205 0.145723755995511 0.143229676241241 0.140476287800831 0.137475884805212 0.134228713435530 0.130738812449387 0.127010778531129 0.123049766057659 0.118861478234099 0.114452155847321 0.109828564345449 0.104997979409803 0.0999681710919947 0.0947473865690700 0.0893443315667412 0.0837681505026921 0.0780284054055798 0.0721350536699391 0.0660984247128685 0.0599291956061627 0.0536383657701255 0.0472372308396036 0.0407373558685518 0.0341505481893574 0.0274888307202605 0.0207644183921863 0.0139897104812690 0.00717740846258673 0.000358034181698980 0.00651349557333709 0.0133637955715171 0.0202018404602034 0.0270147225489001 0.0337898204971252 0.0405146415132138 0.0471768260462406 0.0537641715916784 0.0602646603043279 0.0666664873507057 0.0729580891146200 0.0791281709097673 0.0851657340195109 0.0910601019446384 0.0968009457657087 0.102378308539557 0.107782628657363 0.113004762097380 0.118036003510261 0.122868106079509 0.127493300104313 0.131904310257409 0.136094371477732 0.140057243469708 0.143787223810476 0.147279159770258 0.150528459504324 0.153531108836772 0.156280444813554 0.158783035106175 0.161027296288627 0.163014562505352 0.164743731117677 0.166214276765471 0.167426257040343 0.168380310331524 0.169077651806683 0.169520068571722 0.169709914896378 0.169650109087113 0.169344135453180 0.168796059816963 0.168010582212876 0.166993205517562 0.165750858213848 0.164295206012858 0.162692813100379 0.160590402150861 0.158550181408264 0.156271984944015 0.153800366335689];
B = [-2 -1.96000000000000 -1.92000000000000 -1.88000000000000 -1.84000000000000 -1.80000000000000 -1.76000000000000 -1.72000000000000 -1.68000000000000 -1.64000000000000 -1.60000000000000 -1.56000000000000 -1.52000000000000 -1.48000000000000 -1.44000000000000 -1.40000000000000 -1.36000000000000 -1.32000000000000 -1.28000000000000 -1.24000000000000 -1.20000000000000 -1.16000000000000 -1.12000000000000 -1.08000000000000 -1.04000000000000 -1 -0.960000000000000 -0.920000000000000 -0.880000000000000 -0.840000000000000 -0.800000000000000 -0.760000000000000 -0.720000000000000 -0.680000000000000 -0.640000000000000 -0.600000000000000 -0.560000000000000 -0.520000000000000 -0.480000000000000 -0.440000000000000 -0.400000000000000 -0.360000000000000 -0.320000000000000 -0.280000000000000 -0.240000000000000 -0.200000000000000 -0.160000000000000 -0.120000000000000 -0.0800000000000001 -0.0400000000000000 0 0.0400000000000000 0.0800000000000001 0.120000000000000 0.160000000000000 0.200000000000000 0.240000000000000 0.280000000000000 0.320000000000000 0.360000000000000 0.400000000000000 0.440000000000000 0.480000000000000 0.520000000000000 0.560000000000000 0.600000000000000 0.640000000000000 0.680000000000000 0.720000000000000 0.760000000000000 0.800000000000000 0.840000000000000 0.880000000000000 0.920000000000000 0.960000000000000 1 1.04000000000000 1.08000000000000 1.12000000000000 1.16000000000000 1.20000000000000 1.24000000000000 1.28000000000000 1.32000000000000 1.36000000000000 1.40000000000000 1.44000000000000 1.48000000000000 1.52000000000000 1.56000000000000 1.60000000000000 1.64000000000000 1.68000000000000 1.72000000000000 1.76000000000000 1.80000000000000 1.84000000000000 1.88000000000000 1.92000000000000 1.96000000000000 2];
plot(B,A)
% Find peaks.
[maxValuesY,isMaxY]=findpeaks(A);
maxValuesX = B(isMaxY);
% Plot horizontal line.
yline(maxValuesY(2));
% Create arrow.
ar = annotation('arrow');
ar.Parent = gca;
ar.X = [maxValuesX(1), maxValuesX(1)];
ar.Y = [maxValuesY(2), maxValuesY(1)];
ar.Color = 'black';
ar.HeadLength = 3;
Thanks to marsei for tip on the position of annotation.

If you specifically have such plots, you can go with the following solution, which just excludes n neighbours around the first found maximum.
% Input (copy from above...)
A = [ .. ];
B = [ .. ];
% Index of max value.
[max_val, max_idx] = max(A);
% Find second max value by excluding n neighbourhood.
n = 10;
AA = A;
AA(max_idx - n : max_idx + n) = [];
sec_max_val = max(AA);
sec_max_idx = find(A == sec_max_val);
% Output.
figure(1);
hold on;
% Graph.
plot(B, A);
% Black line.
plot([B(1) B(end)], [max_val max_val], 'k');
% Black arrow.
p1 = [B(sec_max_idx) B(sec_max_idx)];
p2 = [max_val sec_max_val];
dp = p2 - p1;
quiver(p1(1), p2(1), p1(2) - p1(1), p2(2) - p2(1), 0, 'k');
hold off;
You'll get such an output:

Related

Fixed size output data in MATLAB

I am trying to have a matrix which its elements doesn't have the same size .
Say the element_1 = 0.1234567 and the element_2 = 0.1 and I need the element_2 =0.1000000 so that both of them has the same size.
clc;clear all
a = rand(4,12);
COL_Names ={'This_is_Colu_No_1','This_is_Colu_No_2','This_is_Colu_No_3','This_is_Colu_No_4','This_is_Colu_No_5','This_is_Colu_No_6','This_is_Colu_No_7','This_is_Colu_No_8','This_is_Colu_No_9','This_is_Colu_No_10','This_is_Colu_No_11','This_is_Colu_No_12'};
rowNames = {'ROW1';'ROW2';'ROW3';'ROW4'};
T = array2table(a,'VariableNames',COL_Names,'RowNames',rowNames);
writetable(T,'Data.txt','Delimiter','\t','WriteRowNames',true);
type Data.txt ;
The OutPut is like this
Row This_is_Colu_No_1 This_is_Colu_No_2 This_is_Colu_No_3 This_is_Colu_No_4 This_is_Colu_No_5 This_is_Colu_No_6 This_is_Colu_No_7 This_is_Colu_No_8 This_is_Colu_No_9 This_is_Colu_No_10 This_is_Colu_No_11 This_is_Colu_No_12
ROW1 0.139740979774291 0.231035232035157 0.347778782863186 0.279682446566279 0.060995054119542 0.233212699943628 0.507599581908539 0.833087779293817 0.552819386888535 0.43251811668393 0.342580158122272 0.420574544492339
ROW2 0.00459708931895875 0.703626845695885 0.33064632159971 0.85782393462353 0 0.935097755896966 0.582441521353621 0.155241648807001 0.163717355897126 0.48985529896707 0.0134551766978835 0.810989133317225
ROW3 0.791254563282513 0.650747335567064 0.293769172888192 0.15110222627643 0.962791661993452 0.842147123142386 0.586462512126695 0.109349751268813 1 0.00525695361457879 0.700826048054212 0.989915984093474
ROW4 0.513993416249574 0.868158891144176 0.293769172888 0.552496163682282 0.301098948730568 0.779790450269442 0.420527994140777 0.523231514251179 0.0602548802340035 0.261436547849062 0.84923648156472 0.433189006269314
I think you need to do it manually using fprintf with formatSpec:
clc, clear, rng(3);
a = rand(4, 3);
colNames = {'This_is_Colu_No_9', 'This_is_Colu_No_10', 'This_is_Colu_No_11'};
rowNames = {'ROW98'; 'ROW99'; 'ROW100'; 'ROW101'};
formatSpecHead = '%-6s %-22s %-22s %-22s\n';
formatSpecRow = '%-6s %.20f %.20f %.20f\n';
fid = fopen('a.fwf', 'w');
fprintf(fid, formatSpecHead, 'Row', colNames{:}); % write header
for row = 1:size(a, 1)
fprintf(fid, formatSpecRow, rowNames{row}, a(row, :)); % write row
end
fclose(fid);
Then a.fwf looks like:
Row This_is_Colu_No_9 This_is_Colu_No_10 This_is_Colu_No_11
ROW98 0.55079790257457550418 0.89294695434765469777 0.05146720330082987793
ROW99 0.70814782261810482744 0.89629308893343806464 0.44080984365063646813
ROW100 0.29090473891294432729 0.12558531046383625274 0.02987621087856695556
ROW101 0.51082760519766301499 0.20724287813818675907 0.45683322439471107934

How can I generate 3d integer coordinates in order of distance from an origin

I want to generate a sequence of coordinates in order of distance from the origin. The sequence will obviously be infinite so just generating them all and sorting by distance will not work for me.
For those points that are the same distance I don't care about the order.
For example, here's some points, with their distance from the origin up to two steps away.
# d² = 0
(0,0,0)
# d² = 1
(0,0,-1)
(0,-1,0)
(-1,0,0)
(1,0,0)
(0,1,0)
(0,0,1)
# d² = 2
(0,-1,-1)
(-1,0,-1)
(1,0,-1)
(0,1,-1)
(-1,-1,0)
(1,-1,0)
(-1,1,0)
(1,1,0)
(0,-1,1)
(-1,0,1)
(1,0,1)
(0,1,1)
# d² = 3
(-1,-1,-1)
(1,-1,-1)
(-1,1,-1)
(1,1,-1)
(-1,-1,1)
(1,-1,1)
(-1,1,1)
(1,1,1)
# d² = 4
(0,0,-2)
(0,-2,0)
(-2,0,0)
(2,0,0)
(0,2,0)
(0,0,2)
# d² = 5
(0,-1,-2)
(-1,0,-2)
(1,0,-2)
(0,1,-2)
(0,-2,-1)
(-2,0,-1)
(2,0,-1)
(0,2,-1)
(-1,-2,0)
(1,-2,0)
(-2,-1,0)
(2,-1,0)
(-2,1,0)
(2,1,0)
(-1,2,0)
(1,2,0)
(0,-2,1)
(-2,0,1)
(2,0,1)
(0,2,1)
(0,-1,2)
(-1,0,2)
(1,0,2)
(0,1,2)
# d² = 6
(-1,-1,-2)
(1,-1,-2)
(-1,1,-2)
(1,1,-2)
(-1,-2,-1)
(1,-2,-1)
(-2,-1,-1)
(2,-1,-1)
(-2,1,-1)
(2,1,-1)
(-1,2,-1)
(1,2,-1)
(-1,-2,1)
(1,-2,1)
(-2,-1,1)
(2,-1,1)
(-2,1,1)
(2,1,1)
(-1,2,1)
(1,2,1)
(-1,-1,2)
(1,-1,2)
(-1,1,2)
(1,1,2)
# d² = 8
(0,-2,-2)
(-2,0,-2)
(2,0,-2)
(0,2,-2)
(-2,-2,0)
(2,-2,0)
(-2,2,0)
(2,2,0)
(0,-2,2)
(-2,0,2)
(2,0,2)
(0,2,2)
# d² = 9
(-1,-2,-2)
(1,-2,-2)
(-2,-1,-2)
(2,-1,-2)
(-2,1,-2)
(2,1,-2)
(-1,2,-2)
(1,2,-2)
(-2,-2,-1)
(2,-2,-1)
(-2,2,-1)
(2,2,-1)
(-2,-2,1)
(2,-2,1)
(-2,2,1)
(2,2,1)
(-1,-2,2)
(1,-2,2)
(-2,-1,2)
(2,-1,2)
(-2,1,2)
(2,1,2)
(-1,2,2)
(1,2,2)
# d² = 12
(-2,-2,-2)
(2,-2,-2)
(-2,2,-2)
(2,2,-2)
(-2,-2,2)
(2,-2,2)
(-2,2,2)
(2,2,2)
Starting from a solution aaa, then aab, then abc, you can retrieve all other "permutations" like aba, baa, -a-a-b, and so on. So you can keep a < b < c, positive numbers.
Geometrically : one eighth triangle on a globe. For a circle x²+y² one would iterate in a quarter over x and let y be retrieved from r² - x². Happens here to given an a.
Unfortunately the coding is too much to me for answering on a sunny Sunday.
(= hard enough).
Schematically in pseudo-code:
int distance = -1;
int a;
int b;
int c;
PermutationIterator perm = ...
Point next() {
if (perm.atEnd()) { // Initially true.
perm.nextDistance();
++distance;
a = distance;
b = a;
c = a;
// Will return Point(a, a, a);
}
return perm.nextPerm();
}

MATLAB fprintf Increase Number of Digits of Exponent

If we have
A=[100 -0.1 0];
B=[30 0.2 -2]; t1='text 1'; t2=text 2'
how to use fprintf so that the output saved in a file will look like that
100 -1.000E-0001 0.000E-0000 'text 1'
30 2.000E-0001 -2.000E-0000 'text 2'
I put together a "one-liner" (spread across several lines for better readability) that takes an array, a single number format, and a delimiter and returns the desired string. And while you found the leading blank-space flag, I prefer the + flag, though the function will work with both:
A=[-0.1 0];
B=[0.2 -2];
minLenExp = 4;
extsprintf = #(num,fmt,delim) ...
strjoin(cellfun(...
#(toks)[toks{1},repmat('0',1,max([0,minLenExp-length(toks{2})])),toks{2}],...
regexp(sprintf(fmt,num),'([+-\s][\.\d]+[eE][+-])(\d+)','tokens'),...
'UniformOutput',false),delim);
Astr = extsprintf(A,'%+.4E',' ');
Bstr = extsprintf(B,'%+.4E',' ');
disp([Astr;Bstr]);
Running this yields:
>> foo
-1.0000E-0001 +0.0000E+0000
+2.0000E-0001 -2.0000E+0000
(foo is just what the script file is called.)
Here's a more general approach that searches for the exponential format instead of assuming it:
A=[100 -0.1 0].';
B=[30 0.2 -2];
extsprintf = #(fmt,arr) ...
regexprep(...
sprintf(fmt,arr),...
regexprep(regexp(sprintf(fmt,arr),'([+-\s][\.\d]+[eE][+-]\d+)','match'),'([\+\.])','\\$1'),...
cellfun(#(match)...
cellfun(...
#(toks)[toks{1},repmat('0',1,max([0,minLenExp-length(toks{2})])),toks{2}],...
regexp(match,'([+-\s][\.\d]+[eE][+-])(\d+)','tokens'),...
'UniformOutput',false),...
regexp(sprintf(fmt,arr),'([+-\s][\.\d]+[eE][+-]\d+)','match')));
fmt = '%3d %+.4E %+.4e';
disp(extsprintf(fmt,A));
disp(extsprintf(fmt,B));
Outputs
>> foo
100 -1.0000E-0001 +0.0000e+0000
30 +2.0000E-0001 -2.0000e+0000

How can I modify a variable and it's corresponding dimension in a netcdf

I am running an idealized experiment with a atmospheric model with output in the format of netcdf. But the netcdf file header does not describe the variable and and dimension clearly. for example :
netcdf qc3d {
dimensions:
Time = UNLIMITED ; // (1453 currently)
bottom_top = 45 ;
south_north = 32 ;
west_east = 12288 ;
variables:
float Time(Time) ;
Time:axis = "Time" ;
Time:long_name = "Time" ;
Time:standard_name = "Time" ;
Time:units = "minutes since 1900-01-01 " ;
double zc(bottom_top) ;
zc:axis = "Z" ;
zc:long_name = "vertical height of model layers, MSL" ;
zc:standard_name = "altitude" ;
zc:units = "m" ;
zc:positive = "up" ;
double yc(south_north) ;
yc:axis = "Y" ;
yc:long_name = "y-coordinate of grid cell centers in Cartesian system" ;
yc:units = "m" ;
double xc(west_east) ;
xc:axis = "X" ;
xc:long_name = "x-coordinate of grid cell centers in Cartesian system" ;
xc:units = "m" ;
float qc(Time, bottom_top, south_north, west_east) ;
qc:long_name = "cloud water mixing ratio" ;
qc:standard_name = "cloud_water_mixing" ;
qc:units = "kg kg-1" ;
qc:_FillValue = 9.96921e+36f ;
qc:missing_value = 9.96921e+36f ;
// global attributes:
:model_tag = "CSU VVM" ;
:references = "http://kiwi.atmos.colostate.edu/pubs/joon-hee-tech_report.pdf" ;
:contact = "jung#atmos.colostate.edu" ;
:institution = "Colorado State University" ;
:VVM_casename = "GATE_PHASE_III " ;
}
there are 4 dimensions, Time(Time), zc(bottom_top), yc(south_north), xc(west_east) and 5 variable in which first 4 variables should be the dimension of the fifth variable, so called qc. But it seems not. The dimension is just a series number index from 1 to 45, 32 ...whatever else.
I would like to calculate some variable which is the function of pressure. in the CDO the script is like this
cdo expr,"pottemp=temp*((100000/clev(temp))^0.287);" ifile pottemp.nc
(this code is from here)
but I just obtain a series of index like 1 ,2 ,3 ... not the real pressure level when I use this function, clev(qv).
So how can I rewrite the variable dimension like qc from
qc(Time, bottom_top, south_north, west_east) ;
to
qc(Time, zc, yc, xc) ;
I think it's possible for me to finish this in matlab, but I just don't want to open this dataset because it's size is too large... so I am trying to find some tools like ncks or cdo, but still failed to do this.
thanks a lot.
With NCO try
ncap2 -s 'pottemp=temp*((100000/zc)^0.287)' in.nc out.nc
ncap2 documentation
extending answer to cover additional question below:
first use ncap2 to explicitly set zc to the values in the ascii file:
ncap2 -s 'zc[$zc]={1.5,900,2500,3500}' in.nc out.nc
(assuming zc is size 4 dimension). then define pottemp based on that zc.

Compute the Frequency of bigrams in Matlab

I am trying to compute and plot the distribution of bigrams frequencies
First I did generate all possible bigrams which gives 1296 bigrams
then i extract the bigrams from a given file and save them in words1
my question is how to compute the frequency of these 1296 bigrams for the file a.txt?
if there are some bigrams did not appear at all in the file, then their frequencies should be zero
a.txt is any text file
clear
clc
%************create bigrams 1296 ***************************************
chars ='1234567890abcdefghijklmonpqrstuvwxyz';
chars1 ='1234567890abcdefghijklmonpqrstuvwxyz';
bigram='';
for i=1:36
for j=1:36
bigram = sprintf('%s%s%s',bigram,chars(i),chars1(j));
end
end
temp1 = regexp(bigram, sprintf('\\w{1,%d}', 1), 'match');
temp2 = cellfun(#(x,y) [x '' y],temp1(1:end-1)', temp1(2:end)','un',0);
bigrams = temp2;
bigrams = unique(bigrams);
bigrams = rot90(bigrams);
bigram = char(bigrams(1:end));
all_bigrams_len = length(bigrams);
clear temp temp1 temp2 i j chars1 chars;
%****** 1. Cleaning Data ******************************
collection = fileread('e:\a.txt');
collection = regexprep(collection,'<.*?>','');
collection = lower(collection);
collection = regexprep(collection,'\W','');
collection = strtrim(regexprep(collection,'\s*',''));
%*******************************************************
temp = regexp(collection, sprintf('\\w{1,%d}', 1), 'match');
temp2 = cellfun(#(x,y) [x '' y],temp(1:end-1)', temp(2:end)','un',0);
words1 = rot90(temp2);
%*******************************************************
words1_len = length(words1);
vocab1 = unique(words1);
vocab_len1 = length(vocab1);
[vocab1,void1,index1] = unique(words1);
frequencies1 = hist(index1,vocab_len1);
I. Character counting problem for a string
bsxfun based solution for counting characters -
counts = sum(bsxfun(#eq,[string1-0]',65:90))
Output -
counts =
2 0 0 0 0 2 0 1 0 0 ....
If you would like to get a tabulate output of counts against each letter -
out = [cellstr(['A':'Z']') num2cell(counts)']
Output -
out =
'A' [2]
'B' [0]
'C' [0]
'D' [0]
'E' [0]
'F' [2]
'G' [0]
'H' [1]
'I' [0]
....
Please note that this was a case-sensitive counting for upper-case letters.
For a lower-case letter counting, use this edit to this earlier code -
counts = sum(bsxfun(#eq,[string1-0]',97:122))
For a case insensitive counting, use this -
counts = sum(bsxfun(#eq,[upper(string1)-0]',65:90))
II. Bigram counting case
Let us suppose that you have all the possible bigrams saved in a 1D cell array bigrams1 and the incoming bigrams from the file are saved into another cell array words1. Let us also assume certain values in them for demonstration -
bigrams1 = {
'ar';
'de';
'c3';
'd1';
'ry';
't1';
'p1'}
words1 = {
'de';
'c3';
'd1';
'r9';
'yy';
'de';
'ry';
'de';
'dd';
'd1'}
Now, you can get the counts of the bigrams from words1 that are present in bigrams1 with this code -
[~,~,ind] = unique(vertcat(bigrams1,words1));
bigrams_lb = ind(1:numel(bigrams1)); %// label bigrams1
words1_lb = ind(numel(bigrams1)+1:end); %// label words1
counts = sum(bsxfun(#eq,bigrams_lb,words1_lb'),2)
out = [bigrams1 num2cell(counts)]
The output on code run is -
out =
'ar' [0]
'de' [3]
'c3' [1]
'd1' [2]
'ry' [1]
't1' [0]
'p1' [0]
The result shows that - First element ar from the list of all possible bigrams has no find in words1 ; second element de has three occurrences in words1 and so on.
Hey similar to Dennis solution you can just use histc()
string1 = 'ASHRAFF'
histc(string1,'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
this checks the number of entries in the bins defined by the string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' which is hopefully the alphabet (just wrote it fast so no garantee). The result is:
Columns 1 through 21
2 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0
Columns 22 through 26
0 0 0 0 0
Just a little modification of my solution:
string1 = 'ASHRAFF'
alphabet1='A':'Z'; %%// as stated by Oleg Komarov
data=histc(string1,alphabet1);
results=cell(2,26);
for k=1:26
results{1,k}= alphabet1(k);
results{2,k}= data(k);
end
If you look at results now you can easily check rather it works or not :D
This answer creates all bigrams, loads in the file does a little cleanup, ans then uses a combination of unique and histc to count the rows
Generate all Bigrams
note the order here is important as unique will sort the array so this way it is created presorted so the output matches expectation;
[y,x] = ndgrid(['0':'9','a':'z']);
allBigrams = [x(:),y(:)];
Read The File
this removes capitalisation and just pulls out any 0-9 or a-z character then creates a column vector of these
fileText = lower(fileread('d:\loremipsum.txt'));
cleanText = regexp(fileText,'([a-z0-9])','tokens');
cleanText = cell2mat(vertcat(cleanText{:}));
create bigrams from file by shifting by one and concatenating
fileBigrams = [cleanText(1:end-1),cleanText(2:end)];
Get Counts
the set of all bigrams is added to our set (so the values are created for all possible). Then a value ∈{1,2,...,1296} is assigned to each unique row using unique's 3rd output. Counts are then created with histc with the bins equal to the set of values from unique's output, 1 is subtracted from each bin to remove the complete set bigrams we added
[~,~,c] = unique([fileBigrams;allBigrams],'rows');
counts = histc(c,1:1296)-1;
Display
to view counts against text
[allBigrams, counts+'0']
or for something potentially more useful...
[sortedCounts,sortInd] = sort(counts,'descend');
[allBigrams(sortInd,:), sortedCounts+'0']
ans =
or9
at8
re8
in7
ol7
te7
do6 ...
Did not look into the entire code fragment, but from the example at the top of your question, I think you are looking to make a histogram:
string1 = 'ASHRAFF'
nr = histc(string1,'A':'Z')
Will give you:
2 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
(Got a working solution with hist, but as #The Minion shows histc is more easy to use here.)
Note that this solution only deals with upper case letters.
You may want to do something like so if you want to put lower case letters in their correct bin:
string1 = 'ASHRAFF'
nr = histc(upper(string1),'A':'Z')
Or if you want them to be shown separately:
string1 = 'ASHRaFf'
nr = histc(upper(string1),['a':'z' 'A':'Z'])
bi_freq1 = zeros(1,all_bigrams_len);
for k=1: vocab_len1
for i=1:all_bigrams_len
if char(vocab1(k)) == char(bigrams(i))
bi_freq1(i) = frequencies1(k);
end
end
end