I want to calculate the accuracy of my k-means clustering without normalization and k-means clustering with normalization and want to compare the results.
My dataset looks like this:
age chol
63 233
37 250
41 204
56 236
57 354
57 192
56 294
44 263
52 199
57 168
54 239
48 275
49 266
64 211
58 283
50 219
58 340
66 226
43 247
69 239
59 234
44 233
42 226
61 243
40 199
71 302
59 212
51 175
65 417
53 197
41 198
65 177
44 219
54 273
51 213
46 177
54 304
54 232
65 269
65 360
51 308
48 245
45 208
53 264
39 321
52 325
44 235
47 257
53 216
53 234
How can I write the code for it in Matlab and plot it?
I'm new to Keras LSTM and I building a script for prediction based on time series.
I have the data set below which describe users' power consumption in a year
User mai/13 jun/13 jul/13 ago/13 set/13 out/13 nov/13 dez/13 jan/14 fev/14 mar/14 abr/14
x1 88 99 108 90 86 79 83 75 80 69 75 61
x2 0 143 163 127 174 199 177 140 100 100 130 119
x3 51 50 41 42 51 48 58 53 53 37 47 41
x4 181 211 166 105 128 123 125 98 114 58 4 0
x5 173 187 180 195 211 212 231 188 193 168 166 0
x6 343 321 293 272 400 397 467 392 409 306 408 353
x7 215 150 161 148 153 130 165 166 198 166 150 148
x8 100 140 132 119 121 125 148 135 144 123 124 16
x9 197 193 181 161 201 161 227 154 176 148 171 172
x10 356 371 347 347 363 377 423 373 395 338 337 302
For each user, my script split the data in windows with size n, i.e., if n=6, user x1 will have the following dataset
88 99 108 90 86 79
99 108 90 86 79 83
108 90 86 79 83 75
90 86 79 83 75 80
86 79 83 75 80 69
79 83 75 80 69 75
83 75 80 69 75 61
I want my net to learn from the patterns of each window, so when I'm going to test it, based on a new window, it predicts a value considering the previous values.
Considering these variables:
dataset_length => number of months
dataset_dim => number of users
sequence_length => window size
The net I'm thinking about is:
1st layer: LSTM with sequence_length units, input_shape(None, 1) and return_sequences set true
2nd layer: LSTM with sequence_length*2 units and return_sequences set false
3rd layer: Dense with 1 unit
Is there anything wrong on my net layout? I'm testing the net with a little bit more data and not getting a very good result (high loss). What can I change for a better prediction? Do I need more layers?
Please let me know if you need some piece of code or the test results.
I have sample code emulating my actual code. Where I have cell arrays outside the parfor loop. I have to perform computations on strings and numerical outputs will be stored in arrays which I can write to a csv file after each parfor loop. So I made dummy code. But I couldn't get it to execute. The error message is: "subscription mismatch at line 6".
ftemp=fopen('temp.csv','w');
march=cell(1,20);tc=0;
march={'ab' 'cd' 'ef' 'gh' 'ij' 'kl' 'mn' 'op' 'qr' 'st' 'uv' 'AB' 'CD' 'EF' 'GH' 'IJ' 'KL' 'MN' 'OP' 'QR'};
for i=1:10
matlabpool open 4;
parfor j=1:1:20
a(j,1)=randi(200,1,1);
b(j,2)=j+tc;
c(j,3)=march{1,j};
d(j,4)=(randi(200,1,1)/200);
end
fprintf(ftemp,'%d\t%d\t%s\t%f',a,b,c,d);
matlabpool close
clear a b c d;
tc=tc+20;
end
fclose(ftemp);
quit
The cause of the Error is that you are trying to assign a cell into an array in line 9.
I made some changes into your code, they are described in comments.
ftemp=fopen('temp.csv','w');
march=cell(1,20);tc=0;
march={'ab' 'cd' 'ef' 'gh' 'ij' 'kl' 'mn' 'op' 'qr' 'st' 'uv' 'AB' 'CD' 'EF' 'GH' 'IJ' 'KL' 'MN' 'OP' 'QR'};
for i=1:10
parpool local; % here in my version of Matlab 2015 there are no more "matlabpool" if it doesen't work change it back
parfor j=1:20
%i changed variable a(j,1) into a(j) and b(j,2) into b(j): may
%contain empty arrays , idem: for c and d
a(j)=randi(200,1,1);
b(j)=j+tc;
c{j}=march{1,j}; % changed c(j)= march{1,j}; : cause of error
d(j)=randi(200,1,1)/200;
end
fprintf(ftemp,'%d\t%d\t%c\t%d',a,b,char(c),d); % char(c) in order to convert cell array to array of strings
delete(gcp) % there isn't such thing "matlabpool close" , the right expression is "delete(gcp)"
clear a b c d;
tc=tc+20;
end
fclose(ftemp);
%quit : i remove because it closes matlab , please put it back if u really
%want to close matlab after operation
Note: if parpool doesent work for your version of matlab , please change it back to the expression you were using matlabpool open 4;
Well for the unexpected outputs and number of rows in the output , it's because the wrong use of fprintf . We need to print element by element , which means the fprintf has to be inside the parfor loop, so your code should look like this:
ftemp=fopen('temp.csv','w');
march=cell(1,20);tc=0;
march={'ab';'cd';'ef';'gh';'ij';'klm';'mn';'op';'qr';'st';'uv';'ABls';'CD';'E3F';'GH';'IJ';'dynaKL';'MN';'OP';'QR'};
matlabpool open 4;
for k=1:10
%parpool local; % here in my version of Matlab 2015 there are no more "matlabpool" if it doesen't work change it back
parfor j=1:20
%i changed variable a(j,1) into a(j) and b(j,2) into b(j): may
%contain empty arrays , idem: for c and d
a(j)=randi(200,1,1);
b(j)=j+tc;% indexing purposes to identify the order when parallel processing going on
if length(march{j})>2 % this kind of conditions and computations are there in my actual code
c{j}='skip';
else
c{j}=march{j};
end
%d(j)=randi(200,1,1)/200;
fprintf(ftemp,'%d\t%d\t%s\t\n',a(j),b(j),c{j}); % char(c) in order to convert cell array to array of strings
end
clear a b c;
tc=tc+20;
end
fclose(ftemp);
matlabpool close;
the output is:
9 1 ab
171 2 cd
7 3 ef
98 4 gh
102 5 ij
20 6 skip
53 7 mn
174 8 op
36 9 qr
30 10 st
130 11 uv
127 12 skip
133 13 CD
65 14 skip
118 15 GH
130 16 IJ
139 17 skip
168 18 MN
57 19 OP
25 20 QR
22 21 ab
39 22 cd
83 23 ef
22 24 gh
159 25 ij
40 26 skip
164 27 mn
78 28 op
194 29 qr
88 30 st
125 31 uv
1 32 skip
2 33 CD
112 34 skip
161 35 GH
170 36 IJ
55 37 skip
59 38 MN
18 39 OP
134 40 QR
80 41 ab
118 42 cd
108 43 ef
174 44 gh
97 45 ij
157 46 skip
85 47 mn
98 48 op
40 49 qr
11 50 st
171 51 uv
139 52 skip
90 53 CD
70 54 skip
173 55 GH
150 56 IJ
186 57 skip
155 58 MN
136 59 OP
96 60 QR
158 61 ab
118 62 cd
124 63 ef
127 64 gh
26 65 ij
124 66 skip
91 67 mn
186 68 op
63 69 qr
137 70 st
170 71 uv
98 72 skip
132 73 CD
80 74 skip
160 75 GH
20 76 IJ
156 77 skip
142 78 MN
110 79 OP
51 80 QR
18 81 ab
29 82 cd
40 83 ef
49 84 gh
102 85 ij
113 86 skip
96 87 mn
44 88 op
166 89 qr
90 90 st
21 91 uv
60 92 skip
44 93 CD
166 94 skip
103 95 GH
123 96 IJ
77 97 skip
163 98 MN
138 99 OP
111 100 QR
94 101 ab
133 102 cd
158 103 ef
13 104 gh
26 105 ij
117 106 skip
90 107 mn
58 108 op
156 109 qr
79 110 st
196 111 uv
168 112 skip
192 113 CD
160 114 skip
56 115 GH
129 116 IJ
191 117 skip
157 118 MN
170 119 OP
30 120 QR
137 121 ab
128 122 cd
12 123 ef
38 124 gh
38 125 ij
122 126 skip
97 127 mn
129 128 op
142 129 qr
154 130 st
99 131 uv
85 132 skip
129 133 CD
111 134 skip
108 135 GH
78 136 IJ
102 137 skip
48 138 MN
100 139 OP
89 140 QR
114 141 ab
12 142 cd
184 143 ef
145 144 gh
5 145 ij
48 146 skip
92 147 mn
64 148 op
87 149 qr
32 150 st
136 151 uv
103 152 skip
90 153 CD
73 154 skip
28 155 GH
191 156 IJ
63 157 skip
120 158 MN
104 159 OP
178 160 QR
148 161 ab
36 162 cd
129 163 ef
11 164 gh
172 165 ij
186 166 skip
145 167 mn
142 168 op
150 169 qr
185 170 st
14 171 uv
141 172 skip
22 173 CD
163 174 skip
48 175 GH
164 176 IJ
117 177 skip
25 178 MN
110 179 OP
111 180 QR
175 181 ab
60 182 cd
195 183 ef
44 184 gh
163 185 ij
4 186 skip
103 187 mn
95 188 op
127 189 qr
10 190 st
11 191 uv
182 192 skip
162 193 CD
179 194 skip
76 195 GH
104 196 IJ
153 197 skip
103 198 MN
4 199 OP
154 200 QR
Also the output has 200 rows.
ftemp=fopen('temp.csv','w');
march=cell(1,20);tc=0;
march={'ab';'cd';'ef';'gh';'ij';'klm';'mn';'op';'qr';'st';'uv';'ABls';'CD';'E3F';'GH';'IJ';'dynaKL';'MN';'OP';'QR'};
matlabpool open 4;
for k=1:10
%parpool local; % here in my version of Matlab 2015 there are no more "matlabpool" if it doesen't work change it back
parfor j=1:20
%i changed variable a(j,1) into a(j) and b(j,2) into b(j): may
%contain empty arrays , idem: for c and d
a(j)=randi(200,1,1);
b(j)=j+tc;% indexing purposes to identify the order when parallel processing going on
if length(march{j})>2 % this kind of conditions and computations are there in my actual code
c{j}='skip';
else
c{j}=march{j};
end
%d(j)=randi(200,1,1)/200;
end
fprintf(ftemp,'%d\t%d\t%s\n',a,b,c{:}); % char(c) in order to convert cell array to array of strings
clear a b c;
tc=tc+20;
end
fclose(ftemp);
matlabpool close;
but the output of the program i coudlnt get it
130 127 Abf5ȱ3³]À#¾
1 2
97 98 cd
101 102 gh
105 106 skip
109 110 op
113 114 st
117 118 skip
67 68 skip
71 72 IJ
115 107 ip
77 78 OP
81 82 168 57 ®$ «E�,vp|}¤
21 22 !"#$%&'(
97 98 cd
101 102 gh
105 106 skip
109 110 op
113 114 st
117 118 skip
67 68 skip
71 72 IJ
115 107 ip
77 78 OP
81 82 88 125 S([g 0'0Â"
41 42 +,-./0123456789:;<
97 98 cd
101 102 gh
105 106 skip
109 110 op
113 114 st
117 118 skip
67 68 skip
71 72 IJ
115 107 ip
77 78 OP
81 82 161 170 7Na�U;6´q¶nl®v¤
61 62 ?#ABCDEFGHIJKLMNOP
97 98 cd
101 102 gh
105 106 skip
109 110 op
113 114 st
117 118 skip
67 68 skip
71 72 IJ
115 107 ip
77 78 OP
81 82 171 139 ZFºPµ~R(`b[
81 82 STUVWXYZ[\]^_`abcd
97 98 cd
101 102 gh
105 106 skip
109 110 op
113 114 st
117 118 skip
67 68 skip
71 72 IJ
115 107 ip
77 78 OP
81 82 132 80 ||£)Á1[ªavb-
101 102 ghijklmnopqrstuvwx
97 98 cd
101 102 gh
105 106 skip
109 110 op
113 114 st
117 118 skip
67 68 skip
71 72 IJ
115 107 ip
77 78 OP
81 82 186 63 ª`neepw»fq°r1
121 122 {|}~�
97 98 cd
101 102 gh
105 106 skip
109 110 op
113 114 st
117 118 skip
67 68 skip
71 72 IJ
115 107 ip
77 78 OP
81 82 51 160 M(1´uOAF¦{¬©
141 142 ���
97 98 cd
101 102 gh
105 106 skip
109 110 op
113 114 st
117 118 skip
67 68 skip
71 72 IJ
115 107 ip
77 78 OP
81 82 90 21 <,uZgh�lu¨B,¦
0
161 162 £¤¥¦§¨©ª«¬®¯°±²³´
97 98 cd
101 102 gh
105 106 skip
109 110 op
113 114 st
117 118 skip
67 68 skip
71 72 IJ
115 107 ip
77 78 OP
81 82 192 160 o£¿^hyµÄ¨�Ä�O
181 182 ·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈ
97 98 cd
101 102 gh
105 106 skip
109 110 op
113 114 st
117 118 skip
67 68 skip
71 72 IJ
115 107 ip
77 78 OP
81 82
i mean to say why only 124 iterations took place instead of 200. and why these arbitrary outputs are there in the 3rd column
I have data set like:
1 14.8759
2 14.083
3 0.735268
4 18.2378
5 17.3748
6 4.07867
7 18.2032
8 15.6929
9 4.03338
10 19.0308
11 17.4139
12 17.4139
13 19.8453
14 4.91288
15 20.6746
16 16.578
17 14.8548
18 23.9831
19 19.0691
20 19.0777
21 3.24368
22 25.6457
23 -5.95598
24 32.3198
25 8.20419
26 22.3266
27 17.4016
28 9.0672
29 24.8722
30 24.8262
31 19.8966
32 34.7338
33 29.8088
34 33.1393
35 28.1402
36 35.6231
37 26.4872
38 3.2392
39 5.73463
40 26.4754
41 33.9667
42 27.3048
43 34.75
44 37.2759
45 15.6929
46 28.9686
47 44.6922
48 37.2799
49 25.699
50 45.4923
51 32.2579
52 25.699
53 29.7885
54 50.4719
55 20.6746
56 30.6061
57 38.0448
58 11.5342
59 52.9365
60 44.7128
61 38.0448
62 44.6621
63 13.1939
64 28.9542
65 46.3637
66 13.1939
67 10.7318
68 31.4318
69 29.7885
70 22.3399
71 29.7885
72 26.4754
73 55.4135
74 48.8326
75 42.2395
76 19.0174
77 7.4035
78 13.1939
79 33.9055
80 14.8935
81 27.3048
82 6.56548
83 64.4474
84 48.7848
85 59.5214
86 31.4915
87 59.5214
88 19.8966
89 57.0318
90 21.5631
91 20.7273
92 66.0889
93 58.6749
94 20.6803
95 52.1244
96 16.5242
97 51.3028
98 10.7037
99 12.3958
100 26.5265
101 30.6061
102 74.2826
103 50.4806
104 12.3958
105 17.354
106 40.5832
107 19.8514
108 63.6089
109 27.3559
110 9.06318
111 11.564
112 39.7561
113 29.8368
114 17.3615
115 19.0241
116 69.3539
117 35.6231
118 38.8777
119 34.7394
120 60.3455
121 25.6969
122 54.5637
123 25.6969
124 79.2023
125 31.4876
126 28.184
127 13.2268
128 34.7394
129 12.3602
130 29.0096
131 47.9604
132 82.4815
133 77.5533
134 14.8935
135 33.9055
136 16.5172
137 41.4113
138 34.7956
139 64.4558
140 29.8368
141 19.0108
142 26.5265
143 36.4452
144 50.4761
145 4.87781
146 83.3041
147 61.9694
148 26.5265
149 1.5427
150 71.8344
151 24.8158
152 94.7328
153 19.8915
154 36.4452
155 32.2504
156 26.5265
157 89.0202
158 29.8347
159 93.9223
160 87.3855
161 4.89738
162 88.1694
163 24.0448
164 51.2987
165 65.2679
166 89.8386
167 33.9055
168 67.7414
169 88.9942
170 19.0174
171 92.2651
172 49.6527
173 18.1971
174 19.0108
175 33.9667
176 92.2611
177 32.2789
178 92.2577
179 4.89738
180 102.898
181 34.7956
182 95.5292
183 28.9542
184 91.451
185 25.6457
186 74.2944
187 25.6516
188 47.1323
189 34.7338
190 94.7081
191 97.9775
192 105.334
193 89.812
194 93.8991
195 88.1756
196 10.7318
197 49.611
198 97.1618
199 2.40369
200 44.7128
201 35.6263
202 42.1795
203 53.7678
204 70.2067
205 28.9542
206 19.0241
207 111.849
208 19.8915
209 95.5218
210 38.8723
211 101.238
212 19.8393
213 92.2651
214 102.053
215 24.8221
216 116.713
217 88.9912
218 88.1756
219 115.102
220 58.6995
221 19.8393
222 27.3171
223 23.1511
224 53.7678
225 99.6138
226 120.79
227 32.2579
228 90.6265
229 38.0448
230 48.8284
231 111.054
232 112.608
233 66.9162
234 100.431
235 63.6317
236 19.8334
237 35.6263
238 17.3615
239 2.39774
240 29.7885
241 71.0225
242 66.9162
243 25.6457
244 128.908
245 12.3602
246 93.8991
247 123.218
248 24.8221
249 33.1393
250 110.194
251 31.4547
252 12.3958
253 92.2611
254 10.7037
255 90.6302
256 96.3458
257 102.053
258 37.2167
259 93.0788
260 19.0108
261 102.063
262 16.5617
263 49.611
264 135.388
265 117.522
266 92.2879
267 118.378
268 116.706
269 24.0448
270 128.941
271 132.182
272 137.009
273 48.7848
274 32.2789
275 137.826
276 137.009
277 117.522
278 54.5904
279 16.5172
280 141.064
281 63.6317
282 27.3559
283 108.587
284 38.8723
285 140.247
286 106.13
287 135.426
288 67.7371
289 19.8915
290 112.652
291 27.3227
292 117.522
and want to ignore/delete any Y value which is smaller than its previous value (and delete its corresponding X too) and put the new data set into a new file so that all resulted Y values would be in increasing order.
Thanks.
Assuming:
data = [1 14.8759
2 14.083
3 0.735268
... ... ];
You could do that:
keep = false(size(data, 1), 1);
largest = -Inf;
for i = 1:size(data, 1)
if data(i,2) > largest
largest = data(i,2);
keep(i) = true;
end
end
newdata = data(keep,:)
Result:
newdata =
1.0000 14.8759
4.0000 18.2378
10.0000 19.0308
13.0000 19.8453
15.0000 20.6746
18.0000 23.9831
22.0000 25.6457
24.0000 32.3198
32.0000 34.7338
36.0000 35.6231
44.0000 37.2759
47.0000 44.6922
50.0000 45.4923
54.0000 50.4719
59.0000 52.9365
73.0000 55.4135
83.0000 64.4474
92.0000 66.0889
102.0000 74.2826
124.0000 79.2023
132.0000 82.4815
146.0000 83.3041
152.0000 94.7328
180.0000 102.8980
192.0000 105.3340
207.0000 111.8490
216.0000 116.7130
226.0000 120.7900
244.0000 128.9080
264.0000 135.3880
272.0000 137.0090
275.0000 137.8260
280.0000 141.0640
If you've lot of data then it will be better to use vectorization. Removing for loops will make it faster.
Let's say 'A' is your second column (data).
A = 5 4 8 8 2 5 5 7 8 8;
Since your first column is just index we can leave it for now (Even if it's not you can copy second column to 'A' and proceed).
B = A - [-inf A(1:end-1)];
Aout = [find(B>=0);A(B>=0)];
If your first column is not just index copy it to say 'C' and change the last line to the following.
Aout = [C(B>=0);A(B>=0)];
Use bsxfun to compare each element with all the others, and from that generate a logical index that selects the desired rows:
result = data(~any(triu(bsxfun(#lt, data(:,2).', data(:,2)))), :);
I want to display a large matrix, but I don't like the words "Columns x to y" to show. How can I do this?
You can use the function NUM2STR to format a large 2-D matrix A into a character array and display that. For example:
>> A = magic(15); %# This would likely break up columns when displayed
>> num2str(A) %# This won't
ans =
122 139 156 173 190 207 224 1 18 35 52 69 86 103 120
138 155 172 189 206 223 15 17 34 51 68 85 102 119 121
154 171 188 205 222 14 16 33 50 67 84 101 118 135 137
170 187 204 221 13 30 32 49 66 83 100 117 134 136 153
186 203 220 12 29 31 48 65 82 99 116 133 150 152 169
202 219 11 28 45 47 64 81 98 115 132 149 151 168 185
218 10 27 44 46 63 80 97 114 131 148 165 167 184 201
9 26 43 60 62 79 96 113 130 147 164 166 183 200 217
25 42 59 61 78 95 112 129 146 163 180 182 199 216 8
41 58 75 77 94 111 128 145 162 179 181 198 215 7 24
57 74 76 93 110 127 144 161 178 195 197 214 6 23 40
73 90 92 109 126 143 160 177 194 196 213 5 22 39 56
89 91 108 125 142 159 176 193 210 212 4 21 38 55 72
105 107 124 141 158 175 192 209 211 3 20 37 54 71 88
106 123 140 157 174 191 208 225 2 19 36 53 70 87 104