Fill Unknown data with mode in matlab - matlab

Let us suppose we have following matlab code which reads csv file, converts text data to the categorical form and prints data :
data =readtable("fruit_data.csv");
data.fruit_name =categorical(data.fruit_name);
data.fruit_subtype=categorical(data.fruit_subtype);
head(data)
8×7 table
fruit_label fruit_name fruit_subtype mass width height color_score
___________ __________ _____________ ____ _____ ______ ___________
1 apple granny_smith 192 8.4 7.3 0.55
1 apple granny_smith 180 8 6.8 0.59
1 apple granny_smith 176 7.4 7.2 0.6
2 mandarin mandarin 86 6.2 4.7 0.8
2 mandarin mandarin 84 6 4.6 0.79
2 mandarin mandarin 80 5.8 4.3 0.77
2 mandarin mandarin 80 5.9 4.3 0.81
2 mandarin mandarin 76 5.8 4 0.81
tail(data)
we get :
ans =
8×7 table
fruit_label fruit_name fruit_subtype mass width height color_score
___________ __________ _____________ ____ _____ ______ ___________
4 lemon unknown 116 6 7.5 0.72
4 lemon unknown 118 5.9 8 0.72
4 lemon unknown 120 6 8.4 0.74
4 lemon unknown 116 6.1 8.5 0.71
4 lemon unknown 116 6.3 7.7 0.72
4 lemon unknown 116 5.9 8.1 0.73
4 lemon unknown 152 6.5 8.5 0.72
4 lemon unknown 118 6.1 8.1 0.7
i want to fill unknown data with the most frequent element (so called mode), like in pandas, i can select only those rows that contains unknown
data(data.fruit_subtype=='unknown',:)
10×7 table
fruit_label fruit_name fruit_subtype mass width height color_score
___________ __________ _____________ ____ _____ ______ ___________
4 lemon unknown 132 5.8 8.7 0.73
4 lemon unknown 130 6 8.2 0.71
4 lemon unknown 116 6 7.5 0.72
4 lemon unknown 118 5.9 8 0.72
4 lemon unknown 120 6 8.4 0.74
4 lemon unknown 116 6.1 8.5 0.71
4 lemon unknown 116 6.3 7.7 0.72
4 lemon unknown 116 5.9 8.1 0.73
4 lemon unknown 152 6.5 8.5 0.72
4 lemon unknown 118 6.1 8.1 0.7
but when i am writing following code :
data(data.fruit_subtype=='unknown',:) =mode(data.fruit_subtype)
result is :
Right hand side of an assignment into a table must be another table or a cell array.
i have tried following
data(data.fruit_subtype=='unknown',:) =cell(mode(data.fruit_subtype))
but i have got :
Error using cell
Conversion to cell from categorical is not possible.
please help me how to fix it?

Related

Pivot table with multiple keyed columns

I have the following table:
t:(([]y:2001 2002) cross ([]m:5 6 7) cross ([]sector:`running`hiking`swimming`cycling)),'([]sales: 14 12 5 9 4 894 1 4 87 12 24 6 4 8 64 354 3 4 86 43 1053 2 43 4);
y m sector sales
------------------------
2001 5 running 14
2001 5 hiking 12
2001 5 swimming 5
2001 5 cycling 9
2001 6 running 4
2001 6 hiking 894
2001 6 swimming 1
2001 6 cycling 4
...
2002 5 running 4
2002 5 hiking 8
2002 5 swimming 64
2002 5 cycling 354
2002 6 running 3
...
I want to pivot the sales values by sector, while keeping the first two y and m columns, such that the resulting table would look like this:
y m cycling hiking running swimming
--------------------------------------
2001 5 9 12 14 5
2001 6 4 894 4 1
2001 7 6 12 87 24
2002 5 354 8 4 64
2002 6 43 4 3 86
2002 7 4 2 1053 43
As per
https://code.kx.com/v2/kb/pivoting-tables/
q) P:asc exec distinct sector from t;
q) exec P#(sector!sales) by y:y,m:m from t
You can unkey the result by () xkey if you need a normal table.

Taking two separate tibbles(e.g. [[1]] and [[2]]) in data and merge?

I have a multiple step issue
1) I need to turn [[1]] "Total" -> Losing_Runs
2) I need to turn [[2]] "Total" -> Winning_Runs
3) I need to take tibble[[1]] "Loser" column and merge up with tibble[[2]] "Winner" column under a new column name labeled "Team"
4) The new tibble should be a 30x3 when compiled. The new column variables should be "Team", "Winning_Runs", "Losing_Runs"
The R doc below
[[1]]
# A tibble: 30 x 2
Loser Total
<chr> <dbl>
1 Baltimore Orioles 288
2 Kansas City Royals 278
3 Chicago White Sox 252
4 Minnesota Twins 251
5 Texas Rangers 236
6 Detroit Tigers 233
7 Miami Marlins 228
8 Cincinnati Reds 224
9 Pittsburgh Pirates 217
10 San Diego Padres 212
# ... with 20 more rows
[[2]]
# A tibble: 30 x 2
Winner Total
<chr> <dbl>
1 Boston Red Sox 694
2 Houston Astros 627
3 New York Yankees 579
4 Cleveland Indians 577
5 Chicago Cubs 572
6 Oakland Athletics 571
7 Los Angeles Dodgers 568
8 Atlanta Braves 543
9 Washington Nationals 540
10 Milwaukee Brewers 528
# ... with 20 more rows
Thank you very much for any&all help!

Assign new matrices for a certain condition, in Matlab

I have a matrix,DataFile=8x8. One of those columns(column 6 or "coarse event") can only be 0 or a 1. It will be 0 for a non-stable condition and 1 for a stable condition.Now for the example:
DataFile = [ 11 5 66 1.2 14.1 0 -1 0.1;...
12 6 67 1.4 15.1 0 -1 0.1;...
13 7 68 1.6 16.1 1 -1 0.2;...
14 8 69 1.7 16.5 1 -2 0.1;...
15 9 68 1.6 16.2 0 -1 0.3;...
16 8 66 1.3 15.7 1 -2 0.0;...
17 5 65 1.5 16.1 1 0 0.0;...
18 6 66 1.2 16.6 0 1 1.0];
With slight changes from the code in the comments:
DataFile =[zeros(1,size(DataFile,2)); DataFile; zeros(1,size(DataFile,2))];
startInd = [find(diff(DataFile(:,6))==1)];
endInd = [find(diff(DataFile(:,6)) <0)];
B={};
for n=1:1:numel(endInd)
B(n)={DataFile(startInd(n):endInd(n),:)};
end
FirstBlock=B{1};
SecondBlock=B{2};
The result is 2 matrices(FirstBlock=3x8,SecondBlock=3x8), which wrongfully includes 0's in the 6th column. It should be giving two matrices(dataIs1(1)=2x8 and dataIs1(2)=2x8), with only 1's in the 6th column.
In reality I would like have the a n-amount of matrices, for which the "coarse event" is 1. Thank you for the help!
The magic word is logical indexing:
If you have a Matrix A:
A=[1 2 3 4 5;...
0 6 7 8 9;...
1 7 8 9 10]
you can extact Row 1 and 2 by:
B=A(A(:,1)==1)
Hope thats waht your looking for, have fun.
To seperate the groups we need to know where they start and end:
endInd = [find(diff(A(:,1))<0) size(A,1)]
startInd = [1 find(diff(A(:,1))==1)]
Then assigne the Data to arrays:
B={};
for n=1:1:numel(endInd)
B(n)={A(startInd(n):endInd(n),:)};
end
Edit:
Your new Data:
DataFile = [ 11 5 66 1.2 14.1 0 -1 0.1;...
12 6 67 1.4 15.1 0 -1 0.1;...
13 7 68 1.6 16.1 1 -1 0.2;...
14 8 69 1.7 16.5 1 -2 0.1;...
15 9 68 1.6 16.2 0 -1 0.3;...
16 8 66 1.3 15.7 1 -2 0.0;...
17 5 65 1.5 16.1 1 0 0.0;...
18 6 66 1.2 16.6 0 1 1.0];
I add some padding to avoid mistakes:
DataFile =[zeros(1,size(DataFile,2)); DataFile; zeros(1,size(DataFile,2))]
Now, as before, we look for the starts and ends of the blocks:
endInd = [find(diff(A(:,1)) <0) -1]
startInd = [find(diff(A(:,1))==1)]
Then assigne the Data to a cell in a arrays:
B={};
for n=1:1:numel(endInd)
B(n)={A(startInd(n):endInd(n),:)};
end
If you want to retrive, say, the second block:
secondBlock=B{2};

PostgreSQL: Summing info from Two Aggregated Tables

There is something wrong with my method or my logic here.
I am trying to sum all the data from both tables. If the two correspond, add them up, if either doesn't correspond, still show the individual query total, ending up with estimates per year in sequence.
I have tried LEFT JOINS, FULL JOINS, (UNIONS). Nothing comes close to just summing where possible and supplying the data otherwise.
The key point here is pb and th_year information are years when the results are needed.
The error must be obvious in my code.
The separate aggregate queries produce the correct results.
Its the combining of the two queries where I am going wrong.
Would appreciate advice on this.
I thought it would be simple.
I think it probably is simple. Just stupidity on my side.
CREATE VIEW public.cf_th_data_totals_by_year_by_wc_2
AS SELECT
a.owner,
a.region,
a.district,
a.plantation,
b.th_year,
a.pb,
a.wc,
sum(a.tcf_calcarea + b.tth_calcarea) AS area,
sum(a.tcf_total + b.tth_total) AS total,
sum(a.tcf_ws + b.tth_ws) AS ws,
sum(a.tcf_util + b.tth_util) AS util,
sum(a.tcf_s + b.tth_s) AS s,
sum(a.tcf_a + b.tth_a) AS a,
sum(a.tcf_b + b.tth_b) AS b,
sum(a.tcf_c + b.tth_c) AS c,
sum(a.tcf_d + b.tth_d) AS d
FROM
(SELECT
cfdata.owner,
cfdata.region,
cfdata.district,
cfdata.plantation,
cfdata.pb,
cfdata.wc,
sum(cfdata.calcarea)AS tcf_calcarea,
sum(cfdata._ba) AS tcf_ba,
sum(cfdata._total) AS tcf_total,
sum( cfdata._ws) AS tcf_ws,
sum( cfdata._util) AS tcf_util,
sum( cfdata._s) AS tcf_s,
sum( cfdata._a) AS tcf_a,
sum( cfdata._b) AS tcf_b,
sum( cfdata._c) AS tcf_c,
sum( cfdata._d) AS tcf_d
FROM cfdata
GROUP BY cfdata.owner, cfdata.region, cfdata.district, cfdata.plantation, cfdata.pb, cfdata.wc
ORDER BY cfdata.owner, cfdata.region, cfdata.district, cfdata.plantation, cfdata.pb, cfdata.wc) a
JOIN
(SELECT
thdata.owner,
thdata.region,
thdata.district,
thdata.plantation,
thdata.th_year,
thdata.wc,
sum(thdata.calcarea)AS tth_calcarea,
sum(thdata.th_ba) AS tth_ba,
sum(thdata.th_total) AS tth_total,
sum(thdata.th_ws) AS tth_ws,
sum(thdata.th_util) AS tth_util,
sum(thdata.th_s) AS tth_s,
sum(thdata.th_a) AS tth_a,
sum(thdata.th_b) AS tth_b,
sum(thdata.th_c) AS tth_c,
sum(thdata.th_d) AS tth_d
FROM thdata
GROUP BY thdata.owner, thdata.region, thdata.district, thdata.plantation, thdata.th_year, thdata.wc
ORDER BY thdata.owner, thdata.region, thdata.district, thdata.plantation, thdata.th_year, thdata.wc) b
ON a.owner = b.owner AND a.region = b.region AND a.district = b.district and a.plantation = b.plantation AND a.pb = b.th_year AND a.wc = b.wc
GROUP BY a.owner, a.region, a.district, a.plantation, a.pb, b.th_year, a.wc
ORDER BY a.owner, a.region, a.district, a.plantation, a.pb, b.th_year, a.wc
thdata sample:
owner region district plantation compartment calcarea wc plantdate th_year th_age th_dbh th_ht th_vtree th_sph th_ba th_total th_ws th_util th_s th_a th_b th_c th_d thdata_id
KeyProjects Northern Marshlands River Glen A27 14.02 PFN 01/08/2009 2017 8 12.3 7.3 0.0289 179 28 70 14 56 42 14 0 0 0 1
KeyProjects Northern Marshlands River Glen A28 2.1 ESN 01/12/2010 2012 2 4.5 4.2 0 479 2 0 0 0 0 0 0 0 0 2
KeyProjects Northern Marshlands River Glen A28 2.1 ESN 01/12/2010 2014 4 10.2 9.6 0.0188 250 4 11 0 8 4 6 0 0 0 3
KeyProjects Northern Marshlands River Glen A29 2.71 ESN 01/08/2009 2011 2 4.5 4.2 0 479 3 0 0 0 0 0 0 0 0 4
KeyProjects Northern Marshlands River Glen A29 2.71 ESN 01/08/2009 2013 4 10.2 9.6 0.0188 250 5 14 0 11 5 8 0 0 0 5
thdata sample:
owner region district plantation compartment wc pb calcarea cfage dbh ht vtree sph _ba _total _ws _util _s _a _b _c _d tmai umai smai cfdata_id
KeyProjects Northern Marshlands River Glen A01 EF1 2021 5.27 10 14.5 20.4 0.1109 1004 90 585 21 564 84 401 79 0 0 11.1 10.7 1.5 1
KeyProjects Northern Marshlands River Glen A02 EF1 2021 36.1 10 14.5 20.4 0.1109 1004 614 4007 144 3863 578 2744 542 0 0 11.1 10.7 1.5 2
KeyProjects Northern Marshlands River Glen A03 EF1 2021 5.5 10 14.5 20.4 0.1109 1004 94 611 22 589 88 418 83 0 0 11.1 10.7 1.5 3
KeyProjects Northern Marshlands River Glen A04 EF1 2021 11.91 10 14.5 20.4 0.1109 1004 202 1322 48 1274 191 905 179 0 0 11.1 10.7 1.5 4
KeyProjects Northern Marshlands River Glen A05 EF1 2022 39.17 11 14.9 21.8 0.1286 1000 705 5053 157 4857 666 3486 744 0 0 11.7 11.3 1.7 5
expected result:
owner region district plantation th_year pb wc area total ws util s a b c d
KeyProjects Northern Marshlands River Glen 2008 2008 EF1 620.49 44176 1788 42389 7562 31953 2852 0 0
KeyProjects Northern Marshlands River Glen 2009 2009 EF1 635.65 44319 1778 42476 7634 31993 2852 0 0
KeyProjects Northern Marshlands River Glen 2010 2010 EF1 1202.31 87980 3453 84487 14906 63883 5704 0 0
KeyProjects Northern Marshlands River Glen 2011 2011 EF1 1948.37 132378 5275 127104 22662 95895 8556 0 0
KeyProjects Northern Marshlands River Glen 2012 2012 EF1 1378.61 87928 3429 84477 14878 63922 5704 0 0
Ok, you have a few issues with your query:
In the main query, do not use sum(a.tcf_calcarea + b.tth_calcarea) AS area. You can simply add but you should make sure to substitute any NULL values with 0 first: write coalesce(a.tcf_calcarea, 0) + coalesce(b.tth_calcarea, 0) AS area instead, for all sum()s. This also means you are not aggregating anymore at this level, so you should drop the final GROUP BY clause.
Now make a FULL OUTER JOIN between the two sub-queries. This means you get all rows from both sub-queries joined and where a corresponding row does not exist for either side, there are NULLs for column values.
It makes no sense to ORDER BY in a sub-query, the planner will process the row set in the way it sees best. You should order at the outer level only.
By definition (join condition) b.th_year = a.pb so you can drop one of the two columns.
Some syntactical pointers:
Your sub-queries use only one table so there is no need to work with table aliases, saves you a lot a typing.
More savings: Use positional parameters in your GROUP BY clause, so you can write GROUP BY 1, 2, 3, 4, 5, 6. Same with ORDER BY.
On the JOIN clause you can write USING (owner, region, district, plantation, wc) and then add WHERE a.pb = b.th_year. Other than that being shorter, you do not need sub-query aliases in the main query anymore for any of the USING columns. However, the fact that one join condition does not have corresponding column names does make things slightly more confused; up to you.
All in all, this is what you get:
CREATE VIEW public.cf_th_data_totals_by_year_by_wc_2 AS
SELECT owner, region, district, plantation, b.th_year, wc,
coalesce(a.tcf_calcarea, 0) + coalesce(b.tth_calcarea, 0) AS area,
...
FROM (
SELECT owner, region, district, plantation, pb, wc,
sum(calcarea) AS tcf_calcarea,
...
FROM cfdata
GROUP BY 1, 2, 3, 4, 5, 6) a
FULL JOIN (
SELECT owner, region, district, plantation, th_year, wc,
sum(calcarea) AS tth_calcarea,
...
FROM thdata
GROUP BY 1, 2, 3, 4, 5, 6) b
USING (owner, region, district, plantation, wc)
WHERE a.pb = b.th_year
ORDER BY 1, 2, 3, 4, 5, 6;

Creating a Neural Network for Classification in Matlab

I am trying to do classification using neural network and I have written the following code. Is this the code required to perform the training and classification?
%n1 to s5(n1=147,n2=205,n3=166,n4=204,n5=167,b1=156,b2=172,b3=153,b4=151,b5=160,r1=133,r2=135,r3=190,r4=143,ru1=133,ru2=153,ru3=154,ru4=137,s1=132,165,130,136,148)
%code:
T = [n1,n2,n3,n4,n5,b1,b2,b3,b4,b5,r1,r2,r3,r4,r5,ru1,ru2,ru3,ru4,s1,s2,s3,s4,s5];
x = [0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 4 4 4 4 4];
net1 = newff(minmax(T),[30 20 1],{'logsig','logsig','purelin'},'trainrp');
net1.trainParam.show = 1000;
net1.trainParam.lr = 0.04;
net1.trainParam.epochs = 7000;
net1.trainParam.goal = 1e-5;
[net1] = train(net1,T,x);
save net1 net1
Additionally, if I have more samples with more features then how should I represent it in T and X? How do I write T and x? For example:
sample 1 ..... 123 0.56 78 127 .......0
sample 2 .......127 0.89 56 132 ........0
sample3...... 134 0.72 65 140...1
sample4 156 0.55 69 145 .....1
sample 5 112 0.10 12 120 .......2
sample 6 123 0.15 24 99 .......2
sample 7 95 0.32 98 198 ....3
sample 8 90 0.45 90 200...... 3
Yes your solution appears correct. I should note, newff is obsolete as of R2010b NNET 7.0 and it's last used in R2010a NNET 6.0.4. The recommended function is now is feedforwardnet. The feedforwardnet implementation should be as follows
net = feedforwardnet([30 20]);
net.layers{1:2}.transferFcn = 'logsig'
net.trainParam.show = 1000;
net.trainParam.lr = 0.04;
net.trainParam.epochs = 7000;
net.trainParam.goal = 1e-5;
[net1] = train(net1,T,x);
save net1 net1
I'm not quite sure what you're trying to do with minmax but that should be the basic structure of the feedforward NN.
To structure T and x, say you have the following data:
[123 0.56 78 127] belongs to class 1
[127 0.89 56 132] belongs to class 1
[134 0.72 65 140] belongs to class 2
[156 0.55 69 145] belongs to class 2
[112 0.10 12 120] belongs to class 3
[123 0.15 24 99] belongs to class 3
You can set T and x as follows:
T = [ 123 0.56 78 127; 127 0.89 56 132; 134 0.72 65 140; 156 0.55 69 145; 112 0.10 12 120; 123 0.15 24 99];
x = [1; 1; 2; 2; 3; 3];