coefficients of correlation on AMPL - postgresql

I have a specific question, and I will deeply appreciate any help please.
I am working on a specific project on AMPL (A Mathematical Programming language):
I need to implement an objective function that minimizes the risk on the cost for a variable, that has a cost as a parameter and correlation coefficients as well.
The risk is estimated using the variance on cost, and I have my correlation matrix data.
My correlation matrix looks like this:
correlation coefficients (%)
2015
Coal steam turbine Gas combustion turbine Wind Central PV Hydro non pumped Nuclear GenIII Nuclear GenIV Coal steam turbine CCS
1 0.47 0 0 0 0.12 0.12 1 Coal steam turbine
0.47 1 0 0 0 0.06 0.06 0.47 Gas combustion turbine
0 0 1 0 0 0 0 0 Wind
0 0 0 1 0 0 0 0 Central PV
0 0 0 0 1 0 0 0 Hydro non pumped
0.12 0.06 0 0 0 1 1 0.12 Nuclear GenIII
0.12 0.06 0 0 0 1 1 0.12 Nuclear GenIV
1 0.47 0 0 0 0.12 0.12 1 Coal steam turbine CCS
In my case, the risk on cost that I want to minimize is on fuel prices ( Fuels types are correlated and coefficients of correlation vary yearly, fuel prices depend on the type of the technology, the province, and the year).
I need to find a way to find an efficient way to enter the correlation matrix in a table ( database on psgAdmin (psql) ) and then use appropriate arguments to read them, and implement them on my objective function.
The table that I have so far looks like this:
table fuel_prices "inputs/fuel_prices.tab" IN:
[province, fuel, year], fuel_price, cv_fuel_price;
read table fuel_prices;
I need to modify it to add correlation coefficients.
# Table for the correlation coefficients
# table fuel_prices_corr "inputs/fuel_prices_corr.tab" and IN:
# [province, year], fuel, correl_coeff1, correl_coeff2;
# read table fuel_prices_corr;
The technologies I am using are extracted from tables as the following:
table generator_info "inputs/generator_info.tab" IN:
TECHNOLOGIES <- [technology], technology_id, fuel;
read table generator_info;
table gen_cap_cost "inputs/gen_cap_cost.tab" IN:
[technology, year], overnight_cost_yearly ~ overnight_cost, fixed_o_m_yearly ~ fixed_o_m, variable_o_m_yearly ~ variable_o_m;
read table gen_cap_cost;
table existing_plants "inputs/existing_plants.tab" IN:
EXISTING_PLANTS <- [project_id, province, technology],
ep_plant_name ~ plant_name, ep_carma_plant_id ~ carma_plant_id,
ep_capacity_mw ~ capacity_mw, ep_heat_rate ~ heat_rate, ep_cogen_thermal_demand ~ cogen_thermal_demand_mmbtus_per_mwh,
ep_vintage ~ start_year,
ep_overnight_cost ~ overnight_cost, ep_connect_cost_per_mw ~ connect_cost_per_mw, ep_fixed_o_m ~ fixed_o_m, ep_variable_o_m ~ variable_o_m,
ep_location_id;
read table existing_plants;
table new_projects "inputs/new_projects.tab" IN:
PROJECTS <- [project_id, province, technology], location_id, ep_project_replacement_id,
capacity_limit, capacity_limit_conversion, heat_rate, cogen_thermal_demand, connect_cost_per_mw;
read table new_projects;
My objective function looks like this:pid = project specific id , a = province, t = technology , p = PERIODS, the start of an investment period as well as the date when a power plant starts running, h = study hour - unique timepoint considered, and p = investment period.
sum{(pid, a, t, p)in PROJECT} Gen[pid, a,t, p, h] * fuel_cost[pid,a,t,p]))
Does anyone have a hint on that please, or a project that uses MPT, and correlated variables?

Here's an example of a table declaration for reading a two-dimensional parameter amt taken from here:
table dietAmts IN "ODBC" (ConnectionStr) "Amounts":
[NUTR, FOOD], amt;
In your case, you'll have the same set twice in the key section, something like [ENERGY_SOURCE, ENERGY_SOURCE], where ENERGY_SOURCE is a set of energy sources such as Coal steam turbine, etc. Since the matrix is symmetric you only need to store half of it.

Related

Neural network - exercise

I am currently learning for myself the concept of neural networks and I am working with the very good pdf from
http://neuralnetworksanddeeplearning.com/chap1.html
There are also few exercises I did, but there is one exercise I really dont understand, at least one step
Task:
There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01.
I found also the solution, as can be seen on the second image
I understand why the matrix has to have this shape, but I really struggle to understand the step, where the user calculates
0.99 + 3*0.01
4*0.01
I really don't understand these two steps. I would be very happy if someone can help me understand this calculation
Thank you very much for help
Output of previous layer is 10x1(x). Weight matrix is 4x10. New output layer will be 4x1. There are two assumption first:
x is 1 only at one row. xT= [1 0 0 0 0 0 0 0 0 0]. If you multiple this vector with matrix W your output will be yT=[0 0 0 0], because there is only 1 in x. After multiplication by W will be this only 1 multiple by 0th column of W which are zeroes.
Second assumption is, what if x is not 1 anymore, instead of one x can be xT=[0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01]. And if you perform multiplication of x with first row of W result is 0.05(I believe here is typo). When xT=[0.01 0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01] after multiplication with first row of W result is 1.03. Because:
0.01*0 + 0.99*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 = 1.03
So I believe there is a typo, because author probably assume 4 ones at first row of W, which is not true, because there is 5 ones. Because if there was 4 ones at first first row, than really results will be 0.04 for 0.99 at first row of x and 1.02 for 0.99 at second row of x.

How to identify recurring patterns in time-series data in Matlab

I am calculating ENSO indices using Matlab and one condition is that I have to find anomalous sea surface temperatures. The condition is that an El NiƱo event is characterised by sea surface temperatures that are 0.5 degrees above the normalised "0-value" for 5 months. I have gotten as far as to make my monthly time series data logical (i.e. "1" is a monthly data value above 0.5 and "0" is a monthly data value below 0.5), but I wanted to know if there was a command in Matlab that allows me to identify when this value repeats 5 times or more.
As an example code:
Monthly_data=[0 0 1 1 1 1 1 0 0 0 1 1 0 0 0 0 1 0 1 1 1 1 1 1 1 0]
I would ideally need a command that finds when a minimum of five "1"s occur after each other. Does this exist?
If more info is needed please let me know, I am new to matlab so I am not yet sure of the structure and syntax that is valued for asking questions on here.
Thank you!
not sure this is what you need but perhaps gives you some direction.
> x = diff(Monthly_data);
> find(x==-1)-find(x==1)
ans =
5 2 1 7
these are the lengths of the 1 sequences. You may need to pad front and end of the array with 0 to eliminate sequences missing one boundary.
To find the start index of the sequence longer than 5:
> s=find(x==1);
> s(find(x==-1)-s>5)
ans = 18
or
> s(find(x==-1)-s>=5)
ans =
2 18
note that because of the diff lag, these are one more than the array index, or consider it as position for zero based indexing.

FMINCON to schedule appliance usage to minimize total cost

I would like to write a code to find the minimum cost of running a dishwasher. This is dependent on the power required, hourly tariff rate, and time used. I am using fmincon for this however the code provided below shows the following error message:
User supplied objective function must return a scalar value
My objective function is to minimize (Total Cost * Time) s.t total cost is equal to the summation of (hourly power)*(hourly cost) from hour 1 to 24 is equal to 0.8 kwh, also, the total cost must be greater than Ca and the total run time for the day is one hour.
% Array showing the hourly electricity rates (cents per kwh)
R=zeros(24,1);
R(1:7,1)=6;
R(20:24,1)=6;
R(8:11,1)=9;
R(18:19,1)=9;
R(12:17,1)=13;
p_7 = transpose([0.8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]); %This is the power pattern of appliance (operates at 0.8 kWh for 1 hour daily)
for k=1:23
P7(:, k+1) = circshift(p_7,k); % This shows all the possible hours of operation
end
Total = P7*R; % This is the total cost per hour at different hourly tariffs
fun = #(x)Total.*(x);
x0 = [1];
A = Total;
%Ca = 0.5;
Ca = ones(1,24);
b = Ca;
Aeq = Total;
Daily_tot_7 = 2*ones(1,24);
beq = Daily_tot_7;
ub = 24;
lb = 1;
x = fmincon(fun,x0,A,b,Aeq,beq,lb,ub)
I believe that my understanding on converting constraints to fmincon is not correct and that I may be missing vital constraints for this issue.
Your output is currently a vector of outputs. You stated that your cost function is the summation of hourly elements. Therefore, your function definition should be
fun = #(x)sum(Total.*(x));
However, if I'm reading into this right, you wish to solve for each hour individually. In that case, you need to set your x0 variable to be defined as a 24x1 input
x0 = ones(24,1);
If that is the case you need to adjust your A,b,Aeq, and beq variables accordingly. However, do you actually need these, you can just not use them by replacing them with []
Finally, your p7 variable is likely better redefined as
p7 = R*.8;
My apologies if I misunderstood what you are trying to accomplish here.

proper way to normalize my distance matrices (matlab)

I am facing a doubt about a comparison that I want to do between two distance matrices. Lets say that I have my ground truth matrix:
gt = [1 0 0 0 1;
0 1 0 0 1;
0 0 1 0 0;
0 0 0 1 0];
and then I have two other extracted matrices:
v1 = [0.6136 0.1012 0.1146 0.1647 0.7445;
0.2264 0.7457 -0.0015 -0.0093 1.0026;
-0.0107 0.1975 1.1219 0.1699 0.1926;
-0.0019 0.0564 0.1560 0.7723 0.0565];
v2 = [0.8209 0.1390 0.1538 0.0203 0.9997;
0.2295 0.7720 -0.0028 -0.0112 1.0329;
-0.0167 0.2593 0.8172 0.2227 0.2501;
-0.0000 0.0549 0.1561 1.2728 0.0569];
Then I want to extract the distance matrix of each column of the above matrices to the columns of the ground truth matrix gt. The way I am getting this distance is dist1 = pdist2(gt', V1','euclidean'); and dist2 = pdist2(gt', V2','euclidean');. However, the result two distance matrices are not comparable right? Since the value range of each of the v1 and v2 matrices are different, therefore I need to apply a kind of normalization in order to be able to make conclusions on the result (please correct, if I am wrong).
However, I am not sure if this should be before or after I compute the distance matrices and what type of normalization to use. The negative values are playing a role of penalizing against (for that reason I am saying that I might need to apply the normalization after I compute the distance matrix, otherwise my first pick would be to normalize the v1 and v2 before I get their distance to the gt), therefore their affect should be kept and after the normalization.
Can you please give some feedback on that, how and what type of normalization to apply.
Thanks

Training a Decision Tree in MATLAB over binary train data

I want to train a decision tree in MATLAB for binary data. Here is a sample of data I use.
traindata <87*239> [array of data with 239 features]
1 0 1 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0 0 1 1 0 ... [till 239]
1 1 1 0 0 0 1 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 0 1 ... [till 239]
....
The thing is that this data corresponds to a form which has only options for yes/no. The outcome of the form is also binary and has the meaning that a patinet has some medical disorder or not! we have used classification tree and the classifier shows us double numbers. for example it branches the first node based on x137 value being bigger than 0.75 or not! Since we don't have 0.75 in our data and it has no yes/no meaning we wanted to use a decision tree which is best for our work. The best decision tree for us is the one that is trained based on boolean variables not double ones. Also it understands that the data is not continuous and for example instead of above representation shows x137 is yes o no (1 or 0). Can someone help me with this? I would also appreciate a solution to map our data to double variables and features if the boolean decision tree is not appliable. I am currently using classregtree in matlab with <87*237> as train and <87*1> as results.
classregtree has an optional input parameter categorical. Using this option, you can pass in a vector indicating which of your input variables are categorical (in your case, this vector would be 1x239, all ones). The decision tree should then contain yes/no decisions rather than numerical thresholds.
From the help of classregtree:
t = classregtree(X,y) creates a decision tree t for predicting the response y as a function of the predictors in the columns of X. X is an n-by-m matrix of predictor values. If y is a vector of n response values, classregtree performs regression. If y is a categorical variable, character array, or cell array of strings, classregtree performs classification.
What's the type of y in your case? It seems that classregtree is doing regression in your case but you want classification. So, y should be a categorical variable.
EDIT: To make your y categorical, you can try "nominal(y)".