Running out of memory when running a PARFOR loop in MATLAB

Running out of memory when running a PARFOR loop in MATLAB - matlab

Some colleagues and I are running out of memory running the following function on a cluster, where F_scattered is a collection of interpolants generated using scatteredInterpolant. Can anyone suggest steps we can take to make the parfor loop less memory-intensive? I’m concerned that the code as written may be sending all of F_scattered to each worker, but I'm not certain how to check if that’s happening. If it helps, there’s more context at the bottom of this message. Thank you in advance for your help.
function [V1q,V2q] = interpolate(F_gridded,X_gridded,F_scattered,X_scattered)
w = F_gridded(X_gridded);
k = fix(w);
w = w-k;
[n,m] = size(F_scattered);
assert(n==2)
F1 = cell(m,1);
F2 = cell(m,1);
parfor j=1:m
kj = k==j;
X = X_scattered(kj,:);
if j<m
W = w(kj);
F1{j} = (1-W).*F_scattered{1,j}(X)+W.*F_scattered{1,j+1}(X);
F2{j} = (1-W).*F_scattered{2,j}(X)+W.*F_scattered{2,j+1}(X);
else
F1{j} = F_scattered{1,j}(X);
F2{j} = F_scattered{2,j}(X);
end
end
Vq = nan(size(w));
for j=1:m
kj = k==j;
V1q(kj) = F1{j};
V2q(kj) = F2{j};
end
end
Additional context is as follows. Some colleagues and I are attempting a collocation on a seven-dimensional grid using an endogenous grid method. The relevant details are as follows:
• Our dimensions are (b1,dstate,omega,u,b2,b3,eps_pi). b1, b2, b3 and eps are continuous. dstate, omega and u are discrete. eps_pi is scattered due to the endogenous grid algorithm, but the grids for a other variables were built from vectors using ndgrid.
• Because scatteredInterpolant will at most accept three dimensional arguments, we construct our interpolants by “slicing” our collocation grid in (b1,dstate,omega,u), then running scatteredInterpolant on the (b2,b3,eps_pi) values associated with each slice.
• Since b1 is the only continuous variable in (b1,dstate,omega,u), we then interpolate by (i) finding the two slices that bracket the query values for (b1,dstate,omega,u), then (ii) running the scattered interpolants associated with those slices on the query values for (b2,b3,eps_pi), and then (iii) taking an appropriately weighted average of the outputs.
Here are the key parts of the code in relatively minimal form:
%%% setting up the grid vectors
b1_t_grid = linspace(0,1,8);
dstate_t_grid = [-1:12];
omega_t_grid = [-2,2];
u_t_grid = [0,1];
b2_t_grid = linspace(0,1,5);
b3_t_grid = linspace(0,1,5);
quasidiff_t_grid = linspace(-9*0.05*4*2,9*0.05*4*2,8); % ``target’’ variable for the endogenous grid method
%%% separating sliced dimensions from the others
gridded_dims_TSnow = [numel(b1_t_grid),numel(dstate_t_grid),numel(omega_t_grid),numel(u_t_grid)];
n_gridded_substates_TSnow = prod(gridded_dims_TSnow);
scattered_dims_TSnow = [numel(b2_t_grid),numel(b3_t_grid),numel(quasidiff_t_grid)];
n_scattered_substates_TSnow = prod(scattered_dims_TSnow);
[b1_ts_TSnow,dstate_ts_TSnow,omega_ts_TSnow,u_ts_TSnow,b2_ts_TSnow,b3_ts_TSnow,quasidiff_ts_TSnow] = ndgrid(b1_t_grid,dstate_t_grid,omega_t_grid,u_t_grid,b2_t_grid,b3_t_grid,quasidiff_t_grid);
b1_ts_TSnow = reshape(b1_ts_TSnow ,n_gridded_substates_TSnow,n_scattered_substates_TSnow);
dstate_ts_TSnow = reshape(dstate_ts_TSnow ,n_gridded_substates_TSnow,n_scattered_substates_TSnow);
omega_ts_TSnow = reshape(omega_ts_TSnow ,n_gridded_substates_TSnow,n_scattered_substates_TSnow);
u_ts_TSnow = reshape(u_ts_TSnow ,n_gridded_substates_TSnow,n_scattered_substates_TSnow);
b2_ts_TSnow = reshape(b2_ts_TSnow ,n_gridded_substates_TSnow,n_scattered_substates_TSnow);
b3_ts_TSnow = reshape(b3_ts_TSnow ,n_gridded_substates_TSnow,n_scattered_substates_TSnow);
quasidiff_ts_TSnow = reshape(quasidiff_ts_TSnow,n_gridded_substates_TSnow,n_scattered_substates_TSnow);
grid_size_TSnow = numel(b1_ts_TSnow);
grid_dims_TSnow = size(b1_ts_TSnow);
%%% initial guess on the dimension to which we’ll ultimately apply the endogenous grid algorithm:
Gamma = #(yhat,omega) 0.05*(yhat-omega) + 0.10*(max(0,yhat-omega)).^2;
CP = #(omega,u) ((omega == 2).*(u == 1) - (omega == -2).*(u == 0))*0.05*4;
CP_ts_TSnow = CP(omega_ts_TSnow,u_ts_TSnow);
yhat_ts_TSnow_UNCnow = (ergoprob_L*omega_L + (1-ergoprob_L)*omega_H)*ones(grid_dims_TSnow);
eps_pi_ts_TSnow = quasidiff_ts_TSnow - Gamma(yhat_ts_TSnow_UNCnow,omega_ts_TSnow) - CP_ts_TSnow;
%%% pre-compute some stuff that will be useful for interpolation
substate_finder_TSnow = griddedInterpolant({b1_t_grid,dstate_t_grid,omega_t_grid,u_t_grid},reshape(1:n_gridded_substates_TSnow,gridded_dims_TSnow));
yhat_t_TSnow_fxns = cell(1,n_gridded_substates_TSnow); pihat_t_TSnow_fxns = cell(1,n_gridded_substates_TSnow);
parfor iii=1:n_gridded_substates_TSnow
yhat_t_TSnow_fxns{iii} = scatteredInterpolant(b2_ts_TSnow(iii,:)',b3_ts_TSnow(iii,:)',eps_pi_ts_TSnow(iii,:)', yhat_ts_TSnow_UNCnow(iii,:)');
pihat_t_TSnow_fxns {iii} = scatteredInterpolant(b2_ts_TSnow(iii,:)',b3_ts_TSnow(iii,:)',eps_pi_ts_TSnow(iii,:)',pihat_ts_TSnow_UNCnow(iii,:)');
end
%%% example of an interpolation, given an arbitrary grid of query points (b1s,dstates,omegas,us,b2s,b3s,epses)
yhats = NaN(size(b1s));
pihats= NaN(size(b1s));
[yhats(:),pihats(:)] = interpolate(substate_finder_TSnow ,[b1s(:),dstates(:), omegas(:),us(:)],...
[yhat_t_TSnow_fxns;pihat_t_TSnow_fxns],[b2s(:),b3(:),epses (:)]);

Related

Why do these linear inequality constraints work in Matlab but not Octave?

I have the following script performing a nonlinear optimization (NLP), which works in Matlab and hits MaxFunctionEvaluations after about 5 minutes on my machine:
% Generate sample consumption data (4 weeks)
x = 0:pi/8:21*pi-1e-1; %figure; plot(x, 120+5*sin(0.2*x).*exp(-2e-2*x) + 10*exp(-x))
y = 120 + 5*sin(0.2*x).*exp(-2e-2*x) + 10*exp(-x);
consumptionPerWeek = (y + [0; 11; -30; 4.5]).'; % in 168x4 format
consumptionPerHour = reshape(consumptionPerWeek, [], 1);
hoursPerWeek = 168;
hoursTotal = numel(consumptionPerHour);
daysTotal = hoursTotal/24;
weeksTotal = ceil(daysTotal/7);
%% Perform some simple calculations
q_M_mean = mean(consumptionPerHour);
dvsScalingPerWeek = mean(consumptionPerWeek)/q_M_mean;
%% Assumptions about reactor, hard-coded
V_liq = 5701.0; % m^3, main reactor; from other script
initialValue = 4.9298; % kg/m^3; from other script
substrates_FM_year = [676.5362; 451.0241];
total_DVS_year = [179.9586; 20.8867];
mean_DVS_conc = 178.1238; %kg/m^3
% Product yields (m^3 per ton DVS)
Y_M = 420;
Y_N = 389;
%% Test DVS model
DVS_hour = sum(total_DVS_year)/hoursTotal; % t/h
k_1 = 0.25; % 1/d
parameters = [k_1; Y_M; Y_N; V_liq];
%% Build reference and initial values for optimization
% Distribute feed according to demand (-24%/+26% around mean)
feedInitialMatrix = DVS_hour*ones(hoursPerWeek, 1)*dvsScalingPerWeek;
% Calculate states with reference feed (improved initials)
feedInitialVector = reshape(feedInitialMatrix, [], 1);
feedInitialVector = feedInitialVector(1:hoursTotal);
resultsRef = reactorModel1(feedInitialVector, initialValue, parameters, ...
mean_DVS_conc);
V_M_PS = 0 + cumsum(resultsRef(:,2)/24 - consumptionPerHour);
neededMStorage0 = max(V_M_PS) - min(V_M_PS);
%% Setup optimization problem (NLP): feed optimization with virtual product storage
% Objective function 1: Standard deviation of theoretical product storage volume
objFun1 = #(feedVector) objFunScalar(feedVector, initialValue, parameters, ...
mean_DVS_conc, consumptionPerHour);
% Bounds (lb <= x <= ub), i.e., decision variables can only range between 0 and 0.9*dailyDvsAmount
upperfeedLimitSlot = 0.90; % Limit DVS feed amount per *slot*
upperfeedLimitDay = 1.80; % Limit DVS feed amount per *day*
upperfeedLimitWeek = 1.37; % Limit DVS feed amount per *week*
lowerBound_nlp = zeros(1, hoursTotal);
upperBound_nlp = upperfeedLimitSlot*24*DVS_hour.*ones(1, hoursTotal);
% Equality Constraint 1: feed amount mean = constant
A_eq1_nlp = ones(1, hoursTotal);
b_eq1_nlp = DVS_hour*hoursTotal;
% Inequality Constraint 1: Limit max. daily amount
A_nlp1 = zeros(daysTotal, hoursTotal);
for dI = 1:daysTotal
A_nlp1(dI, (24*dI)-(24-1):(24*dI)) = 1;
end
b_nlp1 = upperfeedLimitDay*24*DVS_hour*ones(daysTotal, 1);
% Inequality Constraint 2: Limit max. weekly amount
A_nlp2 = zeros(weeksTotal, hoursTotal);
for wIi = 1:weeksTotal
A_nlp2(wIi, (168*wIi)-(168-1):(168*wIi)) = 1;
end
b_nlp2 = upperfeedLimitWeek*168*DVS_hour*ones(weeksTotal, 1);
% Summarize all inequality constraints
A_nlp = [A_nlp1; A_nlp2]; %sparse([A_nlp1; A_nlp2]);
b_nlp = [b_nlp1; b_nlp2]; %sparse([b_nlp1; b_nlp2]);
try
% Solver: fmincon (Matlab Optimization Toolbox) --> SQP-algorithm = best
optionen_GB = optimoptions('fmincon', 'Display', 'iter', 'FunctionTolerance', 1e-5, ...
'StepTolerance', 1e-4, 'MaxIterations', 2*hoursTotal, ...
'MaxFunctionEvaluations', 100*hoursTotal, 'HonorBounds', true, 'Algorithm', 'sqp');
catch
optionen_GB = optimset('Display', 'iter', 'TolFun', 1e-5, 'TolX', 1e-4, ...
'MaxIter', 2*hoursTotal, 'MaxFunEvals', 100*hoursTotal, 'Algorithm', 'sqp');
end
%% Solve gradient-based NLP
tic; [feedOpt, fval] = fmincon(#(feedVector) objFun1(feedVector), ...
feedInitialVector, A_nlp, b_nlp, A_eq1_nlp, b_eq1_nlp, lowerBound_nlp, upperBound_nlp, ...
[], optionen_GB); toc
%% Rerun model and calculate virtual storage volume with optimized input
resultsOpt = reactorModel1(feedOpt, initialValue, parameters, mean_DVS_conc);
q_M_Opt = resultsOpt(:,2)/24;
V_M_PS_opt = 0 + cumsum(q_M_Opt - consumptionPerHour);
neededMStorageOpt = max(V_M_PS_opt) - min(V_M_PS_opt);
sprintf('Needed product storage before optimization: %.2f m^3, \nafterwards: %.2f m^3. Reduction = %.1f %%', ...
neededMStorage0, neededMStorageOpt, (1 - neededMStorageOpt/neededMStorage0)*100)
%% Objective as separate function
function prodStorageStd = objFunScalar(dvs_feed, initialValues, parameters, mean_DVS_conc, ...
MConsumptionPerHour)
resultsAlgb = reactorModel1(dvs_feed(:, 1), initialValues, parameters, mean_DVS_conc);
q_M_prod = resultsAlgb(:,2)/24;
V_M_PS1 = 0 + cumsum(q_M_prod - MConsumptionPerHour);
prodStorageStd = std(V_M_PS1);
end
The external function reads like this:
function resultsArray = reactorModel1(D_feed, initialValue, parameters, D_in)
% Simulate production per hour with algebraic reactor model
% Feed is solved via a for-loop
hoursTotal = length(D_feed);
k_1 = parameters(1);
Y_M = parameters(2);
Y_N = parameters(3);
V_liq = parameters(4);
resultsArray = zeros(hoursTotal, 3);
t = 1/24;
liquid_feed = D_feed/(D_in*1e-3); % m^3/h
initialValue4Model0 = (initialValue*(V_liq - liquid_feed(1))*1e-3 ...
+ D_feed(1))*1e3/V_liq; % kg/m^3
resultsArray(1, 1) = initialValue4Model0*exp(-k_1*t);
% Simple for-loop with feed as vector per hour
for pHour = 2:hoursTotal
initialValue4Model = (resultsArray(pHour-1, 1)*(V_liq - liquid_feed(pHour))*1e-3 ...
+ D_feed(pHour))*1e3/V_liq; % kg/m^3
resultsArray(pHour, 1) = initialValue4Model*exp(-k_1*t);
end
resultsArray(:, 2) = V_liq*Y_M*k_1*resultsArray(:, 1)*1e-3; % m^3/d
resultsArray(:, 3) = V_liq*Y_N*k_1*resultsArray(:, 1)*1e-3; % m^3/d
end
When I execute the very same script in Octave (ver 5.1.0 with optim 1.6.0), I get:
error: linear inequality constraints: wrong dimensions
When in fact, the following line (executed from the command prompt)
sum(A_nlp*feedInitialVector <= b_nlp)
gives 32 on both Octave and Matlab, thus showing that dimensions are correct.
Is this a bug? Or is Octave treating linear (in)equality constraints somehow different than Matlab?
(Also, if you have tips on how to speed up this script, they would come in handy.)

I've debugged this a bit for you to get you started.
First enable debugging on error:
debug_on_error(1)
Then find the installation folder of optim, and have a look at file /private/__linear_constraint_dimensions__.m within.
*(I found this by doing a grep operation for the exact error you were getting, and found the relevant file. There is another one outside the private folder, you may want to look at that too.)
If you look at the lines trigerring the errors, you will notice, e.g. that an error is triggered if rm != o.np, where [rm, cm] = size(f.imc)
Now run your script and let it enter debug mode on error. You will see that:
debug> [rm, cm] = size(f.imc)
rm = 32
cm = 672
debug> o.np
ans = 672
debug> rm != o.np
ans = 1 % I.e. boolean test succeeds and triggers error
I have no idea what these are, presumably r and c reflect rows and columns, but in any case, you will see that it appears you are trying to match rows with columns and vice versa.
In other words, it looks like you may have passed your inputs in a transposed fashion at some point.
In any case, if this isn't exactly what's happening, this should be a decent starting point for you to figure the exact bug out.
I don't know why matlab "works". Maybe there's a bug in your code and matlab works despite it (for better or worse).
Or there might be a bug in optim transposing inputs by accident (or, at least, in a manner that is incompatible to matlab).
If you feel after your debugging adventures that it's a bug in the optim package, feel free to file a bug report :)

How to improve curve fitting in MATLAB?

I am using lsqcurvefit function of matlab to fit o the calculated values by a 'function' to observed data and optimizing two parameters of 'function'. After running the code I get optimized values of parameters but fit between calculated/simulated curve and observed curve is quite bade as can be seen here. I have tried using Marquardt Levenberg algorithm as well as Reflective region and tried reducing function tolerance but of no avail. What can I do to make simulated curve look more closely as the observed curve or is there a software for curve fitting with GUI so that I can manually change my simulated curve to make it look similar to observed curve?
The code I am using is
function wtfinal = fst(para,tes)
x = 45; k = para(1); b = 2; S = para(2); D = k*2/S; tes = 1:998;
g_vecrow = (xlsread('signaal 1.xlsx','signal','D2:D999'))';
g_vec = g_vecrow-g_vecrow(1) ;
t_vec = tes.*5;
for i = 2:998
t = t_vec(i);
g = g_vec(i);
tow = 0:5:t-1;
f = g.*(t - tow).^(-3/2).*exp(-x^2./(4*D*(t - tow)));
wt(i) = ((1/D)^(1/2)* x)/(2 * sqrt(pi))* trapz(tow,f);
end
wtfinal = wt + 147.902;
end
and using this function as
clear all; close all; clc;
ydata = (xlsread('signaal 1.xlsx','signal','C2:C999'))';
tes = 1:998;
x0 = [0.0327 0.00172];
lb = [];
ub = [];
opts = optimset('Algorithm', 'levenberg-marquardt');
[newpara,resnorm,~,exitflag,output]=lsqcurvefit(#fst,x0,tes,ydata,lb,ub,opts)
figure
plot(tes,ydata)
hold on
simulated=fst(newpara,tes);
plot(tes,simulated,'r')
The data file 'signaal 1' can be obtained from here

separate 'entangled' vectors in Matlab

I have a set of three vectors (stored into a 3xN matrix) which are 'entangled' (e.g. some value in the second row should be in the third row and vice versa). This 'entanglement' is based on looking at the figure in which alpha2 is plotted. To separate the vector I use a difference based approach where I calculate the difference of one value with respect the three next values (e.g. comparing (1,i) with (:,i+1)). Then I take the minimum and store that. The method works to separate two of the three vectors, but not for the last.
I was wondering if you guys can share your ideas with me how to solve this problem (if possible). I have added my coded below.
Thanks in advance!
Problem in figures:
clear all; close all; clc;
%%
alpha2 = [-23.32 -23.05 -22.24 -20.91 -19.06 -16.70 -13.83 -10.49 -6.70;
-0.46 -0.33 0.19 2.38 5.44 9.36 14.15 19.80 26.32;
-1.58 -1.13 0.06 0.70 1.61 2.78 4.23 5.99 8.09];
%%% Original
figure()
hold on
plot(alpha2(1,:))
plot(alpha2(2,:))
plot(alpha2(3,:))
%%% Store start values
store1(1,1) = alpha2(1,1);
store2(1,1) = alpha2(2,1);
store3(1,1) = alpha2(3,1);
for i=1:size(alpha2,2)-1
for j=1:size(alpha2,1)
Alpha1(j,i) = abs(store1(1,i)-alpha2(j,i+1));
Alpha2(j,i) = abs(store2(1,i)-alpha2(j,i+1));
Alpha3(j,i) = abs(store3(1,i)-alpha2(j,i+1));
[~, I] = min(Alpha1(:,i));
store1(1,i+1) = alpha2(I,i+1);
[~, I] = min(Alpha2(:,i));
store2(1,i+1) = alpha2(I,i+1);
[~, I] = min(Alpha3(:,i));
store3(1,i+1) = alpha2(I,i+1);
end
end
%%% Plot to see if separation worked
figure()
hold on
plot(store1)
plot(store2)
plot(store3)

Solution using extrapolation via polyfit:
The idea is pretty simple: Iterate over all positions i and use polyfit to fit polynomials of degree d to the d+1 values from F(:,i-(d+1)) up to F(:,i). Use those polynomials to extrapolate the function values F(:,i+1). Then compute the permutation of the real values F(:,i+1) that fits those extrapolations best. This should work quite well, if there are only a few functions involved. There is certainly some room for improvement, but for your simple setting it should suffice.
function F = untangle(F, maxExtrapolationDegree)
%// UNTANGLE(F) untangles the functions F(i,:) via extrapolation.
if nargin<2
maxExtrapolationDegree = 4;
end
extrapolate = #(f) polyval(polyfit(1:length(f),f,length(f)-1),length(f)+1);
extrapolateAll = #(F) cellfun(extrapolate, num2cell(F,2));
fitCriterion = #(X,Y) norm(X(:)-Y(:),1);
nFuncs = size(F,1);
nPoints = size(F,2);
swaps = perms(1:nFuncs);
errorOfFit = zeros(1,size(swaps,1));
for i = 1:nPoints-1
nextValues = extrapolateAll(F(:,max(1,i-(maxExtrapolationDegree+1)):i));
for j = 1:size(swaps,1)
errorOfFit(j) = fitCriterion(nextValues, F(swaps(j,:),i+1));
end
[~,j_bestSwap] = min(errorOfFit);
F(:,i+1) = F(swaps(j_bestSwap,:),i+1);
end
Initial solution: (not that pretty - Skip this part)
This is a similar solution that tries to minimize the sum of the derivatives up to some degree of the vector valued function F = #(j) alpha2(:,j). It does so by stepping through the positions i and checks all possible permutations of the coordinates of i to get a minimal seminorm of the function F(1:i).
(I'm actually wondering right now if there is any canonical mathematical way to define the seminorm so we get our expected results... I initially was going for the H^1 and H^2 seminorms, but they didn't quite work...)
function F = untangle(F)
nFuncs = size(F,1);
nPoints = size(F,2);
seminorm = #(x,i) sum(sum(abs(diff(x(:,1:i),1,2)))) + ...
sum(sum(abs(diff(x(:,1:i),2,2)))) + ...
sum(sum(abs(diff(x(:,1:i),3,2)))) + ...
sum(sum(abs(diff(x(:,1:i),4,2))));
doSwap = #(x,swap,i) [x(:,1:i-1), x(swap,i:end)];
swaps = perms(1:nFuncs);
normOfSwap = zeros(1,size(swaps,1));
for i = 2:nPoints
for j = 1:size(swaps,1)
normOfSwap(j) = seminorm(doSwap(F,swaps(j,:),i),i);
end
[~,j_bestSwap] = min(normOfSwap);
F = doSwap(F,swaps(j_bestSwap,:),i);
end
Usage:
The command alpha2 = untangle(alpha2); will untangle your functions:
It should even work for more complicated data, like these shuffled sine-waves:
nPoints = 100;
nFuncs = 5;
t = linspace(0, 2*pi, nPoints);
F = bsxfun(#(a,b) sin(a*b), (1:nFuncs).', t);
for i = 1:nPoints
F(:,i) = F(randperm(nFuncs),i);
end
Remark: I guess if you already know that your functions will be quadratic or some other special form, RANSAC would be a better idea for larger number of functions. This could also be useful if the functions are not given with the same x-value spacing.

Vectorize double for loops in Matlab

I present my simple working Matlab code and will ask questions:
tic
nrand1 = 10000;
nrand2 = 20000;
% Location matrix 1: [longitude, latitude, w1]
lmat1=[rand(nrand1,1)-75 rand(nrand1,1)+39 round(rand(nrand1,1)*1000)+1];
% Location matrix 2: [longitude, latitude, w2]
lmat2=[rand(nrand2,1)-75 rand(nrand2,1)+39 round(rand(nrand2,1)*100)+1];
% The number of rows for each matrix = In fact it's nrand1 X nrand2, obviously
nobs1 = size(lmat1,1);
nobs2 = size(lmat2,1);
% The number of pair-wise distances
% between L1 locations X L2 locations
ndist = nobs1*nobs2;
% Initialization: Distance vector and weight vector
hdist = zeros(ndist,1);
weight = zeros(ndist,1);
% Double for loop -- for calculating the pair-wise distances and weights
k=1;
for i=1:nobs1
for j=1:nobs2
% distances in kilometers.
lonH = sin(0.5*(lmat1(i,1)-lmat2(j,1))*pi/180.0)^2;
latH = sin(0.5*(lmat1(i,2)-lmat2(j,2))*pi/180.0)^2;
hdist(k) = 0.001*6372797.560856*2 ...
*asin(sqrt(latH+(cos(lmat1(i,2)*pi/180.0) ...
*cos(lmat2(j,2)*pi/180.0))*lonH));
weight(k) = lmat1(i,3)*lmat2(j,3);
k=k+1;
end
end
toc
The code calculates 10000 X 20000 distances and weights.
Elapsed time is 67.124844 seconds.
Is there a way to vectorize the double-loop processing, or to perform a parallel computing? If there is no room for performance improvement in Matlab, I may have to write the double loops in C and call it from Matlab. I don't know how to call C from matlab, so I will ask a separate question. Thanks!

Using bsxfun, you can eliminate the for loops and the need for calculating matrices for each combination (this should reduce memory usage). The following is about six times faster than your original code on my computer using R2014b:
nrand1 = 10000;
nrand2 = 20000;
% Location matrix 1: [longitude, latitude, w1]
lmat1=[rand(nrand1,1)-75 rand(nrand1,1)+39 round(rand(nrand1,1)*1000)+1];
% Location matrix 2: [longitude, latitude, w2]
lmat2=[rand(nrand2,1)-75 rand(nrand2,1)+39 round(rand(nrand2,1)*100)+1];
p180 = pi/180;
lonH = sin(0.5*bsxfun(#minus,lmat1(:,1).',lmat2(:,1))*p180).^2;
latH = sin(0.5*bsxfun(#minus,lmat1(:,2).',lmat2(:,2))*p180).^2;
hdist = 0.001*6372797.560856*2*asin(sqrt(latH+bsxfun(#times,cos(lmat1(:,2).'*p180),cos(lmat2(:,2)*p180)).*lonH));
hdist1 = hdist(:);
weight1 = bsxfun(#times,lmat1(:,3).',lmat2(:,3));
weight1 = weight1(:);
Note that by using the variable p180, the math is changed slightly so you won't get precisely the same values, but they will be very close.

The solution is that your inputs (lmat1 and lmat2) do not need to be matrices like you have them. Each one is really three vectors. Once you've broken out the vectors, you can create arrays that have every permutation of lmat1 and lmat2 together (which is what your double loop is doing). At that point, you can call your math as single, fully-vectorized operations...
%make your vectors
lmat1A = rand(nrand1,1)-75;
lmat1B = rand(nrand1,1)+39;
lmat1C = round(rand(nrand1,1)*1000)+1
lmat2A = rand(nrand2,1)-75;
lmat2B = rand(nrand2,1)+39;
lmat2C = round(rand(nrand2,1)*1000)+1
%make every combination
lmat1A = lmat1A(:)*ones(1,nrand2);
lmat1B = lmat1B(:)*ones(1,nrand2);
lmat1C = lmat1C(:)*ones(1,nrand2);
lmat2A = ones(nrand1,1)*(lmat2A(:)');
lmat2B = ones(nrand1,1)*(lmat2B(:)');
lmat2C = ones(nrand1,1)*(lmat2C(:)');
%do your math
lonH = sin(0.5*(lmat1A-lmat2A)*pi/180.0).^2;
latH = sin(0.5*(lmat1B-lmat2B)*pi/180.0).^2;
hdist = 0.001*6372797.560856*2 ...
.*asin(sqrt(latH+(cos(lmat1B*pi/180.0) ...
.*cos(lmat2B*pi/180.0)).*lonH)); %use element-wise multiplication
weight = lmat1C.*lmat2C;
%reshape your output into vectors (not arrays), which is what your original code does
lonH = lonH(:)
latH = latH(:)
hdist = hdist(:);
weight = weight(:);

Matlab - How to improve efficiency of two port matrix calculations?

I'm looking for a way to speed up some simple two port matrix calculations. See the below code example for what I'm doing currently. In essence, I create a [Nx1] frequency vector first. I then loop through the frequency vector and create the [2x2] matrices H1 and H2 (all functions of f). A bit of simple matrix math including a matrix left division '\' later, and I got my result pb as a [Nx1] vector. The problem is the loop - it takes a long time to calculate and I'm looking for way to improve efficiency of the calculations. I tried assembling the problem using [2x2xN] transfer matrices, but the mtimes operation cannot handle 3-D multiplications.
Can anybody please give me an idea how I can approach such a calculation without the need for looping through f?
Many thanks: svenr
% calculate frequency and wave number vector
f = linspace(20,200,400);
w = 2.*pi.*f;
% calculation for each frequency w
for i=1:length(w)
H1(i,1) = {[1, rho*c*k(i)^2 / (crad*pi); 0,1]};
H2(i,1) = {[1, 1i.*w(i).*mp; 0, 1]};
HZin(i,1) = {H1{i,1}*H2{i,1}};
temp_mat = HZin{i,1}*[1; 0];
Zin(i,1) = temp_mat(1,1)/temp_mat(2,1);
temp_mat= H1{i,1}\[1; 1/Zin(i,1)];
pb(i,1) = temp_mat(1,1); Ub(i,:) = temp_mat(2,1);
end

Assuming that length(w) == length(k) returns true , rho , c, crad, mp are all scalars and in the last line is Ub(i,1) = temp_mat(2,1) instead of Ub(i,:) = temp_mat(2,1);
temp = repmat(eyes(2),[1 1 length(w)]);
temp1(1,2,:) = rho*c*(k.^2)/crad/pi;
temp2(1,2,:) = (1i.*w)*mp;
H1 = permute(num2cell(temp1,[1 2]),[3 2 1]);
H2 = permute(num2cell(temp2,[1 2]),[3 2 1]);
HZin = cellfun(#(a,b)(a*b),H1,H2,'UniformOutput',0);
temp_cell = cellfun(#(a,b)(a*b),H1,repmat({[1;0]},length(w),1),'UniformOutput',0);
Zin_cell = cellfun(#(a)(a(1,1)/a(2,1)),temp_cell,'UniformOutput',0);
Zin = cell2mat(Zin);
temp2_cell = cellfun(#(a)({[1;1/a]}),Zin_cell,'UniformOutput',0);
temp3_cell = cellfun(#(a,b)(pinv(a)*b),H1,temp2_cell);
temp4 = cell2mat(temp3_cell);
p(:,1) = temp4(1:2:end-1);
Ub(:,1) = temp4(2:2:end);