networkx graphing betweenness and eigenvector centrality - networkx

I am having trouble with the values coming out of eigenvector and betweenness centralities when trying to graph the distributions. Currently when graphing it shows negative values of around 0.4 to 0.0 when there clearly aren't any negative values in the centralities. However I am able to graph the degree centrality correctly. Below are images of what comes out currently.
The code looks like the following at the moment and I am unable to figure out why this problem persists:
degr_cent = nx.degree_centrality(G)
eigvec_cent = nx.eigenvector_centrality(G)
betw_cent = nx.betweenness_centrality(G)
degree_sequence = sorted((d for n,d in G.degree()), reverse=True)
fig = plt.figure("Degree of a random graph", figsize=(8, 8))
axgrid = fig.add_gridspec(5, 4)
ax = fig.add_subplot()
ax.bar(*np.unique(degree_sequence, return_counts=True))
ax.set_title("Degree histogram")
ax.set_xlabel("Degree")
ax.set_ylabel("# of Nodes")
plt.savefig('degree_distribution.png')
plt.show()
#degree centrality histogram
dc_degr_histogram = nx.degree_histogram(G)
dc_degrees = range(len(dc_degr_histogram))
dc_degr_histogram = nx.degree_histogram(G)
dc_degrees = range(len(dc_degr_histogram))
plt.figure(figsize=(12,8))
plt.loglog(dc_degrees, dc_degr_histogram,'go-')
plt.xlabel('Degree')
plt.ylabel('Frequency')
plt.savefig('degree_distribution.png')
plt.show()
eigvec_values = eigvec_cent.values()
eigvec_sequence = sorted(eigvec_values, reverse=True)
fig = plt.figure("Degree of a random graph", figsize=(8, 8))
axgrid = fig.add_gridspec(5, 4)
ax = fig.add_subplot()
ax.bar(*np.unique(eigvec_sequence, return_counts=True))
ax.set_title("Eigenvec histogram")
ax.set_xlabel("Eigenvec")
ax.set_ylabel("# of Nodes")
plt.savefig('eigvec_distribution.png')
plt.show()
betw_cent_values = betw_cent.values()
betw_cent_values = sorted(betw_cent_values)
fig = plt.figure("Degree of a random graph", figsize=(8, 8))
axgrid = fig.add_gridspec(5, 4)
ax = fig.add_subplot()
ax.bar(*np.unique(betw_cent_values, return_counts=True))
ax.set_title("betweenness")
ax.set_xlabel("betw")
ax.set_ylabel("# of Nodes")
plt.savefig('betweenness_distribution.png')
plt.show()

Related

How to use interpn?

I am trying to use interpn (in python using Scipy) to replicate results from Matlab using interp3. However, I am struggling to structure my arguments. I tried the following line:
f = interpn(blur_maps, fx, fy, pyr_level)
Where blur maps is a 600 x 800 x 7 representing a grayscale image at seven levels of blur,
fx and fy are indices of the seven maps. Both fx and fy are 2d arrays. pyr_level is a 2d array that contains values from 1 to 7 representing the blur map to be interpolated.
My question is since I incorrectly arranged the arguments, how can I arrange them in a way that works? I tried to look up examples but I didn't see anything similar. Here is an example of the data I am trying to interpolate:
import numpy as np
import cv2, math
from scipy.interpolate import interpn
levels = 7
img_path = '/Users/alimahdi/Desktop/i4.jpg'
img = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2GRAY)
row, col = img.shape
x_range = np.arange(0, col)
y_range = np.arange(0, row)
fx, fy = np.meshgrid(x_range, y_range)
e = np.exp(np.sqrt(fx ** 2 + fy ** 2))
pyr_level = 7 * (e - np.min(e)) / (np.max(e) - np.min(e))
blur_maps = np.zeros((row, col, levels))
blur_maps[:, :, 0] = img
for i in range(levels - 1):
img = cv2.pyrDown(img)
r, c = img.shape
tmp = img
for j in range(int(math.log(row / r, 2))):
tmp = cv2.pyrUp(tmp)
blur_maps[:, :, i + 1] = tmp
pixelGrid = [np.arange(x) for x in blur_maps.shape]
interpPoints = np.array([fx.flatten(), fy.flatten(), pyr_level.flatten()])
interpValues = interpn(pixelGrid, blur_maps, interpPoints.T)
finalValues = np.reshape(interpValues, fx.shape)
I am now getting the following error: ValueError: One of the requested xi is out of bounds in dimension 0 I do know that the problem is in interpPoints but I am not sure how to fix it. Any suggestions?
The documentation for scipy.interpolate.interpn states that the first argument is a grid of the data you are interpolating over (which is just the integers of the pixel numbers), second argument is data (blur_maps) and third arguments is the interpolation points in the form (npoints, ndims). So you would have to do something like:
import scipy.interpolate
pixelGrid = [np.arange(x) for x in blur_maps.shape] # create grid of pixel numbers as per the docs
interpPoints = np.array([fx.flatten(), fy.flatten(), pyr_level.flatten()])
# interpolate
interpValues = scipy.interpolate.interpn(pixelGrid, blur_maps, interpPoints.T)
# now reshape the output array to get in the original format you wanted
finalValues = np.reshape(interpValues, fx.shape)

Kalman Filter (pykalman): Value for obs_covariance and model without intercept

I am looking at the KalmanFilter from pykalman shown in examples:
pykalman documentation
Example 1
Example 2
and I am wondering
observation_covariance=100,
vs
observation_covariance=1,
the documentation states
observation_covariance R: e(t)^2 ~ Gaussian (0, R)
How should the value be set here correctly?
Additionally, is it possible to apply the Kalman filter without intercept in the above module?
The observation covariance shows how much error you assume to be in your input data. Kalman filter works fine on normally distributed data. Under this assumption you can use the 3-Sigma rule to calculate the covariance (in this case the variance) of your observation based on the maximum error in the observation.
The values in your question can be interpreted as follows:
Example 1
observation_covariance = 100
sigma = sqrt(observation_covariance) = 10
max_error = 3*sigma = 30
Example 2
observation_covariance = 1
sigma = sqrt(observation_covariance) = 1
max_error = 3*sigma = 3
So you need to choose the value based on your observation data. The more accurate the observation, the smaller the observation covariance.
Another point: you can tune your filter by manipulating the covariance, but I think it's not a good idea. The higher the observation covariance value the weaker impact a new observation has on the filter state.
Sorry, I did not understand the second part of your question (about the Kalman Filter without intercept). Could you please explain what you mean?
You are trying to use a regression model and both intercept and slope belong to it.
---------------------------
UPDATE
I prepared some code and plots to answer your questions in details. I used EWC and EWA historical data to stay close to the original article.
First of all here is the code (pretty the same one as in the examples above but with a different notation)
from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
# reading data (quick and dirty)
Datum=[]
EWA=[]
EWC=[]
for line in open('data/dataset.csv'):
f1, f2, f3 = line.split(';')
Datum.append(f1)
EWA.append(float(f2))
EWC.append(float(f3))
n = len(Datum)
# Filter Configuration
# both slope and intercept have to be estimated
# transition_matrix
F = np.eye(2) # identity matrix because x_(k+1) = x_(k) + noise
# observation_matrix
# H_k = [EWA_k 1]
H = np.vstack([np.matrix(EWA), np.ones((1, n))]).T[:, np.newaxis]
# transition_covariance
Q = [[1e-4, 0],
[ 0, 1e-4]]
# observation_covariance
R = 1 # max error = 3
# initial_state_mean
X0 = [0,
0]
# initial_state_covariance
P0 = [[ 1, 0],
[ 0, 1]]
# Kalman-Filter initialization
kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
transition_matrices = F,
observation_matrices = H,
transition_covariance = Q,
observation_covariance = R,
initial_state_mean = X0,
initial_state_covariance = P0)
# Filtering
state_means, state_covs = kf.filter(EWC)
# Restore EWC based on EWA and estimated parameters
EWC_restored = np.multiply(EWA, state_means[:, 0]) + state_means[:, 1]
# Plots
plt.figure(1)
ax1 = plt.subplot(211)
plt.plot(state_means[:, 0], label="Slope")
plt.grid()
plt.legend(loc="upper left")
ax2 = plt.subplot(212)
plt.plot(state_means[:, 1], label="Intercept")
plt.grid()
plt.legend(loc="upper left")
# check the result
plt.figure(2)
plt.plot(EWC, label="EWC original")
plt.plot(EWC_restored, label="EWC restored")
plt.grid()
plt.legend(loc="upper left")
plt.show()
I could not retrieve data using pandas, so I downloaded them and read from the file.
Here you can see the estimated slope and intercept:
To test the estimated data I restored the EWC value from the EWA using the estimated parameters:
About the observation covariance value
By varying the observation covariance value you tell the Filter how accurate the input data is (normally you just describe your confidence in the observation using some datasheets or your knowledge about the system).
Here are estimated parameters and the restored EWC values using different observation covariance values:
You can see the filter follows the original function better with a bigger confidence in observation (smaller R). If the confidence is low (bigger R) the filter leaves the initial estimate (slope = 0, intercept = 0) very slowly and the restored function is far away from the original one.
About the frozen intercept
If you want to freeze the intercept for some reason, you need to change the whole model and all filter parameters.
In the normal case we had:
x = [slope; intercept] #estimation state
H = [EWA 1] #observation matrix
z = [EWC] #observation
Now we have:
x = [slope] #estimation state
H = [EWA] #observation matrix
z = [EWC-const_intercept] #observation
Results:
Here is the code:
from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
# only slope has to be estimated (it will be manipulated by the constant intercept) - mathematically incorrect!
const_intercept = 10
# reading data (quick and dirty)
Datum=[]
EWA=[]
EWC=[]
for line in open('data/dataset.csv'):
f1, f2, f3 = line.split(';')
Datum.append(f1)
EWA.append(float(f2))
EWC.append(float(f3))
n = len(Datum)
# Filter Configuration
# transition_matrix
F = 1 # identity matrix because x_(k+1) = x_(k) + noise
# observation_matrix
# H_k = [EWA_k]
H = np.matrix(EWA).T[:, np.newaxis]
# transition_covariance
Q = 1e-4
# observation_covariance
R = 1 # max error = 3
# initial_state_mean
X0 = 0
# initial_state_covariance
P0 = 1
# Kalman-Filter initialization
kf = KalmanFilter(n_dim_obs=1, n_dim_state=1,
transition_matrices = F,
observation_matrices = H,
transition_covariance = Q,
observation_covariance = R,
initial_state_mean = X0,
initial_state_covariance = P0)
# Creating the observation based on EWC and the constant intercept
z = EWC[:] # copy the list (not just assign the reference!)
z[:] = [x - const_intercept for x in z]
# Filtering
state_means, state_covs = kf.filter(z) # the estimation for the EWC data minus constant intercept
# Restore EWC based on EWA and estimated parameters
EWC_restored = np.multiply(EWA, state_means[:, 0]) + const_intercept
# Plots
plt.figure(1)
ax1 = plt.subplot(211)
plt.plot(state_means[:, 0], label="Slope")
plt.grid()
plt.legend(loc="upper left")
ax2 = plt.subplot(212)
plt.plot(const_intercept*np.ones((n, 1)), label="Intercept")
plt.grid()
plt.legend(loc="upper left")
# check the result
plt.figure(2)
plt.plot(EWC, label="EWC original")
plt.plot(EWC_restored, label="EWC restored")
plt.grid()
plt.legend(loc="upper left")
plt.show()

Split dataset to test and train MATLAB [duplicate]

This question already has an answer here:
Matlab: How can I split my data matrix into two random subsets of column vectors while keeping the label information?
(1 answer)
Closed 5 years ago.
I want to split a very large dataset that I have (over one million observations) into a test and train set. As, you can see I have already managed to perform something similar in the code bellow with the use of dividerand.
What the code does is we have a very large set X, on every iteration we select N=1700 variables and then I split them in a ratio 7/3 - train/test. But, what I would further like to do though is instead of using %'s with the dividerand to use specific values. For instance, split the data into mini-batches with size 2000, and then use 500 for test and 1500 for training. Again, in the next loop we will select the data (2001:4000) and split them in 500 test and 1500 train etc.
Again, dividerand allows to do that with ratios but I would like to use actual values.
X = randn(10000,9);
mu_6 = zeros(510,613); % 390/802 - 450/695 - 510/613 - Test/Iterations
s2_6 = zeros(510,613);
nl6 = zeros(613,1);
RSME6 = zeros(613,1);
prev_batch = 0;
inf = #infGaussLik;
meanfunc = []; % empty: don't use a mean function
covfunc = #covSEiso; % Squared Exponential covariance
likfunc = #likGauss; % Gaussian likelihood
for k=1:613
new_batch = k*1700;
X_batch = X(1+prev_batch:new_batch,:);
[train,~,test] = dividerand(transpose(X_batch),0.7,0,0.3);
train = transpose(train);
test = transpose(test);
x_t = train(:,1:8); % Train batch we get 910 values
y_t = train(:,9);
x_z = test(:,1:8); % Test batch we get 390 values
y_z = test(:,9);
% Calculations for Gaussian process regression
if k==1
hyp = struct('mean', [], 'cov', [0 0], 'lik', -1);
else
hyp = hyp2;
end
hyp2 = minimize(hyp, #gp, -100, inf, meanfunc, covfunc, likfunc, x_t, y_t);
[m4 s4] = gp(hyp2, inf, meanfunc, covfunc, likfunc, x_t, y_t, x_z);
[nlZ4,dnlZ4] = gp(hyp2, inf, meanfunc, covfunc, likfunc, x_t, y_t);
RSME6(k,1) = sqrt(sum(((m4-y_z).^2))/450);
nl6(k,1) = nlZ4;
mu_6(:,k) = m4;
s2_6(:,k) = s4;
% End of calculations
prev_batch = new_batch;
disp(k);
end
How about:
[~, idx] = sort([randn(2000,1)]);
group1_idx = idx(1:1500);
group2_idx = idx(1501:end);

Using MATLAB to calculate offset between successive images

I'm taking images using a tunneling microscope. However, the scope is drifting between successive images. I'm trying to use MatLab to calculate the offset between images. The code below calculates in seconds for small images (e.g. 64x64 pixels), but takes >2 hrs to handle the 512x512 pixel images I'm dealing with. Do you have any suggestions for speeding up this code? Or do you know of better ways to track images in MatLab? Thanks for your help!
%Test templates
template = .5*ones(32);
template(25:32,:) = 0;
template(:,25:64) = 0;
data_A = template;
close all
imshow(data_A);
template(9:32,41:64) = .5;
template(:,1:24) = 0;
data_B = template;
figure, imshow(data_B);
tic
[m n] = size(data_B);
z = [];
% Loop over all possible displacements
for x = -n:n
for y = -m:m
paddata_B = data_B;
ax = abs(x);
zerocols = zeros(m,ax);
if x > 0
paddata_B(:,1:ax) = [];
paddata_B = [paddata_B zerocols];
else
paddata_B(:,(n-ax+1):n) = [];
paddata_B = [zerocols paddata_B];
end
ay = abs(y);
zerorows = zeros(ay,n);
if y < 0
paddata_B(1:ay,:) = [];
paddata_B = vertcat(paddata_B, zerorows);
else
paddata_B((m-ay+1):m,:) = [];
paddata_B = vertcat(zerorows, paddata_B);
end
% Full matrix sum after array multiplication
C = paddata_B.*data_A;
matsum = sum(sum(C));
% Populate array of matrix sums for each displacement
z(x+n+1, y+m+1) = matsum;
end
end
toc
% Plot matrix sums
figure, surf(z), shading flat
% Find maximum value of z matrix
[max_z, imax] = max(abs(z(:)));
[xpeak, ypeak] = ind2sub(size(z),imax(1))
% Calculate displacement in pixels
corr_offset = [(xpeak-n-1) (ypeak-m-1)];
xoffset = corr_offset(1)
yoffset = corr_offset(2)
What you're calculating is known as the cross-correlation of the two images. You can calculate the cross-correlation of all offsets at once using Discrete Fourier Transforms (DFT or FFT). So try something like
z = ifft2( fft2(dataA) .* fft2(dataB).' );
If you pad with zeros in the Fourier domain, you can even use this sort of math to get offsets in fractions of a pixel, and apply offsets of fractions of a pixel to an image.
A typical approach to this kind of problem is to use the fact that it works quickly for small images to your advantage. When you have large images, decimate them to make small images. Register the small images quickly and use the computed offset as your initial value for the next iteration. In the next iteration, you don't decimate the images as much, but you're starting with a good initial estimate of the offset so you can constrain your search for solutions to a small neighborhood near your initial estimate.
Although not written with tunneling microscopes in mind, a review paper that may be of some assistance is: "Mutual Information-Based Registration of Medical Images: A Survey" by Pluim, Maintz, and Viergever published in IEEE Transactions on Medical Imaging, Vol. 22, No. 8, p. 986.
below link will help you find transformation between 2 images and correct/recover the distorted (in your case, image with offset)
http://in.mathworks.com/help/vision/ref/estimategeometrictransform.html
index_pairs = matchFeatures(featuresOriginal,featuresDistorted, 'unique', true);
matchedPtsOriginal = validPtsOriginal(index_pairs(:,1));
matchedPtsDistorted = validPtsDistorted(index_pairs(:,2));
[tform,inlierPtsDistorted,inlierPtsOriginal] = estimateGeometricTransform(matchedPtsDistorted,matchedPtsOriginal,'similarity');
figure; showMatchedFeatures(original,distorted,inlierPtsOriginal,inlierPtsDistorted);
The inlierPtsDistored, inlierPtsOriginal have attributes called locations.
These are nothing but matching locations of one image on another. I think from that point it is very easy to calculate offset.
The function below was my attempt to compute the cross-correlation of the two images manually. Something's not quite right though. Will look at it again this weekend if I have time. You can call the function with something like:
>> oldImage = rand(64);
>> newImage = circshift(oldImage, floor(64/2)*[1 1]);
>> offset = detectOffset(oldImage, newImage, 10)
offset =
32 -1
function offset = detectOffset(oldImage, newImage, margin)
if size(oldImage) ~= size(newImage)
offset = [];
error('Test images must be the same size.');
end
[imageHeight, imageWidth] = size(oldImage);
corr = zeros(2 * imageHeight - 1, 2 * imageWidth - 1);
for yIndex = [1:2*imageHeight-1; ...
imageHeight:-1:1 ones(1, imageHeight-1); ...
imageHeight*ones(1, imageHeight) imageHeight-1:-1:1];
oldImage = circshift(oldImage, [1 0]);
for xIndex = [1:2*imageWidth-1; ...
imageWidth:-1:1 ones(1, imageWidth-1); ...
imageWidth*ones(1, imageWidth) imageWidth-1:-1:1];
oldImage = circshift(oldImage, [0 1]);
numPoint = abs(yIndex(3) - yIndex(2) + 1) * abs(xIndex(3) - xIndex(2) + 1);
corr(yIndex(1),xIndex(1)) = sum(sum(oldImage(yIndex(2):yIndex(3),xIndex(2):xIndex(3)) .* newImage(yIndex(2):yIndex(3),xIndex(2):xIndex(3)))) * imageHeight * imageWidth / numPoint;
end
end
[value, yOffset] = max(corr(margin+1:end-margin,margin+1:end-margin));
[dummy, xOffset] = max(value);
offset = [yOffset(xOffset)+margin-imageHeight xOffset+margin-imageWidth];

Is there a way to make a log 2d histogram in plotly?

Is there a way to make a log 2d histogram in plotly? I'm not sure if this is even possible.
Here is the matplotlib code that worked:
%matplotlib inline
import math
from matplotlib.colors import LogNorm
print len(list_length_view)
%time xy = [(math.log10(x[0]), math.log10(x[1])) for x in list_length_view if x[0] > 0 and x[1] > 0]
print len(xy)
## Why are there negative lengths or view counts!?!?
plt.hist2d([x[0] for x in list_length_view], [x[1] for x in list_length_view],
bins=(40, 60), range=numpy.array([(0, 10000), (0, 1000000)]),
norm=LogNorm(), cmap='Oranges')
cb1 = plt.colorbar()
plt.gca().set_xlabel(r'Clip Duration in Seconds')
plt.gca().set_ylabel(r'Views Per Clip')
cb1.set_label(r'Total Clips')
However, my original goal was to do it in plotly. Here is the code I tried to use so far to get it to work:
data4 = Data([
Histogram2d(
x=[x[0] for x in list_length_view], #Durations of highlights
y=[y[1] for y in list_length_view], # Views fo hgihlights
nbinsx=10,
nbinsy=10,
)])
layout4 = dict(
title='Public Highlight Analysis',
yaxis=YAxis(
title = 'Average Number of Views'),
xaxis1=XAxis(
title = "Duration of Highlight in Seconds")
)
fig3 = Figure(data=data4, layout=layout4)
py.iplot(fig3)
I also tried converting matplotlib to plotly.
def plot_mpl_fig():
xy = [(math.log10(x[0]), math.log10(x[1])) for x in list_length_view if x[0] > 0 and x[1] > 0]
plt.hist2d([x[0] for x in xy], [x[1] for x in xy],
bins=200, range=numpy.array([(-1, 4), (-1, 6)]),
norm=LogNorm(), cmap='Oranges')
cb1 = plt.colorbar()
plt.gca().set_xlabel(r'Log$_{10}$(Clip Duration in Seconds)')
plt.gca().set_ylabel(r'Log$_{10}$(Views Per Clip)')
cb1.set_label(r'Total Clips')
plot_mpl_fig()
mpl_fig1 = plt.gcf()
py_fig1 = tls.mpl_to_plotly(mpl_fig1, verbose=True)