scipy - how to make a matrix with specific rows and columns - scipy

i have the following code:
import scipy as sc
import matplotlib.pyplot as plt
....
MeanSquareDistance1D=lambda n,m: ((m*Lastpoint1d(n)**2).sum())/m
......
data=[]
for i in range(10,110,20):
#mydata=list(sc.mat([[i],[MeanSquareDistance1D(i,2000)]]))
#data.append(mydata)
mydata=(sc.array([i,MeanSquareDistance1D(i,2000)])).tolist() I did it like this
data.append(mydata)
plt.plot(data)
plt.show()
I want the 'mydata' to be a matrix or array(preferable) (i am convering it to a list in order to do the plot) with i lines (5 lines) and 2 columns.
The first column should be the 'i' and the second the value of MeanSquareDistance1D(i,2000).
I am receiving the error 'ValueError: x and y can be no greater than 2-D'

import scipy as sc
import matplotlib.pyplot as plt
....
MeanSquareDistance1D=lambda n,m: ((m*Lastpoint1d(n)**2).sum())/m
......
data=[]
for i in range(10,110,20):
#mydata=list(sc.mat([[i],[MeanSquareDistance1D(i,2000)]]))
#data.append(mydata)
mydata=(sc.array([i,MeanSquareDistance1D(i,2000)])).tolist() I did it like this
data.append(mydata)
plt.plot(data)
plt.show()

Related

How do you predict an outcome using a single value in a multiple logistic regression using statsmodels?

#import the needed pandas module
import pandas as pd
import statsmodels.formula.api as smf
#Upload the contents of an excel file to a DataFrame
df= pd.read_excel("C:/Users/ME/OneDrive/Desktop/weather.xlsx")
#Create a multiple logistic regression model
logRegModel = smf.logit('sunny ~ temp + barom', data = df)
#Fit the data in df into the model
results = logRegModel.fit()
#Print the results summary
print(results.summary())
#plot the scatterplot with the actual data
z = df.sunny
x = df.temp
y = df.barom
#make a prediction for a given temp x and barometer y reading
prediction = results.predict(pd.DataFrame({'temp': [21],'barom':[12]})
prediction.summary_frame(alpha=0.05)
# Creating figure
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (10, 7))
ax = plt.axes(projection ="3d")
# Creating plot
ax.scatter3D(x, y, z, color = "blue")
plt.title("3D scatter plot")
# show plot
plt.show()
I ran the code above. Everything works fine until it hits the code for making a prediction using a single x and a single y value. When I run the code to include:
#make a prediction for a given temp x and barometer y reading
prediction = results.predict(pd.DataFrame({'temp': [21],'barom':[12]})
prediction.summary_frame(alpha=0.05)
I recieve the following error:
File "<ipython-input-78-b26a4bf65d01>", line 36
from mpl_toolkits import mplot3d
^
SyntaxError: invalid syntax
This is so incredibly odd??? WHy does it run perfectly without the two prediction lines above and then when I include them it tells me a simple import function is a syntax error? It is my understanding reading the statsmodels docs, that in order to make a prediction for a multiple logistic regression model I need to pass a dataFrame into the predict function. Wasn't this done correctly above? My logistic regression is trying to predict if there is a sunny day from temperature and barameter reading. WHen I comment out the import statement above and run it I receive another error on another import statement. This is so strange. WHy soes it not accept my import statements? I ran the code on multiple IDEs and receive the same results. Thank you everyone in advance.

PyTorch Geometric directed graph shows wrong number of edges when converted to NetworkX undirected format

I loaded the PyTorch Geometric dataset OGB_MAG, converted it into a homogeneous dataset, and then checked its number of nodes as follows:
import torch
import numpy as np
from torch_geometric.datasets import OGB_MAG
import os.path as osp
import networkx as nx
from torch_geometric.utils import to_networkx, to_undirected
path = osp.join('..', 'data', 'OGB_MAG')
dataset = OGB_MAG(path)
data = dataset[0]
homogeneous_data = data.to_homogeneous()
print(homogeneous_data)
--> Data(node_type=[1939743], edge_index=[2, 21111007], edge_type=[21111007])
print(homogeneous_data.has_isolated_nodes())
--> True
Then I converted it to NetworkX format and checked its number of edges:
nx_data = to_networkx(homogeneous_data)
print(nx_data.number_of_edges())
--> 21111007
nx_data = nx_data.to_undirected(reciprocal=False)
print(nx_data.number_of_nodes(), nx_data.number_of_edges())
--> (1939743, 21091072)
print(len(list(nx.isolates(nx_data))))
--> 0
The number of edges is different (21111007 with PyG vs 21091072 with NetworkX after converting to undirected). The number of nodes is the same though. We also see that PyG says there are isolated nodes, but NetworkX says there are none.
Any insight as to why I'm seeing this discrepancy?

Issue with drawmapscale from basemap

when I tried this drawmapscale example I got the exact results from there:
map scale with correct size
but when I run the following:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
m = Basemap(projection='spstere',boundinglat=-50,lon_0=180,lat_0=-90,
resolution='l')
m.drawcoastlines()
m.fillcontinents(color='white',lake_color='aqua')
m.drawmapboundary(fill_color='lightblue')
m.drawmapscale(lon=0,lat=-60,lon0=180,lat0=-90,length=1000)
plt.show()
I have a map scale with wrong format like this one
Could anyone help me to figure out what I'm doing wrong here?
thanks,
I set the figure size bigger, then plot, and get a map with better scale bar.
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
fig = plt.figure()
fig.set_size_inches(10, 10)
m = Basemap(projection='spstere', boundinglat=-50, \
lon_0=180, lat_0=-90, resolution='l')
m.drawcoastlines()
m.fillcontinents(color='white', lake_color='aqua')
m.drawmapboundary(fill_color='lightblue')
m.drawmapscale(lon=0, lat=-60, lon0=180, lat0=-90, length=1000)
plt.show()
The resulting map:
Is this what you want?

how to create a sequence prediction in keras

I want to predict an output sequence with keras LSTM. I have 6 features and 6 output values. However my code throws an error in my label values.
Error when checking model target: expected dense_1 to have shape (None, 1) but got array with shape (4000, 6)
import numpy as np
np.random.seed(seed=7)
import pandas as pd
numbers = pd.read_csv(r'C:\...\Desktop\LSTM.csv', sep=';')
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Dropout
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
numval = numbers.values.astype('float32')
scaler =MinMaxScaler()
scaler.fit_transform(numval)
X = numval[:4000,0:6]
y = numval[:4000,6:]
y_test = numval[4000,:6:]
X = np.reshape(X,(X.shape[0],1,X.shape[1]))
X_test = numval[4000:,0:6]
X_test = np.reshape(X_test,(X_test.shape[0],1,X_test.shape[1]))
print(X.shape)
model = Sequential()
model.add(LSTM(6,input_dim=6,stateful=True))
model.add(Dense(6))
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam')
model.fit(X,y,batch_size=200,nb_epoch=100,verbose=2)
scores = model.evaluate(X_test,y_test,batch_size=32,verbose=1)
print(scores[1])
How can i get multiple labeloutputs? Thx

keep the scaling while drawing a weighed networkx

when I draw a weighed networkx, it does not really represented the real weight in terms of distance. I was curious if there is any parameters that I am missing or some other problem.
so, I started by making a simulated dataset as following
from pylab import plot,show
from numpy import vstack,array
from numpy.random import rand
from scipy.cluster.vq import kmeans,vq
from scipy.spatial.distance import euclidean
import networkx as nx
from scipy.spatial.distance import pdist, squareform, cdist
# data generation
data = vstack((rand(5,2) + array([12,12]),rand(5,2)))
a = pdist(data, 'euclidean')
def givexy(index1D, VectorLength):
return [index1D%VectorLength, index1D/VectorLength]
import matplotlib.pyplot as plt
plt.plot(data[:,0], data[:,1], 'o')
plt.show()
then, I calculate the euclidean distance among all pairs and use the distance as weight
G = nx.empty_graph(1)
for cnt, item in enumerate(a):
print cnt
G.add_edge(givexy(cnt, 10)[0], givexy(cnt, 10)[1], weight=item, length=0)
pos = nx.spring_layout(G)
nx.draw_networkx(G, pos)
edge_labels=dict([((u,v,),"%.2f" % d['weight'])
for u,v,d in G.edges(data=True)])
nx.draw_networkx_edge_labels(G,pos,edge_labels=edge_labels)
#~ nx.draw(G,pos,edge_labels=edge_labels)
plt.show()
exit()
you might a get a different plot - because of unknown reason it is random. my main problem is the distance of nodes. for example the distance between node 4 to 8 is 0.82 but it looks longer than the distance of node 7 and 0.
any hint ?
thank you,
The spring layout doesn't explicitly use the weights as distances. Higher weight edges produce shorter edges in general.
Though if you want to specify the positions explicitly you can do that:
from numpy import vstack,array
from numpy.random import rand
from scipy.spatial.distance import euclidean, pdist
import networkx as nx
import matplotlib.pyplot as plt
# data generation
data = vstack((rand(5,2) + array([12,12]),rand(5,2)))
a = pdist(data, 'euclidean')
def givexy(index1D, VectorLength):
return [index1D%VectorLength, index1D/VectorLength]
plt.plot(data[:,0], data[:,1], 'o')
G = nx.Graph()
for cnt, item in enumerate(a):
print cnt
G.add_edge(givexy(cnt, 10)[0], givexy(cnt, 10)[1], weight=item, length=0)
pos={}
for node,row in enumerate(data):
pos[node]=row
nx.draw_networkx(G, pos)
plt.savefig('drawing.png')