Spatial clustering with maximum weight for a cluster - cluster-analysis

I want to do spatial clustering, grouping data with a spatial proximity based on the network.
Indeed,
I have two shapefiles layers in input.
A linear layer constituting the network, the one on which the proximity distance must be based.
Another layer of the point type which will be used for the clustering.
However, my shapefile layer has a weight field.
The cluster must take this weight into account so that the total sum of weights in a cluster does not exceed the indicated value: Ex: 200.
I made a working code using DBSCAN. However, this algorithm does not take into account the weight, but the distance separating two entities to be in the same cluster.
In my case, the distance has little importance as long as the entities are the closest and that the total sum of the weights in a cluster is respected.
I put below the code with DBSCAN if it helps.
I take anything that could help me to advance in this research.
Thanks in advance.
from sklearn.cluster import DBSCAN
import networkx as nx
import pandas as pd
import numpy as np
from qgis.core import QgsProject, QgsField, QgsFeatureIterator
from qgis.PyQt.QtCore import QVariant
from tabulate import tabulate
class Clustering:
"""
Clustering logical grouping of points (populations) based on the linear networks (road networks) """
def __init__(self, copy_network_layer, node_nearest_point_layer) -> None:
"""
:param copy_network_layer: copy_network_layer: name of the new network layer
:param node_nearest_point_layer: Name of the new node table.
"""
self.copy_network_layer = copy_network_layer
self.node_nearest_point_layer = node_nearest_point_layer
#staticmethod
def formation_graph_in_network(feat_network_layer: QgsFeatureIterator) -> nx:
"""
:param feat_network_layer: Network layer name
:return: Graph constitutes the origin and the end of the network
"""
# Construction du graphe du réseau en utilisant networkx
network_graph = nx.Graph()
for feat_line in feat_network_layer:
start_point = feat_line.geometry().asPolyline()[0]
end_point = feat_line.geometry().asPolyline()[-1]
network_graph.add_edge(start_point, end_point)
return network_graph
#staticmethod
def association_node_closest_network(feat_points_layer: QgsFeatureIterator,
network_graph: nx) -> tuple[np, pd]:
"""
Association of each point entity to the network closest to it
:param feat_points_layer: point layer name
:param network_graph: origin and the end of the network
:return:
"""
# Récupération des points en utilisant qgis.core
points = []
gid = []
for feat_pt in feat_points_layer:
point = feat_pt.geometry().asPoint()
nearest_node = min(network_graph.nodes(),
key=lambda x: np.linalg.norm(np.array(x) -
np.array([point.x(),
point.y()])))
gid.append(feat_pt.id())
points.append(nearest_node)
data = pd.DataFrame({"gid": gid})
return np.array(points), data
def cluster_dbscan(self, field_weight: str) -> pd.DataFrame:
"""
:param field_weight: This field will be created for contains information cluster
The same cluster will have same number.
:return: Each entity is associated with the number of its cluster
"""
# Chargement des fichiers shapefiles en utilisant qgis.core
network_layer = QgsProject.instance().mapLayersByName(self.copy_network_layer)[0]
points_layer = QgsProject.instance().mapLayersByName(self.node_nearest_point_layer)[0]
network_graph = self.formation_graph_in_network(network_layer.getFeatures())
# print(f"network_graph : {network_graph}")
points, clustering = self.association_node_closest_network(points_layer.getFeatures(),
network_graph)
# Clustering en utilisant DBSCAN
# esp : distance maximale entre deux échantillons pour qu'ils soient considérés comme
# faisant partie du même voisinage. Il ne s'agit pas de distance entre toutes les éléments d'un même cluster
# min_samples : nombre minimum d'échantillons requis pour former une densité d'échantillons
dbscan = DBSCAN(eps=500, min_samples=3, algorithm="auto")
# print(tabulate(points, headers="keys", tablefmt="psql"))
clustering[field_weight] = dbscan.fit_predict(points)
# Ajout des résultats du clustering à la couche des points en utilisant qgis.core
# points_layer.startEditing()
points_layer.dataProvider().addAttributes([QgsField("label",
QVariant.Int,
comment="Valeur des cluster")])
points_layer.updateExtents()
points_layer.updateFields()
# If not existe, it will be
idx_label = points_layer.fields().indexFromName("label")
# Ajout des couches à un projet QGIS
for [gid, label] in clustering.values:
attrs = {idx_label: int(label)}
points_layer.dataProvider().changeAttributeValues({gid: attrs})
return clustering
if name == "name":
cluster_instance = Clustering("network_layer", "point_layer")
cluster_instance.cluster_dbscan("poids")
print("FIN")

Related

Spark vs scikit-learn

I use pyspark for traffic classification using the decision tree model & I measure the time required for training the model. It took 2 min and 17 s. Then, I perform the same task using scikit-learn. In the second case, the training time is 1 min and 19 s. Why? since it is supposed that Spark performs the task in a distributed way.
This is the code for pyspark:
df = (spark.read.format("csv")\
.option('header', 'true')\
.option("inferSchema", "true")\
.load("D:/PHD Project/Paper_3/Datasets_Download/IP Network Traffic Flows Labeled with 75 Apps/Dataset-Unicauca-Version2-87Atts.csv"))
from pyspark.ml.classification import DecisionTreeClassifier
dt = DecisionTreeClassifier(featuresCol = 'features', labelCol = 'label', maxDepth = 10)
pModel = dt.fit(trainDF)
in scikit - learn
import warnings
warnings.filterwarnings('ignore')
path = 'D:/PHD Project/Paper_3/Datasets_Download/IP Network Traffic Flows Labeled with 75 Apps/Dataset-Unicauca-Version2-87Atts.csv'
df= pd.read_csv(path)
#df.info()
%%time
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

How I can overcome the _geoslib problem in this code?

This code is specified to visualize the CALIPSO satellite atmospheric profiles
The input files are .HDF
The code is copyrighted to the HDF group.
In the begining, I struggled with installing the basemap,
finally I installed it using .whl file on my windows10.
Now, this error is reached when I run the script:
SystemError:
execution of module _geoslib raised unreported exception.
I have looked a lot in google, but nothing done.
Can you please help me?
Cheers
"Copyright (C) 2014-2019 The HDF Group
Copyright (C) 2014 John Evans
This example code illustrates how to access and visualize a LaRC CALIPSO file
in file in Python.
If you have any questions, suggestions, or comments on this example, please use
the HDF-EOS Forum (http://hdfeos.org/forums). If you would like to see an
example of any other NASA HDF/HDF-EOS data product that is not listed in the
HDF-EOS Comprehensive Examples page (http://hdfeos.org/zoo), feel free to
contact us at eoshelp#hdfgroup.org or post it at the HDF-EOS Forum
(http://hdfeos.org/forums).
Usage: save this script and run
$python CAL_LID_L2_VFM-ValStage1-V3-02.2011-12-31T23-18-11ZD.hdf.py
The HDF file must either be in your current working directory
or in a directory specified by the environment variable HDFEOS_ZOO_DIR.
Tested under: Python 2.7.15::Anaconda custom (64-bit)
Last updated: 2019-01-25
"""
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.basemap import Basemap
from matplotlib import colors
USE_NETCDF4 = False
def run(FILE_NAME):
# Identify the data field.
DATAFIELD_NAME = 'Feature_Classification_Flags'
if USE_NETCDF4:
from netCDF4 import Dataset
nc = Dataset(FILE_NAME)
# Subset the data to match the size of the swath geolocation fields.
# Turn off autoscaling, we'll handle that ourselves due to presence of
# a valid range.
var = nc.variables[DATAFIELD_NAME]
data = var[:,1256]
# Read geolocation datasets.
lat = nc.variables['Latitude'][:]
lon = nc.variables['Longitude'][:]
else:
from pyhdf.SD import SD, SDC
hdf = SD(FILE_NAME, SDC.READ)
# Read dataset.
data2D = hdf.select(DATAFIELD_NAME)
data = data2D[:,1256]
# Read geolocation datasets.
latitude = hdf.select('Latitude')
lat = latitude[:]
longitude = hdf.select('Longitude')
lon = longitude[:]
# Subset data. Otherwise, all points look black.
lat = lat[::10]
lon = lon[::10]
data = data[::10]
# Extract Feature Type only through bitmask.
data = data & 7
# Make a color map of fixed colors.
cmap = colors.ListedColormap(['black', 'blue', 'yellow', 'green', 'red', 'purple', 'gray', 'white'])
# The data is global, so render in a global projection.
m = Basemap(projection='cyl', resolution='l',
llcrnrlat=-90, urcrnrlat=90,
llcrnrlon=-180, urcrnrlon=180)
m.drawcoastlines(linewidth=0.5)
m.drawparallels(np.arange(-90.,90,45))
m.drawmeridians(np.arange(-180.,180,45), labels=[True,False,False,True])
x,y = m(lon, lat)
i = 0
for feature in data:
m.plot(x[i], y[i], 'o', color=cmap(feature), markersize=3)
i = i+1
long_name = 'Feature Type at Altitude = 2500m'
basename = os.path.basename(FILE_NAME)
plt.title('{0}\n{1}'.format(basename, long_name))
fig = plt.gcf()
# define the bins and normalize
bounds = np.linspace(0,8,9)
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)
# create a second axes for the colorbar
ax2 = fig.add_axes([0.93, 0.2, 0.01, 0.6])
cb = mpl.colorbar.ColorbarBase(ax2, cmap=cmap, norm=norm, spacing='proportional', ticks=bounds, boundaries=bounds, format='%1i')
cb.ax.set_yticklabels(['invalid', 'clear', 'cloud', 'aerosol', 'strato', 'surface', 'subsurf', 'no signal'], fontsize=5)
# plt.show()
pngfile = "{0}.py.png".format(basename)
fig.savefig(pngfile)
if __name__ == "__main__":
# If a certain environment variable is set, look there for the input
# file, otherwise look in the current directory.
hdffile = 'CAL_LID_L2_VFM-ValStage1-V3-02.2011-12-31T23-18-11ZD.hdf'
try:
fname = os.path.join(os.environ['HDFEOS_ZOO_DIR'], ncfile)
except KeyError:
fname = hdffile
run(fname)
Please try miniconda and use basemap from conda-forge:
conda install -c conda-forge basemap

AssertionError: Torch not compiled with CUDA enabled (problem in torch vision)

so I am trying to run my object detection program and I keep getting the following error message:
AssertionError: Torch not compiled with CUDA enabled.
I don't understand why this happens, as I have a 2017 MacBook Pro with an AMD GPU, so I have no CUDA enabled GPU.
I added this statement in my code to make sure the device is set to 'cpu', however, it looks as if the program keeps trying to run it through a GPU even though it does not exist.
if torch.cuda.is_available():
device = torch.device('cuda')
else:
device = torch.device('cpu')
This is the place where the error happens (4th line):
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
print("Hey")
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
print("Hey")
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
evaluate(model, data_loader_test, device=device)
It would be really great, if anyone could help me with this issue!
Thanks everyone in advance!
PS: I already tried updating the Pytorch version, but still same problem.
Error output:
import os
import pandas as pd
import torch
import torch.utils.data
import torchvision
from PIL import Image
import utils
from engine import train_one_epoch, evaluate
import transforms as T
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
def parse_one_annot(path_to_data_file, filename):
data = pd.read_csv(path_to_data_file)
boxes_array = data[data["filename"] == filename][["xmin", "ymin", "xmax", "ymax"]].values
return boxes_array
class RaccoonDataset(torch.utils.data.Dataset):
def __init__(self, root, data_file, transforms=None):
self.root = root
self.transforms = transforms
self.imgs = sorted(os.listdir(os.path.join(root, "images")))
self.path_to_data_file = data_file
def __getitem__(self, idx):
# load images and bounding boxes
img_path = os.path.join(self.root, "images", self.imgs[idx])
img = Image.open(img_path).convert("RGB")
box_list = parse_one_annot(self.path_to_data_file,
self.imgs[idx])
boxes = torch.as_tensor(box_list, dtype=torch.float32)
num_objs = len(box_list)
# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# suppose all instances are not crowd
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.imgs)
dataset = RaccoonDataset(root="./raccoon_dataset", data_file="./raccoon_dataset/data/raccoon_labels.csv")
dataset.__getitem__(0)
def get_model(num_classes):
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new on
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
return model
def get_transform(train):
transforms = []
# converts the image, a PIL image, into a PyTorch Tensor
transforms.append(T.ToTensor())
if train:
# during training, randomly flip the training images
# and ground-truth for data augmentation
transforms.append(T.RandomHorizontalFlip(0.5))
return T.Compose(transforms)
def main():
dataset = RaccoonDataset(root="./raccoon_dataset",
data_file="raccoon_dataset/data/raccoon_labels.csv",
transforms=get_transform(train=True))
dataset_test = RaccoonDataset(root="./raccoon_dataset",
data_file="raccoon_dataset/data/raccoon_labels.csv",
transforms=get_transform(train=False))
torch.manual_seed(1)
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:-40])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-40:])
# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(dataset, batch_size=2, shuffle=True, num_workers=4,
collate_fn=utils.collate_fn)
data_loader_test = torch.utils.data.DataLoader(dataset_test, batch_size=1, shuffle=False, num_workers=4,
collate_fn=utils.collate_fn)
print("We have: {} examples, {} are training and {} testing".format(len(indices), len(dataset), len(dataset_test)))
if torch.cuda.is_available():
device = torch.device('cuda')
else:
device = torch.device('cpu')
num_classes = 2
model = get_model(num_classes)
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
# and a learning rate scheduler which decreases the learning rate by
# 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)
# let's train it for 10 epochs
num_epochs = 10
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
evaluate(model, data_loader_test, device=device)
os.mkdir("pytorch object detection/raccoon/")
torch.save(model.state_dict(), "pytorch object detection/raccoon/model")
if __name__ == '__main__':
main()
Turns out I had to reinstall torch and torch vision to make everything work

Combine Graphs in Networkx: Adding Graphs as Daughter Nodes

I have two Graphs.
Graph_1 is a Directed Acyclic Graph (DAG) which has the following edge list in df_1:
node_1 node_2
John Charity
John Constantine
Gordon John
Gordon Nick
Graph_1 = nx.from_pandas_edgelist(df_1, source="node_1",
target="node_2", create_using=nx.DiGraph())
Graph_2 is a random stochastic graph which is generated as follows:
Graph_2 = nx.erdos_renyi_graph(1000, 0.1)
I would like to join Graph_2 to Graph_1 by making the node with the highest betweenness centrality in Graph_2 a child node of the "Nick" node in Graph_1.
Does anyone have any ideas on how I could do this?
Following should work
import networkx as nx
import matplotlib.pylab as pl
edge_list = [
["John", "Charity"],
["John", "Constantine"],
["Gordon", "John"],
["Gordon", "Nick"], ]
Graph_1 = nx.from_edgelist(edge_list, create_using=nx.DiGraph())
# reduced the number for visualization
Graph_2 = nx.erdos_renyi_graph(10, 0.1)
node_with_highest_betweenness_centrality = max(nx.betweenness_centrality(Graph_2).items(), key=lambda x: x[1])[0]
joined_graph = nx.DiGraph(Graph_1)
joined_graph.add_edges_from(Graph_2.edges())
# not sure which direction you want
joined_graph.add_edge(node_with_highest_betweenness_centrality, "Nick")
nx.draw(joined_graph, with_labels=True)
pl.show()

Event-B Rodin platform, Modelling Sub-Sets relation

I'm a beginner on Event-B and I'm trying to model a machine where the set PERSONNE includes the set CLIENT which includes the set RESIDENT... I've searched on Rodin's documentation but I haven't found anything...
Here is the context
context contexteHumain
sets PERSONNE CLIENT RESIDENT
axioms
#axm1; finite(PERSONNE)
#axm2; finite(CLIENT)
#axm3; finite(RESIDENT) // Definition of three possible sets
and here is the machine
machine machineFunKeyHotel sees contexteHumain
variables
pers
reserv
cli
resid
chkin
chkout
invariants
#inv1: pers ⊆ PERSONNE
#inv2: cli ⊆ CLIENT
#inv3: resid ⊆ RESIDENT
// Définis les 3 variables d'ensemble de Personnes, Clients et Résidents
#inv4: reserv ∈ BOOL
#inv5: chkin ∈ BOOL
#inv6: chkout ∈ BOOL
// Les paramètres booléens si la ⦂personne a réservé, check-in ou check-out.
#inv7: CLIENT ⊆ PERSONNE
#inv8: RESIDENT ⊆ CLIENT
// Et les relations entre les différnets ensembles d'humains·
events
event INITIALISATION
begin
#act1: reserv ≔ FALSE
#act2: chkin ≔ FALSE
#act3: chkout ≔ FALSE
// Ces valeurs sont à faux, en effet, au début personne n'a ni réservé ni check-in
// Encore moins check out.
#act4: resid ≔ ∅
#act5: cli ≔ ∅
// Au début le nombre de client et de résidents sont zéro
#act6: pers ≔ ∅ //???
// Définir un nombre de personne presqu'infini (Personnes sur terre estimé à
// 7 290 477 807 personnes le vendredi 3 avril 2015 à 9 h 07 min et 24 s (GTM +1)
end
event réserver
// Lorsqu'une personne quelconque a réservé ça implique quelle soit ajoutée
// à l'ensemble clients.
any potentiel_client
where
#gr1: potentiel_client ∈ PERSONNE
#gr2: reserv = TRUE
then
#act1: cli ≔ cli ∪ {potentiel_client}
end
event checkerin
// Lorsqu'un client a passé l'étape de check-in, cela implique qu'il est ajouté
// à l'ensemble résident
any futur_resident
where
#gr1: futur_resident ∈ CLIENT
#gr2: chkin = TRUE
then
#act1: resid ≔ resid ∪ {futur_resident}
end
event checkerout
// Lorsqu'un résident a procédé au check out cela implique qu'il est retiré
// et de l'ensemble client et de l'ensemble résident.
any resident_actuel
where
#gr1: resident_actuel ∈ RESIDENT
#gr2: chkout = TRUE
then
#act1: resid ≔ resid ∖ {resident_actuel}
#act2: cli ≔ cli ∖ {resident_actuel}
end
end
I think I've got the idea but I cannot manage how to solve the various errors I get:
Types CLIENT and PERSONNE do not match (3 times)
Types RESIDENT and CLIENT do not match (2 times)...
There is a problem in your specification that is very common for beginners in Event-B. :)
You have introduced three deferred sets PERSONNE, CLIENT and RESIDENT. But I think a client or a resident are persons, too. And all deferred sets are constants, so with this construction, you're not able to modify your set of clients or residents.
I think the basic problem is the keyword SETS. You do not have to specify all sets of your machine there. Think TYPES! You just introduce a new type (I think you need only PERSONNE here) and have a constant for all elements.
context contexteHumain
sets PERSONNE
So remove the sets CLIENT and RESIDENT. I would suggest to remove all axioms, too. Do you really have to assume that the set of possible persons is finite?
Adapt your invariants:
invariants
#inv1: pers ⊆ PERSONNE
#inv2: cli ⊆ pers
#inv3: resid ⊆ cli
Remove inv7 and inv8. You probably want to add an invariant that the set of persons in your system is finite (in contrast to all possible persons in PERSONNE):
#inv9: finite(pers)
Accordingly, you would adapt your guards:
#gr1: futur_resident ∈ cli
resp.
#gr1: resident_actuel ∈ res