Importing agents and their attributes from CSV in mesa - python-3.7

My data is in .csv format and each row of data represents each agent while each column represents a certain attribute.
My question is how to assign agents and their attributes from a csv file in Mesa?
Could anyone help me with how to import them in Mesa please?
Thanks.

To import a .csv and turn them into attributes you want to know how you are reading in the file and then pass that into the agent class as you are creating it.
For example:
you have 'people.csv':
agent, height, weight
bill, 72 in, 190lbs
anne, 70 in, 170lbs
you read in as a list of lists:
import csv
people = []
with open('people.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
pass #skips the column headers
line_count += 1
else:
people.append(row)
# This will look like [[ 'bill', '72 in', '190lbs'], ['anne', '70 in',\
#'170lbs']]
you pass in a row into your agent instantiation, this would typically occur in the creation of your agent schedule in the model module:
for i in range(number_of_agents):
a = myAgent(i, self, row[i])
self.schedule.add(a)
you assign the row variables to agent attributes:
class myAgent(Agent):
def __init__(self, unique_id, model, row):
super().__init__(unique_id, model)
self.name = row[0] # e.g bill
self.height = row[1] # e.g. 72 in
self.weight = row[2] # e.g. 190 lbs

Related

Pandas to_csv hands on for data analysis

Question:
Create a series named heights_A with values 176.2, 158.4, 167.6, 156.2, and 161.4. These values represent heights of 5 students of class A.
Label each student as s1, s2, s3, s4, and s5.
Create another series named weights_A with values 85.1, 90.2, 76.8, 80.4, and 78.9. These values represent weights of 5 students of class A.
Label each student as s1, s2, s3, s4, and s5.
Create a dataframe named df_A, which contains the height and weight of five students namely s1, s2, s3, s4 and s5.
Label the columns as Student_height and Student_weight, respectively.
Write the contents of df_A to a CSV file named classA.csv.
Note: Use the to_csv method associated with a dataframe.
Verify if the file classA.csv exists in the present directory using command ls -l.
You can also view the contents of the file using the command cat classA.csv
My code:
import pandas as pd
heights_A = pd.Series([176.2, 158.4, 167.6, 156.2,161.4])
heights_A.index = ["S1","S2","S3","S4","S5"]
weights_A = pd.Series([85.1, 90.2, 76.8, 80.4, 78.9])
weights_A.index = ["S1","S2","S3","S4","S5"]
df_A = pd.DataFrame({'Student_height': heights_A,'Student_weight':weights_A}, index = weights_A.index)
df_A.to_csv("classA.csv")
while checking with ls -l and cat classA.csv I can see the expected contents yet the checker does not allow me to proceed. Not sure where I am wrong
Use small letters for s1,s2...
import pandas as pd
heights_A = pd.Series([176.2, 158.4, 167.6, 156.2,161.4])
heights_A.index = ["s1","s2","s3","s4","s5"]
print(heights_A[1])
weights_A = pd.Series([85.1, 90.2, 76.8, 80.4, 78.9])
weights_A.index = ["s1","s2","s3","s4","s5"]
df_A = pd.DataFrame({'Student_height': heights_A,'Student_weight':weights_A}, index = weights_A.index)
df_A.to_csv("classA.csv")
import os
import numpy as np
import pandas as pd
# Creating the Series
heights_A = pd.Series([ 176.2, 158.4, 167.6, 156.2, 161.4 ])
# Creating the row axis labels
heights_A.index = ['s1', 's2', 's3', 's4','s5']
# Creating the Series
weights_A = pd.Series([85.1, 90.2, 76.8, 80.4 , 78.9])
# Creating the row axis labels
weights_A.index = ['s1', 's2', 's3', 's4','s5']
df_A = pd.DataFrame()
df_A['Student_height'] = heights_A
df_A['Student_weight'] = weights_A
# Display the shape of dataframe df_A
df_A.shape
df_A = pd.DataFrame({'Student_height': heights_A,'Student_weight':weights_A}, index = weights_A.index)
df_A.to_csv("classA.csv")
os.system("cat classA.csv")
df_A2=pd.read_csv("classA.csv")
print(df_A2)
df_A3=pd.read_csv("classA.csv", index_col=0)
print(df_A3)
np.random.seed(100)
x=np.random.normal(loc=170.0,scale=25.0,size=5)
np.random.seed(100)
heights_B=pd.Series(x,index=['s1','s2','s3','s4','s5'])
np.random.seed(100)
y=np.random.normal(loc=75.0,scale=12.0,size=5)
weights_B=pd.Series(y,index=['s1','s2','s3','s4','s5'])
df_B = pd.DataFrame({'Student_height': heights_B,'Student_weight':weights_B}, index = weights_B.index)
df_B.to_csv("classB.csv",index=False)
os.system("cat classB.csv")
df_B2=pd.read_csv("classB.csv")
print(df_B2)
df_B3=pd.read_csv("classB.csv", header=None)
print(df_B3)
df_B4=pd.read_csv("classB.csv", header=None, skiprows=2)
print(df_B4)

Correct data loading, splitting and augmentation in Pytorch

The tutorial doesn't seem to explain how we should load, split and do proper augmentation.
Let's have a dataset consisting of cars and cats. The folder structure would be:
data
cat
0101.jpg
0201.jpg
...
dogs
0101.jpg
0201.jpg
...
At first, I loaded the dataset by datasets.ImageFolder function. Image Function has command "TRANSFORM" where we can set some augmentation commands, but we don't want to apply augmentation to test dataset! So let's stay with transform=None.
data = datasets.ImageFolder(root='data')
Apparently, we don't have folder structure train and test and therefore I assume a good approach would be to use split_dataset function
train_size = int(split * len(data))
test_size = len(data) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(data, [train_size, test_size])
Now let's load the data the following way.
train_loader = torch.utils.data.DataLoader(train_dataset,
batch_size=8,
shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset,
batch_size=8,
shuffle=True)
How can I apply transformations (data augmentation) to the "train_loader" images?
Basically I need to: 1. load data from the folder structure explained above
2. split the data into test/train parts
3. apply augmentations on train part.
I am not sure if there is a recommended way of doing this, but this is how I would workaround this problem:
Given that torch.utils.data.random_split() returns Subset, we cannot (can we? not 100% sure here I double-checked, we cannot) exploit their inner datasets, because they are the same (the only diference is in the indices). In this context, I would implement a simple class to apply transformations, something like this:
from torch.utils.data import Dataset
class ApplyTransform(Dataset):
"""
Apply transformations to a Dataset
Arguments:
dataset (Dataset): A Dataset that returns (sample, target)
transform (callable, optional): A function/transform to be applied on the sample
target_transform (callable, optional): A function/transform to be applied on the target
"""
def __init__(self, dataset, transform=None, target_transform=None):
self.dataset = dataset
self.transform = transform
self.target_transform = target_transform
# yes, you don't need these 2 lines below :(
if transform is None and target_transform is None:
print("Am I a joke to you? :)")
def __getitem__(self, idx):
sample, target = self.dataset[idx]
if self.transform is not None:
sample = self.transform(sample)
if self.target_transform is not None:
target = self.target_transform(target)
return sample, target
def __len__(self):
return len(self.dataset)
And then use it before passing the dataset to the dataloader:
import torchvision.transforms as transforms
train_transform = transforms.Compose([
transforms.ToTensor(),
# ...
])
train_dataset = ApplyTransform(train_dataset, transform=train_transform)
# continue with DataLoaders...
I think you can see this https://gist.github.com/kevinzakka/d33bf8d6c7f06a9d8c76d97a7879f5cb
def get_train_valid_loader(data_dir,
batch_size,
augment,
random_seed,
valid_size=0.1,
shuffle=True,
show_sample=False,
num_workers=4,
pin_memory=False):
"""
Utility function for loading and returning train and valid
multi-process iterators over the CIFAR-10 dataset. A sample
9x9 grid of the images can be optionally displayed.
If using CUDA, num_workers should be set to 1 and pin_memory to True.
Params
------
- data_dir: path directory to the dataset.
- batch_size: how many samples per batch to load.
- augment: whether to apply the data augmentation scheme
mentioned in the paper. Only applied on the train split.
- random_seed: fix seed for reproducibility.
- valid_size: percentage split of the training set used for
the validation set. Should be a float in the range [0, 1].
- shuffle: whether to shuffle the train/validation indices.
- show_sample: plot 9x9 sample grid of the dataset.
- num_workers: number of subprocesses to use when loading the dataset.
- pin_memory: whether to copy tensors into CUDA pinned memory. Set it to
True if using GPU.
Returns
-------
- train_loader: training set iterator.
- valid_loader: validation set iterator.
"""
error_msg = "[!] valid_size should be in the range [0, 1]."
assert ((valid_size >= 0) and (valid_size <= 1)), error_msg
normalize = transforms.Normalize(
mean=[0.4914, 0.4822, 0.4465],
std=[0.2023, 0.1994, 0.2010],
)
# define transforms
valid_transform = transforms.Compose([
transforms.ToTensor(),
normalize,
])
if augment:
train_transform = transforms.Compose([
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,
])
else:
train_transform = transforms.Compose([
transforms.ToTensor(),
normalize,
])
# load the dataset
train_dataset = datasets.CIFAR10(
root=data_dir, train=True,
download=True, transform=train_transform,
)
valid_dataset = datasets.CIFAR10(
root=data_dir, train=True,
download=True, transform=valid_transform,
)
num_train = len(train_dataset)
indices = list(range(num_train))
split = int(np.floor(valid_size * num_train))
if shuffle:
np.random.seed(random_seed)
np.random.shuffle(indices)
train_idx, valid_idx = indices[split:], indices[:split]
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=batch_size, sampler=train_sampler,
num_workers=num_workers, pin_memory=pin_memory,
)
valid_loader = torch.utils.data.DataLoader(
valid_dataset, batch_size=batch_size, sampler=valid_sampler,
num_workers=num_workers, pin_memory=pin_memory,
)
# visualize some images
if show_sample:
sample_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=9, shuffle=shuffle,
num_workers=num_workers, pin_memory=pin_memory,
)
data_iter = iter(sample_loader)
images, labels = data_iter.next()
X = images.numpy().transpose([0, 2, 3, 1])
plot_images(X, labels)
return (train_loader, valid_loader)
Seems that he use sampler=train_sampler to do the split.

Duplicate values in read from file minibatches TensorFlow

I followed the tutorial about Reading data with TF and made some tries myself. Now, the problem is that my tests show duplicate data in the batches I created when reading data from a CSV.
My code looks like this:
# -*- coding: utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import collections
import numpy as np
from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf
class XICSDataSet:
def __init__(self, height=20, width=195, batch_size=1000, noutput=15):
self.depth = 1
self.height = height
self.width = width
self.batch_size = batch_size
self.noutput = noutput
def trainingset_files_reader(self, data_dir, nfiles):
fnames = [os.path.join(data_dir, "test%d"%i) for i in range(nfiles)]
filename_queue = tf.train.string_input_producer(fnames, shuffle=False)
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
record_defaults = [[.0],[.0],[.0],[.0],[.0]]
data_tuple = tf.decode_csv(value, record_defaults=record_defaults, field_delim = ' ')
features = tf.pack(data_tuple[:-self.noutput])
label = tf.pack(data_tuple[-self.noutput:])
depth_major = tf.reshape(features, [self.height, self.width, self.depth])
min_after_dequeue = 100
capacity = min_after_dequeue + 30 * self.batch_size
example_batch, label_batch = tf.train.shuffle_batch([depth_major, label], batch_size=self.batch_size, capacity=capacity,
min_after_dequeue=min_after_dequeue)
return example_batch, label_batch
with tf.Graph().as_default():
ds = XICSDataSet(2, 2, 3, 1)
im, lb = ds.trainingset_files_reader(filename, 1)
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
tf.train.start_queue_runners(sess=sess)
for i in range(1000):
lbs = sess.run([im, lb])[1]
_, nu = np.unique(lbs, return_counts=True)
if np.array_equal(nu, np.array([1, 1, 1])) == False:
print('Not unique elements found in a batch!')
print(lbs)
I tried with different batch sizes, different number of files, different values of capacity and min_after_dequeue, but I always get the problem. In the end, I would like to be able to read data from only one file, creating batches and shuffling the examples.
My files, created ad hoc for this test, have 5 lines each representing samples, and 5 columns. The last column is meant to be the label for that sample. These are just random numbers. I'm using only 10 files just to test this out.
The default behavior for tf.train.string_input_producer(fnames) is to produce an infinite number of copies of the elements in fnames. Therefore, since your tf.train.shuffle_batch() capacity is larger than the total number of elements in your input files (5 elements per file * 10 files = 50 elements), and the min_after_dequeue is also larger than the number of elements, the queue will contain at least two full copies of the input data before the first batch is produced. As a result, it is likely that some batches will contain duplicate data.
If you only want to process each example once, you can set an explicit num_epochs=1 when creating the tf.train.string_input_producer(). For example:
def trainingset_files_reader(self, data_dir, nfiles):
fnames = [os.path.join(data_dir, "test%d" % i) for i in range(nfiles)]
filename_queue = tf.train.string_input_producer(
fnames, shuffle=False, num_epochs=1)
# ...

Confusion with classes and global variables

I've come to a halt in the making of my first project. I'm trying to make a timecard program. I decided to use class object to handle the variables locally, but I can't figure out how to create a class object from user input.
import time
import datetime
import sqlite3
class Employee(object):
def __init__(self, name, position, wage=0, totalpay=0, totalhours=0):
self.name = name
self.position = position
self.wage = wage
self.totalpay = totalpay
self.totalhours = totalhours
def HourlyPay(self):
if self.position not in range(1, 4):
return "%s is not a valid position" % self.position
elif self.position == 1:
self.wage = 105.00
elif self.position == 2:
self.wage = 112.50
elif self.position == 3:
self.wage = 118.50
return "%s at position %i is making %i DKK per hour" % (self.name, self.position, self.wage)
def Salary(self, hours):
self.hours = hours
self.totalpay += self.wage * self.hours
self.totalhours += self.hours
return "%s next salary will be %i DKK" % (self.name, self.totalpay)
# This is out Employee object
EmployeeObj = Employee('John Doe', 1) # Our Employee object
EmployeeObj.HourlyPay()
EmployeeObj.Salary(43) # Takes 'hours' as argument
# Temporary Database config and functions below
conn = sqlite3.connect('database.db')
c = conn.cursor()
# For setting up the database tables: name, position and total.
def Create_table():
c.execute('CREATE TABLE IF NOT EXISTS EmployeeDb(name TEXT, position INTEGER, total REAL)')
# Run to update values given by our Employee object
def Data_entry():
name = str(EmployeeObj.name)
position = int(EmployeeObj.position)
total = float(EmployeeObj.totalpay)
c.execute('INSERT INTO EmployeeDb (name, position, total) VALUES (?, ?, ?)',
(name, position, total))
conn.commit()
c.close()
conn.close()
return True
What I'm trying to achieve is to create this variable from user input:
EmployeeObj = Employee('John Doe', 1) # Our Employee object
May be you can do something like this:
name = input("Enter employee name:")
position = int(input("Enter employee position:"))
EmployeeObj = Employee(name, position)

How do you order annotations by offset in brat?

When using the rapid annotator tool brat, it appears that the created annotations file will present the annotation in the order that the annotations were performed by the user. If you start at the beginning of a document and go the end performing annotation, then the annotations will naturally be in the correct offset order. However, if you need to go earlier in the document and add another annotation, the offset order of the annotations in the output .ann file will be out of order.
How then can you rearrange the .ann file such that the annotations are in offset order when you are done? Is there some option within brat that allows you to do this or is it something that one has to write their own script to perform?
Hearing nothing, I did write a python script to accomplish what I had set out to do. First, I reorder all annotations by begin index. Secondly, I resequence the label numbers so that they are once again in ascending order.
import optparse, sys
splitchar1 = '\t'
splitchar2 = ' '
# for brat, overlapped is not permitted (or at least a warning is generated)
# we could use this simplification in sorting by simply sorting on begin. it is
# probably a good idea anyway.
class AnnotationRecord:
label = 'T0'
type = ''
begin = -1
end = -1
text = ''
def __repr__(self):
return self.label + splitchar1
+ self.type + splitchar2
+ str(self.begin) + splitchar2
+ str(self.end) + splitchar1 + self.text
def create_record(parts):
record = AnnotationRecord()
record.label = parts[0]
middle_parts = parts[1].split(splitchar2)
record.type = middle_parts[0]
record.begin = middle_parts[1]
record.end = middle_parts[2]
record.text = parts[2]
return record
def main(filename, out_filename):
fo = open(filename, 'r')
lines = fo.readlines()
fo.close()
annotation_records = []
for line in lines:
parts = line.split(splitchar1)
annotation_records.append(create_record(parts))
# sort based upon begin
sorted_records = sorted(annotation_records, key=lambda a: int(a.begin))
# now relabel based upon the sorted order
label_value = 1
for sorted_record in sorted_records:
sorted_record.label = 'T' + str(label_value)
label_value += 1
# now write the resulting file to disk
fo = open(out_filename, 'w')
for sorted_record in sorted_records:
fo.write(sorted_record.__repr__())
fo.close()
#format of .ann file is T# Type Start End Text
#args are input file, output file
if __name__ == '__main__':
parser = optparse.OptionParser(formatter=optparse.TitledHelpFormatter(),
usage=globals()['__doc__'],
version='$Id$')
parser.add_option ('-v', '--verbose', action='store_true',
default=False, help='verbose output')
(options, args) = parser.parse_args()
if len(args) < 2:
parser.error ('missing argument')
main(args[0], args[1])
sys.exit(0)