Why I cannot extend a scipy rv_discrete class successfully? - scipy

I'm trying to extend the rv_discrete scipy class, as it is supposed to work in every case while extending a class.
I just want to add a couple of instance attributes.
from scipy.stats import rv_discrete
class Distribution(rv_discrete):
def __init__(self, realization):
self._realization = realization
self.num = len(realization)
#stuff to obtain random alphabet and probabilities from realization
super().__init__(values=(alphabet,probabilities))
This should allow me to do something like this :
realization = #some values
dist = Distribution(realization)
print(dist.mean())
Instead, I receive this error
ValueError: rv_discrete.__init__(..., values != None, ...)
If I simply create a new rv_discrete object as in the following line of code
dist = rv_discrete(values=(alphabet,probabilities))
It works just fine.
Any idea why? Thank you for your help

Related

What does Pytorch's nn.Linear(x,y) return?

I am new to object orientation, and I am having troubles understanding the following:
import torch.nn as nn
class mynet(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(20, 64)
def forward(self, x):
x = self.fc1(x)
The line self.fc1 = nn.Linear(20, 64) is supposed to create a member variable fc1 to my class, right? But what is the return value of nn.Linear(20, 64)?
According to the documentation, nn.Linear is defined as
class torch.nn.Linear(in_features: int, out_features: int, bias: bool = True).
However, in my basic OOP tutorial I have only seen something like class CLASSNAME(BASECLASS) so that the class CLASSNAME inherits from BASECLASS. What does the documentation mean with its way of writing all that stuff in between the brackets?
Also, the line x=fc1(x) somehow makes it look as if fc1 was a function now.
I seem to lack OOP knowledge here... Any help appreciated!
First lets take a look at this
self.fc1 = nn.Linear(20, 64)
This part is probably familiar to someone with a basic understanding of python and OOP. Here we are simply creating a new instance of an nn.Linear class and initializing the class using positional arguments 20 and 64 corresponding to in_features and out_features respectively. The arguments in the documentation are the expected arguments to be passed to nn.Linear's __init__ method.
Now for the part that's probably a little more confusing
x = self.fc1(x)
The nn.Linear class is a callable since it's parent class, nn.Module, implements a special method named __call__. That means you can treat self.fc1 like a function and do things like x = self.fc1(x), which is equivalent to x = self.fc1.__call__(x).
You can create a little examination:
import torch
import torch.nn as nnn
fc1 = nn.Linear(20, 64)
print(fc1, type(fc1))
ret = fc1(torch.randn(20))
print(ret, type(ret), ret.shape)
Out:
Linear(in_features=20, out_features=64, bias=True) <class 'torch.nn.modules.linear.Linear'>
tensor([-0.2795, 0.8476, -0.8207, 0.3943, 0.1464, -0.2174, 0.6605, 0.6072,
-0.6881, -0.1118, 0.8226, 0.1515, 1.3658, 0.0814, -0.8751, -0.9587,
0.1310, 0.2539, -0.3072, -0.0225, 0.4663, -0.0019, 0.0404, 0.9279,
0.4948, -0.3420, 0.9061, 0.1752, 0.1809, 0.5917, -0.1010, -0.3210,
1.1910, 0.5145, 0.2254, 0.2077, -0.0040, -0.6406, -0.1885, 0.5270,
0.0824, -0.0787, 1.5140, -0.7958, 1.1727, 0.1862, -1.0700, 0.0431,
0.6849, 0.1393, 0.7547, 0.0917, -0.3264, -0.2152, -0.0728, -0.6441,
-0.1162, 0.4154, 0.3486, -0.1693, 0.6697, 0.0229, 0.0311, 0.1433],
grad_fn=<AddBackward0>) <class 'torch.Tensor'> torch.Size([64])
fc1 is of type class 'torch.nn.modules.linear.Linear'.
It needs some "juice" to work. In your case it needs the input tensor torch.randn(20) to return the output of torch.Size([64]).
So fc1 is a class instance that you can run with () in which case the forward() method of a class nn.Linear will be called.
In most cases when working with your modules (like mynet in your case) you will list the modules in __init__, and then in the forward of your module you will be defining what will happen (the behavior).
The three kind of modules in PyTorch are:
Functional modules
Default modules
Custom modules
Custom modules like mynet you created typically use default modules:
nn.Identity()
nn.Embedding()
nn.Linear()
nn.Conv2d()
nn.BatchNorm() (BxHxW)
nn.LayerNorm() (CxHxW)
nn.Dropout()
nn.ReLU()
And many many other modules that I haven't set. But of course, you can create custom modules without any default modules, just by using nn.Parameter(), see the last example.
The third kind functional modules are defined here.
Also check nn.Linear implementation. You may note the F.linear() functional module is used.
You may test the naive implementation of Linear from Fastai Book:
import torch
import torch.nn as nn
import math
class Linear(nn.Module):
def __init__(self, n_in, n_out):
super().__init__()
self.weight = nn.Parameter(torch.randn(n_out, n_in) * math.sqrt(2/n_in))
self.bias = nn.Parameter(torch.zeros(n_out))
def forward(self, x): return x # self.weight.T + self.bias
fc = Linear(20,64)
ret = fc(torch.randn(20))
print(ret.shape) # 64
You may try to understand the difference between the naive implementation provided inside PyTorch.

Applying models to data - Inheriting or accessing the attributes of a root class (data) from a sub-class (model)

I am fitting a number of models to several different datasets. I would like to store the methods and attributes of the datasets (e.g. X, y, trainTestSplit(), etc.) in 'Dataset' objects, and store the the methods and attributes of the models (e.g. fit(), hyperparameters, scores, etc.) in 'Model' objects, and store the 'Models' in the 'Datasets' (several Models for each Dataset).
I have tried several different ways to make this work, including inheritance with the use of super(); indenting (nesting) the Model class inside the dataset class; and with functions in the Dataset class which can be called by the Model class.
This is about as close as I've come:
class Dataset :
def __init__(self, X, y, attr) :
self.X = X
self.y = y
self.attr = attr
def trainTestSplit(self, **kwargs):
self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(self.X, self.y, **kwargs)
class Model(Dataset) :
def __init__(self, regressor):
self.reg = regressor
super().__init__(self)
def fit(self):
self.reg.fit(X=self.X_train, y=self.y_train)
def predict(self):
self.yPredict = self.reg.predict(X=self.X_test)
In the above, the variables are not accessible to the Model class, and so it returns and error.
If this worked, I would expect to be executing the following function calls, as examples.
Creating an instance of a Dataset object:
dataset_1 = Dataset(X, y, 'string')
Splitting data into training and test sets:
dataset_1.trainTestSplit(test_size=0.3))
Creating an instance of a Model, and applying it to a dataset:
dataset_1.svr = Model(SVR(hyperParams))
Fitting a model:
dataset_1.svr.fit()
Actually, if I'm thinking about this in the right way, the fit() method of the Model class could be applied as part of the init, so that it is fit to the dataset on instantiation.
Reading a training score:
dataset_1.svr.training_score_
Because I have so many datasets, and will be fitting numerous models to each, having the methods and attributes stored in this way seemed to make sense, but I'm not sure how to implement it.
Is there any way to instantiate a class (call it a sub-class), such that it inherits or has access to the attributes or objects contained in another class (say a root class), and so the sub-class is contained as an object within the root class? Or am I thinking about this in the wrong way?
So I might have been trying to use classes where a dictionary would be better suited.
The following seems to accomplish what I'm trying to do, which is to store a dataset and the results of models applied to it in the one object (other attributes such as training/CV/test scores etc. are still to be added in the answer below).
If anyone has any suggestions/comments on how this can be done better please comment.
Thanks.
from sklearn.model_selection import train_test_split
class Dataset(object):
def __init__(self, X, y):
self.X = X
self.y = y
self.models = {}
def add_model(self, model, regressor):
self.models[model] = {}
self.models[model]['reg'] = regressor
regressor.fit(self.X_train, self.y_train)
yPredict = regressor.predict(self.X_test)
self.models[model]['yPredict'] = yPredict
def trainTestSplit(self, **kwargs):
self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(self.X, self.y, **kwargs)

How to new a class inside an object?

I'm trying out Nak (Machine Learning package for Scala). However, they don't provide easy access for basic methods like NaiveBayes or Maximum Entropy. I want to do it manually, but I failed to create an instance of the NaiveBayes class. Part of their NaiveBayes code looks like this:
object NaiveBayes {
class Trainer[L,T](wordSmoothing: Double=0.05, classSmoothing: Double= 0.01) extends Classifier.Trainer[L,Counter[T,Double]] {
type MyClassifier = NaiveBayes[L,T]
override def train(data: Iterable[Example[L,Counter[T,Double]]]) = {
new NaiveBayes(data,wordSmoothing,classSmoothing)
}
}
}
I can't access the Trainer class...and I don't know why. The full code can be found here:
https://github.com/scalanlp/nak/blob/master/src/main/scala/nak/classify/NaiveBayes.scala
I try to write code like:
Trainer train = new Trainer() or NaiveBayes.Trainer train = new ...
It's just not working...
Trainer takes type parameters, so you have to specify them if they can't be inferred. For example:
val trainer = new NaiveBayes.Trainer[???,???]()
where the question marks should be replaced by type arguments for L and T. According to the comments in Classifier.scala, L should be the type of your labels, and T should be the type of your observations.

Trying to understand Precedence Rule 4 in Scala In Depth Book

I am trying to understand the 4th rule on Precedence on Bindings on page 93 in Joshua Suareth's book Scala in depth.
According to this rule:
definitions made available by a package clause not in the source file where the definition occurs have lowest precedence.
It is this rule that I intend to test.
So off I went and tried to follow Josh's train of thought on Page 94. He creates a source file called externalbindings.scala and I did the same, with some changes to it as below
package com.att.scala
class extbindings {
def showX(x: Int): Int = {
x
}
object x1 {
override def toString = "Externally bound obj object in com.att.scala"
}
}
Next he asks us to create another file that will allow us to test the above rule. I created a file called precedence.scala:
package com.att.scala
class PrecedenceTest { //Josh has an object here instead of a class
def testPrecedence(): Unit = { //Josh has main method instead of this
testSamePackage()
//testWildCardImport()
//testExplicitImport()
//testInlineDefinition()
}
println("First statement of Constructor")
testPrecedence
println("Last statement of Constructor")
def testSamePackage() {
val ext1 = new extbindings()
val x = ext1.showX(100)
println("x is "+x)
println(obj1) // Eclipse complains here
}
}
Now, Josh is able to print out the value of the object in his example by simply doing the <package-name>.<object-name>.testSamePackage method on the REPL.
His output is:
Externally bound x object in package test
In my versions, the files are in Eclipse and I have my embedded Scala interpreter.
Eclipse complains right here: println(obj), it says: not found value obj1
Am I doing something obviously wrong in setting up the test files?
I would like to be able to test the rule I mentioned above and get the output:
Externally bound obj object in package com.att.scala
I haven't read the book, thus I'm not really sure if your code shows what the book wants to tell you.
Nevertheless, the error message is correct. obj1 is not found because it doesn't exist. In your code it is called x1. Because it is a member of extbindings you have to access it as a member of this class:
println(ext1.x1)
If x1 is defined outside of class extbinding, in scope of package com.att.scala, you can access it directly:
println(x1)
If it is defined in another package you have to put the package name before:
println(com.att.scala2.x1)
To simplify some things you can import x1:
import ext1.x1
println(x1)
Finally a tip to improve your code: name types in UpperCamelCase: extbindings -> Extbindings, x1 -> X1
If you replace a singleton object with a class, you will need to create an instance of that class.

Python generating Python

I have a group of objects which I am creating a class for that I want to store each object as its own text file. I would really like to store it as a Python class definition which subclasses the main class I am creating. So, I did some poking around and found a Python Code Generator on effbot.org. I did some experimenting with it and here's what I came up with:
#
# a Python code generator backend
#
# fredrik lundh, march 1998
#
# fredrik#pythonware.com
# http://www.pythonware.com
#
# Code taken from http://effbot.org/zone/python-code-generator.htm
import sys, string
class CodeGeneratorBackend:
def begin(self, tab="\t"):
self.code = []
self.tab = tab
self.level = 0
def end(self):
return string.join(self.code, "")
def write(self, string):
self.code.append(self.tab * self.level + string)
def indent(self):
self.level = self.level + 1
def dedent(self):
if self.level == 0:
raise SyntaxError, "internal error in code generator"
self.level = self.level - 1
class Point():
"""Defines a Point. Has x and y."""
def __init__(self, x, y):
self.x = x
self.y = y
def dump_self(self, filename):
self.c = CodeGeneratorBackend()
self.c.begin(tab=" ")
self.c.write("class {0}{1}Point()\n".format(self.x,self.y))
self.c.indent()
self.c.write('"""Defines a Point. Has x and y"""\n')
self.c.write('def __init__(self, x={0}, y={1}):\n'.format(self.x, self.y))
self.c.indent()
self.c.write('self.x = {0}\n'.format(self.x))
self.c.write('self.y = {0}\n'.format(self.y))
self.c.dedent()
self.c.dedent()
f = open(filename,'w')
f.write(self.c.end())
f.close()
if __name__ == "__main__":
p = Point(3,4)
p.dump_self('demo.py')
That feels really ugly, is there a cleaner/better/more pythonic way to do this? Please note, this is not the class I actually intend to do this with, this is a small class I can easily mock up in not too many lines. Also, the subclasses don't need to have the generating function in them, if I need that again, I can just call the code generator from the superclass.
We use Jinja2 to fill in a template. It's much simpler.
The template looks a lot like Python code with a few {{something}} replacements in it.
This is pretty much the best way to generate Python source code. However, you can also generate Python executable code at runtime using the ast library. You can build code using the abstract syntax tree, then pass it to compile() to compile it into executable code. Then you can use eval() to run the code.
I'm not sure whether there is a convenient way to save the compiled code for use later though (ie. in a .pyc file).
Just read your comment to wintermute - ie:
What I have is a bunch of planets that
I want to store each as their own text
files. I'm not particularly attached
to storing them as python source code,
but I am attached to making them
human-readable.
If that's the case, then it seems like you shouldn't need subclasses but should be able to use the same class and distinguish the planets via data alone. And in that case, why not just write the data to files and, when you need the planet objects in your program, read in the data to initialize the objects?
If you needed to do stuff like overriding methods, I could see writing out code - but shouldn't you just be able to have the same methods for all planets, just using different variables?
The advantage of just writing out the data (it can include label type info for readability that you'd skip when you read it in) is that non-Python programmers won't get distracted when reading them, you could use the same files with some other language if necessary, etc.
I'm not sure whether this is especially Pythonic, but you could use operator overloading:
class CodeGenerator:
def __init__(self, indentation='\t'):
self.indentation = indentation
self.level = 0
self.code = ''
def indent(self):
self.level += 1
def dedent(self):
if self.level > 0:
self.level -= 1
def __add__(self, value):
temp = CodeGenerator(indentation=self.indentation)
temp.level = self.level
temp.code = str(self) + ''.join([self.indentation for i in range(0, self.level)]) + str(value)
return temp
def __str__(self):
return str(self.code)
a = CodeGenerator()
a += 'for a in range(1, 3):\n'
a.indent()
a += 'for b in range(4, 6):\n'
a.indent()
a += 'print(a * b)\n'
a.dedent()
a += '# pointless comment\n'
print(a)
This is, of course, far more expensive to implement than your example, and I would be wary of too much meta-programming, but it was a fun exercise. You can extend or use this as you see fit; how about:
adding a write method and redirecting stdout to an object of this to print straight to a script file
inheriting from it to customise output
adding attribute getters and setters
Would be great to hear about whatever you go with :)
From what I understand you are trying to do, I would consider using reflection to dynamically examine a class at runtime and generate output based on that. There is a good tutorial on reflection (A.K.A. introspection) at http://diveintopython3.ep.io/.
You can use the dir() function to get a list of names of the attributes of a given object. The doc string of an object is accessible via the __doc__ attribute. That is, if you want to look at the doc string of a function or class you can do the following:
>>> def foo():
... """A doc string comment."""
... pass
...
>>> print foo.__doc__
A doc string comment.