Cannot unpickle some data just pickled with python 3 with a cycle

Cannot unpickle some data just pickled with python 3 with a cycle - hash

I have a quite complicated structure that I pickle fine but have problem unpickling.
Roughly, I pickle an object of class Grid that has a Cell (a dictionnary of cell). An object caterpillar includes a cell, and a cell has a list of caterpillar. Yes, it cycles.
Pickling un pickling goes fine until the bit of code tagged "ADDING THIS CODE CAUSES PROBLEM" is inserted.
In this case I get this error :
Traceback (most recent call last):
File "./issue.py", line 84, in <module>
main()
File "./issue.py", line 79, in main
grid2 = pickle.load(handle2)
File "./issue.py", line 25, in __hash__
return hash(self._position)
AttributeError: 'Cell' object has no attribute '_position'
I understand the problem comes from the cycle and the hashing, but I need both.
After perusing issues on stack overflow, I tried playing with :
object.__getstate__()
object.__setstate__(state)
for no change at all.
=> How I can I get my object to pickle/unpickle ?
Thanks for the help !
I simplified my code as much as possible to get to the point.
Below the code.
#!/usr/bin/env python3
import typing
import pickle
class Cell:
def __init__(self, position: int):
self._position = position
self._possible_occupants = set() # type: typing.Set['Caterpillar']
#property
def position(self) -> int:
return self._position
#property
def possible_occupants(self) -> typing.Set['Caterpillar']:
return self._possible_occupants
#possible_occupants.setter
def possible_occupants(self, possible_occupants: typing.Set['Caterpillar']) -> None:
self._possible_occupants = possible_occupants
def __hash__(self) -> int:
return hash(self._position)
def __eq__(self, other: 'Cell') -> bool: #type: ignore
return self._position == other.position
class Caterpillar:
def __init__(self, cell: typing.Optional[Cell] = None):
self._cells = frozenset([cell]) if cell else frozenset([])
class Grid:
def __init__(self, file_name: str):
self._cell_table = dict() # type: typing.Dict[int, Cell]
# create a cell
cell = Cell(0)
# put in grid
self._cell_table[0] = cell
# create caterpillar
caterpillar = Caterpillar(cell)
# put back link cell -> caterpillar
# ADDING THIS CODE CAUSES THE PROBLEM !!!
cell.possible_occupants.add(caterpillar)
#property
def cell_table(self) -> int:
""" property """
return self._cell_table
def __eq__(self, other):
return self._cell_table == other.cell_table
def main() -> None:
input_file = 'tiny_grid.csv'
pickle_file = input_file + ".pickle"
grid = Grid(input_file)
with open(pickle_file, 'wb') as handle:
pickle.dump(grid, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open(pickle_file, 'rb') as handle2:
grid2 = pickle.load(handle2)
print(f"are equal : {grid2 == grid}")
if __name__ == '__main__':
main()

Ok, I eventually managed to solve this. (thanks Maks anyway)
It seems (anyone can correct me if I am wrong) that you cannot pickle (unpickle in fact) an object - or an object contaiing an object - that has a user defined hash function (from this experience)
The object was in a set, so needed to have an hash. However, I had defined an eq for the object. Seems the eq dispels the native hash, so I had to put my own hash, so ... I could not pickle...ans posted here.
I solved the issue by removing the eq and the hash so the equality was the identity (address) and changed the code to take care of that (by explicitly testing equality by value)
Note that defining equality, not only forces to define hash (with the inconvenience below) but also lowers the performances by introducing a new layer.
By the way, my real issue was not about pickling, but about multiprocessing. Since I had a pool of workers, the multiprocessing library seems to pickle my data to pass them other (or more probably back). I submitted here the pickling issue to reduce the problem.
Lesson learned : do not hash if you pickle !!!
(Little side note : makes a nice pun in french since "hash" sounds like drugs and "pickle" sounds like drinking alcohol)

Related

Why I cannot extend a scipy rv_discrete class successfully?

I'm trying to extend the rv_discrete scipy class, as it is supposed to work in every case while extending a class.
I just want to add a couple of instance attributes.
from scipy.stats import rv_discrete
class Distribution(rv_discrete):
def __init__(self, realization):
self._realization = realization
self.num = len(realization)
#stuff to obtain random alphabet and probabilities from realization
super().__init__(values=(alphabet,probabilities))
This should allow me to do something like this :
realization = #some values
dist = Distribution(realization)
print(dist.mean())
Instead, I receive this error
ValueError: rv_discrete.__init__(..., values != None, ...)
If I simply create a new rv_discrete object as in the following line of code
dist = rv_discrete(values=(alphabet,probabilities))
It works just fine.
Any idea why? Thank you for your help

Run a script and get each lines result

I want to write a simple Scala script that runs some methods that are defined in another file.
Each line that runs this method requires information that won't be available until runtime. For simplicity sakes, I want to abstract that portion out.
Thus, I want to use Currying to get the result of each line in the script, then run the result again with the extra data.
object TestUtil {
// "control" is not known until runtime
def someTestMethod(x: Int, y: Int)(control: Boolean): Boolean = {
if (control) {
assert(x == y)
x == y
} else {
assert(x > y)
x > y
}
}
}
someTestMethod is defined in my primary codebase.
// testScript.sc
import <whateverpath>.TestUtil
TestUtil.someTestMethod(2,1)
TestUtil.someTestMethod(5,5)
Each line should return a function, that I need to rerun with a Boolean.
val control: Boolean = true
List[(Boolean) -> Boolean) testFuncs = runFile("testScript.sc")
testFuncs.foreach(_(control)) // Run all the functions that testScripts defined
(Sorry if this is a weird example, it's the simplest thing I can think of)
So far I have figured out how to parse the script, and get the Tree. However at that point I can't figure out how to execute each individual tree object and get the result. This is basically where I'm stuck!
val settings = new scala.tools.nsc.GenericRunnerSettings(println)
settings.usejavacp.value = true
settings.nc.value = true
val interpreter: IMain = new IMain(settings)
val treeResult: Option[List[Tree]] = interpreter.parse(
"""true
| 5+14""".stripMargin)
treeResult.get.foreach((tree: Tree) => println(tree))
the result of which is
true
5.$plus(14)
where 5 + 14 has not been evaluated, and I can't figure out how, or if this is even a worthwhile route to pursure

In the worst case you could toString your modified tree and then call interpreter.interpret:
val results = treeResult.get.map { tree: Tree => interpreter.interpret(tree.toString) }
It seems like it would be better to have the other file evaluate to something that can then be interpreted passing control (e.g. using the scalaz Reader Monad and for/yield syntax). But I assume you have good reasons for wanting to do this via an extra scala interpreter.

Strongly typed access to csv in scala?

I would like to access csv files in scala in a strongly typed manner. For example, as I read each line of the csv, it is automatically parsed and represented as a tuple with the appropriate types. I could specify the types beforehand in some sort of schema that is passed to the parser. Are there any libraries that exist for doing this? If not, how could I go about implementing this functionality on my own?

product-collections appears to be a good fit for your requirements:
scala> val data = CsvParser[String,Int,Double].parseFile("sample.csv")
data: com.github.marklister.collections.immutable.CollSeq3[String,Int,Double] =
CollSeq((Jan,10,22.33),
(Feb,20,44.2),
(Mar,25,55.1))
product-collections uses opencsv under the hood.
A CollSeq3 is an IndexedSeq[Product3[T1,T2,T3]] and also a Product3[Seq[T1],Seq[T2],Seq[T3]] with a little sugar. I am the author of product-collections.
Here's a link to the io page of the scaladoc
Product3 is essentially a tuple of arity 3.

If your content has double-quotes to enclose other double quotes, commas and newlines, I would definitely use a library like opencsv that deals properly with special characters. Typically you end up with Iterator[Array[String]]. Then you use Iterator.map or collect to transform each Array[String] into your tuples dealing with type conversions errors there. If you need to do process the input without loading all in memory, you then keep working with the iterator, otherwise you can convert to a Vector or List and close the input stream.
So it may look like this:
val reader = new CSVReader(new FileReader(filename))
val iter = reader.iterator()
val typed = iter collect {
case Array(double, int, string) => (double.toDouble, int.toInt, string)
}
// do more work with typed
// close reader in a finally block
Depending on how you need to deal with errors, you can return Left for errors and Right for success tuples to separate the errors from the correct rows. Also, I sometimes wrap of all this using scala-arm for closing resources. So my data maybe wrapped into the resource.ManagedResource monad so that I can use input coming from multiple files.
Finally, although you want to work with tuples, I have found that it is usually clearer to have a case class that is appropriate for the problem and then write a method that creates that case class object from an Array[String].

You can use kantan.csv, which is designed with precisely that purpose in mind.
Imagine you have the following input:
1,Foo,2.0
2,Bar,false
Using kantan.csv, you could write the following code to parse it:
import kantan.csv.ops._
new File("path/to/csv").asUnsafeCsvRows[(Int, String, Either[Float, Boolean])](',', false)
And you'd get an iterator where each entry is of type (Int, String, Either[Float, Boolean]). Note the bit where the last column in your CSV can be of more than one type, but this is conveniently handled with Either.
This is all done in an entirely type safe way, no reflection involved, validated at compile time.
Depending on how far down the rabbit hole you're willing to go, there's also a shapeless module for automated case class and sum type derivation, as well as support for scalaz and cats types and type classes.
Full disclosure: I'm the author of kantan.csv.

I've created a strongly-typed CSV helper for Scala, called object-csv. It is not a fully fledged framework, but it can be adjusted easily. With it you can do this:
val peopleFromCSV = readCSV[Person](fileName)
Where Person is case class, defined like this:
case class Person (name: String, age: Int, salary: Double, isNice:Boolean = false)
Read more about it in GitHub, or in my blog post about it.

Edit: as pointed out in a comment, kantan.csv (see other answer) is probably the best as of the time I made this edit (2020-09-03).
This is made more complicated than it ought to because of the nontrivial quoting rules for CSV. You probably should start with an existing CSV parser, e.g. OpenCSV or one of the projects called scala-csv. (There are at least three.)
Then you end up with some sort of collection of collections of strings. If you don't need to read massive CSV files quickly, you can just try to parse each line into each of your types and take the first one that doesn't throw an exception. For example,
import scala.util._
case class Person(first: String, last: String, age: Int) {}
object Person {
def fromCSV(xs: Seq[String]) = Try(xs match {
case s0 +: s1 +: s2 +: more => new Person(s0, s1, s2.toInt)
})
}
If you do need to parse them fairly quickly and you don't know what might be there, you should probably use some sort of matching (e.g. regexes) on the individual items. Either way, if there's any chance of error you probably want to use Try or Option or somesuch to package errors.

I built my own idea to strongly typecast the final product, more than the reading stage itself..which as pointed out might be better handled as stage one with something like Apache CSV, and stage 2 could be what i've done. Here's the code you are welcome to it. The idea is to typecast the CSVReader[T] with type T .. upon construction, you must supply the reader with a Factor object of Type[T] as well. The idea here is that the class itself (or in my example a helper object) decides the construction detail and thus decouples this from the actual reading. You could use Implicit objects to pass the helper around but I've not done that here. The only downside is that each row of the CSV must be of the same class type, but you could expand this concept as needed.
class CsvReader/**
* #param fname
* #param hasHeader : ignore header row
* #param delim : "\t" , etc
*/
[T] ( factory:CsvFactory[T], fname:String, delim:String) {
private val f = Source.fromFile(fname)
private var lines = f.getLines //iterator
private var fileClosed = false
if (lines.hasNext) lines = lines.dropWhile(_.trim.isEmpty) //skip white space
def hasNext = (if (fileClosed) false else lines.hasNext)
lines = lines.drop(1) //drop header , assumed to exist
/**
* also closes the file
* #return the line
*/
def nextRow ():String = { //public version
val ans = lines.next
if (ans.isEmpty) throw new Exception("Error in CSV, reading past end "+fname)
if (lines.hasNext) lines = lines.dropWhile(_.trim.isEmpty) else close()
ans
}
//def nextObj[T](factory:CsvFactory[T]): T = past version
def nextObj(): T = { //public version
val s = nextRow()
val a = s.split(delim)
factory makeObj a
}
def allObj() : Seq[T] = {
val ans = scala.collection.mutable.Buffer[T]()
while (hasNext) ans+=nextObj()
ans.toList
}
def close() = {
f.close;
fileClosed = true
}
} //class
next the example Helper Factory and example "Main"
trait CsvFactory[T] { //handles all serial controls (in and out)
def makeObj(a:Seq[String]):T //for reading
def makeRow(obj:T):Seq[String]//the factory basically just passes this duty
def header:Seq[String] //must define headers for writing
}
/**
* Each class implements this as needed, so the object can be serialized by the writer
*/
case class TestRecord(val name:String, val addr:String, val zip:Int) {
def toRow():Seq[String] = List(name,addr,zip.toString) //handle conversion to CSV
}
object TestFactory extends CsvFactory[TestRecord] {
def makeObj (a:Seq[String]):TestRecord = new TestRecord(a(0),a(1),a(2).toDouble.toInt)
def header = List("name","addr","zip")
def makeRow(o:TestRecord):Seq[String] = {
o.toRow.map(_.toUpperCase())
}
}
object CsvSerial {
def main(args: Array[String]): Unit = {
val whereami = System.getProperty("user.dir")
println("Begin CSV test in "+whereami)
val reader = new CsvReader(TestFactory,"TestCsv.txt","\t")
val all = reader.allObj() //read the CSV info a file
sd.p(all)
reader.close
val writer = new CsvWriter(TestFactory,"TestOut.txt", "\t")
for (x<-all) writer.printObj(x)
writer.close
} //main
}
Example CSV (tab seperated.. might need to repair if you copy from an editor)
Name Addr Zip "Sanders, Dante R." 4823 Nibh Av. 60797.00 "Decker, Caryn G." 994-2552 Ac Rd. 70755.00 "Wilkerson, Jolene Z." 3613 Ultrices. St. 62168.00 "Gonzales, Elizabeth W." "P.O. Box 409, 2319 Cursus. Rd." 72909.00 "Rodriguez, Abbot O." Ap #541-9695 Fusce Street 23495.00 "Larson, Martin L." 113-3963 Cras Av. 36008.00 "Cannon, Zia U." 549-2083 Libero Avenue 91524.00 "Cook, Amena B." Ap
#668-5982 Massa Ave 69205.00
And finally the writer (notice the factory methods require this as well with "makerow"
import java.io._
class CsvWriter[T] (factory:CsvFactory[T], fname:String, delim:String, append:Boolean = false) {
private val out = new PrintWriter(new BufferedWriter(new FileWriter(fname,append)));
if (!append) out.println(factory.header mkString delim )
def flush() = out.flush()
def println(s:String) = out.println(s)
def printObj(obj:T) = println( factory makeRow(obj) mkString(delim) )
def printAll(objects:Seq[T]) = objects.foreach(printObj(_))
def close() = out.close
}

If you know the the # and types of fields, maybe like this?:
case class Friend(id: Int, name: String) // 1, Fred
val friends = scala.io.Source.fromFile("friends.csv").getLines.map { line =>
val fields = line.split(',')
Friend(fields(0).toInt, fields(1))
}

summation in python

I am getting an error....and I know what I'm doing wrong, but not sure how to fix it. I understand I can't add a string to a integer...Any ideas, I'd be grateful!
self.variables['gas'] = 'gas'+add
TypeError: Can't convert 'int' object to str implicitly
My code:
class Cars:
def __init__(self, **kwargs):
self.variables = kwargs
def set_Variable(self, k, v):
self.variables[k] = v
def get_Variable(self, k):
return self.variables.get(k, None)
def add_gas(self, add, gas):
self.variables['gas'] = gas+add
def main():
mercedes = Cars(gas = 3)
print (mercedes.get_Variable('gas'))
print(mercedes.add_gas(4))
if __name__ == "__main__": main()

Your method add_gas needs two arguments, you're only passing one to the function. Also you are printing the returning value though there is no return in the function.
This one worked for me:
class Cars:
def __init__(self, **kwargs):
self.variables = kwargs
def set_Variable(self, k, v):
self.variables[k] = v
def get_Variable(self, k):
return self.variables[k]
def add_gas(self, add):
self.variables['gas'] +=add
def main():
mercedes = Cars(gas = 3)
print (mercedes.get_Variable('gas'))
mercedes.add_gas(4)
print (mercedes.get_Variable('gas'))
print ('There are %s gallons of gas in the car.' % mercedes.get_Variable('gas'))
if __name__ == "__main__":
main()

So, I see two independent problems with your code:
In your error message you showed this line:
self.variables['gas'] = 'gas'+add
While in your code, the line is this:
self.variables['gas'] = gas+add
That are two different meanings. The first tries to concat the variable add to the string 'gas' which will fail as add is an integer and Python won’t convert integers to strings implicitely (the error told you that). So you would want to do 'gas' + str(add).
In your main function, you call the add_gas method like this:
mercedes.add_gas(4)
But you actually defined the method with two parameters, add and gas, so this will fail.
What I think is what you want to do is to increase the self.variables['gas'] value by add when the method is called, so you would want to write your method like this:
def add_gas(self, add):
self.variables['gas'] = self.variables['gas'] + add
Note that you can shorten that using the += operator to just this:
def add_gas(self, add):
self.variables['gas'] += add
And finally, I don’t know why you would want to use a dictionary for your locale instance variables, but unless you have a good reason to do that, why not just define a gas property for your car type?

Python generating Python

I have a group of objects which I am creating a class for that I want to store each object as its own text file. I would really like to store it as a Python class definition which subclasses the main class I am creating. So, I did some poking around and found a Python Code Generator on effbot.org. I did some experimenting with it and here's what I came up with:
#
# a Python code generator backend
#
# fredrik lundh, march 1998
#
# fredrik#pythonware.com
# http://www.pythonware.com
#
# Code taken from http://effbot.org/zone/python-code-generator.htm
import sys, string
class CodeGeneratorBackend:
def begin(self, tab="\t"):
self.code = []
self.tab = tab
self.level = 0
def end(self):
return string.join(self.code, "")
def write(self, string):
self.code.append(self.tab * self.level + string)
def indent(self):
self.level = self.level + 1
def dedent(self):
if self.level == 0:
raise SyntaxError, "internal error in code generator"
self.level = self.level - 1
class Point():
"""Defines a Point. Has x and y."""
def __init__(self, x, y):
self.x = x
self.y = y
def dump_self(self, filename):
self.c = CodeGeneratorBackend()
self.c.begin(tab=" ")
self.c.write("class {0}{1}Point()\n".format(self.x,self.y))
self.c.indent()
self.c.write('"""Defines a Point. Has x and y"""\n')
self.c.write('def __init__(self, x={0}, y={1}):\n'.format(self.x, self.y))
self.c.indent()
self.c.write('self.x = {0}\n'.format(self.x))
self.c.write('self.y = {0}\n'.format(self.y))
self.c.dedent()
self.c.dedent()
f = open(filename,'w')
f.write(self.c.end())
f.close()
if __name__ == "__main__":
p = Point(3,4)
p.dump_self('demo.py')
That feels really ugly, is there a cleaner/better/more pythonic way to do this? Please note, this is not the class I actually intend to do this with, this is a small class I can easily mock up in not too many lines. Also, the subclasses don't need to have the generating function in them, if I need that again, I can just call the code generator from the superclass.

We use Jinja2 to fill in a template. It's much simpler.
The template looks a lot like Python code with a few {{something}} replacements in it.

This is pretty much the best way to generate Python source code. However, you can also generate Python executable code at runtime using the ast library. You can build code using the abstract syntax tree, then pass it to compile() to compile it into executable code. Then you can use eval() to run the code.
I'm not sure whether there is a convenient way to save the compiled code for use later though (ie. in a .pyc file).

Just read your comment to wintermute - ie:
What I have is a bunch of planets that
I want to store each as their own text
files. I'm not particularly attached
to storing them as python source code,
but I am attached to making them
human-readable.
If that's the case, then it seems like you shouldn't need subclasses but should be able to use the same class and distinguish the planets via data alone. And in that case, why not just write the data to files and, when you need the planet objects in your program, read in the data to initialize the objects?
If you needed to do stuff like overriding methods, I could see writing out code - but shouldn't you just be able to have the same methods for all planets, just using different variables?
The advantage of just writing out the data (it can include label type info for readability that you'd skip when you read it in) is that non-Python programmers won't get distracted when reading them, you could use the same files with some other language if necessary, etc.

I'm not sure whether this is especially Pythonic, but you could use operator overloading:
class CodeGenerator:
def __init__(self, indentation='\t'):
self.indentation = indentation
self.level = 0
self.code = ''
def indent(self):
self.level += 1
def dedent(self):
if self.level > 0:
self.level -= 1
def __add__(self, value):
temp = CodeGenerator(indentation=self.indentation)
temp.level = self.level
temp.code = str(self) + ''.join([self.indentation for i in range(0, self.level)]) + str(value)
return temp
def __str__(self):
return str(self.code)
a = CodeGenerator()
a += 'for a in range(1, 3):\n'
a.indent()
a += 'for b in range(4, 6):\n'
a.indent()
a += 'print(a * b)\n'
a.dedent()
a += '# pointless comment\n'
print(a)
This is, of course, far more expensive to implement than your example, and I would be wary of too much meta-programming, but it was a fun exercise. You can extend or use this as you see fit; how about:
adding a write method and redirecting stdout to an object of this to print straight to a script file
inheriting from it to customise output
adding attribute getters and setters
Would be great to hear about whatever you go with :)

From what I understand you are trying to do, I would consider using reflection to dynamically examine a class at runtime and generate output based on that. There is a good tutorial on reflection (A.K.A. introspection) at http://diveintopython3.ep.io/.
You can use the dir() function to get a list of names of the attributes of a given object. The doc string of an object is accessible via the __doc__ attribute. That is, if you want to look at the doc string of a function or class you can do the following:
>>> def foo():
... """A doc string comment."""
... pass
...
>>> print foo.__doc__
A doc string comment.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Cannot unpickle some data just pickled with python 3 with a cycle - hash

Related

Why I cannot extend a scipy rv_discrete class successfully?

Run a script and get each lines result

Strongly typed access to csv in scala?

summation in python

Python generating Python

Categories

Resources