How can I run function after each fold in cross_val_score() from scikit-learn?

How can I run function after each fold in cross_val_score() from scikit-learn? - callback

I want to do cross-validation of my Keras neural networks with scikit-learn's cross_val_score() function.
The problem is that after each fold not only result is remembered, but also entire Keras model. So I would like to clear this model using K.clear_session() after each fold. But this are just details for context.
My main question is: How can I run custom function after each fold with cross_val_score() from scikit-learn? In other words: It is possible to run callback which should be run after each fold? Or there exists other workarounds?

You can probably create a custom callback and re write the on_train_end(self,logs={}) method of this callback. This new method will do stuff at the end of each training step. Something like that :
class CustomCall(Callback):
def __init__(self):
super(CustomCall, self).__init__()
def on_epoch_begin(self, epoch, logs={}):
return
def on_epoch_end(self, epoch, logs={}):
return
def on_batch_begin(self, batch, logs={}):
return
def on_train_end(self, logs={}):
# Stuff here
print('\n Delete previous trained model : ')
K.clear_session()
return

Related

Using fixtures at collect time in pytest

I use testinfra with ansible transport. It provides host fixture which has ansible, so I can do host.ansible.get_variables().
Now I need to create a parametrization of test based on value from this inventory.
Inventory:
foo:
hosts:
foo1:
somedata:
- data1
- data2
I want to write a test which tests each of 'data' from somedata for each host in inventory. 'Each host' part is handled by testnfra, but I'm struggling with parametrization of the test:
#pytest.fixture
def somedata(host):
return host.ansible.get_variables()["somedata"]
#pytest.fixture(params=somedata):
def data(request):
return request.param
def test_data(host, data):
assert 'data' in data
I've tried both ways:
#pytest.fixture(params=somedata) -> TypeError: 'function' object is not iterable
#pytest.fixture(params=somedata()) -> Fixture "somedata" called directly. Fixtures are not meant to be called directly...
How can I do this? I understand that I can't change the number of tests at test time, but I pretty sure I have the same inventory at collection time, so, theoretically, it can be doable...

After reading a lot of source code I have came to conclusion, that it's impossible to call fixtures at collection time. There are no fixtures at collection time, and any parametrization should happen before any tests are called. Moreover, it's impossible to change number of tests at test time (so no fixture could change that).
Answering my own question on using Ansible inventory to parametrize a test function: It's possible, but it requires manually reading inventory, hosts, etc. There is a special hook for that: pytest_generate_tests (it's a function, not a fixture).
My current code to get any test parametrized by host_interface fixture is:
def cartesian(hosts, ar):
for host in hosts:
for interface in ar.get_variables(host).get("interfaces",[]):
yield (host, interface)
def pytest_generate_tests(metafunc):
if 'host_interface' in metafunc.fixturenames:
inventory_file = metafunc.config.getoption('ansible_inventory')
ansible_config = testinfra.utils.ansible_runner.get_ansible_config()
inventory = testinfra.utils.ansible_runner.get_ansible_inventory(ansible_config, inventory_file)
ar = testinfra.utils.ansible_runner.AnsibleRunner(inventory_file)
hosts = ar.get_hosts(metafunc.config.option.hosts)
metafunc.parametrize("host_interface", cartesian(hosts, ar))

You should use helper function instead of fixture to parametrize another fixture. Fixtures can not be used as decorator parameters in pytest.
def somedata(host):
return host.ansible.get_variables()["somedata"]
#pytest.fixture(params=somedata()):
def data(request):
return request.param
def test_data(host, data):
assert 'data' in data
This assumes that the host is not a fixture.
If the host is a fixture, there is hacky way to get around the problem. You should write the parameters to a tmp file or in a environment variable and read it with a helper function.
import os
#pytest.fixture(autouse=True)
def somedata(host):
os.environ["host_param"] = host.ansible.get_variables()["somedata"]
def get_params():
return os.environ["host_param"] # do some clean up to return a list instead of a string
#pytest.fixture(params=get_params()):
def data(request):
return request.param
def test_data(host, data):
assert 'data' in data

Keras get model outputs after each batch

I'm using a generator to make sequential training data for a hierarchical recurrent model, which needs the outputs of the previous batch to generate the inputs for the next batch. This is a similar situation to the Keras argument stateful=True which saves the hidden states for the next batch, except it's more complicated so I can't just use that as-is.
So far I tried putting a hack in the loss function:
def custom_loss(y_true, y_pred):
global output_ref
output_ref[0] = y_pred[0].eval(session=K.get_session())
output_ref[1] = y_pred[1].eval(session=K.get_session())
but that didn't compile and I hope there's a better way. Will Keras callbacks be of any help?

Learned from here:
model.compile(optimizer='adam')
# hack after compile
output_layers = [ 'gru' ]
s_name = 's'
model.metrics_names += [s_name]
model.metrics_tensors += [layer.output for layer in model.layers if layer.name in output_layers]
class my_callback(Callback):
def on_batch_end(self, batch, logs=None):
s_pred = logs[s_name]
print('s_pred:', s_pred)
return
model.fit(..., callbacks=[my_callback()])

I use this in the Tensorflow version of Keras, but it should work in Keras without Tensorflow
import tensorflow as tf
class ModelOutput:
''' Class wrapper for a metric that stores the output passed to it '''
def __init__(self, name):
self.name = name
self.y_true = None
self.y_pred = None
def save_output(self, y_true, y_pred):
self.y_true = y_true
self.y_pred = y_pred
return tf.constant(True)
class ModelOutputCallback(tf.keras.callbacks.Callback):
def __init__(self, model_outputs):
tf.keras.callbacks.Callback.__init__(self)
self.model_outputs = model_outputs
def on_train_batch_end(self, batch, logs=None):
#use self.model_outputs to get the outputs here
model_outputs = [
ModelOutput('rbox_score_map'),
ModelOutput('rbox_shapes'),
ModelOutput('rbox_angles')
]
# Note the extra [] around m.save_output, this example is for a model with
# 3 outputs, metrics must be a list of lists if you type it out
model.compile( ..., metrics=[[m.save_output] for m in self.model_outputs])
model.fit(..., callbacks=[ModelOutputCallback(model_outputs)])

Future sequence to sequence of futures?

Suppose, I have an abstract "producer" instance:
trait Producer[T] {
def listObjectIds: Future[Seq[String]]
def getObject(id: String): Future[T]
}
and I need to apply some processing to each (or some) of the objects it yields.
So, I do something like:
producer
.listObjectIds
.map(maybeFilter)
.map(_.map(producer.getObject))
... and end up with Future[Seq[Future[T]]]
This is ok, but kinda cumbersome. I would like to get rid of the outer Future, and just have Seq[Future[T]], but can't think of a (non-blocking) transformation, that would let me do that.
Any ideas?

It's not possible to end up with a Seq[Future[T]]. See Reverse of Future.sequence.
But it is possible to end with a Future[Seq[T]]. Just call .flatMap(Future.sequence) on the Future[Seq[Future[T]].

How to convert an Akka Source to a scala.stream

I already have a Source[T], but I need to pass it to a function that requires a Stream[T].
I could .run the source and materialize everything to a list and then do a .toStream on the result but that removes the lazy/stream aspect that I want to keep.
Is this the only way to accomplish this or am I missing something?
EDIT:
After reading Vladimir's comment, I believe I'm approaching my issue in the wrong way. Here's a simple example of what I have and what I want to create:
// Given a start index, returns a list from startIndex to startIndex+10. Stops at 50.
def getMoreData(startIndex: Int)(implicit ec: ExecutionContext): Future[List[Int]] = {
println(s"f running with $startIndex")
val result: List[Int] = (startIndex until Math.min(startIndex + 10, 50)).toList
Future.successful(result)
}
So getMoreData just emulates a service which returns data by the page to me.
My first goal it to create the following function:
def getStream(startIndex: Int)(implicit ec: ExecutionContext): Stream[Future[List[Int]]]
where the next Future[List[Int]] element in the stream depends on the previous one, taking the last index read from the previous Future's value in the stream. Eg with a startIndex of 0:
getStream(0)(0) would return Future[List[0 until 10]]
getStream(0)(1) would return Future[List[10 until 20]]
... etc
Once I have that function, I then want to create a 2nd function to further map it down to a simple Stream[Int]. Eg:
def getFlattenedStream(stream: Stream[Future[List[Int]]]): Stream[Int]
Streams are beginning to feel like the wrong tool for the job and I should just write a simple loop instead. I liked the idea of streams because the consumer can map/modify/consume the stream as they see fit without the producer needing to know about it.

Scala Streams are a fine way of accomplishing your task within getStream; here is a basic way to construct what you're looking for:
def getStream(startIndex : Int)
(implicit ec: ExecutionContext): Stream[Future[List[Int]]] =
Stream
.from(startIndex, 10)
.map(getMoreData)
Where things get tricky is with your getFlattenedStream function. It is possible to eliminate the Future wrapper around your List[Int] values, but it will require an Await.result function call which is usually a mistake.
More often than not it is best to operate on the Futures and allow asynchronous operations to happen on their own. If you analyze your ultimate requirement/goal it is usually not necessary to wait on a Future.
But, iff you absolutely must drop the Future then here is the code that can accomplish it:
val awaitDuration = 10 Seconds
def getFlattenedStream(stream: Stream[Future[List[Int]]]): Stream[Int] =
stream
.map(f => Await.result(f, awaitDuration))
.flatten

Algorithm mixing

I have a class that extends Iterator and model an complex algorithm (MyAlgorithm1). Thus, the algorithm can advance step by step through the Next method.
class MyAlgorithm1(val c:Set) extends Iterator[Step] {
override def next():Step {
/* ... */
}
/* ... */
}
Now I want apply a different algorithm (MyAlgorithm2) in each pass of the first algorithm. The iterations of algorithm 1 and 2 should be inserted
class MyAlgorithm2(val c:Set) { /* ... */ }
How I can do this in the best way? Perhaps with some Trait?
UPDATE:
MyAlgorithm2 recives a set and transform it. MyAlgorithm1 too, but this is more complex and it's necessary runs step by step. The idea is run one step of MyAlgoirthm1 and then run MyAlgorithm2. Next step same. Really, MyAlgorithm2 simplifies the set and may be useful to simplify work of MyAlgorithm1.

As described, the problem can be solved with either inheritance or trait. For instance:
class MyAlgorithm1(val c:Set) extends Iterator[Step] {
protected var current = Step(c)
override def next():Step = {
current = process(current)
current
}
override def hasNext: Boolean = !current.set.isEmpty
private def process(s: Step): Step = s
}
class MyAlgorithm2(c: Set) extends MyAlgorithm1(c) {
override def next(): Step = {
super.next()
current = process(current)
current
}
private def process(s: Step): Step = s
}
With traits you could do something with abstract override, but designing it so that the result of the simplification gets fed to the first algorithm may be harder.
But, let me propose you are getting at the problem in the wrong way.
Instead of creating a class for the algorithm extending an iterator, you could define your algorithm like this:
class MyAlgorithm1 extends Function1[Step, Step] {
def apply(s: Step): Step = s
}
class MyAlgorithm2 extends Function1[Step, Step] {
def apply(s: Step): Step = s
}
The iterator then could be much more easily defined:
Iterator.iterate(Step(set))(MyAlgorithm1 andThen MyAlgorithm2).takeWhile(_.set.nonEmpty)

Extending Iterator is probably more work than you actually need to do. Let's roll back a bit.
You've got some stateful object of type MyAlgorithm1
val alg1 = new MyAlgorithm1(args)
Now you wish to repeatedly call some function on it, which will alter it's state and return some value. This is best modeled not by having your object implement Iterator, but rather by creating a new object that handles the iteration. Probably the easiest one in the Scala standard library is Stream. Here's an object which creates a stream of result from your algorithm.
val alg1Stream:Stream[Step] = Stream.continually(alg1.next())
Now if you wanted to repeatedly get results from that stream, it would be as easy as
for(step<-alg1Stream){
// do something
}
or equivalently
alg1Stream.forEach{
//do something
}
Now assume we've also encapsulated myAlgorithm2 as a stream
val alg2=new MyAlgorithm2(args)
val alg2Stream:Stream[Step] = Stream.continually(alg2.next())
Then we just need some way to interleave streams, and then we could say
for(step<-interleave(alg1Stream, algStream2)){
// do something
}
Sadly, a quick glance through the standard library, reveals no Stream interleaving function. Easy enough to write one
def interleave[A](stream1:Stream[A], stream2:Stream[A]):Stream[A] ={
var str1 = stream1
var str2 = stream2
var streamToUse = 1
Stream.continually{
if(streamToUse == 1){
streamToUse = 2
val out = str1.head
str1 = str1.tail
out
}else{
streamToUse = 1
val out = str2.head
str2 = str1.tail
out
}
}
}
That constructs a stream that repeatedly alternates between two streams, fetching the next result from the appropriate one and then setting up it's state for the next fetch. Note that this interleave only works for infinite streams, and we'd need a more clever one to handle streams that can end, but that's fine for the sake of the problem.

I have a class that extends Iterator and model an complex algorithm (MyAlgorithm1).
Well, stop for a moment there. An algorithm is not an iterator, so it doesn't make sense for it to extend Iterator.

It seems as if you might want to use a fold or a map instead, depending on exactly what it is that you want to do. This is a common pattern in functional programming: You generate a list/sequence/stream of something and then run a function on each element. If you want to run two functions on each element, you can either compose the functions or run another map.