Apply function to tf.keras.Model output to every sample in the batch - tf.keras

The problem which I am facing is that I am trying to apply a function to every sample in the model output of a keras model while not knowing what the batch size is.
To elaborate this further, I am creating a model on which I am going to use predict_on_batch for. Say the model looks like model = tf.keras.Model(x,y). The shape of y = (batch_size, output_shape), I am trying to apply a function to the output for every sample in the batch_size and return that as the model output

Related

Using Weka NaiveBayes with Matlab

I created a NaiveBayes model in Weka. I exported the model to disk. I now want to inject this model into MATLAB 2018, so that I can check how it performs via some data that I am receiving.
I load my model in MATLAB, by stating something like this:
loadedModel = weka.core.SerializationHelper.read('myweka.model');
I then create a Weka Instance object, and let it contain this data:
instance = infrequent,low,high,medium-high,high,medium,medium,low,low
If I run these two commands:
loadedModel.distributionForInstance(instance)
loadedModel.classifyInstance(instance)
I see the following output:
0.0001
0.9999
1
This is odd to me because if I observe the same record in WEKA ui, I see the same instance with probabilities 0.993 and 0.007, classified as '2'. (I can load the same model multiple times from disk in WEKA, and reproduce this behavior, which is correct) After further investigation, I noticed that regardless of the sequence of attributes my Instance object has, I always get the same probability output and the same classification by invoking the model via MATLAB.
There are some posts on the net that share the same problem, like these:
Always getting the same output
Weka - Classifier returns the same distribution for any input
However, the recommended solution to call 'instance.setClassMissing()' did not solve my issue. Is there anything I am missing, or can try to do in order to further troubleshoot the issue?
Does your test instance has same structure as your train set? If not, you need yo provide the same structure.
Weka indexes nominal attributes and stores the indices internally. So the nominal attributes order in train file is important. For example if your attribute is mapped as low=>0, high=>1 in training, you need to map them like this in your test set. Usually this is achieved by serializing the train header with the model.
Sample code for creating train header:
Instances trainHeader = new Instances(instances, 0);
trainHeader.setClassIndex(instances.classIndex());
When creating a new instance set its dataset:
Instance instance = ...
instance.setDataset(trainHeader);

How to Divide Drone Images dataset into Train & Test and Valid Parts for Faster R CNN in Matlab2018b

I have 297 Grayscale images and I would Like Divide Them into 3 parts (train-test and validation).
Ofcourse, I wrote some sample codes for example following codes from MathWorks (Object Detection Using Faster R-CNN Deep Learning)
% Split data into a training and test set.
idx = floor(0.6 * height(vehicleDataset));
trainingData = vehicleDataset(1:idx,:);
testData = vehicleDataset(idx:end,:);
But Matlab 2018a show the following error
Error:"Undefined function 'height' for input arguments of type
'struct'."
I would like to detect objects in images using "Faster R CNN" method and determine their locations in images.
Suppose your images are saved in the path "C:\Users\Student\Desktop\myImages"
First, create an imageDataStore object to manage a collection of image files.
datapath = "C:\Users\Student\Desktop\myImages";
imds = imageDatastore(datapath);%You may look at documentation for customizations.
[trainds,testds,valds] = splitEachLabel(imds,.6,.2);%Lets say 60% data for training, 20% for testing and 20% for validation
Now you have train data in the variable trainds and test data in the variable testds.
You can retrieve each images using readimage, say 5th image from train set as;
im = readimage(trainds,5);

Target Encoding in Python Model

I made a model in python and this uses target encoding. I used a dataset with 25000 rows and this gets divided into training and test data sets. The model is really working fine. However, I now want to run the model on totally fresh data - say just one row of data in an excel file. I need to know the code for it and will really appreciate it if someone can help. I am somewhat new to python. Here is the part of code I have written to create the training and test data sets from 25000 rows and train the model on training and predict on the test. However, I need the code that runs this model that uses target encoding to predict fresh data. If I need to post more code for greater clarity please let me know.
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.2)
rf = RandomForestClassifier(n_jobs=-1)
rf.fit(train_x.values, train_y.values)
pred_train = rf.predict(train_x.values)
pred = rf.predict(test_x.values)
Thanks
You might want to have have a look the comment section in this notebook-
here
"After we apply target encoding on the train data and target. We can have the result for one category like column A has a,b,c. Then we calculate the mean of each a,b,c in column A and apply it to the test data.We then apply it to test using the pd.merge function."

Using hidden activations in loss function

I want to create a custom loss function for a double-input double-output model in Keras that:
minimizes the reconstruction error of two autoencoders;
maximizes the correlation of the bottleneck features of the autoencoders.
For this I need to pass to the loss function:
both inputs;
both outputs / reconstructions;
output of intermediate layers for both (hidden activations).
I know I can pass both inputs and outputs to Model, but am struggling to find a way to pass the hidden activations.
I could create two new Models that have the output of the intermediate layers and pass that to loss, like:
intermediate_layer_model1 = Model(input=input1, output=autoencoder.get_layer('encoded1').output)
intermediate_layer_model2 = Model(input=input2, output=autoencoder.get_layer('encoded2').output)
autoencoder.compile(optimizer='adadelta', loss=loss(intermediate_layer_model1, intermediate_layer_model2))
But still, I would need to find a way to match the y_true in loss to the correct intermediate model.
What is the right way to approach this?
Edit
Here's an approach that I think should work. Simplified:
# autoencoder 1
input1 = Input(shape=(input_dim,))
encoded1 = Dense(encoding_dim, activation='relu', name='encoded1')(input1)
decoded1 = Dense(input_dim, activation='sigmoid', name='decoded1')(encoded1)
# autoencoder 2
input2 = Input(shape=(input_dim,))
encoded2 = Dense(encoding_dim, activation='relu', name='encoded2')(input2)
decoded2 = Dense(input_dim, activation='sigmoid', name='decoded2')(encoded2)
# merge encodings
merge_layer = merge([encoded1, encoded2], mode='concat', name='merge', concat_axis=1)
model = Model(input=[input1, input2], output=[decoded1, decoded2, merge_layer])
model.compile(optimizer='rmsprop', loss={
'decoded1': 'binary_crossentropy',
'decoded2': 'binary_crossentropy',
'merge': correlation,
})
Then in correlation I can split y_pred and do the calculations.
How about:
Defining a single model with a multiple outputs (be sure that you named a coding and reconstruction layer properly):
duo_model = Model(input=input, output=[coding_layer, reconstruction_layer])
Compiling your model with two different losses (or even performing a loss reweighting):
duo_model.compile(optimizer='rmsprop',
loss={'coding_layer': correlation_loss,
'reconstruction_layer': 'mse'})
Taking your final model as a:
encoder = Model(input=input, output=[coding_layer])
autoencoder = Model(input=input, output=[reconstruction_layer])
After proper compilation this should do the job.
When it comes to defining a proper correlation loss function there are two ways:
when coding layer and your output layer have the same dimension -
you could easly use predefinied cosine_proximity function from
Keras library.
when coding layer has different dimensonality -
you shoud first find embedding of coding vector and reconstruction vector to the same space and then - compute correlation there. Remember that this embedding should either be a Keras layer / function or Theano / Tensor flow operation (depending on which backend you are using). Of course you can compute both embedding and correlation function as a part of one loss function.

Partitioning dataset into train, test and validate subset (Matlab)

Im new to using Matlab and i'm trying to achieve the following situation:
I've one dataset of 7000+ entries. The goal is to train a classification tree (fitctree) on this data. I've seperated the data into a matrix with observations (predictors) and a matrix with classes(class). To partition the data i'm using cvpartition. Everything works fine up until this point.
Problem: I want to create three subsets with data: 1 training set, 1 validation set and 1 testing set. I want to train the tree using the training set, and validate its performance using the validation set. After tweeking the parameters I want to run the final test on the test data partition.
To partition the data I tried creating a cvpartition, which works, e.g.
cvpart = cvpartition(class, 'k', 10);
and then performing another cvpartition on that testing set, seperating this into another two sets:
cvpart2 = cvpartition(cvpart.TestSize, 'k', 10);
Sadly, when validating the performance of the tree this doesn't seem to work. When I skip the seond cvpartition, and validate the performance on the test set of cvpart, the model performs perfectly.
Update: after days I found that it seems to work when using it in this way:
cvpart2 = cvpartition(cvpart.TrainSize, 'k', 10);
Anyone care to explain me why it does work in this way, but not when using the test set?
Hope you guys can help me out;)
Kind regards.