pytorch lightning epoch_end/validation_epoch_end - neural-network

Could anybody breakdown the code and explain it to me? The part that needs help is indicated with the "#This part". I would greatly appreciate any help thanks
def validation_epoch_end(self, outputs):
batch_losses = [x["val_loss"]for x in outputs] #This part
epoch_loss = torch.stack(batch_losses).mean()
batch_accs = [x["val_acc"]for x in outputs] #This part
epoch_acc = torch.stack(batch_accs).mean()
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
def epoch_end(self, epoch, result):
print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format( epoch,result['val_loss'], result['val_acc'])) #This part

Based on the structure, I assume you are using pytorch_lightning.
validation_epoch_end() will collect outputs from validation_step(), so it's a list of dict with the length of number of batch in your validation dataloader. Thus, the first two #This part is just unwrapping the result from your validation set.
epoch_end() catch the result {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()} from validation_epoch_end().

In your provided snippet, outputs is a list containing dicts elements which seem to contain at least keys "val_loss", and "val_acc". It would be fair to assume they correspond to the validation loss and validation accuracy respectively.
Those two lines (annotated with the # This path comment) correspond to list comprehensions going over the elements inside the outputs list. The first one gathers the values of the key "val_loss" for each element in outputs. The second one does the same this time gathering the values of the "val_acc" key.
A minimal example would be:
## before
outputs = [{'val_loss': tensor(a), # element 0
'val_acc': tensor(b)},
{'val_loss': tensor(c), # element 1
'val_acc': tensor(d)}]
## after
batch_losses = [tensor(a), tensor(c)]
batch_acc = [tensor(b), tensor(d)]

Related

Transforming `PCollection` with many elements into a single element

I am trying to convert a PCollection, that has many elements, into a PCollection that has one element. Basically, I want to go from:
[1,2,3,4,5,6]
to:
[[1,2,3,4,5,6]]
so that I can work with the entire PCollection in a DoFn.
I've tried CombineGlobally(lamdba x: x), but only a portion of elements get combined into an array at a time, giving me the following result:
[1,2,3,4,5,6] -> [[1,2],[3,4],[5,6]]
Or something to that effect.
This is my relevant portion of my script that I'm trying to run:
import apache_beam as beam
raw_input = range(1024)
def run_test():
with TestPipeline() as test_pl:
input = test_pl | "Create" >> beam.Create(raw_input)
def combine(x):
print(x)
return x
(
input
| "Global aggregation" >> beam.CombineGlobally(combine)
)
pl.run()
run_test()
I figured out a pretty painless way to do this, which I missed in the docs:
The more general way to combine elements, and the most flexible, is
with a class that inherits from CombineFn.
CombineFn.create_accumulator(): This creates an empty accumulator. For
example, an empty accumulator for a sum would be 0, while an empty
accumulator for a product (multiplication) would be 1.
CombineFn.add_input(): Called once per element. Takes an accumulator
and an input element, combines them and returns the updated
accumulator.
CombineFn.merge_accumulators(): Multiple accumulators could be
processed in parallel, so this function helps merging them into a
single accumulator.
CombineFn.extract_output(): It allows to do additional calculations
before extracting a result.
I suppose supplying a lambda function that simply passes its argument to the "vanilla" CombineGlobally wouldn't do what I expected initially. That functionality has to be specified by me (although I still think it's weird this isn't built into the API).
You can find more about subclassing CombineFn here, which I found very helpful:
A CombineFn specifies how multiple values in all or part of a
PCollection can be merged into a single value—essentially providing
the same kind of information as the arguments to the Python “reduce”
builtin (except for the input argument, which is an instance of
CombineFnProcessContext). The combining process proceeds as follows:
Input values are partitioned into one or more batches.
For each batch, the create_accumulator method is invoked to create a fresh initial “accumulator” value representing the combination of
zero values.
For each input value in the batch, the add_input method is invoked to combine more values with the accumulator for that batch.
The merge_accumulators method is invoked to combine accumulators from separate batches into a single combined output accumulator value,
once all of the accumulators have had all the input value in their
batches added to them. This operation is invoked repeatedly, until
there is only one accumulator value left.
The extract_output operation is invoked on the final accumulator to get the output value. Note: If this CombineFn is used with a transform
that has defaults, apply will be called with an empty list at
expansion time to get the default value.
So, by subclassing CombineFn, I wrote this simple implementation, Aggregated, that does exactly what I want:
import apache_beam as beam
raw_input = range(1024)
class Aggregated(beam.CombineFn):
def create_accumulator(self):
return []
def add_input(self, accumulator, element):
accumulator.append(element)
return accumulator
def merge_accumulators(self, accumulators):
merged = []
for a in accumulators:
for item in a:
merged.append(item)
return merged
def extract_output(self, accumulator):
return accumulator
def run_test():
with TestPipeline() as test_pl:
input = test_pl | "Create" >> beam.Create(raw_input)
(
input
| "Global aggregation" >> beam.CombineGlobally(Aggregated())
| "print" >> beam.Map(print)
)
pl.run()
run_test()
You can also accomplish what you want with side inputs, e.g.
with beam.Pipeline() as p:
pcoll = ...
(p
# Create a PCollection with a single element.
| beam.Create([None])
# This will process the singleton exactly once,
# with the entirity of pcoll passed in as a second argument as a list.
| beam.Map(
lambda _, pcoll_as_side: ...consume pcoll_as_side here...,
pcoll_as_side=beam.pvalue.AsList(pcoll))

How to use pattern matching callbacks in Dash for Julia

I'm new to julia, but not so new on Dash; I'm trying to build my first app with Dash for julia, but I can't seem to make a pattern matching callback work properly. Here's the part of the code that's giving me troubles:
callback!(
app,
Output((type= "filter_", index= ALL), "options"),
Input("inputs", "data"),
State((type= "filter_", index= ALL), "value"),
) do inputs, filters
list_outs = []
list_vals = []
for i in 1:length(filters)
push!(list_outs, [(label= input, value= input) for input in inputs])
end
return list_outs
end
What I'm trying to do here is to use the available inputs of the data set, already stored in "inputs", to set the filters' options, creating as many sets of options as there are dropdowns.
The problem here is, I guess, in the format of the output I'm returning: it says "Invalid number of output values for {"index":["ALL"],"type":"filter_"}.options. Expected 3, got 1"
Sadly, I found nothing of use about how to use pattern matching callbacks with julia; I tried passing the output both as an array and as a tuple, but to no avail.
Any help is welcomed, thank you all!
This is the error related to the fact that if the result is a single Output, the callback output is automatically represented as an array of what is returned for uniform further processing. I.e., in your case, as [list_outs]. The fact that the Output with the match pattern is also treated as a single one is my bug, I added the issue and try to fix it in the near future.
Right now you can work around this problem by using Output as an array:
using Dash
using DashHtmlComponents
using DashCoreComponents
app = dash()
app.layout = html_div() do
dcc_input(id = "input", value = "A,B,C"),
dcc_dropdown(id = (type="filter_", index = 1)),
dcc_dropdown(id = (type="filter_", index = 2)),
dcc_dropdown(id = (type="filter_", index = 3)),
dcc_dropdown(id = (type="filter_", index = 4))
end
callback!(
app,
[Output((type= "filter_", index= ALL), "options")], #This is multiple output in explicitly form
Input("input", "value"),
State((type= "filter_", index= ALL), "value"),
) do input, filters
inputs = split(input, ",")
list_outs = []
list_vals = []
for i in 1:length(filters)
push!(list_outs, [(label= input, value= input) for input in inputs])
end
return [list_outs] # Accordingly, we return the result inside an additional array
end
run_server(app, debug = true)

Read specific portions of an excel file based on string values in MATLAB

I have an excel file and I need to read it based on string values in the 4th column. I have written the following but it does not work properly:
[num,txt,raw] = xlsread('Coordinates','Centerville');
zn={};
ctr=0;
for i = 3:size(raw,1)
tf = strcmp(char(raw{i,4}),char(raw{i-1,4}));
if tf == 0
ctr = ctr+1;
end
zn{ctr}=raw{i,4};
end
data=zeros(1,10); % 10 corresponds to the number of columns I want to read (herein, columns 'J' to 'S')
ctr=0;
for j = 1:length(zn)
for i=3:size(raw,1)
tf=strcmp(char(raw{i,4}),char(zn{j}));
if tf==1
ctr=ctr+1;
data(ctr,:,j)=num(i-2,10:19);
end
end
end
It gives me a "15129x10x22 double" thing and when I try to open it I get the message "Cannot display summaries of variables with more than 524288 elements". It might be obvious but what I am trying to get as the output is 'N = length(zn)' number of matrices which represent the data for different strings in the 4th column (so I probably need a struct; I just don't know how to make it work). Any ideas on how I could fix this? Thanks!
Did not test it, but this should help you get going:
EDIT: corrected wrong indexing into raw vector. Also, depending on the format you might want to restrict also the rows of the raw matrix. From your question, I assume something like selector = raw(3:end,4); and data = raw(3:end,10:19); should be correct.
[~,~,raw] = xlsread('Coordinates','Centerville');
selector = raw(:,4);
data = raw(:,10:19);
[selector,~,grpidx] = unique(selector);
nGrp = numel(selector);
out = cell(nGrp,1);
for i=1:nGrp
idx = grpidx==i;
out{i} = cell2mat(data(idx,:));
end
out is the output variable. The key here is the variable grpidx that is an output of the unique function and allows you to trace back the unique values to their position in the original vector. Note that unique as I used it may change the order of the string values. If that is an issue for you, use the setOrderparameter of the unique function and set it to 'stable'

Convert dataset column to obsnames

I have many large dataset arrays in my workspace (loaded from a .mat file).
A minimal working example is like this
>> disp(old_ds)
Date firm1 firm2 firm3 firm4
734692 880,0 102,1 32,7 204,2
734695 880,0 102,0 30,9 196,4
734696 880,0 100,0 30,9 200,2
734697 880,0 101,4 30,9 200,2
734698 880,0 100,8 30,9 202,2
where the first row (with the strings) already are headers in the dataset, that is they are already displayed if I run old_ds.Properties.VarNames.
I'm wondering whether there is an easy and/or fast way to make the first column as ObsNames.
As a first approach, I've thought of "exporting" the data matrix (columns 2 to 5, in the example), the vector of dates and then creating a new dataset where the rows have names.
Namely:
>> mat = double(old_ds(:,2:5)); % taking the data, making it a matrix array
>> head = old_ds.Properties.VarNames % saving headers
>> head(1,1) = []; % getting rid of 'Date' from head
>> dates = dataset2cell(old_ds(:,1)); % taking dates as column cell array
>> dates(1) = []; % getting rid of 'Date' from dates
>> new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates);
Apart from the fact that the last line returns the following error, ...
Error using setobsnames (line 25)
NEWNAMES must be a nonempty string or a cell array of nonempty strings.
Error in dataset (line 377)
a = setobsnames(a,obsnamesArg);
Error in mat2dataset (line 75)
d = dataset(vars{:},args{:});
...I would have found a solution, then created a function (such to generalize the process for all 22 dataset arrays that I have) and then run the function 22 times (once for each dataset array).
To put things into perspective, each dataset has 7660 rows and a number of columns that ranges from 2 to 1320.
I have no idea about how I could (and if I could) make the dataset directly "eat" the first column as ObsNames.
Can anyone give me a hint?
EDIT: attached a sample file.
Actually it should be quite easy (but the fact that I'm reading your question means that having the same problem, I first googled it before looking up the documentation... ;)
When loading the dataset, use the following command (adjusted to your case of course):
cell_dat{1} = dataset('File', 'YourDataFile.csv', 'Delimiter', ';',...
'ReadObsNames', true);
The 'ReadObsNames' default is false. It takes the header of the first column and saves it in the file or range as the name of the first dimension in A.Properties.DimNames.
(see the Documentation, Section: "Name/value pairs available when using text files or Excel spreadsheets as inputs")
I can't download your sample file, but if you haven't yet solved the problem otherwise, just try the suggested solution and tell if it works. Glad if I could help.
You are almost there, the error message you got is basically saying that Obsname have to be strings. In your case the 'dates' variable is cell array containing doubles. So you just need to convert them to string.
mat = double(piHU(:,2:end)); % taking the data, making it a matrix array
head = piHU.Properties.VarNames % saving headers
head(1) = []; % getting rid of 'Date' from head
dates = dataset2cell(piHU(:,1)); % taking dates as column cell array, here dates are of type double. try typing on the command window class(dates{2}), you can see the output is double.
dates(1) = []; % getting rid of 'Date' from dates
dates_str=cellfun(#(s) num2str(s),dates,'UniformOutput',false); % convert dates to string, now try typing class(dates_str{2}), the output should be char
new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates_str); % construct new dataset.

Looping through documents in matlab

I am attempting to loop through the variable 'docs' which is a cell array that holds strings, i need to make a for loop that colllects the terms in a cell array and then uses command 'lower' and unique to create a dictionary.
Here is the code i've tried sp far and i just get errors
docsLength = length(docs);
for C = 1:docsLength
list = tokenize(docs, ' .,-');
Mylist = [list;C];
end
I get these errors
Error using textscan
First input must be of type double or string.
Error in tokenize (line 3)
C = textscan(str,'%s','MultipleDelimsAsOne',1,'delimiter',delimiters);
Error in tk (line 4)
list = tokenize(docs, ' .,-');
Generically, if you get an "must be of type" error, that means you are passing the wrong sort of input to a function. In this case you should look at the point in your code where this is taking place (here, in tokenize when textscan is called), and doublecheck that the input going in is what you expect it to be.
As tokenize is not a MATLAB builtin function, unless you show us that code we can't say what those inputs should be. However, as akfaz mentioned in comments, it is likely that you want to pass docs{C} (a string) to tokenize instead of docs (a cell array). Otherwise, there's no point in having a loop as it just repeatedly passes the same input, docs, into the function.
There are additional problems with the loop:
Mylist = [list; C]; will be overwritten each loop to consist of the latest version of list plus C, which is just a number (the index of the loop). Depending on what the output of tokenize looks like, Mylist = [Mylist; list] may work but you should initialise Mylist first.
Mylist = [];
for C = 1:length(docs)
list = tokenize(docs{C}, ' .,-');
Mylist = [Mylist; list];
end