Can't use object "aaa" outside of the cell where it's defined - datalore

I use the datalore kernel in datalore.jetbrains.com. In my notebook there are 3 following cells (this is a minimal working example on which I was able to reproduce this error):
#%%
class MyClass:
def __getattribute__(self, name):
return 123
#%%
aaa = MyClass()
#%%
aaa
When I try to execute the third cell I get an error, Can't use object "aaa" outside of the cell where it's defined. The message clearly implies that variable aaa can only be used inside the second cell. But why does the datalore kernel have such a limitation?

The short answer is, the Datalore kernel saves on disk runtime environment after executing a cell.
Why does the datalore kernel need to do it? Here comes the long answer. In order to understand the root cause of the issue we need to know how the datalore kernel executes cells.
It would be easier to grasp it if we forgot everything we know about the Jupyter kernel. The Datalore kernel differs drastically from the Jupyter kernel because it's reproducible and incremental.
Reproducibility
Have you ever been in a situation when you needed to re-run all the cells in a notebook from the very beginning because you lost track of the order in which cells were executed? Have you ever shared a notebook with somebody together with the notes which describe the cell execution order? With the datalore kernel you wouldn't need to do anything like that. It ensures that cells are always evaluated in exactly the same order, i.e. in the order in which they are defined in the notebook. Whenever you execute the N-th cell, all the previous cells are automatically calculated by the datalore kernel. You might think it must be extremely slow, but it's not. This brings us to the second key property of the kernel.
Incrementality
The Datalore kernel saves the result of every cell execution on disk. The result is simply a runtime environment. It's in fact just a dictionary of objects and their names. That's why the datalore kernel doesn't need to recalculate unchanged cells because the result is already known - it's persisted on disk. So in the typical real-world situation when you work with one cell and run this cell from time to time - previous cells are not recalculated (only the first time). This property naturally imposes the following restriction: if you want to use your object in several cells, you need to make it serializable. In the opposite case you are limited to using an object only within one cell.
P.S. In this particular example the issue is caused by the incorrect implementation of __getattribute__ method. Such an implementation implies that every invocation of getattr(aaa, attr_name, None) returns 123, which obviously doesn't work well in every case. That's why some error occurred on attempt to serialize object aaa and therefore it hasn't been saved on disk.

Related

A Grid of Clones

My goal is to build a 5x5 grid of images. In the following code, row, col and rowcol were created as variables local to the sprite, and newcol, newrow and cats are global. (By the way, is it possible to tell which variables are local and which are global? It's easy to forget or make mistakes.)
The result is a 5x1 grid only, as seen here.
I am unclear as to the order of execution of these statements. Does when I start as a clone get called before or after add_cat gets called the second time? My tentative conclusion is that it gets called afterwards, yet the clone's global variables seem to contain their values from beforehand instead.
When I attempted to debug it with ask and say and wait commands, the results varied wildly. Adding such pauses in some places fixed the problem completely, resulting in a 5x5 grid. In other places, they caused a 1x5 grid.
The main question is: How to fix this so that it produces a 5x5 grid?
Explanation
Unfortunately, the execution order in Scratch is a little bizarre. Whenever you edit a script (by adding or removing blocks, editing inputs, or dragging the entire script to a new location in the editor), it gets placed at the bottom of the list (so it runs last).
A good way to test this out is to create a blank project with the following scripts:
When you click the green flag, the sprite will either say "script one" or "script two", depending on which runs first. Try clicking and dragging one of the when green flag clicked blocks. The next time you click the green flag, the sprite will say whichever message corresponds to the script you just dragged.
This crazy order can make execution incredibly unpredictable, especially when using clones.
The solution
The only real solution is to write code that has a definite execution order built-in (rather than relying on the whims of the editor). For simpler scripts, this generally means utilizing the broadcast and wait block to run particular events in the necessary order.
For your specific project, I see two main solutions:
Procedural Solution
This is the most straightforward script, and it's probably what I would choose to go with:
(row and col are both sprite-only variables)
Because clones inherit all sprite-only variable values when they are created, each clone will be guaranteed to have the correct row and col when it is created.
Recursive Solution
This solution is a bit harder to understand than the first, so I would probably avoid it unless you're just looking for the novelty:

Adding Multiple Values to a Variable in MATLAB

I have to work with a lot of data and run the same MATLAB program more than once, and every time the program is run it will store the data in the same preset variables. The problem is, every time the program is run the values are overwritten and replaced, most likely because all the variables are type double and are not a matrix. I know how to make a variable that can store multiple values in a program, but only when the program is run once.
This is the code I am able to provide:
volED = reconstructVolume(maskAlignedED1,maskAlignedED2,maskAlignedED3,res)
volMean = (volED1+volED2+volES3)/3
strokeVol = volED-volES
EF = strokeVol/volED*100
The program I am running depends on a ton more MATLAB files that I cannot provide at this moment, however I believe the double variables strokeVol and EF are created at this instant. How do I create a variable that will store multiple values and keep adding the values every time the program is run?
The reason your variables are "overwritten" with each run is that every function (or standalone program) has its own workspace where the local variables are located, and these local variables cease to exist when the function (or standalone program) returns/terminates. In order to preserve the value of a variable, you have to return it from your function. Since MATLAB passes its variables by value (rather than reference), you have to explicitly provide a vector (or more generally, an array) as input and output from your function if you want to have a cumulative set of data in your calling workspace. But it all depends on whether you have a function or a deployed program.
Assuming your program is a function
If your function is now declared as something like
function strokefraction(inputvars)
you can change its definition to
function [EFvec]=strokefraction(inputvars,EFvec)
%... code here ...
%volES initialized somewhere
volED = reconstructVolume(maskAlignedED1,maskAlignedED2,maskAlignedED3,res);
volMean = (volED1+volED2+volES3)/3;
strokeVol = volED-volES;
EF = strokeVol/volED*100;
EFvec = [EFvec; EF]; %add EF to output (column) vector
Note that it's legal to have the same name for an input and an output variable. Now, when you call your function (from MATLAB or from another function) each time, you add the vector to its call, like this:
EFvec=[]; %initialize with empty vector
for k=1:ndata %simulate several calls
inputvar=inputvarvector(k); %meaning that the input changes
EFvec=strokefraction(inputvar,EFvec);
end
and you will see that the size of EFvec grows from call to call, saving the output from each run. If you want to save several variables or arrays, do the same (for arrays, you can always introduce an input/output array with one more dimension for this purpose, but you probably have to use explicit indexing instead of just shoving the next EF value to the bottom of your vector).
Note that if your input/output array eventually grows large, then it will cost you a lot of time to keep allocating the necessary memory by small chunks. You could then choose to allocate the EFvec (or equivalent) array instead of initializing it to [], and introduce a counter variable telling you where to overwrite the next data points.
Disclaimer: what I said about the workspace of functions is only true for local variables. You could also define a global EFvec in your function and on your workspace, and then you don't have to pass it in and out of the function. As I haven't yet seen a problem which actually needed the use of global variables, I would avoid this option. Then you also have persistent variables, which are basically globals with their scope limited to their own workspace (run help global and help persistent in MATLAB if you'd like to know more, these help pages are surprisingly informative compared to usual help entries).
Assuming your program is a standalone (deployed) program
While I don't have any experience with standalone MATLAB programs, it seems to me that it would be hard to do what you want for that. A MathWorks Support answer suggests that you can pass variables to standalone programs, but only as you would pass to a shell script. By this I mean that you have to pass filenames or explicit numbers (but this makes sense, as there is no MATLAB workspace in the first place). This implies that in order to keep a cumulative set of output from your program you would probably have to store those in a file. This might not be so painful: opening a file to append the next set of data is straightforward (I don't know about issues such as efficiency, and anyway this all depends on how much data and how many runs of your function we're talking about).

Why does MATLAB give a "class has no property" error when running but not debugging?

I have a script that performs a bunch of experiments on different datasets, below is part of the logic to determine whether a dataset needs to be loaded. Basically we load the dataset if there is none already loaded, or if the currently loaded one doesn't match the one we need by name.
This example crashes with the error The class dataset has no property or method named 'name' at the if statement (the class does in fact have this property):
if(~exist('dataset','var')||~strcmp(dataset.name,datasets{datai}.id))
loaded=load(datasets{datai}.filename);
dataset=loaded.dataset;
end
If I debug and stop at the line, I can access dataset.name in the debugger without performing any further actions. I don't think the reason is the dataset object not existing. In the loop I ran, the first dataset was correctly loaded, but the second one (where the name check comes into play) wasn't.
This rewriting works:
if(~exist('dataset','var')||~strcmp(nom,datasets{datai}.id))
loaded=load(datasets{datai}.filename);
dataset=loaded.dataset;
nom=dataset.name;
end
Why was I able to access dataset.name in the debugger, and why does the rewriting fix the issue?

Loading Variables into Functions from Structs in Matlab

Say I have a project which is comprised of:
A main script that handles all of the running of my simulation
Several smaller functions
A couple of structs containing the data
Within the script I will be accessing the functions many times within for loops (some over a thousand times within the minute long simulation). Each function is also looking for data contained with a struct files as part of their calculations, which are usually parameters that are fixed over the course of the simulation, however need to be varied manually between runs to observe the effects.
As typically these functions form the bulk of the runtime I'm trying to save time, as my simulation can't quite run at real-time as it stands (the ultimate goal), and I lose alot of time passing variables/parameters around functions. So I've had three ideas to try and do this:
Load the structs in the main simulation, then pass each variable in turn to the function in the form of a large argument (the current solution).
Load the structs every time the function is called.
Define the structs as global variables.
In terms of both the efficiency of the system (most relevent as the project develops), and possibly as I'm no expert programmer from a "good practise" perspective what is the best solution for this? Is there another option that I have not considered?
As mentioned above in the comments - the 1st item is best one.
Have you used the profiler to find out where you code takes most of its time?
profile on
% run your code
profile viewer
Note: if you are modifying your input struct in your child functions -> this will take more time, but if you are just referencing them then that should not be a problem.
Matlab does what's known as a "lazy copy" when passing arguments between functions. This means that it passes a pointer to the data to the function, rather than creating a new instance of that data, which is very efficient memory- and speed-wise. However, if you make any alteration to that data inside the subroutine, then it has to make a new instance of that argument so as to not overwrite the argument's value in the main function. Your response to matlabgui indicates you're doing just that. So, the subroutine may be making an entire new struct every time it's called, even though it's only modifying a small part of that struct's values.
If your subroutine is altering a small part of the array, then your best bet is to just pass that small part to it, then assign your outputs. For instance,
[modified_array] = somesubroutine(struct.original_array);
struct.original_array=modified_array;
You can also do this in just one line. Conceptually, the less data you pass to the subroutine, the smaller the memory footprint is. I'd also recommend reading up on in-place operations, as it relates to this.
Also, as a general rule, don't use global variables in Matlab. I have not personally experienced, nor read of an instance in which they were genuinely faster.

Simulink: Synchronizing and timing

in order to simulate some processes I have a problem with getting a predefined working order of my self modeled blocks.
How can I be sure, that for example Block A must be finished before Block B and C start working?
The problem is, that some Blocks shall work after some others and some shall not. I must admit I have not much experience with Simulink in order of doing time-depending things (althought basic knowledge of simulink is available).
For instance this scenario shall be realised:
A -> B, C -> D, E, F
The main thing is, that all blocks A-F have no logic correlation to each other, they all do several things. My aim is to make B and C start working, after A has finished. And D/E/F after B AND C have finished.
In this case, the word "parallel" was the wrong word, this does not have to be calculated really parallely. Just making sure, that this complies with a predifined steady order.
Edit:
My new idea is to use the matlab workspace als buffer, so my block A can push its results to workspace (by the "to workspace" block). But now I have to make sure, that block B and C may read the results (with "From workspace") of A AFTER A pushed its information to workspace...how to do this?
Edit2:
Here's a screenshot which should make some thinks clearer:
As the documentation of "Sorted order" refers, the set up seems to be ok (including the subsystems timing). But unfortunately problem still exists. The variable "simin" is loaded from workspace, before it was written :( As you see, the display shows "1", which it shouldn't do. In the very first run of the simulation I get an exception, that the variable "simin" does not exist.
It would be nice, if you can help me with my issue.
Greets, poeschlorn
So in your example, if you have Block A connected with the same wire to both B and C, when Block A is finished, Block B and C will work in parallel.
EDIT:
I am using the same blocks as you are, but it works for me. I think you are over complicating things. The way you are setting block priorities is no different than how Simulink runs the blocks without them. Below you can see my setup and the output on both binary displays.
The error you see on the first run is due to Simulink not creating the variable until the first time step is being executed. When Simulink builds the simulation it sees that the variable used as input from the workspace is not created.
If the connection between the block are not enough to set the order, you can use block priorities.
A tip to test the execution order is to add an "embedded Matlab block" with a disp command displaying the name of the block.
It's not really clear what you're asking. When you say that Block A must be finished, do you mean the Output function? The way simulation works in Simulink is that the blocks are run serially, so Block B and C would never run until Block A finished it's Output function.
I don't know of any obvious way of running blocks B and C in parallel currently in Simulink.