Return parameter passing in Julia - pass-by-reference

If a Julia function returns an array, is the reference returned or a copy?
function pass(A::Matrix)
return A
end
A real example is reshape:
reshape(A, dims)
Create an array with the same data as the given array, but with different dimensions. An implementation for a particular type of array may choose whether the data is copied or shared.
How does the implementation determine whether data is copied or shared?

The pass function above returns by reference, http://julia.readthedocs.org/en/latest/manual/arrays/ .
There is a bit more to the reshape example.
For full arrays the reshaped array is a new array object that shares the same data. But keep in mind that there are plenty of specialized array types. The docs warn you not to rely on that because for example for a future implementation of immutable fixed sized arrays a different reshape mechanism could be used.

Related

MATLAB arrays and containers

I've just started with MATLAB, having mainly played around with Python previously. I've just made my first associative array, and am a little confused with how it's dealing with commas and spaces. My array is:
co_comma=containers.Map({'Open University','UCL',' University of Edinburgh','Birkbeck'},{193835,21210,24525,17822})
I also made a second associative array, splitting using spaces:
co_space=containers.Map({'Open University' 'UCL' ' University of Edinburgh' 'Birkbeck'},{193835 21210 24525 17822})
They both give the following:
Map with properties:
Count: 4
KeyType: char
ValueType: double
But co_comma==co_space gives False:
ans =
logical
0
Questions:
how are these associative arrays different
what actually is a container? Although I've never thought of lists etc in this way, Python seems to have containers in the form of lists/tuples/general iterables - https://stackoverflow.com/a/11576019/11357695. So is a matlab string container vs a matlab char array different in the same way as (for example) python lists and python tuples are different?
Thanks :D
Many things mixed in here!
Clarifications:
A matlab container is only equivalent to a python dictionary, not a list/tuple/general iterable.
Both of your containers are created the same. You seem to be naming them comma and space, but this distinction does not even reach the definition of the container.
Both {'Open University','UCL',' University of Edinburgh','Birkbeck'} and {'Open University' 'UCL' ' University of Edinburgh' 'Birkbeck'} create the exact same cell array, so the input to container.map is the same. you are comparing a=[5]; and b=5, as a MATLAB-valid analogy. They are the same.
For objects, most programing languages (including python!) will give you false when you compare two objects that contain the same values, yet are different objects. Only basic variables tend to compare by value, and not by some sort of objecID. In your case, simply doing isequal(co_comma,co_space) will return true, so their values are the same (as we already know, from the previous point)
Containers are not generally used in MATLAB, unless you specifically want a dictionary.
So here's the deal with Matlab: Matlab is an oddball. In Matlab, everything is an array, and there is no real distinction between regular "values" and "containers" or "collections" like there is in many programming languages. In Matlab, any numeric value is a numeric array. Any string value is actually an array of strings. Every value is iterable and can be used in a for loop or other "list"-like context. And every type or class in Matlab must implement collection/container behaviors as well as its "plain" value semantics.
Scalar values in Matlab are actually degenerate cases of two-dimensional arrays that are 1-by-1 in size. The number 42? Actually a 1-long array of doubles. The string "foo"? Actually a 1-long array of strings. Everything in Matlab is actually like a list in Python (or, more accurately, a NumPy series or array).
A cell array is an array of things, which can contain any other type of array in each of its elements. This is used for heterogeneous collections.
A char array is a bit weird, because it is used to represent strings, but its elements are not themselves strings, but rather characters. Plain char arrays are used in a weird way to represent lists of strings: you make a 2-D char array, and read each row as a string that is padded with spaces on the right. Lists of strings that are different lengths used to be represented as "cellstrs", which are cell arrays that contain char row vectors in each element. It's weird; see my blog post about it. The new string array makes most uses of other string types obsolete. You might want to use strings here. In Matlab, string literals are double-quoted, and char literals are single-quoted.
The word "container" doesn't have a specific technical meaning in Matlab, and is not really a thing. There's the containers package, but the only thing in it is containers.Map; there are no other types of containers and no generalization of what it means to be a container. One of the Matlab developers had an idea for making containers like this, but it never really went anywhere. And as far as I can tell, containers.Map hardly gets used at all: containers.Map is a pass-by-reference "handle" object, whereas most Matlab types are pass-by-value, so Map can't be used easily with most Matlab code.
So, putting aside the weirdness of chars, everything in Matlab has array semantics, and is effectively an iterable container like Python lists or tuples. And in Matlab, most values and objects are immutable and pass-by-value, so they are more like Python tuples than Python lists.

How to access array/cell element in a uniform way

In MATLAB, the i-th element of an array is accessed by a(i), while the i-th element of a cell is accessed by a{i}. So my code has to do different things in terms of whether a is a cell or not.
Is there a better way to do it? so we can access the i-th element in a 'same'? way.
You can access the ith element of any array in the same way, a(i), and the behaviour is completely uniform: you get back an object of the same class as a. So if a is a 5x5 double array, a(i) is a 1x1 double. Similarly, if a is a cell array, then a(i) is a 1x1 cell. So far, so logical and consistent.
Now, if you want to look inside a cell, you use a{i} but that is a fundamentally different operation. Your question seems to imply that it would be "better" if these two fundamentally different operations had the same syntax. It wouldn't: if this were the case, then the ability to slice a cell array into arbitrary shapes (including 1x1) would be overshadowed.
But you can always write a custom function. Among the many Matlab workarounds I carry with me everywhere, I have the following pair of functions:
function a = ascell(a)
if ~iscell(a), a = {a}; end
and
function a = uncell(a)
if iscell(a) & numel(a) == 1, a = a{1}; end
With uncell.m on your path, b = uncell(a(i)); would give you the ith element of a with the cell wrapping, if any, stripped off.
It is good to have the call to uncell visible in the code because it alerts you (or another maintainer) to the possibility that a might legally be a cell or a non-cell—this is by no means necessarily true in everybody's coding strategy. Nor will my code necessarily follow the same convention as yours when it comes to interpreting the meaning and correct treatment of a cell array where a non-cell was expected, or a non-cell where a cell was expected (and this is another way of explaining why there's no common syntax). This leads me to the question: if the design of your application is such that a can by its nature contain elements with mismatched shapes or types, then why not simply decree that it is always a cell, never a non-cell, and always access the ith element as a{i}?

"Reverse" of arrayfun

I have an array of "object" structures, OBJECT_ARRAY, that I must often transform into individual arrays for each element of the object structures. This can be done using arrayfun. It's more tedious than simply refereeing to OBJECT_ARRAY(k).item1, but that's how The Mathworks chose to do it.
In this case today, I have used those individual arrays and calculated a corresponding derived value, newItem, for each element and I need to add this to the original array of structures. So I have an array of newItems.
Is there a straightforward way to do an assignment for each object in OBJECT_ARRAY so that (effectively) OBJECT_ARRAY(k).newItem = newItems(k) for every index k?
I am using version 2015a.
You shouldn't need arrayfun for any of this.
To get values out, you can simply rely on the fact that the dot indexing of a non-scalar struct or object yields a comma-separated list. If we surround it with [] it will horizontally concatenate all of the values into an array.
array_of_values = [OBJECT_ARRAY.item1];
Or if they are all different sizes that can't be concatenated, use a cell array
array_of_values = {OBJECT_ARRAY.item1};
To do assignment, you can again use the comma separated list on the left and right side of the assignment. We first stick the new values in a cell array so that we can automatically convert them to a comma-separated list using {:}.
items = num2cell(newitems);
[OBJECT_ARRAY.item1] = items{:};

error in array of struct matlab

I want to train data on various labels using svm and want svm model as array of struct. I am doing like this but getting the error:
Subscripted assignment between dissimilar structures.
Please help me out
model = repmat(struct(),size);
for i=1:size
model(i) = svmtrain(train_data,labels(:,i),'Options', options);
end
A structure array in MATLAB can only contain structures with identical fields. By first creating an array of empty structures1 and then trying to fill it with SVMStruct structures, you try to create a mixed array with some empty structures, and some SVMStruct structures. This is not possible, thus MATLAB complains about "dissimilar structures" (not-equal structures: empty vs. SVMStruct).
To allocate an array of structs, you will have to specify all fields during initialization and fill them all with initial values - which is rather inconvenient in this case. A very simple alternative is to drop this initialization, and run your loop the other way around2,3:
for ii=sizeOfLabels:-1:1
model(ii) = svmtrain(train_data,labels(:,ii),'Options', options);
end
That way, for ii=sizeOfLabels, e.g. ii=100, MATLAB will call model(100)=..., while model doesn't exist yet. It will then allocate all space needed for 100 SVMStructs and fill the first 99 instances with empty values. That way you pre-allocate the memory, without having to worry about initializing the values.
1Note: if e.g. size=5, calling repmat(struct(),size) will create a 5-by-5 matrix of empty structs. To create a 1-by-5 array of structs, call repmat(struct(),1,size).
2Don't use size as a variable name, as this is a function. If you do that, you can't use the size function anymore.
3i and j denote the imaginary unit in MATLAB. Using them as a variable slows the code down and is error-prone. Use e.g. k or ii for loops instead.

In-Place Quicksort in matlab

I wrote a small quicksort implementation in matlab to sort some custom data. Because I am sorting a cell-array and I need the indexes of the sort-order and do not want to restructure the cell-array itself I need my own implementation (maybe there is one available that works, but I did not find it).
My current implementation works by partitioning into a left and right array and then passing these arrays to the recursive call. Because I do not know the size of left and and right I just grow them inside a loop which I know is horribly slow in matlab.
I know you can do an in place quicksort, but I was warned about never modifying the content of variables passed into a function, because call by reference is not implemented the way one would expect in matlab (or so I was told). Is this correct? Would an in-place quicksort work as expected in matlab or is there something I need to take care of? What other hints would you have for implementing this kind of thing?
Implementing a sort on complex data in user M-code is probably going to be a loss in terms of performance due to the overhead of M-level operations compared to Matlab's builtins. Try to reframe the operation in terms of Matlab's existing vectorized functions.
Based on your comment, it sounds like you're sorting on a single-value key that's inside the structs in the cells. You can probably get a good speedup by extracting the sort key to a primitive numeric array and calling the builtin sort on that.
%// An example cell array of structs that I think looks like your input
c = num2cell(struct('foo',{'a','b','c','d'}, 'bar',{6 1 3 2}))
%// Let's say the "bar" field is what you want to sort on.
key = cellfun(#(s)s.bar, c) %// Extract the sort key using cellfun
[sortedKey,ix] = sort(key) %// Sort on just the key using fast numeric sort() builtin
sortedC = c(ix); %// ix is a reordering index in to c; apply the sort using a single indexing operation
reordering = cellfun(#(s)s.foo, sortedC) %// for human readability of results
If you're sorting on multiple field values, extract all the m key values from the n cells to an n-by-m array, with columns in descending order of precedence, and use sortrows on it.
%// Multi-key sort
keyCols = {'bar','baz'};
key = NaN(numel(c), numel(keyCols));
for i = 1:numel(keyCols)
keyCol = keyCols{i};
key(:,i) = cellfun(#(s)s.(keyCol), c);
end
[sortedKey,ix] = sortrows(key);
sortedC = c(ix);
reordering = cellfun(#(s)s.foo, sortedC)
One of the keys to performance in Matlab is to get your data in primitive arrays, and use vectorized operations on those primitive arrays. Matlab code that looks like C++ STL code with algorithms and references to comparison functions and the like will often be slow; even if your code is good in O(n) complexity terms, the fixed cost of user-level M-code operations, especially on non-primitives, can be a killer.
Also, if your structs are homogeneous (that is, they all have the same set of fields), you can store them directly in a struct array instead of a cell array of structs, and it will be more compact. If you can do more extensive redesign, rearranging your data structures to be "planar-organized" - where you have a struct of arrays, reading across the ith elemnt of all the fields as a record, instead of an array of structs of scalar fields - could be a good efficiency win. Either of these reorganizations would make constructing the sort key array cheaper.
In this post, I only explain MATLAB function-calling convention, and am not discussing the quick-sort algorithm implementation.
When calling functions, MATLAB passes built-in data types by-value, and any changes made to such arguments are not visible outside the function.
function y = myFunc(x)
x = x .* 2; %# pass-by-value, changes only visible inside function
y = x;
end
This could be inefficient for large data especially if they are not modified inside the functions. Therefore MATLAB internally implements a copy-on-write mechanism: for example when a vector is copied, only some meta-data is copied, while the data itself is shared between the two copies of the vector. And it is only when one of them is modified, that the data is actually duplicated.
function y = myFunc(x)
%# x was never changed, thus passed-by-reference avoiding making a copy
y = x .* 2;
end
Note that for cell-arrays and structures, only the cells/fields modified are passed-by-value (this is because cells/fields are internally stored separately), which makes copying more efficient for such data structures. For more information, read this blog post.
In addition, versions R2007 and upward (I think) detects in-place operations on data and optimizes such cases.
function x = myFunc(x)
x = x.*2;
end
Obviously when calling such function, the LHS must be the same as the RHS (x = myFunc(x);). Also in order to take advantage of this optimization, in-place functions must be called from inside another function.
In MEX-functions, although it is possible to change input variables without making copies, it is not officially supported and might yield unexpected results...
For user-defined types (OOP), MATLAB introduced the concept of value object vs. handle object supporting reference semantics.