Mapping Over a Specific Axis in Scala - scala

If I have an Array[Array[Double]] in Scala, is there an idomatic way to map over the second axis?
For instance, consider the following matrix:
val M : Array[Array[Double]] = Array(Array(1d,2d),Array(3d,4d),Array(5d,6d))
To normalize the rows I simply run:
M.map(x=>x.map(_/x.sum))
However, the normalize the columns it seems like I must execute:
M.transpose.map(x=>x.map(_/x.sum)).transpose
This is workable, but it becomes extremely tedious if I have more than two indices. In generally if I want to map over the last axis of a bunch of nested Array, i.e., Array[Array[...Array[Double]...]], then I need to bubble the last axis to the front via map and transpose, then map over it, then bubble it back to the back.

Related

Creating graph from text file functionally using scala

I am new to functional programming paradigm and Scala. I am trying to solve a problem using scala. I have a text file containing graph edges in following format:
3, 5
4, 6
7, 8
where 3,5 represents an edge from 3 to 5 in the graph
I am using a type of Map[Vertex,List[Vertex]] to handle graphs. My approach is to read line by line using foreach and process it, which I think is not a functional way to do it. Any help in this is appreciated.
I will leave the file reading to you, as there are many ways to do it depending on your particular application. Here is one source you might find useful for it.
Assuming you've managed to read the file into an Array[(Int, Int)], i.e., an array of tuples, in your example Array((3,5), (4,6), (7,8)), we can turn it into the the adjacency map you're looking for as follows:
arr.groupBy(_._1).mapValues(arr => arr.map(_._2))
Explanation:
We first group the tuples by first element (._1). This produces a Map[Int, Array[(Int, Int)]] representing a map from each vertex to all its edges.
Next, since the we transform the arrays to not contain the full edge information (u,v) but only the neighbour vertex v corresponding to that edge.
And we're done!
NB: This is assuming your graph is directed. If you want to turn it into an undirected graph, you can do this simply by adding (v,u) for every (u,v).

Passing values to a sparse matrix in MATLAB

Might sound too simple to you but I need some help in regrad to do all folowings in one shot instead of defining redundant variables i.e. tmp_x, tmp_y:
X= sparse(numel(find(G==0)),2);
[tmp_x, temp_y] = ind2sub(size(G), find(G == 0));
X(:)=[tmp_x, tmp_y];
(More info: G is a sparse matrix)
I tried:
X(:)=ind2sub(size(G), find(G == 0));
but that threw an error.
How can I achieve this without defining tmp_x, tmp_y?
A couple of comments with your code:
numel(find(G == 0)) is probably one of the worst ways to determine how many entries that are zero in your matrix. I would personally do numel(G) - nnz(G). numel(G) determines how many elements are in G and nnz(G) determines how many non-zero values are in G. Subtracting these both would give you the total number of elements that are zero.
What you are doing is first declaring X to be sparse... then when you're doing the final assignment in the last line to X, it reconverts the matrix to double. As such, the first statement is totally redundant.
If I understand what you are doing, you want to find the row and column locations of what is zero in G and place these into a N x 2 matrix. Currently with what MATLAB has available, this cannot be done without intermediate variables. The functions that you'd typically use (find, ind2sub, etc.) require intermediate variables if you want to capture the row and column locations. Using one output variable will give you the column locations only.
You don't have a choice but to use intermediate variables. However, if you want to make this more efficient, you don't even need to use ind2sub. Just use find directly:
[I,J] = find(~G);
X = [I,J];

MATLAB: apply a function to every n items in a vector

This related question How can I apply a function to every row/column of a matrix in MATLAB? seems to indicate one way to do this is using num2cell, which I kind of want to stay away from.
Here's what I want to do. I've got an index list for a triangle mesh, the indices index the vertex list.
I want to run func(a,b,c) on the first 3 indices, then the next three indices, and so on.
So I could reshape(idxs,3,[]) so now i've got my data into triplets as column vectors. But arrayfun does not do what I want it to do.
Looking for something like a column-map operator.
First, get your func properly vectorized, if necessary, such that the arguments can be lists of equal length:
vec_func = #(a,b,c)(arrayfun(#func,a,b,c))
Then, you can directly access every third element of idxs:
vec_func( idxs(1:3:end), idxs(2:3:end), idxs(3:3:end) )

optimal way of storing multidimensional array/tensor

I am trying to create a tensor (can be conceived as a multidimensional array) package in scala. So far I was storing the data in a 1D Vector and doing index arithmetic.
But slicing and subarrays are not so easy to get. One needs to do a lot of arithmetic to convert multidimensional indices to 1D indices.
Is there any optimal way of storing a multidimensional array? If not, i.e. 1D array is the best solution, how one can optimally slice arrays (some concrete code would really help me)?
The key to answering this question is: when is pointer indirection faster than arithmetic? The answer is pretty much never. In-order traversals can be about equally fast for 2D, and things get worse from there:
2D random access
Array of Arrays - 600 M / second
Multiplication - 1.1 G / second
3D in-order
Array of Array of Arrays - 2.4G / second
Multiplication - 2.8 G / second
(etc.)
So you're better off just doing the math.
Now the question is how to do slicing. Initially, if you have dimensions of n1, n2, n3, ... and indices of i1, i2, i3, ..., you compute the offset into the array by
i = i1 + n1*(i2 + n2*(i3 + ... ))
where typically i1 is chosen to be the last (innermost) dimension (but in general it should be the dimension most often in the innermost loop). That is, if it were an array of arrays of (...), you would index into it as a(...)(i3)(i2)(i1).
Now suppose you want to slice this. First, you might give an offset o1, o2, o3 to every index:
i = (i1 + o1) + n1*((i2 + o2) + n2*((i3 + o3) + ...))
and then you will have a shorter range on each (let's call these m1, m2, m3, ...).
Finally, if you eliminate a dimension entirely--let's say, for example, that m2 == 1, meaning that i2 == 0, you just simplify the formula:
i = (i1 + o1 + n1*o2) + (n1+n2)*((i3 + o3) + ... ))
I will leave it as an exercise to the reader to figure out how to do this in general, but note that we can store new constants o1 + n1*o21 and n1+n2 so we don't need to keep doing that math on the slice.
Finally, if you are allowing arbitrary dimensions, you just put that math into a while loop. This does, admittedly, slow it down a little bit, but you're still at least as well off as if you'd used a pointer dereference (in almost every case).
From my own general experience: If you have to write a multidimensional (rectangular) array class yourself, do not aim to store the data as Array[Array[Double]] but use a one-dimensional storage and add helper methods for converting the multidimensional access tuples to a simple index and vice versa.
When using lists of lists, you need to do much to much bookkeeping that all lists are of the same size and you need to be careful when assigning a sublist to another sublist (because this makes the assigned to sublist identical to the first and you wonder why changing the item at (0,5) also changes (3,5)).
Of course, if you expect a certain dimension to be sliced much more often than another and you want to have reference semantics for that dimension as well, a list of lists will be the better solution, as you may pass around those inner lists as a slice to the consumer without making any copy. But if you don’t expect that, it is a better solution to add a proxy class for the slices which maps to the multidimensional array (which in turn maps to the one-dimensional storage array).
Just an idea: how about a map with Int-tuples as keys?
Example:
val twoDimMatrix = Map((1,1) -> -1, (1,2) -> 5, (2,1) -> 7.7, (2,2) -> 9)
and then you could
scala> twoDimMatrix.filterKeys{_._2 == 1}.values
res1: Iterable[AnyVal] = MapLike(-1, 7.7)
or
twoDimMatrix.filterKeys{tuple => { val (dim1, dim2) = tuple; dim1 == dim2}} //diagonal
this way the index arithmetics would be done by the map. I don't know how practical and fast this is though.
As soon as the number of dimension is known before the design, you can use a collection of collection ...(n times) of collection. If you must be able to build a verctor for any number of dimension, then, there's nothing convenient in the scala API to do it (as far as I know).
You can simply store information in a mulitdimensional array (eg. `Array[Array[Double]]).
If the tensors are small and can fit in cache, you can have a performance improvement with 1D arrays because of data memory locality. It should also be faster to copy the whole tensor.
For slicing arithmetic. It depends what kind of slicing you require. I suppose you already have a function for extracting an element based on indices. So write a basic splicing loop based on indices iteration, insert manually the expression for extracting element, and then try to simplify the whole loop. It is often simpler than to write a correct expression from scratch.

Using matlab and Time Series object (fints), how can I make an array of them?

I am getting stock prices from yahoo, and want to have each stock have its own time series data structure, but also don't want to have hundreds of variables, so naturally I would want to have an array, but when I do something like array = [stock1 stock2]; it actually merges the series together. How can I make a real array?
Thanks,
CP
[x x] notation in matlab is not an array, it is a vector. It is assumed that what you're putting together belongs together. What you probably want is a cell array which is indexed with a curly brace, ie myArray{1} = stock1; myArray{2} = stock2;. Reference here.
Ah, since you have row vectors, [stock1 stock2] is a concatenation. If you want to create a 2-by-x array instead, do something like this [stock1; stock2], which will place one array above the other.
Joining vectors using [x y] has different results depending on whether your vectors are rows or columns. If rows, then joining them with [x y] makes a longer row vector, but if columns, you'll get a Nx2 matrix. You should probably convert them to column vectors using the TRANSPOSE operator thus: [x' y']. Although you should check if transpose means the same thing with Time Series objects as at does with regular vectors.