How to define neighborhoods in hexagonal Self-Organizing Maps (SOM) - scala

I'm trying to implement a bi-dimensional SOM with shrinking neighborhoods, but to avoid computing the neighborhood function to each neuron for every input, I want to define the neighbors for each neuron since the construction of the lattice. I mean, when creating the SOM, I would add each neighbor to a list within the neurons, so when a neuron is selected as BMU, I only have to apply the neighborhood function to the neurons in that BMU's list.
The problem is to define the topology of an hexagonal lattice within a bi-dimensional array, which is the structure that I'm using for the SOM, cause to achieve the hexagonal distribution I would have to do something like this:
n1 | null | n2 | null | n3
null | n4 | null | n5 | null
n6 | null | n7 | null | n8
Is it correct to create the array like that or there is a way to create a normal array and adjust de indexes?

Related

Compute similarity in pyspark

I have a csv file contains some data, I want select the similar data with an input.
my data is like:
H1 | H2 | H3
--------+---------+----------
A | 1 | 7
B | 5 | 3
C | 7 | 2
And the data point that I want find data similar to that in my csv is like : [6, 8].
Actually I want find rows that H2 and H3 of data set is similar to input, and It return H1.
I want use pyspark and some similarity measure like Euclidean Distance, Manhattan Distance, Cosine Similarity or machine learning algorithm.

How do I implement dynamically sized data structures in MATLAB?

I am trying to implement a dynamically sized data structure in MATLAB.
I have a 2D plane with nodes. For each node I need to save the coordinates and the coordinates of the nodes around it, within a distance of e.g. 100.
Imagine a circle with a radius of 100 around each node. I want to store all nodes within this circle for every node.
For example:
-----------------------------------------------
| |
| x |
| x |
| |
| x |
| x |
| x |
| |
| x |
-----------------------------------------------
I tried to implement this as shown below. I create a NodeList which contains a NodeStruct for every node. Every NodeStruct contains the coordinates of its corresponding node, as well as an array of the nodes around it. The problem with the implementation which I had in mind is, that the variable NodeStruct.NextNode changes its size for every Node.
I have an idea on how to find all the nodes, my problem is the datastructure to store all the necessary information.
NodeList = [];
NodeStruct.Coords = [];
NodeStruct.NextNode = [];
You can create a struct array that you index as follows:
NodeStruct(3).Coords = [x,y];
NodeStruct(3).NextNode = [1,2,6,10];
However, it is likely that this is better solved with an adjacency matrix. That is an NxN matrix, with N the number of nodes, and where adj(i,j) is true if nodes i and j are within the given distance of each other. In this case, the adjacency matrix is symmetric, but it doesn't need to be if you list, for example, the 10 nearest nodes for each node. That case can also be handled with the adjacency matrix.
Given an Nx2 matrix with coordinates coord, where each row is the coordinates for one node, you can write:
dist = sqrt(sum((reshape(coord,[],1,2) - reshape(coord,1,[],2)).^2, 3));
adj = dist < 100; % or whatever your threshold is

Cosine-similarity between columns in a Spark dataframe

I have data that looks like this...
+-----------+--------------------+
| searchterm| title|
+-----------+--------------------+
|red ball |A big red ball |
|red ball |A small blue ball |
|... |... |
+-----------+--------------------+
I'm trying to find the cosine similarity between the searchterm column and the title column in Scala. I'm able to tokenize each column without issue, but most similarity implementations I have found online operate across rows and not across columns, i.e. they would compare 'a big red ball' with 'a small blue ball' rather than the cross column comparison I actually want. Any ideas? I'm very new to Scala, but this is how I would do it in Python.
def get_text_cosine_similarity(row):
# Form TF-IDF matrix
text_arr = row[['searchterm', 'title']].values
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(text_arr)
# Get cosine similarity 'score', assuming keyword is at index 0
similarity_scores = cosine_similarity(tfidf_matrix[0], tfidf_matrix)
return pd.Series(similarity_scores[0][1:])
df[['title_cs']] = df.apply(get_text_cosine_similarity, axis=1)
Using sklearn.metrics.pairwise.cosine_similarity and sklearn.feature_extraction.text.TfidfVectorizer
You could transpose the matrix and then do the cosine similarity

Postgresql Geometric Line - Finding the Y intercept

I have an model that can be represented by multiple linear segments as below:
Y
| _/_________\_
| / \
| / \
| / \
|/ \
|
|________________________ X
I would need to find the value of Y for a given value of X
My initial though was to store each segment as a relational line type {A, B, C}. However I'm not sure what would that buy me in terms of finding a proper query to retrieve the Y value.
Since you are working with linear segments, you should use the lseg data type (the line data type represents a line of infinite length). Once you have your data in that format you can find the intersection of the segments with a vertical line of infinite length at the desired value of X and extract the Y value of the intersection.
CREATE TABLE segments (id int, seg lseg);
INSERT INTO segments VALUES
(1, '[(4,3), (12,15)]'), -- positively inclined line segment
(2, '[(2,19), (24,-4)]'), -- negatively inclined line segment
(3, '[(4,3), (12,3)]'), -- horizontal line segment
(4, '[(5,3), (5,15)]'), -- vertical line segment, collinear at X=5
(5, '[(4,3), (4,15)]'); -- vertical line segment, no intersection at X=5
and then:
test=# SELECT id, 5 AS x, (seg # '((5,-999999999), (5,999999999))'::lseg)[1] AS y
test-# FROM segments;
id | x | y
----+---+------------------
1 | 5 | 4.5
2 | 5 | 15.8636363636364
3 | 5 | 3
4 | 5 |
5 | 5 |
(5 rows)
As is obvious from the above, collinear line segments (i.e. vertical line segments with the same value for X) and segments without intersection return NULL for Y.

Fast way for solving symbolic system of equations in Matlab

I have a system of equations with 100001 variables (x1 through x100000 and alpha) and exactly that many equations. Is there a computationally efficient way, in Matlab or otherwise, to solve this system of equations. I know of the solve() command, but I'm wondering if there is something that will run faster. The equations are of the form:
1.) -x1 + alpha * (x4 + x872 + x9932) = 0
.
.
.
100000.) -x100000 + alpha * (x38772 + x95) = 0
In other words, the i^th equation has variable xi with coefficient -1 added to alpha * (sum of some other variables) equals 0. The final equation is just that x1 + ... + x100000 = 1.
The Math Part
This system may be always brought to the eigen[value/vector] equation canonical form:
**A***x* = λx
where A is your system's matrix, and x = [x1; x2; ...; x100000]. Taking the example from this question, the system may be written down as:
/ \ / \ / \
| 0 1 0 0 0 | | x1 | | x1 |
| 0 0 1 0 1 | | x2 | | x2 |
| 1 0 0 0 0 | x | x3 | = (1/alpha) | x3 |
| 0 0 1 0 0 | | x4 | | x4 |
| 0 1 0 1 0 | | x5 | | x5 |
\ / \ / \ /
This means that your eigenvalues λ = 1/α. Of course, you should beware of complex eigenvalues (unless you really want to take them into account).
The Matlab part
Well this is much to your taste and skills. You can always find the eigenvalues of a matrix with eig(). Better to use sparse matrices (memory economy):
N = 100000;
A = sparse(N,N);
% Here's your code to set A's values
A_lambda = eig(A);
ieps= 0e-6; % below this threshold imaginary part is considered null
alpha = real(1 ./ (A_lambda(arrayfun(#(x) imag(x)<ieps, A_lambda)))); % Chose Real. Choose Life. Choose a job. Choose a career. Choose a family. Choose a f****** big television, choose washing machines, cars, compact disc players and electrical tin openers. Choose good health, low cholesterol, and dental insurance. Choose fixed interest mortgage repayments. Choose a starter home. Choose your friends. Choose leisurewear and matching luggage. Choose a three-piece suit on hire purchase in a range of f****** fabrics. Choose DIY and wondering who the f*** you are on a Sunday morning. Choose sitting on that couch watching mind-numbing, spirit-crushing game shows, stuffing f****** junk food into your mouth. Choose rotting away at the end of it all, pissing your last in a miserable home, nothing more than an embarrassment to the selfish, f***** up brats you spawned to replace yourself. Chose life.
% Now do your stuff with alpha here
But, mind this: numerically solving large eigenvalue equations might give you complex values where real are expected. Tweak your ieps to sensible values, if you don't find anything in the beginning.
To find the eigenvectors, just take one out of the system and solve for the rest by means of Cramer's rule. The norm them to one if you wish so.