Calculating transpose of a tensor in Paraview - paraview

I am required to calculate the following in Paraview:
How can I calculate the transpose used in the above formula ? Basically I would like to know how to calculate the transpose of a matrix in Paraview.

As suggested by #Nico Vuaille, you should make use of Numpy support in ParaView. Simply apply a Programmable Filter to the dataset of interest, and supply a script comparable to the following.
import numpy as np
u = inputs[0].PointData['Velocity']
# Calculate gradient here, say uGrad
output.PointData.append(uGrad, 'Gradient')
EDIT: I have actually tried to generate your calculation with one of my datasets and realised that my answer and comments are not so helpful. Therefore, this is what I would suggest now, which should work:
Load your dataset in ParaView
Apply a Gradient / Gradient Of Unstructured Dataset filter on your dataset and select the velocity field as the input field (I used Gradient Of Unstructured Dataset, from which you have the possibility to also directly work out both divergence and vorticity fields).
Apply a Programmable Filter filter to the resulting dataset you obtained from the previous step and supply the code below.
Script
import numpy as np
grad = inputs[0].PointData['Gradients']
omega = (grad - np.transpose(grad, axes=(0, 2, 1))) / 2
output.PointData.append(omega, 'Omega')
You should end up with another item in your ParaView pipeline that only contains the expected Omega.
EDIT 2: The input file is using the XMDF format. When loaded into ParaView, it is interpreted as a Multi-Block Dataset of Blocks. As a result, the code snippet provided to the Script argument of Programmable Filter has to be updated to:
import paraview.vtk.numpy_interface.dataset_adapter as dsa
for i in range(inputs[0].GetNumberOfBlocks()):
data = dsa.WrapDataObject(inputs[0].GetBlock(i))
grad = data.PointData['Gradients']
omega = (grad - np.transpose(grad, axes=(0, 2, 1))) / 2
data.PointData.append(omega, 'Omega')
output.SetBlock(i, data.VTKObject)

I think this can be easily computed using Python calculator (no need for programmable filter):
To compute the gradient, type:
gradient(u)
To compute the symmetric part of the tensor gradient(u):
strain(u)
To compute the non-symmetric part, Omega, of the gradient tensor:
gradient(u) - strain(u)
Note that that the gradient(u) tensor can be written as follows:

Related

Feature Selection in Multivariate Linear Regression

import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
a = make_regression(n_samples=300,n_features=5,noise=5)
df1 = pd.DataFrame(a[0])
df1 = pd.concat([df1,pd.DataFrame(a[1].T)],axis=1,ignore_index=True)
df1.rename(columns={0:"X1",1:"X2",2:"X3",3:"X4",4:"X5",5:"Target"},inplace=True)
sns.heatmap(df1.corr(),annot=True);
Correlation Matrix
Now I can ask my question. How can I choose features that will be included in the model?
I am not that well-versed in python as I use R most of the time.
But it should be something like this:
# Create a model
model = LinearRegression()
# Call the .fit method and pass in your data
model.fit(Variables,Target)
# Or simply do
model = LinearRegression().fit(Variables,Target)
# So based on the dataset head provided, it should be
X<-df1[['X1','X2','X3','X4','X5']]
Y<-df1['Target']
model = LinearRegression().fit(X,Y)
In order to do feature selections. You need to run the model first. Then check for the p-value. Typically, a p-value of 5% (.05) or less is a good cut-off point. If the p-value crosses the upper threshold of .05, the variable is insignificant and you can remove it from your model. You will have to do this manually. You can also tell by looking from the correlation matrix to see which value has less correlation to the target. AFAIK, there are no libs with built-in functionality to do feature selection automatically. In the end, statistics are just numbers. It is up to humans to interpret the results.

Orange3 concatenate tables with different targets

I have two input datafiles to use in orange, one corresponds to the train set (with targets "A", "B" and "C") and the other to the unknown samples ( with targets "D" and "E" to be able to identify the unknown samples in the scatterplot of the two first principal components).
I have applied PCA to the train dataset and through a python script i have reapplied the PCA transformation to the test dataset, however the result have a ? in the target value for all entries in the unknown samples set.
I have tried to merge the train and unknown samples sets with the merge table widget, and apparently it does the same, all samples in train are correct, but the unknown samples have ? as targets.
The only way i managed to have this running properly is to have unknown samples and train set on the same input file. Which is not practical for obvious reasons.
Is there any way to fix this?
Please note that i have tried to change the domain.class_var and the target value directly on the transformed unknown samples, but it also alters the domain of the train dataset. Apparently when the new table is created it just have a reference to the domain of the original train data after PCA.
I have managed it by converting the data into numpy arrays concatenate them and then back to table.
Here is the code if anyone is interested:
import numpy
from Orange.data.table import Table
from Orange.data import Domain, DiscreteVariable, ContinuousVariable
trnsfrmd_knwn_data = numpy.array(in_object)
trnsfrmd_unkwn_data = numpy.array(Table(in_object.domain,in_data))
ndx = list(set(trnsfrmd_knwn_data[:,len(trnsfrmd_knwn_data[0])-1].tolist()))[-1] + 1
trnsfrmd_unkwn_data[:,len(trnsfrmd_knwn_data[0])-1] = numpy.array([i for i in range(0, len(trnsfrmd_unkwn_data))]) + ndx
targets = in_object.domain.class_var.values + in_data.domain.class_var.values
dm = Domain([ContinuousVariable(x.name) for x in in_object.domain.attributes], DiscreteVariable('region', values=targets))
out_data = Table.from_numpy(dm, numpy.append(trnsfrmd_knwn_data,trnsfrmd_unkwn_data,axis=0))

Using SparseTensor as a trainable variable?

I'm trying to use SparseTensor to represent weight variables in a fully-connected layer.
However, it seems that TensorFlow 0.8 doesn't allow to use SparseTensor as tf.Variable.
Is there any way to go around this?
I've tried
import tensorflow as tf
a = tf.constant(1)
b = tf.SparseTensor([[0,0]],[1],[1,1])
print a.__class__ # shows <class 'tensorflow.python.framework.ops.Tensor'>
print b.__class__ # shows <class 'tensorflow.python.framework.ops.SparseTensor'>
tf.Variable(a) # Variable is declared correctly
tf.Variable(b) # Fail
By the way, my ultimate goal of using SparseTensor is to permanently mask some of connections in dense form. Thus, these pruned connections are ignored while calculating and applying gradients.
In my current implementation of MLP, SparseTensor and its sparse form of matmul ops successfully reports inference outputs. However, the weights declared using SparseTensor aren't trained as training steps go.
As a workaround to your problem, you can provide a tf.Variable (until Tensorflow v0.8) for the values of a sparse tensor. The sparsity structure has to be pre-defined in that case, the weights however remain trainable.
weights = tf.Variable(<initial-value>)
sparse_var = tf.SparseTensor(<indices>, weights, <shape>) # v0.8
sparse_var = tf.SparseTensor(<indices>, tf.identity(weights), <shape>) # v0.9
TensorFlow doesn't currently support sparse tensor variables. However, it does support sparse lookups (tf.embedding_lookup) and sparse gradient updates (tf.sparse_add) of dense variables. I suspect these two will suffice your use case.
TensorFlow doesn't support training on sparse tensors yet. You can initialize a sparse tensor as you wish, then convert it into a dense tensor and create a variable from it like that:
# You need to correctly initialize the sparse tensor with indices, values and a shape
b = tf.SparseTensor(indices, values, shape)
b_dense = tf.sparse_tensor_to_dense(b)
b_variable = tf.Variable(b_dense)
Now you have initialized a sparse tensor as a variable. Now you need to take care of the gradient update (in other words, make sure the entries in the variable stay 0, since there is a non-vanishing gradient calculated in the backpropagation algorithm for them when using this naively).
In order to do this, TensorFlow optimizers have a method called tf.train.Optimizer.compute_gradients(loss, [list_of_variables]). This calculates all the gradients in the graph necessary to minimize the loss function, but doesn't apply them yet. This method returns a list of tuples in a form of (gradients, variable). You can modify these gradients freely, but in your case it makes sense to mask the gradients not needed to 0 (i.e. by creating another sparse tensor with default values 0.0 and values 1.0 where the weights in your network are present).
After having modified them, you call the optimizer method tf.train.Optimizer.apply_gradients(grads_and_vars) to actually apply the gradients. An example code would look like this:
# Create optimizer instance
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
# Get the gradients for your weights
grads_and_vars = optimizer.compute_gradients(loss, [b_variable])
# Modify the gradients at will
# In your case it would look similar to this
modified_grads_and_vars = [(tf.multiply(gv[0], mask_tensor), gv[1] for gv in grads_and_vars]
# Apply modified gradients to your model
optimizer.apply_gradients(modified_grads_and_vars)
This makes sure your entries stay 0 in your weight matrix and no unwanted connections are created. You need to take care of all the other gradients for all other variables later.
The above code works with some minor correction like this.
def optimize(loss, mask_tensor):
optimizer = tf.train.AdamOptimizer(0.001)
grads_and_vars = optimizer.compute_gradients(loss)
modified_grads_and_vars = [
(tf.multiply(gv[0], mask_tensor[gv[1]]), gv[1]) for gv in grads_and_vars
]
return optimizer.apply_gradients(modified_grads_and_vars)

Generating synthetic data of mixed type (numerical/categorical) in Matlab (or any other)

I'm trying to generate some synthetic data for experiments. When it comes to data sets with numerical features this is rather easy, I just use a Gaussian mixture (using Netlab, a package for Matlab) and that's done.
Noooww, I also need to generate some data sets with numerical and categorical features. The numerical part I can easily do using the above method, what about the categorical?
I was thinking to generate a categorical feature with (say) 3 categories with probabilities of 68.2% (+/- 1 sigma), 27.2% (between +/- 1 sigma and +/- 2 sigma), and 4.6% (the rest) within the objects with the same label.
And perhaps another categorical feature with 5 categories, with probabilities of 34.1%, 34.1%, 13.6%, 13.6%, 4.6% - again, within the objects with the same label.
Does that make sense to you guys? any thoughts?
I can easily write the code for the above, but if you know of any function that does it for me - please let me know.
Thanks!
It's easy to do in Python using numpy:
import numpy as np
np.random.multinomial(n=1, pvals=[.3,.3,.4], size=10)

Text Categorization datasets for MATLAB

I am looking for a reliable dataset for Text categorization tasks in MATLAB format.
I want to run some experiments and don't want to spend too much time in preprocessing the text and creating feature vectors. I need something to be ready so I can plug it in my algorithm. I found a MATLAB files for reuters dataset here: link text
Everything is ready in here, but I want to use a subset of this. In this "fea" contains the feature vectors for each document. However, it seems that it is not a normal matrix. I want for example to select the top 1000 documents in this "fea". If you just download it and load it into MATLAB you will see what I mean.
So, If it is possible I need a solution for the above-mentioned dataset or any alternative datasets.
Thanks in advance.
It is stored as sparse matrix. Extract the first 1000 documents (rows), and if you have enough space, you can convert it to full dense matrix:
load Reuters21578.mat
TF = full( fea(1:1000,:) );
Lets check the variables we have:
>> whos
Name Size Bytes Class Attributes
TF 1000x18933 151464000 double
fea 8293x18933 4749196 double sparse
gnd 8293x1 66344 double
testIdx 2347x1 18776 double
trainIdx 5946x1 47568 double
so you can see TF is now about 150MB.
Other than that, the rest is self-explanatory:
fea: term-frequency matrix, rows are documents, columns are terms
gnd: category of each document, where numel(unique(gnd)) == 65
trainIdx/testIdx: split of instances (documents) for classification purposes, contains indices of rows, used as: tr = fea(trainIdx,:); tt = fea(testIdx,:);