In Paraview, I am working with a dataset that uses the value -99999 as a flag value. I'd like to be able to manipulate the dataset without these values causing issues with things like glyphs and colorbars. Nominally, I'd like the data to be "ignored".
A little about the data: I've got both scalar and vector point data, sitting on a fixed 2D spatial mesh at set temporal intervals.
Although -99999 is very far beyond the values the data might otherwise show, using a threshold filter isn't an option because the flag can occur at different places at different times. The way Paraview's threshold filter works means that the point ID to a fixed point in space will change as the number of filtered points changes through time.
In case it matters, data are in a netCDF file that is read in via an XMF header file and the XDMF Reader since the CF reader doesn't work (possibly because of my unstructured triangular mesh). The netCDF data have the _FillValue global attribute, however this doesn't appear to be getting picked up on by Paraview.
You could use a Programmable Filter to replace values below -99999 by NaN. Providing the data is not a vtkMultiblockDataSet, you can use the following script in the programmable filter :
import numpy as np
from vtk.numpy_interface import dataset_adapter as dsa
# name of the array
name = 'name'
# limit
limit = -99999
array = inputs[0].PointData[name].copy()
array[array<=limit] = np.nan
out = dsa.WrapDataObject(self.GetOutput())
out.PointData.append(array, name)
Note: if data of interest is a Cell Data, replace PointData by CellData in the script.
Note 2: the script was tested on ParaView 5.6.
Related
To save the time, I'm trying to read the .mat file rather than simulate the model again.
I used scipy.io.loadmat but it didn't work well:
res = loadmat('ChatteringControl_result.mat')
res.keys()
['Aclass', 'dataInfo', 'name', 'data_2', 'data_1', 'description']
The keys are not variable names, and I don't know how to get the variable values.
Then I searched for resolutions, and found DyMat, it works well for other variables but cannot get time.
res1 = DyMat.DyMatFile('ChatteringControl_result.mat')
T = res1['T']
t = res1['time']
KeyError: 'time'
So, how can I get all the results in JModelica?(Without open Matlab of course.)Like, a built-in function in JModelica?
BIG THANKS!
To load the mat file using JModelica you can use this code:
from pyfmi.common.io import ResultDymolaBinary
res = ResultDymolaBinary("MyResult.mat")
var = res.get_variable_data("myVar")
var.t #Time trajectory
var.x #Variable trajectory
https://openmodelica.org/doc/OpenModelicaUsersGuide/latest/technical_details.html#the-matv4-result-file-format describes the format. I think you can also look in a Dymola manual for more details.
As for DyMat, there is no reason to get the time trajectory because you typically lookup what value a variable has at a certain time. The start and stop-times are in the data_1 matrix as far as I remember (or typically get it from the first trajectory in the data_2 matrix). (The data_2 matrix may be interpolated, so the time values stored in it may not reflect the actual steps taken internally by the solvers)
Unfortunately I am not too tech proficient and only have a basic MATLAB/programming background...
I have several csv data files in a folder, and would like to make a histogram plot of all of them simultaneously in order to compare them. I am not sure how to go about doing this. Some digging online gave a script:
d=dir('*.csv'); % return the list of csv files
for i=1:length(d)
m{i}=csvread(d(i).name); % put into cell array
end
The problem is I cannot now simply write histogram(m(i)) command, because m(i) is a cell type not a csv file type (I'm not sure I'm using this terminology correctly, but MATLAB definitely isn't accepting the former).
I am not quite sure how to proceed. In fact, I am not sure what exactly is the nature of the elements m(i) and what I can/cannot do with them. The histogram command wants a matrix input, so presumably I would need a 'vector of matrices' and a command which plots each of the vector elements (i.e. matrices) on a separate plot. I would have about 14 altogether, which is quite a lot and would take a long time to load, but I am not sure how to proceed more efficiently.
Generalizing the question:
I will later be writing a script to reduce the noise and smooth out the data in the csv file, and binarise it (the csv files are for noisy images with vague shapes, and I want to distinguish these shapes by setting a cut off for the pixel intensity/value in the csv matrix, such as to create a binary image showing these shapes). Ideally, I would like to apply this to all of the images in my folder at once so I can shift out which images are best for analysis. So my question is, how can I run a script with all of the csv files in my folder so that I can compare them all at once? I presume whatever technique I use for the histogram plots can apply to this too, but I am not sure.
It should probably be better to write a script which:
-makes a histogram plot and/or runs the binarising script for each csv file in the folder
-and puts all of the images into a new, designated folder, so I can sift through these.
I would greatly appreciate pointers on how to do this. As I mentioned, I am quite new to programming and am getting overwhelmed when looking at suggestions, seeing various different commands used to apparently achieve the same thing- reading several files at once.
The function csvread returns natively a matrix. I am not sure but it is possible that if some elements inside the csv file are not numbers, Matlab automatically makes a cell array out of the output. Since I don't know the structure of your csv-files I will recommend you trying out some similar functions(readtable, xlsread):
M = readtable(d(i).name) % Reads table like data, most recommended
M = xlsread(d(i).name) % Excel like structures, but works also on similar data
Try them out and let me know if it worked. If not please upload a file sample.
The function csvread(filename)
always return the matrix M that is numerical matrix and will never give the cell as return.
If you have textual data inside the .csv file, it will give you an error for not having the numerical data only. The only reason I can see for using the cell array when reading the files is if the dimensions of individual matrices read from each file are different, for example first .csv file contains data organised as 3xA, and second .csv file contains data organised as 2xB, so you can place them all into a single structure.
However, it is still possible to use histogram on cell array, by extracting the element as an array instead of extracting it as cell element.
If M is a cell matrix, there are two options for extracting the data:
M(i) and M{i}. M(i) will give you the cell element, and cannot be used for histogram, however M{i} returns element in its initial form which is numerical matrix.
TL;DR use histogram(M{i}) instead of histogram(M(i)).
I have written a algorithm, storing the data points in column. The results contains mean and std. However, if I store the data points in row instead, the results seem more close to the true value. Is there a proper explanation to this? In my opinion, the results should be independent from the way of storage and thus remain the same.
function [C]=test(n,m)
rng default
B=randn(m,n)
rng default
A=randn(n,m)
C=A'-B
end
I am using this generator to do my calculations. My results turns out to be different depending on if I use randn(m,n) or randn(n,m). (I am taking m-th column or m-row to sum them up cumulatively).
I am running kmeans in Mahout and as an output I get folders clusters-x, clusters-x-final and clusteredPoints.
If I understood well, clusters-x are centroid locations in each of iterations, clusters-x-final are final centroid locations, and clusteredPoints should be the points being clustered with cluster id and weight which represents probability of belonging to cluster (depending on the distance between point and its centroid). On the other hand, clusters-x and clusters-x-final contain clusters centroids, number of elements, features values of centroid and the radius of the cluster (distance between centroid and its farthest point.
How do I examine this outputs?
I used cluster dumper successfully for clusters-x and clusters-x-final from terminal, but when I used it clusteredPoints, I got an empty file? What seems to be the problem?
And how can I get this values from code? I mean, the centroid values and points belonging to clusters?
FOr clusteredPoint I used IntWritable as key, and WeightedPropertyVectorWritable for value, in a while loop, but it passes the loop like there are no elements in clusteredPoints?
This is even more strange because the file that I get with clusterDumper is empty?
What could be the problem?
Any help would be greatly appreciated!
I believe your interpretation of the data is correct (I've only been working with Mahout for ~3 weeks, so someone more seasoned should probably weigh in on this).
As far as linking points back to the input that created them I've used NamedVector, where the name is the key for the vector. When you read one of the generated points files (clusteredPoints) you can convert each row (point vector) back into a NamedVector and retrieve the name using .getName().
Update in response to comment
When you initially read your data into Mahout, you convert it into a collection of vectors with which you then write to a file (points) for use in the clustering algorithms later. Mahout gives you several Vector types which you can use, but they also give you access to a Vector wrapper class called NamedVector which will allow you to identify each vector.
For example, you could create each NamedVector as follows:
NamedVector nVec = new NamedVector(
new SequentialAccessSparseVector(vectorDimensions),
vectorName
);
Then you write your collection of NamedVectors to file with something like:
SequenceFile.Writer writer = new SequenceFile.Writer(...);
VectorWritable writable = new VectorWritable();
// the next two lines will be in a loop, but I'm omitting it for clarity
writable.set(nVec);
writer.append(new Text(nVec.getName()), nVec);
You can now use this file as input to one of the clustering algorithms.
After having run one of the clustering algorithms with your points file, it will have generated yet another points file, but it will be in a directory named clusteredPoints.
You can then read in this points file and extract the name you associated to each vector. It'll look something like this:
IntWritable clusterId = new IntWritable();
WeightedPropertyVectorWritable vector = new WeightedPropertyVectorWritable();
while (reader.next(clusterId, vector))
{
NamedVector nVec = (NamedVector)vector.getVector();
// you now have access to the original name using nVec.getName()
}
check the parameter named "clusterClassificationThreshold".
clusterClassificationThreshold should be 0.
You can check this http://mail-archives.apache.org/mod_mbox/mahout-user/201211.mbox/%3C50B62629.5020700#windwardsolutions.com%3E
I am trying to load feature vectors into classifiers such as a k-nearest neighbors classifier.
I have my code for GLCM, so I get contrast, correlation, energy, homogeneity in numbers (feature vectors).
My question is, how can I save every set of feature vectors from all the training images? I have seen somewhere that people had a .set file to load into classifiers (may be it is a special case for the particular classifier toolbox).
load 'mydata.set';
for example.
I suppose it does not have to be a .set file.
I'd just need a way to store all the feature vectors from all the training images in a separate file that can be loaded.
I've google,
and I found this that may be useful
but I am not entirely sure.
Thanks for your time and help in advance.
Regards.
If you arrange your feature vectors as the columns of an array called X, then just issue the command
save('some_description.mat','X');
Alternatively, if you want the save file to be readable, say in ASCII, then just use this instead:
save('some_description.txt', 'X', '-ASCII');
Later, when you want to re-use the data, just say
var = {'X'}; % <-- You can modify this if you want to load multiple variables.
load('some_description.mat', var{:});
load('some_description.txt', var{:}); % <-- Use this if you saved to .txt file.
Then the variable named 'X' will be loaded into the workspace and its columns will be the same feature vectors you computed before.
You will want to replace the some_description part of each file name above and instead use something that allows you to easily identify which data set's feature vectors are saved in the file (if you have multiple data sets). Your array of feature vectors may also be called something besides X, so you can change the name accordingly.