How to save cluster assignments in output file using Weka clustering XMeans? - cluster-analysis

Context
I want to use Weka clustering algorithm XMeans. However I cannot figure out how to obtain cluster assignments from GUI of Weka.
At the moment I can only see a list of cluster IDs along with percentage of entries assigned to each cluster.
Question
There any way to save cluster assignments for each entry in, e.g. CSV format?

Do everything in the "Preprocess Panel".
This is one way to do this:
Load Data File.
Remove any Classification Attribute or Identifiers
Choose Preprocess / Filter / Unsupervised attribute Filter /
AddCLuster
Click on the Word "AddCluster", choose the XMeans Clusterer, click
Apply.
This sghould add a new column "cluster" in the Attribute Panel
Click on "Save..." Button to export.

Related

Share kusto variables between grafana panels

Well, let's say that I have the query from my previous question: How to do multi graph time series on Grafana with Kusto
Then I'd like to consume the tiemposCicloBruto variable from one panel to another in order to avoid repeating queries.
I saw: https://grafana.com/blog/2020/10/14/learn-grafana-share-query-results-between-panels-to-reduce-load-time/
But there isn't any way to share variables at all...
I also tried it as a dashboard variable, but it doesn't seem to support tabular expressions at all...
You can share only input variables across dashboard panels. Variables work as primitive text substitution in one direction (from dashboard to query), and do not take into account any context in your query language.
Your link tells about sharing results of the query between different panels. If exact same result set returned to a panel fits your needs, you can reuse it "for free", without putting load on the database. You don't need to save it into any variable, you just set it as a pseudo-datasource and you get the result immediately.
You can factor this feature into design of you panels. Examples could be:
time series plus histogram visualizations of the same data;
time-series chart plus a panel with latest readings (or use other Grafana reduce expressions).

How to use the User Input node in SPSS modeler?

I am trying to make my SPSS flow a bit more dynamic. Although there is no way (that I am aware of) to take an input, is there a way to use User Input node to take some parameter values and use these parameters in another select node to perform tests on data.
What I am trying to achieve is run the same flow with some minor parameter changes. I have the flow running for some static values used in select nodes. It would be great if there was a way to change these static select nodes, even if it's not with a User Input node.
I believe what you are looking for is actually to make a parameter in SPSS modeler and then enable/check the Prompt? which will ask the user for input every time the stream is executed.
In SPSS Modeler 18.2 onwards I was able to see the prompt option as the first feature of parameter tab.

Best Practice to Store Simulation Results

Dear Anylogic Community,
I am struggling with finding the right approach for storing my simulation results. I have datasets created that keep track of every value I am interested in. They live in Main (see below)
My aim is to do a parameter variation experiment. In every run, I change the value for p_nDrones (see below)
After the experiment, I would like to store all the datasets in one excel sheet.
However, when I do the parameter variation experiment and afterwards check the log of the dataset (datasets_log), the changed values do not even show up (2 is the value I did set up in the normal simulation).
Now my question. Do I need to create another type of dataset if I want to track the values that are produced in the experiments? Why are they not stored after executing the experiment?
I really would appreciate if someone could share the best way to set up this export of experiment results. I would like to store the whole time series for every dataset.
Thank you!
Best option would be to write the outputs to some external file at the end of each model run.
If you want to use Excel, which I personally would not advise, even though it has a nice excelFile.writeDataSet() function, you can.
I would rather write the data to a text file as you will have much for control over the writing, the file itself, it is thread-safe, and useable in many many more platforms than Microsoft Excel.
See my example below:
Setup parameters in your model that you will write the data to at the end of the model of type TextFile. Here I used the model on destroy code to write out the data from the data sets.
Here you can immediately see the benefit of using the text file! You can add the number of drones we are simulating (or scenario name or any other parameter) in a column, whereas with Excel this would be a pain...
Now you can pass your specific text file to the model to use by adding it to the parameter variation page, providing it to the model through the parameters.
You will see that I also set up some headers for the text file in the Initial Experiment setup part, and then at the very end of the experiment, I close the text files in the After experiment section so that the text files can be used.
Here is the result if you simply right-click on the text files and open them in Excel. (Excel will always have a purpose, even if it is just to open text files ;-) )

Assigning bins to records in CHAID model

I built a custom CHAID tree in SPSS modeler. I would like to assign the particular terminal nodes to all of the records in the dataset. How would I go about doing this from within the software?
Assuming that you used the regular node called CHAID, if you select inside the diamond icon (created chaid model) in the tab configurations the rule identifyer, the output will add another variable called $RI-XXX that will classify all the records within the terminal nodes. Just check that option and then put a table node after that and all the records will be classified.
You just need to apply the algorithm to whatever data set you need, and you only need to inputs to be the same (type and eventually storage).
The diamond contains the algo and you can disconnect it and connects to whatever you want.
http://beyondthearc.com/blog/wp-content/uploads/2015/02/spss.png

Inconsistent output from DBSCAN implementation in Weka

I am using the DBSCAN implementation in Weka and it seems to be giving me different results based on whether I select "Use training set" or "Classes to clusters evaluation" as the 'Cluster mode'. As per the documentation here, selecting "Classes to clusters evaluation" should only change the metrics reported.
With DBSCAN however I actually see a different number of clusters. Here's a way to reproduce the problem:
Load the IRIS dataset: Select the "Preprocess" tab, click "Open file", go to the "data" folder inside your Weka installation and load the "iris" dataset.
Go over to the "Cluster" tab and choose DBSCAN. Set epsilon=0.5 and minpts=5.
In cluster mode, select the radio button "Use training set" and Start the clustering. Look for the string "Number of generated clusters" - this number is 3 for me.
Now select the radio mode to "Classes to clusters evaluation" and re-run the clustering. I get 1 cluster now.
Is this expected behavior? Am I missing something?
What I seemed to be missing was with the "Use training set" setting all attributes including the class-label, are used. If I explicitly remove the class, the results match.