I am a new SPSS Modeler user. I have tried to build a simple decision-tree model for predicting safe births using only a couple of input variables. In the decision tree that gets produced, and the resulting table, only the binary outcome variable seems to have been read, saved, and run. I have tried re-marking the input variables and the binary (flag) outcome variable, causing them to be read, then saved, then the whole model re-run. But the model does not seem to change. Hoping someone can point me to what I am overlooking.
Related
I am trying to compare model versions and extract what is added/deleted/modified by querying the difference tree. The difference tree can be programmatically obtained through slxmlcomp Doc
Edits = slxmlcomp.compare(ModelA_before, ModelA_after).
Is there a neat way to compare an empty model with a model? For instance, I want to compare a Simulink model file that doesnot exist before and now a new Simulink model file is added.
Current Approach: The Edits Object (which is read only ) will have a left and right tree. I need the left tree to have practically nothing (or root only.) I have been doing the comparison by comparing the added file vs deleting content . This way of comparison still leaves behind configuration being modified rather than added.
I've generated a PySpark Word2Vec model like so:
from pyspark.ml.feature import Word2Vec
w2v = Word2Vec(vectorSize=100, minCount=1, inputCol='words', outputCol = 'vector')
model = w2v.fit(df)
(The data that I used to train the model on isn't relevant, what's important is that its all in the right format and successfully yields a pyspark.ml.feature.Word2VecModel object.)
Now I need to convert this model to a Gensim Word2Vec model. How would I go about this?
If you still have the training data, re-training the gensim Word2Vec model may be the most straightforward approach.
If you only need the word-vectors, perhaps PySpark's model can export them in the word2vec.c format that gensim can load with .load_word2vec_format().
The only reason to port the model would be to continue training. Such incremental training, while possible, involves considering a lot of tradeoffs in balancing the influence of the older and later training to get good results.
If you are in fact wanting to do this conversion in order to do more training in such a manner, it again suggests that using the original training to reproduce a similar model could be plausible.
But, if you have to convert the model, the general approach would be to study the source code and internal data structures of the two models, to discover how they alternatively represent each of the key aspects of the model:
the known word-vectors (model.wv.vectors in gensim)
the known-vocabulary of words, including stats about word-frequencies and the position of individual words (model.wv.vocab in gensim)
the hidden-to-output weights of the model (`model.trainables' and its properties in gensim)
other model properties describing the model's modes & metaparameters
A reasonable interactive approach could be:
Write some acceptance tests that take models of both types, and test whether they are truly 'equivalent' for your purposes. (This is relatively easy for just checking if the vectors for individual words are present and identical, but nearly as hard as the conversion itself for verifying other ready-to-be-trained-more behaviors.)
Then, in an interactive notebook, load the source model, and also create a dummy gensim model with the same vocabulary size. Consulting the source code, write Python statements to iteratively copy/transform key properties over from the source into the target, repeatedly testing if they verify as equivalent.
When they do, take those steps you did manually and combine them into a utility method to do the conversion. Again verify its operation then try using the converted model however you'd hoped – perhaps discovering overlooked info or discovering other bugs in the process, and then improving the verification method and conversion method.
It's possible that the PySpark model will be missing things the gensim model expects, which might require synthesizing workable replacement values.
Good luck! (But re-train the gensim model from the original data if you want things to just be straightforward and work.)
I'm working on an extensive Matlab based GUI that was created with GUIDE. Saving the full state of a GUI seems to be a laborious task as it is generally impossible to efficiently make a self-contained copy of the handles structure. From what I've gathered in my web searches, the current work-around is to manually create a new structure and store the necessary properties of all the uicontrols in the GUI in appropriately named fields. For example if there's a uitable in the GUI, you might want to include in the new structure a field called tabledata where you store the Data from the uitable. The idea is then to save this new structure to a .mat file and load the state of the GUI again by reading this file and doing the inverse exercise of manually copying fields.
I called the above a work-around instead of a solution because it's quite laborious for a large GUI. If anyone has a better/quicker/shorter/cleaner way to do this, please feel free to share! I've come up with a shorter and from some points of view cleaner way myself, but there are a few reasons why I might still prefer the above work-around. In any case my question is about that work-around.
The biggest problem with it is that your saving and loading code must be inclusive: every value and property that can be adjusted by the user should be saved into the newly created structure. For a large GUI it's a real pain, nigh on impossible, to find out which values and properties are adjustable by manually checking everything. Especially with the properties of one uicontrol possibly influencing certain properties of one or more other uicontrols. So my question is: is there a way to get an overview of all the values and properties that a user can influence for a given GUI?
Look into a function called uiinspect coded by Yair Altman. It produces a list of all methods, callbacks, and properties.
A full explanation of how the function work s is located here.
I have a simulink model. The model has a sorted order(order of execution). When i save a model to file .mdl there is no a information about a sorted order.
I tried to save it to rtf file (File -> Reports -> System Design Description) but i expect more parseable format.
Are there any ways to save this order to any file?
Thanks
I don't think so, as the sorted order gets determined on compilation of the model. It isn't a property of the model as such, rather a consequence of how the model is constructed. You can't save that sort of information to a file.
EDIT:
I stand corrected. There is a way to access the information, but you need to use the Simulink debugger. For more details, see slist.
I am having issues with 'data overload' while processing point cloud data in MATLAB. This is what I am currently doing:
I begin with my raw data files, each in the order of ~30Mb each.
I then do initial processing on them to extract n individual objects and remove outlying points, which are all combined into a 1 x n structure, testset, saved into testset.mat (~100Mb).
So far so good. Now things become complicated:
For each point in each object in testset, I will compute one of a number of features, which ends up being a matrix of some size (for each point). The size of the matrix, and some other properties of the computation, are parameters of the calculations. I save these computed features in a 1 x n cell array, each cell of which contains an array of the matrices for each point.
I then save this cell array in a .mat file, where the name specified the parameters, the name of the test data used and the types of features extracted. For example:
testset_feature_type_A_5x5_0.2x0.2_alpha_3_beta_4.mat
Now for each of these files, I then do some further processing (using a classification algorithm). Again there are more parameters to set.
So now I am in a tricky situation, where each final piece of the initial data has come through some path, but the path taken (and the parameters set along that path) are not intrinsically held with the data itself.
So my question is:
Is there a better way to do this? Can anyone who has experience in working with large datasets in MATLAB suggest a way to store the data and the parameter settings more efficiently, and more integrally?
Ideally, I would be able to look up a certain piece of data without having to use regex on the file strings—but there is also an incentive to keep individually processed files separate to save system memory when loading them in (and to help prevent corruption).
The time taken for each calculation (some ~2 hours) prohibits computing data 'on the fly'.
For a similar problem, I have created a class structure that does the following:
Each object is linked to a raw data file
For each processing step, there is a property
The set method of the properties saves the data to file (in a directory with the same name as
the raw data file), stores the file name, and updates a "status" property to indicate that this step is done.
The get method of the properties loads the data if the file name has been stored and the status indicates "done".
Finally, the objects can be saved/loaded, so that I can do some processing now, save the object, later load it and I immediately know how far along the particular data set is in the processing pipeline.
Thus, the only data in memory is the data that is currently being worked on, and you can easily know which data set is at which processing stage. Furthermore, if you set up your methods to accept arrays of objects, you can do very convenient batch processing.
I'm not completely sure if this is what you need, but the save command allows you to store multiple variables inside a single .mat file. If your parameter settings are, for example, stored in an array, then you can save this together with the data set in a single .mat file. Upon loading the file, both the dataset and the array with parameters are restored.
Or do you want to be able to load the parameters without loading the file? Then I would personally opt for the cheap solution of having a second set of files with just the parameters (but similar filenames).