Clustering - Which target variable for One-Hot-Enoding? - cluster-analysis

I would like to apply a cluster algorithm to my data frame, however I have some nominal scaled variables. Consequently, I would like to apply one-hot encoding so that I can also use, for example, k-means clustering. I'm aware, that there are also other and maybe also better algorithms than k-means, however I want to start with this and use the results as benchmark.
There are several possibilities, e.g. the packages Caret and Recipes offer functions for this. However, these require the definition of a target variable, which then no longer appears in the data frame. Although I theoretically have a target variable in my data set, I would rather keep it as a predictor and overweight it, so that the different clusters contain only one instance of the target variable. Consequently, I need to select another variable and specify it as the target variable in the formula interface.
I would therefore like to ask whether it then doesn't matter which variable one takes for this or whether I actually have to take my actual target variable and can still weight it somehow afterwards.
I've also seen a method there no target variable is defined in the formula interface. Is this a recommandable approach or is it preferred to define a target variable?
I would be very happy about an answer!
Many greetings and thanks in advance!

Related

integrate Modelica variable without influencing state selection

I want to integrate a Modelica variable over time, just for convenience in plotting and post-processing. The variable I want to integrate over time is the power of a compressor so that I get the total energy. The first idea would be to add these lines:
Modelica.Units.SI.Power P_comp;
Modelica.Units.SI.Energy E_comp;
equation
P_comp = der(E_comp);
Is that the recommended way, or are there (better?) alternatives? Is it expected to influence the selection of dynamic states?
Assuming that those two lines are the only ones using E_comp that should work.
Basically E_comp will be part of its own separate state-selection block and changes there shouldn't influence anything else.
However, state selection consists of a number of algorithms and heuristics so it is difficult to formally guarantee that any change does not influence it.
I could imagine some strange possibilities that would break this, but I don't think anyone has implemented them - and I don't see a use-case for them (except to mess up cases like this).
And if you instead of integrating want to differentiate a signal it is a lot messier.

A general question about Modelica initialization

How to set values to all the variables that could be possibly used as iteration variables, for example, there is a heat exchanger which includes a few connectors, and each connector includes a few variables, I can't know which variables could be used as iteration variables, when dealing with initialization, do I need to set values to every variable so that no matter which variable is chosen as iteration variable, there is a reasonable value?
Marvel,
I think that you are a bit on the wrong track for finding a solution: setting values to all variables that possibly could become iteration variables is often too many, and will lead to errors and problems. But I think I can give you some useful advice in any case.
Alias variables: there are many alias variable sin Modelica models. You should always try to only select one of them to set start values.
Feedback between start values and iteration variables: most Modelica tools will prefer to select iteration variables that have start values set. Selecting fewer thus can guide the algorithm towards selecting good one. Therefore: don't overdo it.
General advice for selecting iteration variables. For a pure ODE, the states will always be a complete set of start variables, even if sometimes not the best one. For DAE you can start with the following exercise: think of all equations that result from a singular perturbation of the complete physics as differential equations with states. For example, in a heat exchanger, you need to consider the dynamic momentum balance and not the most often used static reduction to an algebraic pressure loss only, i.e. add the mass flow as a state. Similar in chemical reactions: think of it as Kinetics, not equilibrium reactions. That gives you a pretty good starting point, even though often not the best one.
If your troubles don't quite resolve from that, I recommend that you contact us via www.modelon.com: we have advanced ways of dealing with hard initialization and steady state problems in our Modelic tool. :-)
There is also a simplest way to answer your question, working quite well with fluid models.
Giving the fact that you are using a dynamic model, what you need to initialize are the state variables of your system. To know the state variables, either you know the type of model you are wirking with or you can dig through them using options like 'List continuous time states selected' in Dymola (I do not know about other tools), giving you the state variables in the translation log.
In case of fluid models, most of the times those are pressure and energy (enthalpy or temperature). All other variables will be calculated based on them.
For complex (or not) models, this approach show limits, which can sometimes be solved by changing/correcting the structure of the model.
Static models are something else...
Hope this can help :)

Labview - Managing large numbers of constants

This is more of a formatting problem than code logic and probably seems silly (considering I've seen far more dense block diagrams). I'm working with a lot of numeric constants and they're starting to clutter my Block Diagram. Is there something I can use to group them nice and compactly?
Preferably I would like to avoid clustering them because I would need to bundle and unbundle every time I needed access.
EDIT: Picture of code in question (code segment is used repeatedly, so would be nice to have a more compact case structure)
I think you should rethink how much of your block diagram you expect to devote to constants :-)
Using numbers directly in code, the equivalent of unlabelled constants on the LabVIEW block diagram, is a recognised anti-pattern. Unless the reason for the constant value is both obvious and fundamental to the operation being carried out, anyone looking at your code (including you, any time after a couple of weeks since you wrote it) will not understand why the value was chosen. Therefore, you should make this clear by labelling the constant somehow (equivalent to assigning it to a name in a text language) and also make it easy to change the value if necessary.
It's usually clear what a 0 or 1 constant is doing there but in the code image you've posted you have two constants of 1000 and one of 999. Why is it 1000, and if I decide that it should be (say) 2000 instead, do I need to update the other two values as well? If so you should define it once, label it with a suitable name describing what it is (in your example it might be chunk size or something) and wire that value to wherever you need to use it. Where you have a constant 999 you could get that value with a Decrement function, or you could also change your Greater Than function to a Greater or Equal and compare directly with the 1000 value. In this way your initial constant definition will take up more space because of the label, but you'll save space and improve maintainability by wiring that value to wherever you need it rather than placing additional constants.
If you need to refer to the same constants in multiple places on your block diagram, you can place the constants (and just the constants, not any other program logic) in a subVI, with each constant wired to an indicator with a suitable label, and each indicator wired to a different output on the connector pane. When you hover the wiring tool over the SubVI's terminals you'll see the label in the tip-strip. Alternatively, especially if you need loads of different constant values, you can do the same thing but in your SubVI bundle the different constants into a named cluster (which you save as a typedef), and then use Unbundle by Name to access specific constant values from the cluster where you need them. Again this doesn't necessarily save block diagram space, but it does make your code more readable and maintainable.
Simple answer was to reorganize my block diagram making more space for the constants. Dave_St suggested creating subvi's for the case structures for anyone looking for alternatives. Wanted to mark this as resolved regardless.

Define Model Parameter as Variable

I am attempting to define the parameter of a model (block) as a variable. For example:
Real WallThickness = 0.5;
Real WallConductance = 10*WallThickness;
Modelica.Thermal.HeatTransfer.Components.ThermalConductor TopPanelConductor(G=WallConductance);
I would like to define "G" so that it remains constant throughout the simulation but the coefficient is updated prior to the simulation based on the other variable "WallThickness". When defining the ThermalConductor parameter "G" as a variable in the model, which is being calculated elsewhere, I get the error message:
The variability of the definition equation:
TopPanelConductor.G = WallConductance;
is higher than the declared variability of the variables.
I would like to define the parameters of a model as a variable. This allows me to create parametric definitions as the geometry of the all changes. Is there a way I can make this definition work?
You mean the geometry changes during simulation? If so, you'll have to rewrite the ThermalConductor model to work with a variable G, because a variable cannot be assigned to a parameter. A variable may vary during the course of simulation. A parameter is fixed at the start of simulation, but can be changed from run to run without recompiling the model, which allows for quicker iteration/design work.
Note that you can also calculate a parameter from other parameters that you define, e.g. to calculate a heat transfer coefficient from a given wall thickness (which you vary from simulation run to simulation run).
An alternative to re-writing the component models is to make the parameter study/variation outside the simulation model. There are at least three approaches:
Export your system model as an FMU (Co-simulation). Import it in Python w. PyFmi and write for loops that vary the parameter value for each iteration. See for example http://www.jmodelica.org/assimulo_home/pyfmi_1.0/pyfmi.examples.html. This is not as complicated as it might sound.
Make the parameter variation loop in a Modelica Script (mos file). I don't have much experience with this though.
If you are varying geometrical parameters in order to find an optimum of some kind you can use the Optimization Library which is shipped with Dymola (as of version 2017 FD01).
Using one of the above suggestions you can reuse all the components from MSL out of the box.
Best regards,
Rene Just Nielsen
There is a heirachery for varaibales/parameters that restrict their use. As you are now aware, parameters are not permitted to vary with within the simulations. Thus, you get the error stating that you are trying to define a parameter with a variable value or input variable.
If you need that functionality I would recommend duplicating the ThermalConductor and change the variable type:
parameter Modelica.SIunits.ThermalConductance G
"Constant thermal conductance of material";
to
input Modelica.SIunits.ThermalConductance G
"Constant thermal conductance of material" annotation (Dialog(group=”Input Variables”));
That all there is to it. Note the additional annotation on the input variable. By default inputs do not show up in the parameter GUI. The annotation will permit them to be seen just like parameters (be careful to clearly label it an input variable versus a parameter though!)
There is work underway that has completely redone the Thermal library but is not yet released and the most-straightforward approach would probably try what I have discussed.

PID filter coefficient output minimum, maximum and parameter attributes

I am trying to find more information on making a custom PID block in MATLAB. I have most of it done but there are a few parameters that I don't really understand and as such I don't know what value to give them. NOTE I am NOT asking for help tuning PID gains.
They are all inside the filter coefficient block:
When I open the block I have to set a few parameters (output min/max, data type, parameter min/max, etc.). Can someone explain to me what these mean? I can't find good resources anywhere. The only thing that I've tried which works is setting each to [] (i.e. -inf) and the input/output data types to 'Inherit: Inherit via internal rule' but then my output goes to hell. If I copy paste the blocks from the PID block there are a bunch of variables which I haven't defined anywhere so the program won't even compile.
Can someone point out some good resources for this or else explain it? Thanks!
You should get your blocks from the standard Simulink library, not from under the PID block mask. The ones under the mask have been set-up to use variables that are passed from/through the mask, which you are not doing.
The block you have circled is just a gain block (from the Math library).
You most likely won't need to make any changes to the default settings of the block other than the constant value (which needs to be the N that you want to use in the approximation of the derivative term in your controller).
To answer your specific question about what the parameters are, some of them are used to specify data types (if you don't want to use the default double precision), some are only used in code generation, some others only for other specific tasks.
All of them are described (in more, or sometimes less, detail) in the doc for the block, obtained by pressing the help button on the block's dialog.