How to perform simple linear regression with Orange - linear-regression

I'm learning Orange and I want to perform a super simple task: simple linear regression with made up data points. I want to start from scratch, using data generated with the Paint Data tool. This is the workflow I have created.
What's wrong with this approach? Why do I get the error? It must be somehow related to the use of Paint Data tool, that perhaps requires some sort of processing? I inspected the data with Scatter Plot and it looks as expected.

Reading the error message could help. :) It says "Training data requires a target variable". In a more statistical terminology, Paint data outputs two independent variables, and although they are named x and y, and y is a name we often use for dependent variable, Test and Score won't do the guesswork.
Insert a "Select Columns" widget between Paint Data and Test and Score, and drag "y" to "Target variable".

Related

How to run parameters optimization for my model using the GPU

Essentially, I have a model with given a set of parameters is able to calculate different thermodynamic properties for different compounds, let's say liquid densities and vapor pressures for example.
When I want to regress the model parameters (e.g. a,b,c,d,e) by fitting data for different compounds, I usually do a lot of sequential operations that I am sure I can easily improve their efficiency. I am thinking about multiobjective optimization or even better-using GPU or multicores of the CPU. But I am a bit lost on where to start from reading just the documentation.
So within my objetive function I usually have something like:
[fval]= objective_function(a,b,c,d,e)
input_comp1=f(constants,a,b,c,d,e)
input_comp2=f(constants,a,b,c,d,e)
exp_density1=load some text file or so.
exp_density2=load some text file or so.
exp_vaporpressure1=load some text file or so.
exp_vaporpressure2=load some text file or so.
densities_1=density(a,b,c,d,e,input_comp1)
vapor_pressures1=vapor_pressures(a,b,c,d,e,input_comp1)
densities_2=density(a,b,c,d,e,input_comp2)
vapor_pressures=vapor_pressures (a,b,c,d,e,input_comp2)
ARD_d1=expression for deviations between experimental and calculated values for density of comp.1
ARD_d2=...
ARD_p1=...
ARD_p2=...
fval= ARD_d1+ARD_d2+ARD_p1+ARD_p2
Which is then evaluated by something like fminsearch but I have also used others in the past, fminsearch worked the best for me. When I do this for just one component It works fast enough for my purpose (but I am a patient man). But now I've extended the model in a way that I need to regress parameters simultaneous from more than one component and it becomes impossible.
I am quite sure this way of doing the calculations can be improved because I can run the calculations for the different compounds simultaneous instead of doing them sequentially, and then evaluate fval when the calculations for all components are done. But how?

How to plot all the stream lines in paraview?

I am simulating the case "Cavity driven lid" and I try to get all the stream lines with the stream tracer of paraview, but I only get the ones that intersect the reference line, and because of that there are vortices that are not visible. How can I see all the stream-lines in the domain?
Thanks a lot in adavance.
To add a little bit to Mathieu's answer, if you really want streamlines everywhere, then you can create a Stream Tracer With Custom Source (as Mathieu suggested) and set your data to both the Input and the Seed Source. That will create a streamline originating from every point in your dataset, which is pretty much what you asked for.
However, while you can do this, you will probably not be happy with the results. First of all, unless your data is trivially small, this will take a long time to compute and create a large amount of data. Even worse, the result will be so dense that you won't be able to see anything. You will get all those interesting streamlines through vortices, but they will be completely hidden by all the boring streamlines around them.
Thus, you are better off with trying to derive a data set that contains seed points that are likely to trace a stream through the vortices that you are interested in. One thing you might want to try is to compute the vorticity of your vector field (Gradient Of Unstructured Data Set when turning on advanced option Compute Vorticity), find the magnitude of that (Calculator), and then use the Threshold filter to pull out the cells with large vorticity. Then use that as your Seed Source.
Another (probably better) option if your data is 2D or you can extract an interesting surface along the flow of your data is to use the Surface LIC plugin. Details can be found at https://www.paraview.org/Wiki/ParaView/Line_Integral_Convolution.
You have to choose a representative source for your streamline.
You could use a "Sphere Source", so in the StreamTracer properties.
If that fails, you can use a StreamTracerWithCustomSource and use your own source that you will have to create yourself first.

How to test cntk object detection example on custom image?

I am trying to run CNTK object detecion example on PascalVoc pretrained dataset. I run all required scripts in fastrcnn and get the visual output for the test data defined in dataset. Now I want to test network on my own image, how can I do that?
For Fast R-CNN you need a library that generates candidate ROIs (regions of interest) for your test images, e.g. selective search.
If you want to evaluate a batch of images you can follow the description in the tutorial to generate the test mapping file and the ROI coordinates (see test.txt and test.rois.txt in the corresponding proc sub folder). If you want to evaluate a single you would need to pass the image and the candidate ROI coordinates as inputs to cntk eval, similar to this example:
# compute model output
arguments = {loaded_model.arguments[0]: [hwc_format]}
output = loaded_model.eval(arguments)
For FastRCNN you need to first run your custom image through Selective Search algorithm to generate ROIs (regions of interest) and then feed it to your model with sth like this:
output = frcn_eval.eval({image_input: image_file, roi_proposals: roi_proposals})
You can find more details here: https://github.com/Microsoft/CNTK/tree/release/latest/Examples/Image/Detection/FastRCNN
Anyway FastRCNN is not the most efficient way to do it because of usage of Selective Search (which is a real bottleneck here). If you want to improve the performance you can try FasterRCNN as it gets rid of SS algorithm and replaces it with Region Proposal Network which performs much, much better.
If you're interested, you can check my repo on GitHub: https://github.com/karolzak/CNTK-Hotel-pictures-classificator

Gaussian Mixture Modelling Matlab

Im using the Gaussian Mixture Model to estimate loglikelihood function(the parameters are estimated by the EM algorithm)Im using Matlab...my data is of the size:17991402*1...17991402 data points of one dimension:
When I run gmdistribution.fit(X,2) I get the desired output
But when I run gmdistribution.fit(X,k) for k>2....the code crashes and I get the error"OUT OF MEMORY"..I have also tried an open source code which again gives me the same problem.Can someone help me out here?..Im basically looking for a code which will allow me to use different number of components on such a large dataset.
Thanks!!!
Is it possible for you to decrease the iteration time? The default is 100.
OPTIONS = statset('MaxIter',50,'Display','final','TolFun',1e-6)
gmdistribution.fit(X,3,OPTIONS)
Or you may consider under-sampling the original data.
A general solution to out of memory problem is described in this document.

trainning neural network

I have a picture.1200*1175 pixel.I want to train a net(mlp or hopfield) to learn a specific part of it(201*111pixel) to save its weight to use in a new net(with the same previous feature)only without train it to find that specific part.now there are this questions :what kind of nets is useful;mlp or hopfield,if mlp;the number of hidden layers;the trainlm function is unuseful because "out of memory" error.I convert the picture to a binary image,is it useful?
What exactly do you need the solution to do? Find an object with an image (like "Where's Waldo"?). Will the target object always be the same size and orientation? Might it look different because of lighting changes?
If you just need to find a fixed pattern of pixels within a larger image, I suggest using a straightforward correlation measure, such as crosscorrelation to find it efficiently.
If you need to contend with any of the issues mentioned above, then there are two basic solutions: 1. Build a model using examples of the object in different poses, scalings, etc. so that the model will recognize any of them, or 2. Develop a way to normalize the patch of pixels being examined, to minimize the effect of those distortions (like Hu's invariant moments). If nothing else, yuo'll want to perform some sort of data reduction to get the number of inputs down. Technically, you could also try a model which is invariant to rotations, etc., but I don't know how well those work. I suspect that they are more tempermental than traditional approaches.
I found AdaBoost to be helpful in picking out only important bits of an image. That, and resizing the image to something very tiny (like 40x30) using a Gaussian filter will speed it up and put weight on more of an area of the photo rather than on a tiny insignificant pixel.