How to remove instance or row with missing values in Python Script in Orange Data Mining? - orange

I want to remove an instance or row with missing values.
It's so simple to do it by using Impute widget, but now I want to do it in Python Script Widget.
How do I do this?

Write this in Python Script widget:
import numpy as np
from Orange.preprocess import impute
drop_instances = impute.DropInstances()
var = in_data.domain.attributes[0] # choose a variable you wanna check
mask = drop_instances(in_data, var)
out_data = in_data[[np.logical_not(mask)]]
If you need more information, you are welcome to comment a question below!

Related

Paraview - get current time_index in ProgrammableSource

I have an array that has exactly as many rows as there are time steps in the animation. Now I want to have the row associated to the current time step as vtkTable as output of a ProgrammableSource. My code (for the ProgrammableSource) so far looks like this:
import numpy as np
file = "table.csv"
tbl = np.genfromtxt(file, names=True, delimiter=",", autostrip=True)
for n in tbl.dtype.names:
v = tbl[n][2]
output.RowData.append(v, n)
Which currently always writes out the third line (v = tbl[n][2]). Is there a way to pass the current time step (index) in place of [2]?
The best is to make your Programmable Source timesteps aware.
So you should fill the Information Script (advanced property) to declare the list of available timesteps, and then in the main script, get current timestep and output the data you want.
See this example from the doc.

selecting a range of colums in SKlearn column transformer

I am encoding catagorical data, many columns need to be seletced, I have typed them in individually and it works ok but there is obviouly a more elegant way.
dataset =pd.read_csv('train.csv')
x = dataset.iloc[:,:-1].values
y = dataset.iloc[:, -1].values
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(),[2,5,6,7,8,9,10,11,12,13,14,15,16,21,22,23,24,25,27,28,29,30,31,32,33,34,35,39,40,41,42,53,54,55,56,57,58,60,63,64,65,72,73,74,78,79])], remainder='passthrough')
x = np.array(ct.fit_transform(x))
I have tried using (23:34) I have tried using slice but that does not work as it is not that data type.
Which method should I use for selecting a range of columns?
Also what datatype is it at this point were I am selecting the columns?
I made a search I an not able to see a solution for this exact question.
Finally, is this an effecient way to encode catagorical data or should I be looking at an alternative method?
Thanks!
you can use the following workaround:
ct = ColumnTransformer(
transformers=[
("ordinal_enc", OrdinalEncoder(), data.loc[:, "col1":"col100"].columns)
])

How to writeback to dataframe using transform_df in palantir foundry?

I created a library for updating description of the columns of the input dataset. This function takes three parameter as input (input_dataset, output_dataset, config file) and eventually writes back the description of output dataset. So now we want to import this library across various use cases. How to go for those cases where we are writing spark transformation i.e taking inputs through transform_df because here we can't assign output to output variable. In that situation how can i call my description library function? How to proceed in those situation in palantir foundry. Any suggestions?
This method isn't currently supported using the #transform_df decorator; you'll have to use the #transform decorator at the moment.
The reasoning behind this resulted from recognizing the need for broader access to metadata APIs like the #transform decorator already allows. Thus it seemed more in line with this pattern to keep it there since the #transform_df decorator is inherently higher-level.
You can always simply move over your transformations from...
from transforms.api import transform_df, Input, Output
#transform_df(
Output("/my/output"),
my_input("/my/input"),
)
def my_compute_function(my_input):
df = my_input
# ... logic ....
return my_input
...to...
from transforms.api import transform, Input, Output
#transform(
my_output=Output("/my/output"),
my_input=Input("/my/input")
)
def my_compute_function(my_input, my_output):
df = my_input.dataframe()
# ... logic ....
my_output.write_dataframe(df)
...in which only 6 lines of code need be changed.

In Spyder, is there some way to plot a dataframe by right-clicking it like you can with arrays?

I've just recently discovered that you can right-click an array in Spyder and get a quick plot of the data. With sample data like this:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Some numbers in a data frame
nsample = 440
x1 = np.linspace(0, 100, nsample)
y = np.sin(x1)
dates = pd.date_range(pd.datetime(2016, 1, 1).strftime('%Y-%m-%d'), periods=nsample).tolist()
df = pd.DataFrame({'dates':dates, 'x1':x1, 'y':y})
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
you can go to the Variable explorer, right-click y and get the following directly in the console:
which will give you this:
The same option does not seem to be available to a pandas dataframe:
Sure, you could easily go for df.plot():
But I really like the right-click option to check whether the variables and dataframes look the way I expect them to when I'm messing around with a lot of data. So, is there any library I'd have to import? Or maybe something in the settings? I've also noticed that what happens in the console is this little piece of magic: %varexp --plot y, but can't seem to find an equivalent for data frames.
Thank you for any suggestions!
(Spyder developer here) This is just a bit of missing functionality for Dataframes, but it's very easy to implement.
Please open an issue in our issue tracker, so we don't forget to do it in a future release.

How to use python-defined variables in javascript code within ipython notebook?

Say I made complex numeric calculations with Scipy factories, using the Ipython notebook. Now, I want to call variables resulting from calculations with Scipy from code in Javascript (still within IPYNB).
Below is a simplistic illustration of what I am willing to accomplish:
# Get a vector of 4 normal random numbers using numpy - the variable 'rnd'
import numpy as np
mu, sig = 0.05, 0.2
rnd = np.random.normal(loc=mu, scale=sig, size=4)
Now, I want to use the variable rnd above in Javascript, for illustrative purpose:
%%javascript
element.append(rnd);
The lines above returns a message error: ReferenceError: rnd is not defined.
Then, how can one use a python variable in javascript code within the Ipython Notebook?
It may not be possible to do this with the %%Javascript cell magic. However you can use the IPython.display.Javascript(...) method to inject Python strings into the browser
output area. Here's a modification of your code fragment that seems to answer your question.
from IPython.display import Javascript
import numpy as np
mu, sig = 0.05, 0.2
rnd = np.random.normal(loc=mu, scale=sig, size=4)
## Now, I want to use the variable rnd above in Javascript, for illustrative purpose:
javascript = 'element.append("{}");'.format(str(rnd))
Javascript(javascript)
Paste this code into an input cell and each time you execute the cell a new and different array of
random numbers will be displayed in the output cell.
(Code was tested with IPython version 2.2)
Arbitrary Python (including retrieving the value of variables) can be executed from the JavaScript side of things in IPython, although it is a bit messy. The following code works for me in IPython 3.1 and Python 2.7:
%%javascript
IPython.notebook.kernel.execute(
"<PYTHON CODE TO EXECUTE HERE>",
{
iopub: {
output: function(response) {
// Print the return value of the Python code to the console
console.log(response.content.data["text/plain"]);
}
}
},
{
silent: false,
store_history: false,
stop_on_error: true
}
)