Creating Custom Jupyter Widgets - jupyter

I'm trying to create a custom jupyter widget that takes a pandas.dataframe as an input and simply renders a modified html version of the dataframe as an output. I'm stuck at the start in terms of defining a dataframe as the input for the widget
I have tried to follow the online examples and I think I would be fine with most string inputs to a widget, but I'm lost when trying a dataframe as an input
I just like to be able to pass a dataframe into my custom widget and validate that is is a dataframe

You can do this using jp_proxy_widget. In fact it is almost implemented in this notebook:
https://nbviewer.jupyter.org/github/AaronWatters/jp_doodle/blob/master/notebooks/misc/In%20place%20html%20table%20update%20demo.ipynb
The implementation is more complex than you requested because it supports
in-place updates of the table.
Please see https://github.com/AaronWatters/jp_proxy_widget
The example notebook is from https://github.com/AaronWatters/jp_doodle

Related

How to update dataframe without using loop

I've two source dataframes:
Storeorder: {columns=Store, Type_of_carriers, No_of_carriers, Total_space_required}
Fleetplanner: {columns=Store, Truck_Type, Truck_space, Route}
Requirement is:
Create list with {Store, Type_of_carriers, No_of_carriers, Route}
In Fleetplanner data, one Store can have more than one Truck_type and
Route. Also one Route can have multiple Stores or stops associate.
Each time I take a record from Storeorder, I've to assign how many carriers will go to which route.
At the same time I've to update Fleetplanner data with the space left for next stores.
This I've done in Pandas using loop and it is taking huge time.
Can anyone please suggest how to resolve this problem in alternate way in Spark?
I've solved the problem using Pandas, but want to parallelize in Spark
Described

Custom UDAF not working ( Ksql: Confluent)

I am facing issues while creating custom UDAF in Ksql. Use case is to find "first" and "last" value of a column in a tumbling window. There is no such built in UDAF (https://docs.confluent.io/current/ksql/docs/syntax-reference.html#aggregate-functions) so I am trying to create custom UDAF.
I performed following steps based on this document https://www.confluent.io/blog/write-user-defined-function-udf-ksql/
i. created UDAF & AggregateFunctionFactory and registered it in FunctionRegistry as follows:
addAggregateFunctionFactory(new MyAggFunctionFactory());
ii.Build ksql-engine jar and replaced the same in confluent package at following path $CONFLUENT_HOME/share/java/ksql.
iii.Restarted ksql-server
However, it seems that function is not registered. Any suggestions?
Confluent Version: 4.1.0
Note: I tried creating simple UDF .That works well. Issue is with UDAF
Issue was that I was naming the function as 'First' which seems to be some keyword. Changed the function name , it worked

To achieve an output from input in datastage tool

I have an input file with data
GGN,IBM
BNGLR, IBM
GGN,HCL
NOIDA,HCL
BNGLR,HCL
I want output like
IBM,GGN,BNGLR
HCL,GGN,NOIDA,BNGLR
using datastage tool.
Thanks in advance
You've not given us much details to work with, so I'm making a few assumptions here on the job you're using (server/parallel) and your DataStage version. In the job design I've considered the name of the first of your columns to be "Value" and the second to be "Key".
Here is a basic job design, notice the partitioning: Job design image
Here is the first transformer setup. I know it's inneficient to add a second transformer just for a trim, but a limitation of the LastRowInGroup() function is that it can only accept columns as params. So transforms to the column it uses must be done before it's passed in the function: first transformer image
Here is the second transformer setup. The stage variable order matters, don't forget the constraint: Second transformer image
In the second transformer, be sure to set the partitioning and constraint as detailed in the picture: second transformer properties image
Your output data will look like this: output stage data image
Hope that helps and is clear, look through the images closely. I'm using images as they speak more than words.
Regards,
Sam Gribble
#InforgeAcademy

Input two datasets into Orange Python Script widget?

Is there a way to input two data tables (test & train sets) into Orange3's Python Script widget?
Yes, since Orange 3.6.0 the Python Script widget accepts multiple inputs of the same type: connect multiple data files to the widget's "Data" input, and access them through the "in_datas" variable (it will be a list of data tables).

Eclipse IDE: How to create a view of a subset of my code's variables when debugging?

I'm using Eclipse (Neon.3 Release (4.6.3)) with PyDev plugin for Python.
The code I'm debugging has a large number of variables, many of which are nested within other variables. I'd like to select a subset of these variables to be included in a separate view so I can bypass having to drill down into the variables at each step, which is often a tedious process.
The primary data structure being used is a pandas DataFrame containing numerous columns, and I typically need to see only a small portion of the values from a few of the DataFrame's columns.
For example, let's say I have a DataFrame 'df' with a column named 'X'. Whenever I debug this code I want to see the values of df.X between indices i and j (i.e. df.X[i:j+1]). i and j may change from time to time as they're also variables in the code but not in 'df'. So how can I create a streamlined tab/view of variables that only includes df.X._values[i:j+1], preferably separate from the standard Variables view?
Thanks in advance for any suggestions or feedback in general.
This can be done by using the 'Expressions' view within the Debug perspective.
For the example in the question above I can add the following expression to see only what I want:
list(df.X._values[i:j+1])