Adding Input Operator Dynamically to a running Apache Apex application - apache-kafka

Is it possible to add input operator for different source in the running Apex application?
For example: In an production environment, I am running an Apex application to read the text-file from input source and I want to add Kafka source with its input operator to the same DAG.

Priyanshu,
You can have multiple input operators. Just add kafka input operator to you dag.
http://docs.datatorrent.com/library_operators/
Amol

Related

How to input csv file data (imported as data sets) to test drools rule?

I want to test my rules in drools. I have used Postman and Java client to test for single row of values and it shows my applied rules are working. However I want to use an entire csv file and check the rules for each row. Is there any way to achieve this in drool ?

Passing a value back TO ADF from a databricks Python node

In Azure DF I run python code using a Databricks.Python node. What I am trying to do is return a value from the Python script (e.g. print(value_to_return)) and use this to set the value of a variable which will be used by another node (web).
Is there anyway of returning a value form a python script that is run from an ADF pipeline. I know I can do it using the Databricks.NoteBook node and adding dbutils.notebook.exit() at the end, but I am really trying to achieve this using the the Python node.
The repeating solution I see is to write the value to a file/db table and then read it back and set the variable value that way.
The answer is no. You can only call a Python script, it cannot return a custom value to the output of the Python node in ADF to be used as an input to another node in the pipeline.

Apache Flink dump to multiple files by key (group)

I'm doing some processing on data, and I want to dump the processed data to multiple files, based on the group.
Example of the data:
A,123
B,200
A,400
B,400
So my desired output is:
file 1:
A,123
A,400
file 2:
B,200
B,400
(The number of files is based on the number of groups).
So basically a simple code for exampleData:
exampleData.groupBy(0).sortGroup(1, Order.ASCENDING)
The type now is GroupedDataSet. I want to output each groupedDataSet to a different CSV. How can I do this? I tried using reduceGroup, so I can work with each group individually, but I couldn't make it to work.
I'm using Scala version 2.11.12, and Flink version 1.11.0

Capture Console output of Spark Action Node in Oozie as variable across the Oozie Workflow

Is there a way to capture the console output of a spark job in Oozie? I want to use the specific printed value in the next action node after the spark job.
I was thinking that I could have maybe used the ${wf:actionData("action-id")["Variable"]} but it seems that oozie does not have the capability to capture output from a spark action node unlike in the Shell action you could just use echo "var=12345" and then invoke the wf:actionData in oozie to be used as an Oozie Variable across the workflow.
I want to achieve that because I want to print the possible number of records processed and store that as an oozie variable and use that to the next action nodes in the workflow without doing any functionalities that requires you to store that data outside of the workflow like saving them in a table or storing them as a system variable via the implementing them inside the Spark Scala Program.
Any help would be thoroughly appreciated since I'm still a novice spark programmer. Thank you very much.
As Spark action does not support capture-output, you'll have to write the data into a file to HDFS.
This post explains how to do that from Spark.

Jmeter - Can I change a variable halfway through runtime?

I am fairly new to JMeter so please bear with me.
I need to understand whist running a JMeter script I can change the variable with the details of "DB1" to then look at "DB2".
Reason for this is I want to throw load at a MongoDB and then switch to another DB at a certain time. (hotdb/colddb)
The easiest way is just defining 2 MongoDB Source Config elements pointing to separate database instances and give them 2 different MongoDB Source names.
Then in your script you will be able to manipulate MongoDB Source parameter value in the MongoDB Script test element or in JSR223 Samplers so your queries will hit either hotdb or colddb
See How to Load Test MongoDB with JMeter article for detailed information
How about reading the value from a file in a beanshell/javascript sampler each iteration and storing in a variable, then editing/saving the file when you want to switch? It's ugly but it would work.