Conditional routing in Apache NiFi - apache-kafka

I'm using NiFi to get data from an Oracle database and put some of this data in Kafka (using the processor PutKafka).
Example : if the attribute "id" contains "aaabb"
Is that possible in Apache NiFi? How can i do it?

This should definitely be possible, the flow might be something like this...
1) ExecuteSQL or QueryDatabaseTable to get the data from the database, these produce Avro
2) ConvertAvroToJson processor to convert the Avro to Json
3) EvaluateJsonPath to extract the id field into an attribute
4) RouteOnAttribute to route flow files where the id attribute contains "aaabbb"
5) PutKafka to deliver any of the matching results from RouteOnAttribute

To add on to Bryan's example flow, I wanted to point you to some great documentation that should help introduce you to Apache NiFi.
Firstly, I would suggest checking out the NiFi documentation. It is very good and should help a lot. In addition to providing details on each of the processors Bryan mentioned it also has general documentation for every type of user.
For a basic introduction to build a NiFi flow check out this video.
For example templates check out this repo. It's a has an excel file at it's root level which has a description and list of processors for each template.

Related

What are the advantages of the Confluent's Kakfa Avro serializers?

I can't seem to find clear in the docs what are the advantages of using AvroKafkaSerializer (that have schema support) vs serializing the object "manually" in code and sending them as bytes/string ?
Maybe schema check when producing a new message? What are the others ?
A message schema is a contract between a group of client applications producing and consuming messages. Schema validation is required when you have many independent applications that need to agree on a specific format, in order to exchange messages reliably.
If you also add a Schema Registry into the picture, then you don't need to include the schema in all your services or every single message, but you will get it from the common registry, with the additional support of schema evolution and validation rules (i.e. backward compatibility, versioning, syntax validation). It is one of the fundamental components in event driven architectures (EDA).

Dynamic routing to IO sink in Apache Beam

Looking at the example for ClickHouseIO for Apache Beam the name of the output table is hard coded:
pipeline
.apply(...)
.apply(
ClickHouseIO.<POJO>write("jdbc:clickhouse:localhost:8123/default", "my_table"));
Is there a way to dynamically route a record to a table based on its content?
I.e. if the record contains table=1, it is routed to my_table_1, table=2 to my_table_2 etc.
Unfortunately the ClickHouseIO is still in development does not support this. The BigQueryIO does support Dynamic Destinations, so it is possible with Beam.
The limitation in the current ClickHouseIO is around transforming data to match the destination table schema. As a workaround, if your destination tables are known at pipeline creation time you could create a ClickHouseIO per table, then use the data to route to the correct instance of the IO.
You might want to file a feature request in the Beam bug tracker for this.

PDI Metadata Injection for JSON Input

Actually, I need to create a transformation which will read the JSON file from the system directory and rename the JSON fields(keys) based on the metadata inputs. Finally, write the modified JSON into '.js' file using JSON output step. This conversion must be done using the ETL Metadata Injection step.
Since I am new to Pentaho Data Integration tool, can anyone help me with the sample '.ktr' files for the above scenario.
Thanks in advance.
The same use case is on the Pentaho official documentation here, except it does it with Excel files rather than JSON objects.
Now, the Metadata Injection Step requires the development of a rather sophisticated machinery. And json, it is rather simple to build with a simple javascript. So, where do you get the "dictionary" (source field name -> target field name) from?

Load multiple records into marklogic server

How can I upload multiple records in a file into marklogic server using RESTapi.
I tried to insert simple json format file
[{"Id":100000,"Name":"Dennis"},
{"Id":100001,"Name":"Andrea"},
{"Id":100002,"Name":"Robert"},
{"Id":100003,"Name":"Sara"}]
But, it gives me like one single record.
How do I convert this into 4 different records?
Thanks in advance,
Y.Prithvi
There isn't an out-of-the-box way to do that split at the moment. Your best bet is to do a client-side split and then do a bulk-write POST with multiple JSON items to /v1/documents
For the client-side split, you might use something like underscore_cli to do the splitting.
As Dave points out, the easiest approach is to split out the documents on the client and send a multipart/mixed payload.
The alternative is to write a resource service extension to do the split. In MarkLogic 7, the service must be implemented in XQuery. In MarkLogic 8, you will also be able to implement a service in JavaScript.
The Java API bundles an example that illustrates the basic idea of a service that splits documents:
scripts/docsplit.xqy
com.marklogic.client.example.extension.DocumentSplitter

Apache Camel and Drools Fusion Integration

Has anyone tried integrating Apache Camel with Drools Fusion or just Drools.
Following is my use case.
Get data from an external service using REST.
Filter the data (using rules defined in Drools.)
The data from the external service could also be a stream of information (e.g., Tweeter feed, real-time location of a user)
Any help or pointers would be appreciated.
Thanks.
Drools has a camel component. Using it is not much different than using any camel component.
source: https://github.com/droolsjbpm/droolsjbpm-integration/tree/master/drools-camel
binary (in the droolsjbpm-integration bundle): http://www.jboss.org/drools/downloads.html
The only thing to be "aware" of is that Drools can treat camel messages as:
commands
regular facts
as-is objects and re-route then
Some articles:
http://blog.athico.com/search?q=camel
Documentation unfortunately only describes the "command" (1) use case:
http://docs.jboss.org/drools/release/5.4.0.Beta2/droolsjbpm-integration-docs/html/ch01.html
Some test cases you can use as examples for the use cases (2) and (3) above:
https://github.com/droolsjbpm/droolsjbpm-integration/tree/master/drools-camel/src/test/java/org/drools/camel/component
Hope this helps.