Kafka Connect Filter Transformer - apache-kafka

I am using a Kafka-connect, and I want to filter out some messages.
This is how a single message in Kafka looks like.
{
"code": 2001,
"time": 11111111,
"requestId": "123456789",
"info": [
{
"name": "dan",
"value": 21
}
]
}
And this is how my transformer looks like
transforms:
transforms: FilterByCode
transforms.FilterByCode.type: io.confluent.connect.transforms.Filter$Value
transforms.FilterByCode.filter.condition: $[?(#.code == 2000)]
transforms.FilterByCode.filter.type: include
I want to filter out the messages that their code value is not 2000.
I tried to few different syntax for the condition, but could not find any that works.
All the messages are transformed, and no one is filtered.
Any ideas on how I should use the syntax for the filtering condition?

If you want to filter out all non 2000 codes, try [?(#.code != 2000)]
You may test it here - http://jsonpath.herokuapp.com/

Related

REST related/nested objects URL standard

if /wallet returns a list a wallets and each wallet has a list of transactions. What is the standard OpenAPI/REST standard?
For example,
http://localhost:8000/api/wallets/ gives me
{
"count": 1,
"next": null,
"previous": null,
"results": [
{
"user": 1,
"address": "3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd",
"balance": "2627199.00000000"
}
]
}
http://localhost:8000/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd/ gives me
{
"user": 1,
"address": "3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd",
"balance": "2627199.00000000"
}
If I wanted to add a transactions list, what is the standard way to form this?
http://localhost:8000/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd/transactions ?
http://localhost:8000/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd/transactions?offset=100 for pagination
REST doesn't care what spelling conventions you use for your resources. What it would instead expect is that you have representations of links between resources, along with meta data describing the nature of the link.
So this schema is fine.
/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd
/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd/transactions
And this schema is also fine.
/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd
/api/transactions/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd
As far as I can tell, OpenAPI also gives you the freedom to design your resource model the way that works best for you (it just tells you one possible way to document the resource model you have selected).

Why does Data Flow Sink Cache not have all Data Preview results?

I'm seeing a significant discrepancy in Data Flow results when using a Cache Sink vs a Data Set Sink. I recreated a simple example to demonstrate.
I uploaded a simple JSON file to Azure Data Lake Storage Gen 2:
{
"data": [
{
"id": 123,
"name": "ABC"
},
{
"id": 456,
"name": "DEF"
},
{
"id": 789,
"name": "GHI"
}
]
}
I created a simple Data Flow that loads this JSON file, flattens it out, then returns it via a Sink. I'm primarily interested in using a Cache Sink because the output is small and I will ultimately need the output for the next pipeline step. (Write to activity output is checked.)
You can see that the Data Preview shows all 3 rows. (I have two sinks in this example simply because I'm illustrating that these do not match.)
Next, I create a pipeline to run the data flow:
Now, when I debug it, the Data Flow output only shows 1 record:
"output": {
"TestCacheSink": {
"value": [
{
"id": 123,
"name": "ABC"
}
],
"count": 1
}
},
However, the second Data Set Sink contains all 3 records:
{"id":123,"name":"ABC"}
{"id":456,"name":"DEF"}
{"id":789,"name":"GHI"}
I expect that the output from the Cache Sink would also have 3 records. Why is there a discrepancy?
When you choose cache as a sink, you will not be allowed to use logging. You see the below error during validation before debug.
To fix which, when you select "none" for logging, it automatically checks "first row only" property! This is causing it to write only the first row to cache sink. You just have to manually uncheck it before running debug.
Here is how it looks...

How to restrict Resources count in GET request in AzureRM

How to filter the resources count in GET Request in AzureRM ?
For an example in List Virtual Machines in an subscription . We get all the Vm's running in an account .
But I need to get 10 VM's alone in ascending or any sorting order . Is there any filter available like that ?
If the sorting order does not matter for you, you can filter the resource count for the top 10 VMs in the GET request below:
I've tried the requests below and tweak resource count for filtering and they all worked as expected.
https://management.azure.com/subscriptions/{subscriptionId}/resources?$filter=resourceType eq 'Microsoft.Compute/virtualmachines'&$top=10&api-version={apiVersion}
Sample response is like below:
{
"value": [
{
"id": "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.Compute/virtualMachines/{vm}",
"name": "{vm}",
"type": "Microsoft.Compute/virtualMachines",
"location": "{location}"
},
{
"id": "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.Compute/virtualMachines/{vm}",
"name": "{vm}",
"type": "Microsoft.Compute/virtualMachines",
"location": "{location}"
}
]
}
Hope this helps.
You could use the following API.
https://management.azure.com/subscriptions/**********/providers/Microsoft.Compute/virtualmachines?api-version=2017-12-01&top=10
Using $top=10 to filter the top 10 result. See this example.

How to query Cloudant documents for a specific field

I am new to Cloudant and I need to do a simple query on a specific document field. My documents have the following structure and I need to get only the documents with status=SIGNED
{
"_id": "3ddb4058f3b24a7a9c585f997e30ff78",
"_rev": "3-757c82c48f4e7c333911be6859aff74e",
"fileName": "Generali Architects",
"status": "SIGNED",
"user": "italy",
"_attachments": {
"Generali Architects": {
"content_type": "application/pdf",
"revpos": 3,
"digest": "md5-9hqSif7CzQ2yvKxSSbj+dw==",
"length": 323653,
"stub": true
}
}
}
Reading Cloudant documentation I created the following Design Document with a related view which returns exactly what I expected
Then from my Java application I use the following code
String cloudantView = "_design/signedDocs/status-signed-iew";
List<SignDocDocument> docs =
db.view(cloudantView).includeDocs(true).query(SignDocDocument.class);
which always returns me "org.lightcouch.NoDocumentException: Object Not Found"
Any idea which kind of mistake I am making here?
Thank you very much
Is it the typo in "_design/signedDocs/status-signed-iew"; e.g. that should be "_design/signedDocs/status-signed-view"; (depending on how your java library works...).
Always worth checking the view by direct access in your browser, too, just to make sure it's returning the expected data.

CouchDB Group Level and Key Range

Can anyone explain to me why the following doesn't work:
Assuming the following document structure:
{
"_id": "520fb089a6cb538b1843cdf3cca39a15",
"_rev": "2-f96c27d19bf6cb10268d6d1c34799931",
"type": "nosql",
"location": "AZ",
"date": "2012/03/01 00:00:00",
"amount": 1500
}
And a Map function defined like so:
function(doc) {
var saleDate = new Date(doc.date);
emit([doc.location,saleDate.getFullYear(),saleDate.getMonth()+1],doc.amount);
}
And using the built in _sum function for the reducer.
When you execute this (with group=true) you get results like this:
{"rows":[
{"key":["AZ",2012,2],"value":224},
{"key":["AZ",2012,3],"value":1500},
{"key":["WA",2011,12],"value":1965},
{"key":["WA",2012,1],"value":358}
]}
Now if you change the query to something like this:
http://127.0.0.1:5984/default/_design/nosql/_view/nosql_test?group_level=2
You get results like this:
{"rows":[
{"key":["AZ",2012],"value":1724},
{"key":["WA",2011],"value":1965},
{"key":["WA",2012],"value":358}
]}
So with that in mind if I wanted to find out all sales in 2011 for "WA" could I not execute something like this:
http://127.0.0.1:5984/default/_design/nosql/_view/nosql_test?group_level=2&key=["WA",2011]
This example has been taken from the useful videos over at NoSQL tapes.
http://nosqltapes.com/video/understanding-mapreduce-with-mike-miller
You always need to give a range of keys, because filtering is done on map's results, not on reduce.
For example, the following parameters should work (if properly url-encoded):
?group_level=2&startkey=["WA",2011]&endkey=["WA",2011,{}]
You can read about view collation to understand how it works.