I'm new to Apache Kafka building some applications and i get stuck dealing with a specific problem.
I'll try my best do explain my use case.
I have an external application, that is a kind of ticket manager and would like to pull data from them. They have a paginated REST API where i can get ticket data by a client. I would like to loop through this API 'till the last page and send it to Kafka, where my Sink Connectors would send them to three DBs.
Q) My best option is to create some kind of python script to get data and /POST/ them to Kafka REST Proxy API?
I don't think you have really any good option here.
Pages imply ordering; if you have N pages and attempt to send N requests, then your producer request could fail and retry any one of them, causing loss of information and random order
Two options to fix that
send "page count" and "current page" along with each message and reshuffle data at some downstream system
don't produce any message until you iterate all pages, keeping in mind that Kafka has a maximum request size
Problem with either approach - what happens if another page is added to the API while you're producing or writing to the database? Or if they change? How will you detect which pages you might need to request again/overwrite?
POST them to Kafka REST Proxy API?
If the REST Proxy is the only way you're able to get data into the cluster, then sure, but it'll be less performant than a producer
Related
I am planning to develop an application where the request can be made by using a REST client (SpringBoot REST), eg to get the list of various services available based on a geographical code.
I am publishing this request as a message to Kafka. There is a consumer (Python code) that listens to this message, and on the arrival of the message it looks up various stores (NoSQL/HDFS), and get the list of services.
I've got this concept working up to the point where I have received the request, consumed it, and produce the results.
Where am not sure is how can the RestController/RestService know when the aggregation is completed and the results are ready, so that it can send the response back with the total size back to the browser?
Any ideas suggestions on this, please?
The vendor publishes frequent Async responses to Kafka Topic which resides in the vendor DC.
We need to get that info into our company.
What we don't want to do is:
to write a (polling) Kafka consumer service to read off their Kafka Topic
We want them to call our Rest API (callback url) to publish the information.
Are there any options to configure a trigger on (their) Kafka topic to call a (external) REST API as and when a message is written into the topic?
We would like them to call our API so we can route it thru our API Gateway and handle all the crosscutting concerns.
If you want them to call your service, or have them write a secondary event to anywhere, then you need to ask them to do that... No real alternatives there.
If you have access to their Kafka service, I see no reason why embedding a consumer into your API would be an issue.
Or you could use MirrorMaker/Replicator to copy their Kafka to your own local Kafka, but you still need some consumer to get data over to a REST action.
assume I have an API that gives a JSON response that return an id and a name.
In a mobile application normally I would make an http GET response to get this data in a one time connection with the server and display the results in the app, however if the data changes over time and I want to keep listening to this data whenever it changes how is that possible ?
I have read about sockets and seen the socket_io_client and socket_io packages, but I did not get my head around it yet, is using sockets the only way to achieve this scenario ? or is it possible to do it in a normal http request ?
Thanks for your time
What you need is not an API but a Webhook:
An API can be used from an app to communicate with myapi.com. Through that communication, the API can List, Create, Edit or Delete items. The API needs to be given instructions, though.
Webhooks, on the other hand, are automated calls from myapi.com to an app. Those calls are triggered when a specific event happens on myapi.com. For example, if a new user signs up on myapi.com, the automated call may be configured to ask the app to add a new item to a list.
is using sockets the only way to achieve this scenario ? or is it possible to do it in a normal http request ?
Sockets is only one of the ways to achieve your goal. It is possible to do it using a normal http request. Here, for example, the docs explain how to update data over the internet using HTTP.
From the flutter docs:
In addition to normal HTTP requests, you can connect to servers using WebSockets. WebSockets allow for two-way communication with a server without polling.
You'll find what you need under the networking section.
You should also take a look at the Stream and StreamBuilder classes.
I don't mind if you use an example from another API that is not Adobe Analytics'. I just need to know the pattern that I have to follow in order to succesfully convert a Postman request into a NiFi request.
After successfully creating requests to pull reports from Adobe Analytics via Postman, I´m having difficulties to migrate these Postman requests to NiFi. I haven´t been able to find concrete use cases that explicity explain how to do this kind of task step-by-step.
I'm trying to build a backend on top of NiFi to handle multiple data extracts from Adobe Analytics in an efficient and robust way. That is instead of having to create all required scripts by myself. Yet, there is more documentation about REST APIs and Postman cases than there is about REST APIs and NiFi cases.
In the screenshot below we can see how the Postman request looks like. It takes 3 headers and 1 temporary header that includes the authorization value (Bearer token). This temporary header is generated automatically after filling in the OAuth 2.0 authorization form in the Authorization tab, as shown here.
Then, we have the body of the request. This json text is generated automatically by debugging Adobe Analytics' workspaces as shown here.
I'd like to know the following in a step-by-step manner with screenshots if possible:
Which processor(s) should I use in NiFi to obtain a similar response as the one I got in Postman?
Which properties should I add/remove from the processor to make this work?
How should I name these properties?
Is there a default property whose value/name I should modify?
As you can see, the question mainly refers to properties setup in NiFi, as well as Processor selection. I already tried to configure some processors but I don't seem to get the correct properties setup, or maybe I'm selecting the wrong processors.
I'm using NiFi v1.6.0 and Postman v7.8.0
This is most likely an easy task for users already familiar with NiFi and API requests, but it has proven challenging to me. Hopefully this will help other users looking to build more robust pipelines by using NiFi instead of doing it manually.
Thanks.
It only takes 3 NiFi processors to replicate a REST API request that works in Postman. In this solution we use a request that contains a nested JSON request. The advantage of this simple approach is that it reduces the amount of configuration required to obtain a successful response from the API. That is, even if you are using a complex JSON request. In this case the body of the JSON request is passed through the GenerateFlowFile processor, without the need of any other processor to parse/format the request.
Step #1. Create a processor called GenerateFlowFile. The only property that you will have to modify is the Custom Text. Paste in there your whole JSON request just as it was in Postman. In this case I'm using the very same JSON shown in the question above. It's a good idea to setup Yield Duration to 10 seconds or more.
Step #2. Create a processor called InvokeHTTP. Then modify the 6 properties shown in the screenshots below. Use the same Authorization details you've used in Postman. Make sure to copy the Bearer token from Postman after it has been tested. Also, don't forget to setup the HTTP Method, Remote URL and Content-Type as well.
Step #3. Finally, add a couple of LogAttribute processors to store the output of InvokeHTTP. One of these LogAttribute processors should store successful responses. The other one can be used for Failure, Original, Retry and No-Retry. Or you can create LogAttribute for each of these outputs.
Step #4. Now, connect the processors and Start your data flow! You should start seeing data populate the Successful LogAttribute. Then you can use the Data Provenance option to review the incoming data and confirm that this is exactly the same result you previously obtained from Postman.
Note: This is a simple, straightforward, "for starters" solution to replicate a Postman API request using a nested static JSON. There are more solutions in StackOverflow that tackle more complex cases, like dynamic JSON. Here's a list of some other posts:
nifi invokehttp post complex json
In NiFi processor 'InvokeHTTP' where do you write body of POST request?
Configuring HTTP POST request from Nifi
i want to clear 'response' queue and any other queues if processor is stopped because of failure( i stop it with 'template' which works similar to rest api) .
I have read this:https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
but i have no idea how can i use it to fullfile my idea.
I mean it would be perfect if i can clear response queue ,in case, i
have at least 1 flowfile in failure queue. is it possible?
Can i use Put request for deleting queues i mean is there any ,state for flowfiles in queues to set it as an empty or deleted?
Using your browser's Developer Tools window, use the UI to clear a queue while monitoring the network tab. Everything the Apache NiFi UI does is performed via the REST API. You will be able to see exactly what requests are sent to the server to clear the connection queue and can recreate that programmatically.
The specific API endpoint you want in this case is POST /flowfile-queues/{id}/drop-requests where {id} is the connection ID.