I have an Informatica BDM system (note Big Data Management, not Power Centre) and am having a problem with dropping connections when communicating with a third party web service. This fails the REST web service transformation which in turn kills our batch job.
Rather than fail the entire job, I would like each REST call to potentially be retried several times first.
I looked at the documentation, but I see no option to set a re-try on the REST Web Service Consumer transformation. Did I miss it? Or does one have to construct a re-try loop around it in some other way?
Bad News, But the Informatica BDM is yet to come up with an option to Retry. The need for the feature is already been raised by many users and may be the option will come up in the upcoming releases.
For now, we can only keep track of all the requests and responses manually and hope for the best.
Related
Is it proper programming practice/ software design to have a REST API call another REST API? If not what would be the recommended way of handling this scenario?
If I understand your question correctly, then YES, it is extremely common.
You are describing the following, I presume:
Client makes API call to Server-1, which in the process of servicing
this request, makes another request to API Server-2, takes the
response from Server-2, does some reformatting or data extraction, and
packages that up to respond back the the Client?
This sort of thing happens all the time. The downside to it, is that unless the connection between Server-1 and Server-2 is very low latency (e.g. they are on the same network), and the bandwidth used is small, then the Client will have to wait quite a while for the response. Obviously there can be caching between the two back-end servers to help mitigate this.
It is pretty much the same as Server-1 making a SQL query to a database in order to answer the request.
An alternative interpretation of your question might be that the Client is asking Server-1 to queue an operation that Server-2 would pick up and execute asynchronously. This also is very common (it's how Google crawls your website, for instance). This scenario would have Server-1 respond to Client immediately without needing to wait for the results of the operation undertaken by Server-2. A message queue or database table is usually used as an intermediary between servers in this case.
Another approach to that is make your REST API(1) store the request details to a queue table. Make a backend that will check that queue table every let's say 100milliseconds. That backend will be the one who will call the other REST API(2).
In your REST API(1) just create a loop that will check if the transaction on queue has been processed. If yes, get the process details and return it to client, if no, just keep on looping until process is done
I have a system that exposes a REST API with a rich set of CRUD endpoints to manage different resources.
The REST API is used also by a front-end application that executes calls by using Ajax.
I would like to make some of these calls asynchronous and add reliability.
The obvious choice seems a message broker (ActiveMQ, RabbitMQ, etc...).
Never used message brokers before and I am wondering if they can be "put in front of" the REST API without having to rewrite them.
I do not want to access the REST API only through the messaging system: for some endpoints, a call must always be synchronous and the reliability is less important (mainly because in case of error the user receives an immediate feedback).
Would a full ESB be a better option for this use case?
If I understand your question, you would like to "register" an API endpoint as a subscriber so that it could receive the messages sent to a given queue.
I do not think that a message broker can be configured to do this.
For example, if you want to use a message broker, both your producers and subscribers need to use the JMS API.
I do not know if a solution can be to implement a subscriber that will execute the corresponding API call. In this case, the reliability is compromised because the message will be dequeued before the API call is executed. It can make sense if the subscriber is running in the same process of the API, but in this case it is not clear why you should use a REST API instead of a library.
IMO #EligioEleuterioFontana you have a misunderstanding of the roles of:
an RESTful Api
a message broker
These are two different subsystems which provide different services.
Now, let's explain their roles with respect to your requirements:
You have clients (desktop browsers, mobile phone browsers or apps) which need to get/push data to your system. (Assumption from the REST API mention).
Requests from the clients are using HTTP/HTTPS (that's the REST API part of your requirement).
Any data that is pushed, you wish to make this more responsive, quicker, reliable.
If I've gotten that right, then I would answer it as:
All clients need to push requests to a REST API because this does just more than simple CRUD. The Api also handles things like security (authentication and authorization), caching, possibly even request throttling, etc.
REST API should always been the front end to clients as this also 'hides' the subsystems that the API uses. Users should never see/know about any of your subsystem choices (eg. what DB you are using. Are you caching? if so, with what? etc).
Message Brokers are great for offloading the work that was requested now and handling the work later. There's heaps of ways this can be done (queues or pub/sub, etc) but the point here is this is a decision the clients should never see or know about.
MB's are also great for resilience (as you noted). If something fails, the message on a queue would be re-attempted after 'x' time ... etc. (no, I'm not going to mention poison queues, dead letter queue, etc).
You can have some endpoints of the Api that are synchronous. Sure! Then have others that leverage some eventual consistency (i.e. for that request, I'll deal with it later (even if later in 5 secs later) and just return the response to the client saying "thanks! got it! I'll do it soon"). This is the asynchronous workflow you are after.
The API endpoints needs to be simple, concise and hopefully pretty stable. What you do behind the scenes as you change things hopefully will be hidden away from the clients. This includes the use of message brokers.
Anyway, that my take on how I see REST APIs and Message Brokers and how they related to each other.
It might be worth looking into the Google Cloud sub/pub? -
https://cloud.google.com/pubsub/docs/overview
Recently I worked on different client side APIs, such as HTTP ReST client, messaging client, and a database client.
In each case, the same concerns sprang up, which are the following:
Connection pooling
Async and non-blocking I/O with clean error handling
Request retrying with backoff policy implementation (this is more the case for ReST and messaging)
Request batching (this is more the case for databases)
The way I see it, the above concerns can be abstracted from the underlying request in a separate API. Furthermore, due to the complexity of coding the above concerns, it makes sense not to pay the cost multiple times.
Therefore, I would have expected to have a generic client helper API which would permit me to retry and batch any sort of request, all while performing all requests asynchronously.
It would be kind of a task executor API, but without the other complexities (such as scheduling, since there is only one task that needs to be executed).
Hence my question, or am I missing something?
I would say to keep them separate. My guess is that you'll find 3rd party solutions for each of these, but I don't know of any libraries that would do all three.
I'm not sure if your programming in Java, but I think the apache project has done a good job at segmenting utilities in their commons-* libraries. You may want to draw some inspiration from there.
https://commons.apache.org/
We need to send an XML messages between a point of sale system and a java webservice (outside of our network). the messages contain very sensitive data. The messaging has to be secure and transactional and highly available (24/7) with failover. The solution requires the developement of a broker that does the following:
Poll messages from the POS of system (3 types of messages)
do some transformation to the messages
forward part of the message to the java webservice
store part of the message in a database
notify the POS system of the result
Based on these somewhat simplified requirements, do you believe that Biztalk would be overkill? would MSMQ/WCF do the trick here?
Thank you for your help
Amine
IMO if you have the ability to receive and deliver messages asynchronously, then MSMQ (or other Message Oriented Middleware) would be an obvious choice for reliable, transactional transport, irrespective of the rest of the solution. MSMQ's journalling can also be used for audit and debugging purposes (but you will need a strategy for archiving the journal).
For the Polling, Routing, Mapping / Broker and Auditing requirements you then have the choice of BizTalk, other ESB and EAI products, or a DIY solution.
As you've suggested, it is difficult to justify the cost and learning curve of BizTalk on a single message exchange scenario such as this - you could probably knock up a .NET Windows Service (e.g. using WCF, Workflow Foundation, Transaction Scopes, some XSLT for mapping and a data access layer) in a few days.
However, if this isn't a one-off integration scenario and the need for additional integration arises (more applications to integrate, more services, additional listeners, different communications technologies etc), then it would be advisable for your company to take a long term view on EAI and ESB technologies. IMO the main challenge in integration isn't the initial development work, but is instead the ongoing operational management requirements - e.g. security, auditing, failover, monitoring, handling of bad messages and other exceptions - where products such as BizTalk are really worth the outlay.
Do you want to and have the bandwidth to develop, monitor, and maintain your own custom solution? If you don't mind doing that, then going the route of a custom .net-based, MSMQ/WCF solution might work well.
BizTalk will also cover all of the requirements you have listed. There is a learning curve but it is certainly not insurmountable. The initial ramp-up may be lengthier than would a custom-code solution, but there are considerable benefits, particularly the benefit of having all your requirements reliably met:
secure
transactional
reliable (messages aren't lost)
highly available (24/7)
failover
adapter architecture (includes polling adapters)
transformations
working with external web services
returning correlated responses back to the source system (i.e., orchestrating the end-to-end process)
use a broker (you specifically listed this, and BizTalk is a broker; custom MSMQ and WCF means using no broker)
If BizTalk needs to poll the POS system, then you do not need to worry about using MSMQ. BizTalk can handle transferring messages reliably (they're persisted to SQL Server, while MSMQ persists messages to disk).
Note too that the only way to make MSMQ highly available is to cluster it. So either way you'll need to cluster something.
A BizTalk solution will be easier to maintain over time, particularly if you just want to update your transformations. With versioning you can do so in a way that doesn't require downtime. It'll be tough to update a custom solution without downtime.
Some people have had difficulty in the past with monitoring BizTalk for failed messages, but I have found it to be easier, especially with a tool like SCOM or BizTalk 360, than trying to monitor message queues, which often requires even more custom work to monitor. Just make sure to include monitoring in your cost estimates for the life of your solution.
If you do need auditing, then BizTalk also has you covered. MSMQ Journaling will keep a copy of each message for you, but without significant transaction details and with no out-of-the-box way to search through or archive the data.
Building your own .NET client code to work with a Java web service will likely take a good bit of work regardless of which way you go. With BizTalk that means running a wizard against the endpoint or against the WSDL. With WCF it means doing everything by hand or with the assistance of the svcutil tool.
You should go with MSMQ transporting either way.
If you use MSMQ from .NET you should know its limitation: 4 MB on a message size.
BizTalk on the other hand has MSMQ adapter which overcomes this limitation (if a second BizTalk server listen on the other side of the channel).On top of that BizTalk gives you features like: easy configurable message tracking, visual transformation maps. It can be set up in cluster too (Ent. version only).
But the question is can you (or do you want) afford biztalk licenses and hardware for it servers (it's slower then custom .net solution).
I want client machines on internet who subscribe to my server to donate their idle cpu cycles.(Like SETI#Home)
They would take jobs(work-units) from server to process, and send back results to the server. (This is the most simple description). The framework i need should allow me to define a job/task. Rest of things like communication, job execution/tracking, client binaries update etc. should be managed by framework.
I evaluated Alchemi.NET a bit, but its not actively maintained, seems half-baked.
BOINC has API in C, but i want a .NET or JAVA framework.
I am looking at Manjrasoft's ANEKA , but it seems to work only for LAN clouds.
There must be some such frameworks available. I need expert recommendations!
I'm hardly an expert but I do have a little experience with distributed computing using MPI (with C). What you're talking about does not sound like grid computing, rather a master/slave system. That is your server is the master and it directs all the clients (slaves).
I know very little about .net programming so I'll speak in general.
There are a lot of web frame works out there and probably most have the facilities you'll need. That is your client will be able to upload files with the content they have gathered (or they could just use http get/post), because you don't have to worry about UI issues you can handle everything probably through one action (assuming an action based web framework). Then the server can return a response via JSON or XML which the client can use to take further direction on. JSON is the right choice if the system is very simple and probably a good choice for prototyping.
For the application upgrade I would consider this a separate issue (although it should be a simple matter for the server to return this to the client).
BOINC is the framework that most naturally meets your volunteer computing requirements, and it's stable and highly scalable -- I'd make sure you've considered it completely before ruling it out.
If you need to deliver something to a short deadline then I'd consider working up a simple supervisor (or scheduler) - worker pattern, just to get the application off the ground. The supervisor would be responsible for chunking up the data and making it available over http. Workers (your client app) would connect to a supervisor server; download a chunk of work; complete the chunk; and upload the results to the supervisor.
The main trick is to get the state machine thrashed out properly, so that you can accurately track what state each work chunk is in. I'd have the supervisor persist state in a database in the background.
Does your first release need to be internal, or is it for public consumption?
While that's working I'd get started on looking at getting up to speed with BOINC and planning a migration.
My recommendation
Work dist:
Have a receiver of requests, that places messages in a message queue, like rabbit mq
Have a host of workers, listening to the same queue, taking work from it and acking it when done.
When done, send a message on another queue, containing an URI to a known location, such as your network drive. The target is your parsed data.
The receiver listens to these "completed" messages. Fetches the data from the URI.
Done.
RabbitMQ comes with great CLR APIs.
The same reasoning works well with Microsoft Azure and their AbbFabric Queue. A plus is that is scales extremely well.
Hot Versioning
http://topshelf-project.com/
It gives a folder where you can drop binaries, which are then run. It manages versioning of these as well as running them as windows services.
Deployment
You can deploy the binaries with robocopy/xcopy and "net use Q: pwd \server\share", "net delete Q:"
Continuous Integration
Teamcity
After working with MsBuild extensively, I would recommend scripting it with psake and running the build with PowerShell. If you get advanced with PowerShell you also have WinRM available to you from your build scripts, which is really smooth.
Use the git/subversion commit number as the 0.0.0.x, x in the previous version number, and you will have automatic versioning that is "shared" across "Debug"/"Production" builds.
The Azure way
Work dist:
Same as above but with AppFabric Queue instead of RabbitMQ.
Hot Versioning
By swapping "Staging" and "Production" instances around, you avoid the downtime.
Deployment
You can either tap into the Azure Tools for Visual Studio's MsBuild tasks as can be read about here or you could use the PowerShell AzureSnapIns with a similar setup as above for Continuous Integration.
Continuous Integration
Same as above.
How about .net's ClickOnce installer to manage auto updated client binaries.
http://msdn.microsoft.com/en-us/library/t71a733d.aspx
I'm not sure of a "jobs framework" per-say, but Microsoft's Sync framework to support rolling your own jobs syncing with clients?
http://msdn.microsoft.com/en-us/sync/default