Handling third-party API requests in End-to-End testing - rest

I want to test my Rest API with end-to-end tests. As I understand, the difference between integration tests is that we don't do in-memory system configuration, but use real test DB and network requests.
But I can't understand how to handle third-party API requests(like GitHub or Bitbucket API).
Is it a normal practice to create a fake Github account with fake data that would be fetched by my tests ?
And what to do with access tokens, not all services are public and even public services can fail with rate limit.

Is it a normal practice to create a fake Github account with fake data that would be fetched by my tests ?
Yes. The purpose of an E2E test (vs an integration test) is to verify that the full system works with all the real system components in place, both the ones you control and the ones you don't. This can be hard to setup and a pain to maintain; but many of those pain points will be exposing real potential issues in your production service. How your service responds to that instability is itself a feature to be tested: does your system crash and burn, or does it gracefully present an error message and support good retry handling?
This also nets you a type of coverage that mocks cannot provide: If the third party API you're using is naughty and introduces some sort of breaking change, your E2E tests will catch it. This is a decent reason to continually run your E2E suite; not just during deploys.
The next level of this sort of testing is chaos engineering where not only do you test your production systems, but you purposefully introduce faults (yes, into prod) in order to ensure that your service can really handle the pressure.
And what to do with access tokens, not all services are public and even public services can fail with rate limit.
Your staging environment should be configured with separate sandbox accounts for external services. I'm not sure what you mean by "not all services are public" but just strive to keep your staging environment (or test users on prod) as identical to a real prod user as possible. For services that don't support multiple access tokens, you can get creative and try to clearly delineate your test data within their system.
Rate limits can be annoying, but if you're getting close enough that your tests push you over the limit, then you should be pursuing a strategy to address that anyways (negotiating with the service, getting multiple accounts, ...).

Running your tests against 3rd party services can result in slow and flaky tests when the service is down or when network latency triggers certain testing timeouts. Not to mention you run the risk of triggering API rate limits depending on the 3rd party service you're hitting. Your tests should ideally be deterministic, not failing randomly, and not needing conditional logic to handle errors within a particular test case. If you expect to need to handle errors, then there should be a specific test case to cover those errors that runs in every build, not waiting for non-deterministic failures to come in from the 3rd party.
One argument people will make is that your tests should notify you if the 3rd party API breaks for one reason or another. Generally speaking, though, most major 3rd party APIs are extremely stable and are unlikely to make breaking changes. Even if it does happen, this is an awkward and confusing way to find out that the API is broken, and in all likelihood, your tests aren't going to be the first place you hear it from. More likely your customers and your production error tracker will notify you. If you want to track when these services change or go down, it makes sense to have a regular production check of some sort to verify it.
As for how to write tests around these situations, that's a little more tricky. There are tools such as VCR in Ruby which work well for stubbing out your language's internet connections and allowing you to stub out, record, and customize responses (there's a list of similar implementations in other languages further down in their readme). That doesn't work for when your browser connects to those resources in automated end-to-end tests, though. There are tools that proxy your browser's web connection such as Puffing Billy in Ruby, but it's a pretty involved process to set up, including managing security certificates. This seems pretty brittle and hard to debug when something isn't working quite right.
Your best bet for writing tests that are deterministic and maintainable may be to fake out the service in test mode. thoughtbot has a pretty decent video on this and here's a high-level article from CircleCI. Essentially, you swap in an adapter in test mode that stands in for your 3rd party service integration. Maybe what you can do on your local machine is make it possible to optionally use the real service or the adapter via an environment variable in order to verify that the tests run the same against both. You could also set up a daily build to run against the real thing so that it would verify that the tests still work alright without introducing a lot of flakiness to your more frequent builds. One issue I've run into, though, is that even if I set up a test account on that 3rd party service, the results will change over time as I add or modify information for the sake of testing new functionality, such as adding new repos, modifying issues, etc. It requires additional consideration for maintaining your test account as a set of fixtures for all of your tests.
One additional tool I've come across that may be helpful is the likes of ngrok-tunnel (Ruby again). This is only relevant in cases where you need the 3rd party service to contact your app, since they can't send requests across the web to localhost:3000. If you've configured some sort of webhooks, services like this can make testing a lot more straightforward.

Related

How do you ensure that a distributed app is working as expected?

Imagine a very simple user creation flow in an online marketplace:
Service A (user service) receives the request and creates a user object and sends an async request to service B and C (e.g. via Kafka)
Service B (notification service) receives the request and sends an email to the newly created user
Service C (referral service) receives the request and credits some funds to the referrer
While this design might be laid out correctly in a design doc, it is only implicitly defined in code because the services talk to each other. How would you:
Ensure that the services are talking to each other in the correct order when implementing the user creation flow (integration tests might not suffice here since they generally test a very narrow set of path)?
Define and enforce SLO guarantees between services in production?
Debug which service is to blame when the flow breaks down?
This is a great question. And I think this scenario is a great fit for considering an orchestrator. A Microservices Orchestrator platform such as Netflix Conductor is designed to handle exactly these kind of scenarios.
With Conductor we can de-couple the flow and dependencies from the underlying functions itself and functions can be designed to do one simple thing such as saving user, notifying via email, credit referrals etc. We can then use the orchestration engine to assemble the required flow.
Such flows are executed really fast and the cost of latency is easily offset with the benefits you get.
Flow is defined as a workflow (this means the order can be controlled using the definition)
SLO guarantees - you can monitor for execution delays, failed transactions and retry and replay them as required. Latency required by an orchestrator is negligible
Debugging - with Conductor you'll get a UI that you can load up each transaction and look at what happened, which server executed it etc.
To explain these concepts better - I defined your use case here using some dummy APIs (this is a sandbox environment for Netflix Conductor)-
https://play.orkes.io/workflowDef/simple_user_creation_flow
And you can see an execution of this definition here:
https://play.orkes.io/execution/5095b5ef-3e2d-11ed-9d7b-1a5314838fe6
(For clarity - I work at https://orkes.io which offers a managed service for Netflix Conductor)

how high frequency trading system connects to exchange

I'm trying to study about high frequency trading systems. Whats the mechanism that HFT use to connect with the exchange and whats the procedure (does it has to go through a broker or is it direct access, if it's direct access what sort of connection information that i require)
Thanks in advance for your answers.
Understand that there are two different "connections" in an HFT engine. The first is the connection to a market data source. The second is to a clearing resource. As mentioned in kpavlov's answer, a very expensive COLO (co-location) is needed to get as close to the data source/target as possible. Depending on their nominal latency these COLO resources cost thousands of dollars per month.
With both connections, your trading engine must be certified by the provider (ICE, CME, etc) to comply with their requirements. With CME the certification process is automated, with ICE it employs human review. In any case, the certification requires that your software demonstrate conformance to standards and freedom from undesirable network side effects.
You must also subscribe to your data source(s) and clearing service, neither is inexpensive and pricing varies over a pretty wide range. During the subscription process you'll gain access to the service providers technical data specification(s)-- a critical part of designing your trading engine. Using old data that you find on the Internet for design purposes is a recipe for problems later. Subscription also gets you access to the provider(s) test sites. It is on these test sites that you test and debug your engine.
After you think you engine is ready for deployment you begin connecting to the data/clearing production servers. This connection will get you into a place of shadows-- port roulette. Not every port at the provider's network edge has the same latency. Here you'll learn that you can have the shortest latency yet seldom have orders filled first. Traditional load balancing does little to help this and CME has begun deployment of FPGA-based systems to ensure correct temporal sequencing of inbound orders, but it's still early in its deployment process.
Once you're running you then get to learn that mistakes can be very expensive. If you place an order prior to a market pre-open event the order is automatically rejected. Do it too often and the clearing provider will charge you a very stiff penalty. Other things can also get you penalized or even kicked-off the service if your systems are determined to be implementing strategies to block others from access, etc.
All the major exchanges web sites have links to public data and educational resources to help decide if HFT is "for you" and how to go about it.
It usually requires an approval from exchange to grant access from outside. They protect their servers by firewalls so your server/network need to be authorized to access.
Special certification procedure with technician (by phone) is usually required before they authorize you.
Most liquidity providers use FIX protocol or custom APIs. You may consider starting implementing your connector with QuickFix, but it may become a bottleneck later, when your traffic will grow.
Information you need to access by FIX is:
Server IP
Server port
FIX protocol credentials:
SenderCompID
TargetCompID
Username
Password
Other fields

Simulating Virtual Users for Smartphone App based Service

Apologies if something similar has been asked in the future but my search didn't return, what I would consider, directly related.
I am trying to implement a service with its backend in AWS EC2/S3 and front-end in iPhone and the service is more or less like a todo-list. This is not a novel idea but will help me in a class I teach about IT infrastructure.
Unfortunately I have access to only my own iPhone and I cannot demonstrate scalability over AWS, etc.
Is there a way/software tool/framework to simulate virtual users for this app that can send requests to the AWS servers pretending to be from different accounts/apps?
The simulator should send requests just like my actual iphone app would send if I were to add an item to the list or delete or edit.
I understand stress testing is a well established topic but here I want to just simulate multiple users and demonstrate scalability instead of trying to push the Web service to its limits. Neither am I sure if this completely overlaps with traffic simulation.
Any help will be deeply appreciated.
You might be able to do it using Apache JMeter. That depends on what you have going on on the backend. But it supports the following server types:
Web - HTTP, HTTPS
SOAP
Database via JDBC
LDAP
JMS
Mail - SMTP(S), POP3(S) and IMAP(S)
Native commands or shell scripts
You should be able to wire something together with that.
http://jmeter.apache.org/
http://www.opensourcetesting.org/performance.php
I've used it at various points to simulate VERY heavy loads for my services running in AWS/EC2.
Apache Benchmark is a very convenient tool for doing HTTP load testing -- you can have it make concurrent requests to simulate multiple users. It's main advantage over other tools is that it's simple and easy to get started with. If your backend listens on HTTP, it might be worth trying ab before investing any time in something more complex.

how can i measure stress testing for the iPhone app?

how can i measure stress testing for the iPhone app ?
i need stress testing not performance testing, for example 100 users access the database of the app which is on the server at the same time.
any help?
thanks in advance
First, you need to decide if you need to test the client-side (iPhone) app, the server-side code, or both.
Testing ONLY the server-side, might make this much easier - especially if it is using HTTP to communicate with the server and exchanges data via a text-based format (XML, JSON, etc). There are many web load testing tools available which can handle this scenario. Using our Load Tester product, for example, you would configure the proxy settings on your iPhone to point to our software running on a local machine. Then start a recording and use the application. Load Tester will record the messages exchanged with the server. You can then replay the scenario, en masse, to simulate many users hitting your server simultaneously. The process, at a high level, is the same with most of the web load testing tools.
Of course, the requests to the server can't be replayed exactly as recorded - they'll need to be customized to accurately simulate multiple users. How much customization is needed will depend on the kind of data being exchanged, the complexity of the scenario and the ability of the tool to automatically configure dynamic fields (and this is one area where the abilities of the tools vary greatly).
Hope that helps!
A basic simulation would involve running your unit tests on OS X, using many simultaneous unit test processes (with unique simulated users, and other variables).
If you need more 'stress', add machines - you'll likely end up hitting io or network limits from one machine relatively early on.

Framework recommendation needed on .NET or JAVA for Volunteer computing (internet nodes)

I want client machines on internet who subscribe to my server to donate their idle cpu cycles.(Like SETI#Home)
They would take jobs(work-units) from server to process, and send back results to the server. (This is the most simple description). The framework i need should allow me to define a job/task. Rest of things like communication, job execution/tracking, client binaries update etc. should be managed by framework.
I evaluated Alchemi.NET a bit, but its not actively maintained, seems half-baked.
BOINC has API in C, but i want a .NET or JAVA framework.
I am looking at Manjrasoft's ANEKA , but it seems to work only for LAN clouds.
There must be some such frameworks available. I need expert recommendations!
I'm hardly an expert but I do have a little experience with distributed computing using MPI (with C). What you're talking about does not sound like grid computing, rather a master/slave system. That is your server is the master and it directs all the clients (slaves).
I know very little about .net programming so I'll speak in general.
There are a lot of web frame works out there and probably most have the facilities you'll need. That is your client will be able to upload files with the content they have gathered (or they could just use http get/post), because you don't have to worry about UI issues you can handle everything probably through one action (assuming an action based web framework). Then the server can return a response via JSON or XML which the client can use to take further direction on. JSON is the right choice if the system is very simple and probably a good choice for prototyping.
For the application upgrade I would consider this a separate issue (although it should be a simple matter for the server to return this to the client).
BOINC is the framework that most naturally meets your volunteer computing requirements, and it's stable and highly scalable -- I'd make sure you've considered it completely before ruling it out.
If you need to deliver something to a short deadline then I'd consider working up a simple supervisor (or scheduler) - worker pattern, just to get the application off the ground. The supervisor would be responsible for chunking up the data and making it available over http. Workers (your client app) would connect to a supervisor server; download a chunk of work; complete the chunk; and upload the results to the supervisor.
The main trick is to get the state machine thrashed out properly, so that you can accurately track what state each work chunk is in. I'd have the supervisor persist state in a database in the background.
Does your first release need to be internal, or is it for public consumption?
While that's working I'd get started on looking at getting up to speed with BOINC and planning a migration.
My recommendation
Work dist:
Have a receiver of requests, that places messages in a message queue, like rabbit mq
Have a host of workers, listening to the same queue, taking work from it and acking it when done.
When done, send a message on another queue, containing an URI to a known location, such as your network drive. The target is your parsed data.
The receiver listens to these "completed" messages. Fetches the data from the URI.
Done.
RabbitMQ comes with great CLR APIs.
The same reasoning works well with Microsoft Azure and their AbbFabric Queue. A plus is that is scales extremely well.
Hot Versioning
http://topshelf-project.com/
It gives a folder where you can drop binaries, which are then run. It manages versioning of these as well as running them as windows services.
Deployment
You can deploy the binaries with robocopy/xcopy and "net use Q: pwd \server\share", "net delete Q:"
Continuous Integration
Teamcity
After working with MsBuild extensively, I would recommend scripting it with psake and running the build with PowerShell. If you get advanced with PowerShell you also have WinRM available to you from your build scripts, which is really smooth.
Use the git/subversion commit number as the 0.0.0.x, x in the previous version number, and you will have automatic versioning that is "shared" across "Debug"/"Production" builds.
The Azure way
Work dist:
Same as above but with AppFabric Queue instead of RabbitMQ.
Hot Versioning
By swapping "Staging" and "Production" instances around, you avoid the downtime.
Deployment
You can either tap into the Azure Tools for Visual Studio's MsBuild tasks as can be read about here or you could use the PowerShell AzureSnapIns with a similar setup as above for Continuous Integration.
Continuous Integration
Same as above.
How about .net's ClickOnce installer to manage auto updated client binaries.
http://msdn.microsoft.com/en-us/library/t71a733d.aspx
I'm not sure of a "jobs framework" per-say, but Microsoft's Sync framework to support rolling your own jobs syncing with clients?
http://msdn.microsoft.com/en-us/sync/default