Start using cypress in docker compose on gitlab,
how can I run tests parallel in docker compose?
I use this command in docker:
npx cypress run -b chrome
What do I need to use for starting tests parallel and combine test results?
You will still need some type of "dashboard" to coordinate, record and parallelize your tests.
You can either use Cypress.io dashboard, or run your own service that coordinates parallelization.
For example (disclaiming that I am the author): https://github.com/agoldis/sorry-cypress
Cypress.io used to provide free parallelization but recently I have got this message:
This Thursday, July 11, 2019, we will begin enforcing limits on test recordings and based on your organization's recent usage of the Cypress Dashboard, your account will be impacted.
After exceeding 100% of your plan's test recording limit, parallelization will be disabled and new test recordings will be hidden from the dashboard.
Magnolia from Customer Support
Cypress.io provides free access to their dashboard for Open Source projects, and there's also a "free" plan with limited recording and parallelization capacity (as of Sep 24, 2019).
See https://www.cypress.io/pricing/
The Cypress Dashboard is required to run tests in parallel across machines.
Here's how it works:
The Cypress Dashboard keeps track of how long each of your tests takes every time you run them.
When your CI initially spin up a bunch of machines, they will reach out to the Cypress Dashboard and ask, "hey, what tests should I run?"
The Cypress dashboard knows how many machines you've spun up, and it will delegate tests to them such that the testing completes at roughly the same time on each one.
The Cypress Dashboard parallelization is free for all plans, including the free tier.
Diagram of the above from the parallelization docs
Related
I've had some issues with tests timing out randomly. Usually on CircleCI, but sometimes locally. Based on Kent Dodds suggestion to write fewer longer tests I now have more tests with multiple clicks & multiple network requests (mocking fetch too). Theses tests seem to timeout. Just recently CircleCI added a Resources tab to the pipeline for some interesting metrics. When the tests timeout, the 4GB ram clearly gets to 100% for extended time, and the test fails. On a passed test, the ram stays mostly below 100%.
Failed test (4GB):
Passed test (4GB):
Updated Resource_class to 8GB
I tried a single experiment to update my circleci config so that the resource_class gets updated to large/8GB. Test passed and even better CPU usage %.
So, does React Testing Library take up a lot of horsepower?
Is our default 4GB RAM docker image ok?
I have a test suite that I run with
python3 -mpytest --log-cli-level=DEBUG ...
on the build server. The live logs are useful to troubleshoot if the tests get stuck or are slow for some reason (the tests use external resources).
To speed things up, it is possible to run them with e.g.
python3 -mpytest -n 4 --log-cli-level=DEBUG ...
to have four parallel test runners. Speedup is almost linear with number of processes, which is great, but unfortunately the parent process swallows all live logs. I get the captured logs in case of a test failure, but I need the live logs as well to understand what is going on in real time. I understand that the output from all four parallel runs will be intermixed and that is fine. The purpose is for the committer to just check the build server output and know roughly what is going on.
I am currently using pytest-xdist, but use none of the more advanced features from it (just the multiprocessing).
I have a python app that builds a dataset for a machine learning task on GCP.
Currently I have to start an instance of a VM that we have, and then SSH in, and run the app, which will complete in 2-24 hours depending on the size of the dataset requested.
Once the dataset is complete the VM needs to be shutdown so we don't incur additional charges.
I am looking to streamline this process as much as possible, so that we have a "1 click" or "1 command" solution, but I'm not sure the best way to go about it.
From what I've read about so far it seems like containers might be a good way to go, but I'm inexperienced with docker.
Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down? How would I pass information to the container such as where to get the config file etc? It's conceivable that we will have multiple datasets being generated at the same time based on different config files.
Is there a better gcloud feature that suits our purpose more effectively than containers?
I'm struggling to get information regarding these basic questions, it seems like container tutorials are dominated by web apps.
It would be useful to have a batch-like container service that runs a container until its process completes. I'm unsure whether such a service exists. I'm most familiar with Google Cloud Platform and this provides a wealth of compute and container services. However -- to your point -- these predominantly scale by (HTTP) requests.
One possibility may be Cloud Run and to trigger jobs using Cloud Pub/Sub. I see there's async capabilities too and this may be interesting (I've not explored).
Another runtime for you to consider is Kubernetes itself. While Kubernetes requires some overhead in having Google, AWS or Azure manage a cluster for you (I strongly recommend you don't run Kubernetes yourself) and some inertia in the capacity of the cluster's nodes vs. the needs of your jobs, as you scale the number of jobs, you will smooth these needs. A big advantage with Kubernetes is that it will scale (nodes|pods) as you need them. You tell Kubernetes to run X container jobs, it does it (and cleans-up) without much additional management on your part.
I'm biased and approach the container vs image question mostly from a perspective of defaulting to container-first. In this case, you'd receive several benefits from containerizing your solution:
reproducible: the same image is more probable to produce the same results
deployability: container run vs. manage OS, app stack, test for consistency etc.
maintainable: smaller image representing your app, less work to maintain it
One (beneficial!?) workflow change if you choose to use containers is that you will need to build your images before using them. Something like Knative combines these steps but, I'd stick with doing-this-yourself initially. A common solution is to trigger builds (Docker, GitHub Actions, Cloud Build) from your source code repo. Commonly you would run tests against the images that are built but you may also run your machine-learning tasks this way too.
Your containers would container only your code. When you build your container images, you would pip install, perhaps pip install --requirement requirements.txt to pull the appropriate packages. Your data (models?) are better kept separate from your code when this makes sense. When your runtime platform runs containers for you, you provide configuration information (environment variables and|or flags) to the container.
The use of a startup script seems to better fit the bill compared to containers. The instance always executes startup scripts as root, thus you can do anything you like, as the command will be executed as root.
A startup script will perform automated tasks every time your instance boots up. Startup scripts can perform many actions, such as installing software, performing updates, turning on services, and any other tasks defined in the script.
Keep in mind that a startup script cannot stop an instance but you can stop an instance through the guest operating system.
This would be the ideal solution for the question you posed. This would require you to make a small change in your Python app where the Operating system shuts off when the dataset is complete.
Q1) Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down?
A1) Medium has a great article on installing a package from a private git repo inside a container. You can execute the dataset build before shutting down.
Q2) How would I pass information to the container such as where to get the config file etc?
A2) You can use ENV to set an environment variable. These will be available within the container.
You may consider looking into Docker for more information about container.
I followed the tutorial for setting up JupyterHub on an AWS EMR cluster at this link: https://aws.amazon.com/blogs/big-data/running-jupyter-notebook-and-jupyterhub-on-amazon-emr/
I got the cluster up and running, but now my question is how do I stress/load test? (i.e. simulate 100 users running through the notebooks simultaneously).
In a classroom setting, I had about 30 users sshed into my cluster running through the notebook exercises, but there was a huge slowdown when more people started executing the code blocks in the notebooks. What happened was some python library imports took forever, some exercises stopped working or was just hanging. Cloudwatch showed that there was a network bottleneck.
Basically what I'm asking is how can I go about debugging something like that? What's the best way to simulate multiple users sshing into the EMR cluster, opening up jupyter notebooks and running the code blocks concurrently?
You should look (and contribute?) to project like this one which are meant to load-test JupyterHub and should migrate to jupyterHub organisation once more polished.
Note that in your case you are not really wishing to test JupyterHub, you are testing your cluster; just run N scripts in parallel importing your library and you have your load test.
We need to use Jenkins to test some web apps that each need:
a database (postgres in our case)
a search service (ElasticSearch in our case, but only sometimes)
a cache server, such as redis
So far, we've just had these services running on the Jenkins master, but this causes problems when we want to upgrade Postgres, ES or Redis versions. Not all apps can move in lock step, and we want to run the tests on new versions before committing to move an app in production.
What we'd like to do is have these services provided on a per-job-run basis, each one running in its own container.
What's the best way to orchestrate these containers?
How do you start up these ancillary containers and tear them down, regardless of whether to job succeeds or not?
how do you prevent port collisions between, say, the database in a run of a job for one web app and the database in the job for another web app?
Check docker-compose and write a docker-compose file for your tests.
The latest network features of Docker (private network) will help you to isolate builds running in parallel.
However, start learning docker-compose as if you only had one build at the same time. When confident with this, look further for advanced docker documentation around networking.