Can buildbot be forced to run a build on multiple nodes? - buildbot

I have a project that I want to build on multiple nodes (several different architectures and operating systems). I then want to create packages on each node (debs and RPMs). Because of the different architectures and operating systems, I want buildbot to schedule a build for this project on several nodes at the same time.
Can that be done? What's the best way? Creating separate builders for each operating system / architecture combination?

So yes, this is possible, but not in the way I thought it would be.
In order to make buildbot run a build on multiple nodes (in my case, to generate RPMs and Debs), you create multiple groups of workers, and a schedulers that lists all these groups as buildernames.
You then create a BuildFactory that builds your artifacts, and finally you create multiple BuilderConfigs that map that factory to your workers.
I hope that, if you read this, looking for an answer, you will find this useful.
Feel free to pring this question for a couple of config snippets.

Related

How can I compactly store a shared configuration with Kubernetes Kustomize?

First, I'm not sure this question is specific enough for Stack Overflow. Happy to remove or revise if someone has any suggestions.
We use Kubernetes to orchestrate our server side code, and have recently begun using Kustomize to modularize the code.
Most of our backend services fit nicely into that data model. For our main transactional system we have a base configuration that we overlay with tweaks for our development, staging, and different production flavors. This works really well and has helped us clean things up a ton.
We also use TensorFlow Serving to deploy machine learning models, each of which is trained and at this point deployed for each of our many clients. The only way that these configurations differ is in the name and metadata annotations (e.g., we might have one called classifier-acme and another one called classifier-bigcorp), and the bundle of weights that are pulled from our blob storage (e.g., one would pull from storage://models/acme/classifier and another would pull from storage://models/bigcorp/classifier). We also assign different namespaces to segregate between development, production, etc.
From what I understand of the Kustomize system, we would need to have a different base and set of overlays for every one of our customers if we wanted to encode the entire state of our current cluster in Kustomize files. This seems like a huge number of directories as we have many customers. If we have 100 customers and five different elopement environments, that's 500 directories with a kustomize.yml file.
Is there a tool or technique to encode this repeating with Kustomize? Or is there another tool that will work to help us generate Kubernetes configurations in a more systematic and compact way?
You can have more complex overlay structures than just a straight matrix approach. So like for one app have apps/foo-base and then apps/foo-dev and apps/foo-prod which both have ../foo-base in their bases and then those in turn are pulled in by the overlays/us-prod and overlays/eu-prod and whatnot.
But if every combo of customer and environment really does need its own setting then you might indeed end up with a lot of overlays.

Running Netlogo standalone - Corporate Environment

I've used NetLogo to explain the power of agent based modelling to people a number of times and I have found it to be very effective.
I have a particular business problem at work where I think that ABM, and Netlogo in particular, could be useful for generating consensus on a way forward between two groups that have entrenched and opposing views.
What I would like to do is to demonstrate models and alter parameters. Even better, if possible, I would like them to see me add to the model.
However, this is a corporate environment. I can not install software on my machine or indeed any machine that I can connect to their network.
Is there a way that I can, say, run this all off a stick, in the cloud or simply as a download/unzip without an install.
Could be worse. My work laptop does have JAVA installed. jre6.
"off a stick": yes, http://ccl.northwestern.edu/netlogo/docs/faq.html#runcd
"download/unzip without an install": Are you on Windows? I'll assume you are. We don't make a zip archive available for this purpose, but you can make one yourself by installing NetLogo on another machine and zipping up the resulting folder. Once you unzip on your work machine, the only thing you'll be missing is the association of the .nlogo and .nlogo3d file suffixes with the NetLogo application. (So yes, JenB's suggestion should work.)

Should actors/services be split into multiple projects?

I'm testing out Azure Service Fabric and started adding a lot of actors and services to the same project - is this okay to do or will I lose any of service fabric features as fail overs, scaleability etc?
My preference here is clearly 1 actor/1 service = 1 project. The big win with a platform like this is that it allows you to write proper microservice-oriented applications at close to no cost, at least compared to the implementation overhead you have when doing similar implementations on other, somewhat similar platforms.
I think it defies the point of an architecture like this to build services or actors that span multiple concerns. It makes sense (to me at least) to use these imaginary constraints to force you to keep the area of responsibility of these services as small as possible - and rather depend on/call other services in order to provide functionality outside of the responsibility of the project you are currently implementing.
In regards to scaling, it seems you'll still be able to scale your services/actors independently even though they are a part of the same project - at least that's implied by looking at the application manifest format. What you will not be able to do, though, are independent updates of services/actors within your project. As an example; if your project has two different actors, and you make a change to one of them, you will still need to deploy an update to both of them since they are part of the same code package and will share a version number.

Jenkins: dealing with many build configurations

The shop I work for is using jenkins for continuous integration and its promoted builds plugin to deploy build artifacts. However, we're having trouble managing this set up as the number of configurations grows. So my question is:
How can I set up a handy CI system from which I can deploy various artifacts in various configurations without manually scripting every possible combination?
Some more details:
Let's say I have build configurations (i.e. branches) A, B and C. There are three deployment targets I, J and K (say for various clients or consumers). Finally, each deployed instance has various services X, Y and Z (e.g. web-site, background tasks and data-service). The various services are usually promoted together; but sometimes, particularly to get hotfixes out, they're not.
Currently, we have promotions for each of these combinations. So to install a typical build I'd need to run promotions J/X, J/Y and J/Z on config C. The number of services is unfortunately rising, and getting all those configurations in jenkins without making any error, and furthermore ensure that none of the components are forgotten or mixed up when deployment comes around is getting tricky. And of course, there are more than three build configs and more than three targets, so it's all getting out of hand.
Some options that don't quite work:
Parametrized promotions to disable various components. Jenkins allows parametrized promotions, but the values are fixed the first time you promote. I could remove a degree of freedom by just promoting J and setting some parameters, but if a later version breaks, I can't rollback only the component that broke, I need to rollback the entire deployment.
Dependant, parametrized builds. Jenkins doesn't seem to support parameters to choose which build to depend on, and if you manually code the options then of course the "run" selection parameter can't work.
What I'd really want:
After a build is manually accepted as ready for deployment, it should be marked as such including an argument for which target and arguments for which components.
the installation history is logged per-component per-target, not (only) per-build.
There may be some plugins to help, but you also may be approaching the point where looking at commercial tools is appropriate. I work for a build/deploy vendor (Urbancode) so by all means take this with a giant grain of salt.
We generally see people have different build types (or branches) for a single project and group those as a single 'project' with multiple 'build workflows' using the same basic configuration with some per workflow parameterization. Really simple reuse of process.
The number of services is unfortunately rising, and getting all those configurations in jenkins without making any error, and furthermore ensure that none of the components are forgotten or mixed up when deployment comes around is getting tricky. And of course, there are more than three build configs and more than three targets, so it's all getting out of hand.
If the challenge here is that you have multiple web services and promotions (especially to production) involve pushing lots of stuff, at specific versions, in a coordinated manner you're hitting the standard use case for our application release automation tool, uDeploy. It's conveniently integrated with Jenkins. It also has really nice tracking of what version of what went to what deployment target, and who ran that process.

Version-control in a large SSIS ETL project

We're about to make data transformation from one system to another using SSIS. We are four people people who will continuously be working on this for two years and therefore we need some sort of versioning system. We can not use team foundation. We're currently configuring a SVN server, but digging into it I've seen some big risks.
It seems that a solution is stored in one huge XML file. This must be a huge problem in a combined code/drag and drop environment as SSIS, as it will be impossible for SVN to merge the changes correctly, and whenever we get an error when commiting we will have to look inside that huge XML file and correct the mistakes manually.
One way to solve this problem is to create many solution projects in SSIS. However, this is not really the setup we want as we are creating one big monster which will have 2 days to execute and we want to follow its progress as it executes. If we have to create several solutions are there ways to link their execution and still have a visual look of whats going on and how well the execution is doing?
Has anyone had similar problems and/or do you have any suggestions as to how to solve them?
Just how many packages are you talking about? If it is hundreds of packages, then what is the specific problem you are trying to avoid? Here are a few things you might be trying to avoid based on your post:
Slow solution and project load time at startup in BIDS. I suppose this could be irritating from time to time. But if you keep BIDS open all day, that seems like a once a day cost.
Slow solution and project load time when you get latest solution definition from your version control system. Again, I suppose this could be irritating from time to time, but how frequently do you need to refresh the whole solution? If you break the solution into separate projects, then you only need to refresh a project. You would only need to refresh the whole solution if you want to get access to a new project within the solution.
What do you mean by "one huge XML file"? The solution file is an XML file that keeps track of the projects. Each project file is an XML file that keeps track of its SSIS packages. So if you have 1,000 SSIS packages evenly distribution across 10 projects in 1 solution, then each file would have no more than 100 objects to track. I can tell you from experience that I've had Reporting Services projects with more RDL files than this and it only took seconds to load the solution properly in BIDS. And as #revelator pointed out, the actual SSIS packages are their own individual XML files. Any version control system should track each of these as separate files and won't combine them into "one huge XML file". If you clarify what you mean by this point, then I think you will get better help on the question.
Whether you are running one package or 1,000 packages, you won't be doing this interactively from BIDS. You will probably deploy the packages to server first and then have the server run the packages. If that's the case, then you will need to call the packages probably with a SQL Server Agent job. Whether you chain the packages by making each package call another package or if you chain the packages by having the job call each package as a separate job step, you can still track where you are in the chain with logging. If you are calling the packages with jobs, then you can track it with job steps too. I run a data warehouse that has scores of packages and I primarily rely on separating processes into jobs that each contain one or more packages. I also chain jobs with start job commands so that I can more easily monitor performance of logical groups of loads. Also, each package shows its execution time in the job history at the step level. Furthermore, I have custom logging in each stored procedure and package that shows how many seconds and rows an individual data load or stored procedure took so that I can troubleshoot performance bottlenecks.
Whatever you do, don't rely on running packages interactively as a way to track performance! You won't get optimal performance running ETL on your machine, let alone running it with a GUI. Run packages in jobs on servers, not desktops. Interactively running packages is just their to help build and troubleshoot individual packages, not to adminster daily ETL.
If you are building generic packages that change their targets and sources based on parameters, then you probably need to build a control table in a database tha tracks progress. If you are simply moving data from one large system to another as a one time event, then you are probably going to divide the load into small sets of packages and have separate jobs for each so that you can more easily manage recovering from failures. If you intend to build something that runs regularly to move data, then how could 2 days of constant running for one process even make sense? It sounds like the underlying data will change on you within 2 days...
If you are concerned about which version control system to use for managing SSIS package projects, then I can say that just about any will do. I've used Visual SourceSafe and Perforce at different companies and both have the same basic features of checking in and checking out individual packages. I'm sure just about any version control system that integrates with Visual Studios will do this for you.
Hope you find something useful in the above and good luck with your project.
Version control makes it possible to have multiple people developing together and working on same project. If I am working on something, a fellow ETL developer will not be able to check it out and make changes to it until I am finished with my changes and check those back in. This addresses the common situation where one developer’s project artifact and code changes clobber that of another developer by accident.
http://blog.sqlauthority.com/2011/08/10/sql-server-who-needs-etl-version-control/
Most ETL projects I work use SVN as the source control repository. The best method I have found is to break each project or solution down into smaller, distinct (and often independently runnable) packages. So for example, say you had a process called ManufacturingImport, this could be your project. Within this you would have a Master package, which then called other packages as required. This means that members of the team can work on distinct packages or pieces of work, rather than everyone trying to edit the same package and getting into troublesome situations with merging.