How can I make Service Fabric package sizes practical? - deployment

I'm working on a Service Fabric application that is deployed to Azure. It currently consists of only 5 stateless services. The zipped archive weighs in at ~200MB, which is already becoming problematic.
By inspecting the contents of the archive, I can see the primary problem is that many files are required by all services. An exact duplicate of those files is therefore present in each service's folder. However, the zip compression format does not do anything clever with respect to duplicate files within the archive.
As an experiment, I wrote a little script to find all duplicate files in the deployment and delete all but one of each files. Then I tried zipping the results and it comes in at a much more practical 38MB.
I also noticed that system libraries are bundled, including:
System.Private.CoreLib.dll (12MB)
System.Private.Xml.dll (8MB)
coreclr.dll (5MB)
These are all big files, so I'd be interested to know if there was a way for me to only bundle them once. I've tried removing them altogether but then Service Fabric fails to start the application.
Can anyone offer any advice as to how I can drastically reduce my deployment package size?
NOTE: I've already read the docs on compressing packages, but I am very confused as to why their compression method would help. Indeed, I tried it and it didn't. All they do is zip each subfolder inside the primary zip, but there is no de-duplication of files involved.

There is a way to reduce the size of the package but I would say it isn't a good way or the way things should be done but still I think it can be of use in some cases.
Please note: This approach requires target machines to have all prerequisites installed (including .NET Core Runtime etc.)
When building .NET Core app there are two deployment models: self-contained and framework-dependent.
In the self-contained mode all required framework binaries are published with the application binaries while in the framework-dependent only application binaries are published.
By default if the project has runtime specified: <RuntimeIdentifier>win7-x64</RuntimeIdentifier> in .csproj then publish operation is self-contained - that is why all of your services do copy all the things.
In order to turn this off you can simply add SelfContained=false property to every service project you have.
Here is an example of new .NET Core stateless service project:
<PropertyGroup>
<TargetFramework>netcoreapp2.2</TargetFramework>
<AspNetCoreHostingModel>InProcess</AspNetCoreHostingModel>
<IsServiceFabricServiceProject>True</IsServiceFabricServiceProject>
<ServerGarbageCollection>True</ServerGarbageCollection>
<RuntimeIdentifier>win7-x64</RuntimeIdentifier>
<TargetLatestRuntimePatch>False</TargetLatestRuntimePatch>
<SelfContained>false</SelfContained>
</PropertyGroup>
I did a small test and created new Service Fabric application with five services. The uncompressed package size in Debug was around ~500 MB. After I have modified all the projects the package size dropped to ~30MB.
The application deployed worked well on the Local Cluster so it demonstrates that this concept is a working way to reduce package size.
In the end I will highlight the warning one more time:
Please note: This approach requires target machines to have all prerequisites installed (including .NET Core Runtime etc.)

You usually don't want to know which node runs which service and you want to deploy service versions independently of each other, so sharing binaries between otherwise independent services creates a very unnatural run-time dependency. I'd advise against that, except for platform binaries like AspNet and DotNet of course.
However, did you read about creating differential packages? https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-application-upgrade-advanced#upgrade-with-a-diff-package that would reduce the size of upgrade packages after the initial 200MB hit.

Here's another option:
https://devblogs.microsoft.com/dotnet/app-trimming-in-net-5/
<SelfContained>True</SelfContained>
<PublishTrimmed>True</PublishTrimmed>
From a quick test just now, trimming one app reduced the package size from ~110m MB to ~70MB (compared to ~25MB for selfcontained=false).
The trimming process took several minutes for a single application though, and the project I work on has 10-20 apps per Service Fabric project. Also I suspect that this process isn't safe when you have a heavy reliance on dependency injection model in your code.
For debug builds we use SelfContained=False though because developers will have the required runtimes on their machines. Not for release deployments though.
As a final note, since the OP mentioned file upload being a particular bottleneck:
A large proportion of the deployment time is just zipping and uploading the package
I noticed recently that we were using the deprecated Publish Build Artifacts task when uploading artifacts during our build pipeline. It was taking 20 minutes to upload 2GB of files. I switched over the suggested Publish Pipeline Artifact task and it took our publish step down to 10-20 seconds. From what I can tell, it's using all kinds of tricks under the hood for this newer task to speed up uploads (and downloads) including file deduplication. I suspect that zipping up build artifacts yourself at that point would actually hurt your upload times.

Related

Is it possible to utilize the same service worker for two projects?

I have an issue with a service worker, I have two different projects that are in the same server but in different folders, and I want to precache the files on project number 2 using my service worker (My service worker is already working on project number 1). My question is, is it possible to do this? is there any other way I can attack this? Any help is very much appreciated.
In general, yes, as long as the service worker is hosted at a URL that is at the same level (or "higher") than the root of each of those projects. That would ensure that each project will be within scope of the service worker.
I'm assuming that one of the challenges you're asking about relates to creating a precache manifest within that service worker that contains build artifacts from both projects. There are a few different ways to tackle that, but I think the most straightforward would be to ensure that you always run the build process for each project at the same time, and then when you use Workbox's build tooling to create the precache manifest, you ensure that you grab all the assets that were output by each of the projects.
The specifics of configuring that build process depends on what you're currently using. You mention that there's a service worker (presumably using Workbox's precaching) already in place for the first project, so I think just using the same build setup, with tweaks to pick up the additional assets, would be easiest.

TFS Intranet Automated Deploy Strategy

I have introduced branching/merging to my team and have talked before about how it would be great to automatically build and deploy code checked into the staging/master branches, but I'm a junior dev, not very ops-y.
The trouble I'm having, is that we create intranet applications and store them on our own VM's which we have access to, but we also have load balancing which is causing me grief!
I can get a build to automate (well, I haven't got all the bugs figured out but I'm working my way through them) - and I can even get the build to automatically create a zip file ready for deployment.
Is it possible to configure several servers for deployment?
I.E
1) I check in some code to stage
***Automatically***
2) Code builds
3) Build completes, Unit tests run and they complete
4) Code is packaged into a .zip
5) .Zip is deployed across the three load balancing servers (all with the same file path).
***
Maybe worth noting we currently have our TFS server running Visual Studio so the code is built on the same server it is all stored, but this is not the server we run live code from.
Any help or tutorials specific to my setup would be GREATLY appreciated, I really want to turn this departments releasing strategies around!
I am going to address only the deployment aspect. There are a lot of different ways that this can be handled, such as:
Customizing the build template
Writing custom .Net code and inserting it into the build template (which would also involve customizing the template)
Creating a Batch or Powershell script set to run after the build completes
Using a separate tool such as OctoDeploy or Release Manager to handle the deployments
The first thing you need to do is separate the build and deployment steps in your head. While they are tightly coupled in your model, they are two totally different tasks that need to be handled different ways.
The second thing is to stop thinking like a developer when it comes to the deployment portion. While there will likely be a programmatic solution, you'll need to identify the manual steps first.
You stated that you're not very ops-y, by which I assume you mean you're more Developer and not Systems Analyst. If that is the case, then the third thing you'll need to do is get someone who is involved, such as your current release team.
There are 3 major things that need to be done then:
EVERYTHING needs to be standardized. If you can't standardize something, then standardize the way that it's non-standard (example: You have a bulk list of servers you need to deploy to, and you need to figure out which ones to deploy to based on their name, which can be anything. In that case, a rule needs to be put in place that all QA servers need to have QA in their name, User Acceptance servers need UAT, Production need PROD, etc.).
Figure out how you're going to communicate from the build to the deployment, which builds are going to deployed, to which servers, and where the code is going to be picked up from
You need to document every manual step, and every exception to those steps, and every exception to those exceptions.
Once you have all those pieces in place, you need to then go through each manual step and automate it, whether that's through Batch, Powershell, or a custom-built application. Once you have all the steps automated, you'll have both the build and deploy pieces complete.
After you're able to execute a single "manual" automatic deployment to a single environment, you're then ready to figure out how you want to run it for multiple environments. This can be as complex as an XML file that is iterated through, to simply calling the same command multiple times with different parameters.
A quick summary of how I've done this at my current job (where using a third-party deployment tool was not an option):
Created a tool using .Net WinForms to allow us to "manually" run automated builds (We use the interface to determine the input parameters, and the custom classes under the hood do all the heavy lifting. These custom classes are in a separate project that builds to their own dll. This also allows us to test tweaks and changes to the process in a testing environment before we roll it out to our production build server)
Set up an XML file for each set of environment (QA, UAT, Prod, etc.) that contains all of the servers that need to be deployed to in that environment, including destination paths, scheduled tasks, and Windows Services
Customize the TFS build template and include the custom classes created for the custom tool, which will read the XML file and iterate through each server entry to perform the deployments
I'm more than happy to help with more specific examples and assistance, I look at things a bit different than most people and it helps when it comes to release management.

Symbol server vs. including pdb files in package

What are the benefits of using a symbol server instead of simply including the pdb files in the nuget package?
The main advantage is that the package size is much smaller. So your restore/disk utilization and potentially deployment are smaller by default.
There is also an automatic matching of the pdb to your dll if you deploy an application without the pdb, where you will have to manually match them otherwise.
For example:
The package developer creates a package with pdb files in it.
The app developer can debug with the pdbs in the package. So far so good.
When the app developer deploys the app, he omits the pdbs (because they are large and not necessary).
Several versions of the app have been deployed.
Now the app developer (or another person using the app) hits a problem in production or on a client machine.
By adding the symbolserver url to visual studio, the symbols are resolved automatically on the target machine and the app developer does not have to bring over the right set of pdbs.

Prevent user from changing ClickOnce application files after installation

I'm developing a WPF application that I deploy with ClickOnce to a network share on the intranet from which clients can install it.
I need to make sure that the user can't modify any of the application files (especially DLLs and the main executable) on their machine. That is, if any of the application files have changed, the application should refuse to run. I was under the impression that, when using ClickOnce, this was available out of the box and that the application would refuse to start if the file hashes didn't match the manifest.
However, I tried to manually replace the executable or a DLL with a slightly different version after installation and the application still ran fine (executing the modified code).
Does ClickOnce provide what I'm looking for?
How can I enable the functionality?
I'm using a level 2 StartSSL code-signing certificate to sign the application manifest if this matters.
P.S.: just to be sure: I'm talking about the installed application files, not the installation files.
You can sign AND strong name each one of DLLs to prevent tampering but then, doing so has its own pain points when it comes to upgrades and distribution in general. Note that even doing so, doesn't entirely prevent someone from injecting code into your running process. It's a sticky subject.
I recommend going thru this question which already discusses these points in detail. Does code-signing without strong-naming leave your app open to abuse?
I think it will be a fairly manual process.
Doesn't look like the VS2013 deployment tools handle code obfuscation but they do support signing and app permissions. Start with that, then you might have to get the generated manifest as a starting point to build your own with obfuscated assemblies.
MS docs break it into 3 steps: 1. obfuscate, 2. build manifest, 3. manually publish
Here is what MS docs say...
Securing ClickOnce Applications
Deploying Obfuscated Assemblies
You might want to obfuscate your application by using Dotfuscator to prevent others from reverse engineering the code. However, assembly obfuscation is not integrated into the Visual Studio IDE or the ClickOnce deployment process. Therefore, you will have to perform the obfuscation outside of the deployment process, perhaps using a post-build step. After you build the project, you would perform the following steps manually, outside of Visual Studio:
Perform the obfuscation by using Dotfuscator.
Use Mage.exe or MageUI.exe to generate the ClickOnce manifests and sign them. For more information, see Mage.exe (Manifest Generation and Editing Tool) and MageUI.exe (Manifest Generation and Editing Tool, Graphical Client).
Manually publish (copy) the files to your deployment source location (Web server, UNC share, or CD-ROM).

How to version control the build tools and libraries?

What are the recommendations for including your compiler, libraries, and other tools in your source control system itself?
In the past, I've run into issues where, although we had all the source code, building an old version of the product was an exercise in scurrying around trying to get the exact correct configuration of Visual Studio, InstallShield and other tools (including the correct patch version) used to build the product. On my next project, I'd like to avoid this by checking these build tools into source control, and then build using them. This would also simplify things in terms of setting up a new build machine -- 1) install our source control tool, 2) point at the right branch, and 3) build -- that's it.
Options I've considered include:
Copying the install CD ISO to source control - although this provides the backup we need if we have to go back to an older version, it isn't a good option for "live" use (each build would need to start with an install step, which could easily turn a 1 hour build into 3 hours).
Installing the software to source control. ClearCase maps your branch to a drive letter; we could install the software under this drive. This doesn't take into account non-file part of installing your tools, like registry settings.
Installing all the software and setting up the build process inside a virtual machine, storing the virtual machine in source control, and figuring out how to get the VM to do a build on boot. While we capture the state of the "build machine" with ease, we get the overhead of a VM, and it doesn't help with the "make the same tools available to developers issue."
It seems such a basic idea of configuration management, but I've been unable to track down any resources for how to do this. What are the suggestions?
I think the VM is your best solution. We always used dedicated build machines to get consistency. In the old COM DLL Hell days, there were dependencies on (COMCAT.DLL, anyone) on non-development software installed (Office). Your first two options don't solve anything that has shared COM components. If you don't have any shared components issue, maybe they will work.
There is no reason the developers couldn't take a copy of the same VM to be able to debug in a clean environment. Your issues would be more complex if there are a lot of physical layers in your architecture, like mail server, database server, etc.
This is something that is very specific to your environment. That's why you won't see a guide to handle all situations. All the different shops I've worked for have handled this differently. I can only give you my opinion on what I think has worked best for me.
Put everything needed to build the
application on a new workstation
under source control.
Keep large
applications out of source control,
stuff like IDEs, SDKs, and database
engines. Keep these in a directory as ISO files.
Maintain a text document, with the source code, that has a list of the ISO files that will be needed to build the app.
I would definitely consider the legal/licensing issues surrounding the idea. Would it be permissible according to the various licenses of your toolchain?
Have you considered ghosting a fresh development machine that is able to build the release, if you don't like the idea of a VM image? Of course, keeping that ghosted image running as hardware changes might be more trouble than it's worth...
Just a note on the versionning of libraries in your version control system:
it is a good solution but it implies packaging (i.e. reducing the number of files of that library to a minimum)
it does not solves the 'configuration aspect' (that is "what specific set of libraries does my '3.2' projects need ?").
Do not forget that set will evolves with each new version of your project. UCM and its 'composite baseline' might give the beginning of an answer for that.
The packaging aspect (minimum number of files) is important because:
you do not want to access your libraries through the network (like though dynamic view), because the compilation times are much longer than when you use local accessed library files.
you do want to get those library on your disk, meaning snapshot view, meaning downloading those files... and this is where you might appreciate the packaging of your libraries: the less files you have to download, the better you are ;)
My organisation has a "read-only" filesystem, where everything is put into releases and versions. Releaselinks (essentially symlinks) point to the version being used by your project. When a new version comes along it is just added to the filesystem and you can swing your symlink to it. There is full audit history of the symlinks, and you can create new symlinks for different versions.
This approach works great on Linux, but it doesn't work so well for Windows apps that tend to like to use things local to the machine such as the registry to store things like configuration.
Are you using a continuous integration (CI) tool like NAnt to do your builds?
As a .Net example, you can specify specific frameworks for each build.
Perhaps the popular CI tool for whatever you're developing in has options that will allow you to avoid storing several IDEs in your version control system.
In many cases, you can force your build to use compilers and libraries checked into your source control rather than relying on global machine settings that won't be repeatable in the future. For example, with the C# compiler, you can use the /nostdlib switch and manually /reference all libraries to point to versions checked in to source control. And of course check the compilers themselves into source control as well.
Following up on my own question, I came across this posting referenced in the answer to another question. Although more of a discussion of the issue than an aswer, it does mention the VM idea.
As for "figuring out how to build on boot": I've developed using a build farm system custom-created very quickly by one sysadmin and one developer. Build slaves query a taskmaster for suitable queued build requests. It's pretty nice.
A request is 'suitable' for a slave if its toolchain requirements match the toolchain versions on the slave - including what OS, since the product is multi-platform and a build can include automated tests. Normally this is "the current state of the art", but doesn't have to be.
When a slave is ready to build, it just starts polling the taskmaster, telling it what it's got installed. It doesn't have to know in advance what it's expected to build. It fetches a build request, which tells it to check certain tags out of SVN, then run a script from one of those tags to take it from there. Developers don't have to know how many build slaves are available, what they're called, or whether they're busy, just how to add a request to the build queue. The build queue itself is a fairly simple web app. All very modular.
Slaves needn't be VMs, but usually are. The number of slaves (and the physical machines they're running on) can be scaled to satisfy demand. Slaves can obviously be added to the system any time, or nuked if the toolchain crashes. That'ss actually the main point of this scheme, rather than your problem with archiving the state of the toolchain, but I think it's applicable.
Depending how often you need an old toolchain, you might want the build queue to be capable of starting VMs as needed, since otherwise someone who wants to recreate an old build has to also arrange for a suitable slave to appear. Not that this is necessarily difficult - it might just be a question of starting the right VM on a machine of their choosing.