I'm working on a Service Fabric application that is deployed to Azure. It currently consists of only 5 stateless services. The zipped archive weighs in at ~200MB, which is already becoming problematic.
By inspecting the contents of the archive, I can see the primary problem is that many files are required by all services. An exact duplicate of those files is therefore present in each service's folder. However, the zip compression format does not do anything clever with respect to duplicate files within the archive.
As an experiment, I wrote a little script to find all duplicate files in the deployment and delete all but one of each files. Then I tried zipping the results and it comes in at a much more practical 38MB.
I also noticed that system libraries are bundled, including:
System.Private.CoreLib.dll (12MB)
System.Private.Xml.dll (8MB)
coreclr.dll (5MB)
These are all big files, so I'd be interested to know if there was a way for me to only bundle them once. I've tried removing them altogether but then Service Fabric fails to start the application.
Can anyone offer any advice as to how I can drastically reduce my deployment package size?
NOTE: I've already read the docs on compressing packages, but I am very confused as to why their compression method would help. Indeed, I tried it and it didn't. All they do is zip each subfolder inside the primary zip, but there is no de-duplication of files involved.
There is a way to reduce the size of the package but I would say it isn't a good way or the way things should be done but still I think it can be of use in some cases.
Please note: This approach requires target machines to have all prerequisites installed (including .NET Core Runtime etc.)
When building .NET Core app there are two deployment models: self-contained and framework-dependent.
In the self-contained mode all required framework binaries are published with the application binaries while in the framework-dependent only application binaries are published.
By default if the project has runtime specified: <RuntimeIdentifier>win7-x64</RuntimeIdentifier> in .csproj then publish operation is self-contained - that is why all of your services do copy all the things.
In order to turn this off you can simply add SelfContained=false property to every service project you have.
Here is an example of new .NET Core stateless service project:
<PropertyGroup>
<TargetFramework>netcoreapp2.2</TargetFramework>
<AspNetCoreHostingModel>InProcess</AspNetCoreHostingModel>
<IsServiceFabricServiceProject>True</IsServiceFabricServiceProject>
<ServerGarbageCollection>True</ServerGarbageCollection>
<RuntimeIdentifier>win7-x64</RuntimeIdentifier>
<TargetLatestRuntimePatch>False</TargetLatestRuntimePatch>
<SelfContained>false</SelfContained>
</PropertyGroup>
I did a small test and created new Service Fabric application with five services. The uncompressed package size in Debug was around ~500 MB. After I have modified all the projects the package size dropped to ~30MB.
The application deployed worked well on the Local Cluster so it demonstrates that this concept is a working way to reduce package size.
In the end I will highlight the warning one more time:
Please note: This approach requires target machines to have all prerequisites installed (including .NET Core Runtime etc.)
You usually don't want to know which node runs which service and you want to deploy service versions independently of each other, so sharing binaries between otherwise independent services creates a very unnatural run-time dependency. I'd advise against that, except for platform binaries like AspNet and DotNet of course.
However, did you read about creating differential packages? https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-application-upgrade-advanced#upgrade-with-a-diff-package that would reduce the size of upgrade packages after the initial 200MB hit.
Here's another option:
https://devblogs.microsoft.com/dotnet/app-trimming-in-net-5/
<SelfContained>True</SelfContained>
<PublishTrimmed>True</PublishTrimmed>
From a quick test just now, trimming one app reduced the package size from ~110m MB to ~70MB (compared to ~25MB for selfcontained=false).
The trimming process took several minutes for a single application though, and the project I work on has 10-20 apps per Service Fabric project. Also I suspect that this process isn't safe when you have a heavy reliance on dependency injection model in your code.
For debug builds we use SelfContained=False though because developers will have the required runtimes on their machines. Not for release deployments though.
As a final note, since the OP mentioned file upload being a particular bottleneck:
A large proportion of the deployment time is just zipping and uploading the package
I noticed recently that we were using the deprecated Publish Build Artifacts task when uploading artifacts during our build pipeline. It was taking 20 minutes to upload 2GB of files. I switched over the suggested Publish Pipeline Artifact task and it took our publish step down to 10-20 seconds. From what I can tell, it's using all kinds of tricks under the hood for this newer task to speed up uploads (and downloads) including file deduplication. I suspect that zipping up build artifacts yourself at that point would actually hurt your upload times.
In setting up Sitecore 7.2 at my organization for our public facing .com I have run into a hiccup while trying to implement proper CI, Release Management, and Deployment Management. I am able to, using MSBuild, compile my Sitecore MVC code, compile .update packages from TDS, and package each of these in .nupkg files for Octopus Deploy. What I am running in to is that once I have deployed the MVC code I must also deploy the Sitecore Structure/Content which requires me to install .update packages. I have tried the solution provided at https://github.com/adoprog/Sitecore-Deployment-Helpers but for a fairly lightweight site this is timing out around 20 minutes within Octopus Deploy for only my System package, let alone having not touched Structure or Content. I am looking for a way, preferably through PowerShell (not strictly speaking, the Sitecore PowerShell Extensions built into the sitecore web interface after installing that package). Using the SPE would be acceptable if, and only if, I can use SPE's Cmdlets from Octopus Deploy's PowerShell workflow.
Please Advise.
Jason Bert has a great series of blogs on using Octopus Deploy with TeamCity and TDS for deploying to Sitecore instances:
http://www.jasonbert.com/2013/11/03/continuous-integration-deployment-with-sitecore/
You can also use TDS itself to deploy the items in the solution, but this uses direct calls to a webservice on the target Sitecore instance which may not meet with your requirements.
Also, are you deploying the entire System tree? 20 minutes to deploy changes made to the System tree seems unusual, unless you've made a LOT of changes in there (for example, the Dictionary). Even then, you shouldn't be source-controlling author content, only the elements crucial to the solution that are owned by development.
You can install the update package via sitecore utility at /sitecore/admin/UpdateInstallationWizard.aspx
If you experience that installing the package via this mode takes a lot of time, you might want to modify the Deployment Property Manager settings for the TDS project.
You can do this by right clicking your TDS project in Visual Studio and selecting "Deployment Property Manager".
Once the Deployment Property Manager window opens up, set the Deploy property to Once for every node which does not need to be updated. For any items which are to be updated, mark them as Always.
This will drastically save you on the time required to install the package.
I have set up a small test Nuget private repository on my machine following this guide.
Everything is working perfectly and I can publish packages, update versions, download them etc. The only problem is that the DownloadCount of my packages is always 0 regardless of how many times I download it.
I downloaded NuGet source but could not find a place where this value is updated. Moreover, nuget does not seem to use any DB technology so probably the feed is just generated on demand from the contents of the Packages folder.
Does anyone have any idea if this is a known issue or if it's a problem in my setup or if I should just add some code to the server to record downloads myself?
Thanks!
NuGet.Server based web sites are simply a front-end exposing an OData feed on top of a file share. There's no real database behind it, no indexing, no auditing, tracing, metrics or statistics, or any of that kind of stuff.
You could build it yourself, or take a look at alternatives such as MyGet, ProGet, Artifactory, etc.
I've been using Inno Setup for deploying and registering a dll, but all the setups generated with InnoSetup have a min size of 500kb while my dll is only like 40kb.
I don't want to use a packer such as UPX because I don't like the way they work.
Is there another free app to create smaller setups for deploying dlls?
Try Nullsoft Installer. Its list of features claims to create only 34KB overhead.
You can also write some WiX scripts to minimize the features included in your installer and bring down its size, while still using the MSI engine. (As a side note, you can also use the ClickThrough to have your dll automatically update)
What are the recommendations for including your compiler, libraries, and other tools in your source control system itself?
In the past, I've run into issues where, although we had all the source code, building an old version of the product was an exercise in scurrying around trying to get the exact correct configuration of Visual Studio, InstallShield and other tools (including the correct patch version) used to build the product. On my next project, I'd like to avoid this by checking these build tools into source control, and then build using them. This would also simplify things in terms of setting up a new build machine -- 1) install our source control tool, 2) point at the right branch, and 3) build -- that's it.
Options I've considered include:
Copying the install CD ISO to source control - although this provides the backup we need if we have to go back to an older version, it isn't a good option for "live" use (each build would need to start with an install step, which could easily turn a 1 hour build into 3 hours).
Installing the software to source control. ClearCase maps your branch to a drive letter; we could install the software under this drive. This doesn't take into account non-file part of installing your tools, like registry settings.
Installing all the software and setting up the build process inside a virtual machine, storing the virtual machine in source control, and figuring out how to get the VM to do a build on boot. While we capture the state of the "build machine" with ease, we get the overhead of a VM, and it doesn't help with the "make the same tools available to developers issue."
It seems such a basic idea of configuration management, but I've been unable to track down any resources for how to do this. What are the suggestions?
I think the VM is your best solution. We always used dedicated build machines to get consistency. In the old COM DLL Hell days, there were dependencies on (COMCAT.DLL, anyone) on non-development software installed (Office). Your first two options don't solve anything that has shared COM components. If you don't have any shared components issue, maybe they will work.
There is no reason the developers couldn't take a copy of the same VM to be able to debug in a clean environment. Your issues would be more complex if there are a lot of physical layers in your architecture, like mail server, database server, etc.
This is something that is very specific to your environment. That's why you won't see a guide to handle all situations. All the different shops I've worked for have handled this differently. I can only give you my opinion on what I think has worked best for me.
Put everything needed to build the
application on a new workstation
under source control.
Keep large
applications out of source control,
stuff like IDEs, SDKs, and database
engines. Keep these in a directory as ISO files.
Maintain a text document, with the source code, that has a list of the ISO files that will be needed to build the app.
I would definitely consider the legal/licensing issues surrounding the idea. Would it be permissible according to the various licenses of your toolchain?
Have you considered ghosting a fresh development machine that is able to build the release, if you don't like the idea of a VM image? Of course, keeping that ghosted image running as hardware changes might be more trouble than it's worth...
Just a note on the versionning of libraries in your version control system:
it is a good solution but it implies packaging (i.e. reducing the number of files of that library to a minimum)
it does not solves the 'configuration aspect' (that is "what specific set of libraries does my '3.2' projects need ?").
Do not forget that set will evolves with each new version of your project. UCM and its 'composite baseline' might give the beginning of an answer for that.
The packaging aspect (minimum number of files) is important because:
you do not want to access your libraries through the network (like though dynamic view), because the compilation times are much longer than when you use local accessed library files.
you do want to get those library on your disk, meaning snapshot view, meaning downloading those files... and this is where you might appreciate the packaging of your libraries: the less files you have to download, the better you are ;)
My organisation has a "read-only" filesystem, where everything is put into releases and versions. Releaselinks (essentially symlinks) point to the version being used by your project. When a new version comes along it is just added to the filesystem and you can swing your symlink to it. There is full audit history of the symlinks, and you can create new symlinks for different versions.
This approach works great on Linux, but it doesn't work so well for Windows apps that tend to like to use things local to the machine such as the registry to store things like configuration.
Are you using a continuous integration (CI) tool like NAnt to do your builds?
As a .Net example, you can specify specific frameworks for each build.
Perhaps the popular CI tool for whatever you're developing in has options that will allow you to avoid storing several IDEs in your version control system.
In many cases, you can force your build to use compilers and libraries checked into your source control rather than relying on global machine settings that won't be repeatable in the future. For example, with the C# compiler, you can use the /nostdlib switch and manually /reference all libraries to point to versions checked in to source control. And of course check the compilers themselves into source control as well.
Following up on my own question, I came across this posting referenced in the answer to another question. Although more of a discussion of the issue than an aswer, it does mention the VM idea.
As for "figuring out how to build on boot": I've developed using a build farm system custom-created very quickly by one sysadmin and one developer. Build slaves query a taskmaster for suitable queued build requests. It's pretty nice.
A request is 'suitable' for a slave if its toolchain requirements match the toolchain versions on the slave - including what OS, since the product is multi-platform and a build can include automated tests. Normally this is "the current state of the art", but doesn't have to be.
When a slave is ready to build, it just starts polling the taskmaster, telling it what it's got installed. It doesn't have to know in advance what it's expected to build. It fetches a build request, which tells it to check certain tags out of SVN, then run a script from one of those tags to take it from there. Developers don't have to know how many build slaves are available, what they're called, or whether they're busy, just how to add a request to the build queue. The build queue itself is a fairly simple web app. All very modular.
Slaves needn't be VMs, but usually are. The number of slaves (and the physical machines they're running on) can be scaled to satisfy demand. Slaves can obviously be added to the system any time, or nuked if the toolchain crashes. That'ss actually the main point of this scheme, rather than your problem with archiving the state of the toolchain, but I think it's applicable.
Depending how often you need an old toolchain, you might want the build queue to be capable of starting VMs as needed, since otherwise someone who wants to recreate an old build has to also arrange for a suitable slave to appear. Not that this is necessarily difficult - it might just be a question of starting the right VM on a machine of their choosing.