How to make webpack generate consistent hashes across multiple builds - deployment

Background
In a Webpack configuration, you can specify the naming convention for emitted files as in [name]-[hash].js. I use this in combination with the html-webpack-plugin to generate .html.erb partials for use in a Rails app to include correct assets on deployment. Every Webpack build produces a unique fingerprint in filenames, which works great ... except for when you scale your app to multiple servers, where Webpack is part of the build process (a fresh new build for each server). Rails does a similar fingerprinting of precompiled assets.
github.css
** becomes **
github-448c90f2e2f181cd43b943786ee6f.css
Problem
Because the app is scaled to multiple servers behind a load balancer (using Elastic Beanstalk), the builds must be exactly the same on each deploy. As Webpack generates a unique hash per build, we get 404s on page loads, as the generated assets are not in sync.
Question
Has anyone figured out how to get the same hash across multiple builds? Possibly based on the git commit hash? That's what I'm thinking, but lots of searching has yielded no results. Not above building it myself.

I had the same problem as Kyle: Using Elastic Beanstalk with multiple servers, each server using Webpack generates a different hash.
First, I tried using [contenthash]. I thought this would work because, unlike [hash], it is based on the content of a file. It didn't work. My suspicion is that each server is using a different salt.
I think you could resolve this by specifying the salt with output.hashSalt, however I have not tested this, as I have since eliminated the need to use a hash in the filename.

The hashes are deterministic and as long as the content of the included files is the same, the hash will be the same as well. This also includes dependencies, so it's important to have the exact same dependencies.
Yarn uses a yarn.lock file to guarantee that the dependencies installed are identical on every install, this makes it very simple to have the exact same build every time on every machine. With npm you can use npm shrinkwrap to lock down the versions of the dependencies, but this is usually quite tedious to manage (one of the reasons Yarn was created and why it uses a lockfile).
You might also want to read Guides - Caching of the offical docs.

Related

Reusing swift package binaries across project configurations (to save disk space)?

I have an iOS project with multiple (around 10) configurations - apart from Debug and Release, there are more configurations which only differ in adding more compilation conditions (like 'simulate free user', 'simulate paid user', 'update more often').
Unfortunately, this causes the swift packages to be rebuilt for each configuration, and as eg. Realm's derived data are about 3 GB, this takes up a lot of space (and build time when rebuilding a configuration I have't needed for a while).
Is there a way to tell Xcode to reuse packages for DEBUG configuration for all other configurations containing DEBUG (eg. DEBUG FASTUPDATES)?
I think swift packages should not be getting my compilation conditions anyway, is that right?
Well... it's hard for Xcode to detect that two build configurations, with different build settings, would result in the exact same binaries.
Having different user-defined settings values, as you have, can easily lead to different builds in any of the packages, and Xcode can't know that unless it goes through every package and checks if the diff between the build settings results if a diff between the compiled binaries, which would be even slower than simply rebuilding the files.
Cocoapods had support for this, i.e. you could specify which build configuration from the Pods project matched the one from your project. You could try doing a similar setup, however, this would require a non-trivial amount of manual work.
The other alternative would be to extract those user-defined build settings into other forms: user defaults, settings pane, etc. But this also depends on the architecture of your project, and can also be time-consuming/risky.

How to create VCS-like conflict-merging file?

I'm trying to generate "unresolved-conflict"-like files with no luck.
I checked diff manpage and googled about diff, merging etc... but I only found information about how to handle these files, but not about how to actually generate them.
To be clear, what I am trying to do is, having two similar files, generate single automatically merged one similar to that most VCS systems like Git or Subversion generate over files in "conflict" status.
The main goal is be able to rapidly edit it to manually resolve all differences just as I do in Git or Subversion but without having them in any VCS system.
I "almost" successfully generated full diffs with diff -C 1000000command (because I won't have too large files that context limit is pretty acceptable).
...but resulting file comes with ALL rows modified. That is: prepended by "-" or "+" (depending of if it comes from first or second file) or " " (space) for common rows.
What I would obtain is an "almost unchanged" file with sections like following example emphasizing differences:
<<<<<<<< File1
Section from File1
Foo
========
Section from File2
Bar
>>>>>>>> File2
EDIT:
Answering #s.m. comment, I explain here what is my exact goal (because it is too long to explain in a comment):
I'm working on a server to allocate multiple PostgreSQL clusters acting as hot-stanby of distinct masters.
I already successfully implemented binary full/incremental backups (bacula) over production servers and also have a helper script to configure hot-standby servers.
But nowadays we have to setup (and mantain -and ideally periodically check-) all of them one by one.
To make it simpler, we are planning to create single (or possibly multiple) "Super"-hot-standby server(s) containing multiple clusters replicating different master servers.
My goal is to have a single script to create new standby cluster easily without too complicated tunning and not having to bother about backup setup (because all clusters will be backed up at once).
I almost successfully implemented that script: It creates a new cluster in a free port, adjust needed configuration parameters and put it in sync with master.
These adjustmens are made over "default" configuration files but some masters may have special configuration parameters (specially memory adjustments) that must be replicated in standby because, otherwise, it could be unable to replicate some operations of the master). And there is too the pg_haba.conf which defines which users/servers are allowed to connect to, which we also want to replicate on standby (for an eventual failover).
So, to make it easier (and less error prone) to merge both configuration files, I implemented a bash function to retrieve configuration files from masters and, now, my goal is to merge it with forementioned "default-tuned" one.
This way, adding new standby would be as easy as executing our script providing master's network name and reviewing automatically merged files to manually solve the few differences encountered in merge.
EDIT 2:
To be clear, what I were trying to do in preference order is:
Approach it by just using GNU Diff (like #s.m. pointed in his comment) even by using complex arguments or piping to external tools usually available on most unix* systems so I can wrap it in a bash function and use it in my script without no dependencies.
Use some existing tool (but not reinvent the wheel).
Implement my own tool and use it.
Without better solution, I finally tried to implement my own tool (which I called 'humandiff') to approach it.
I published it in Github and uploaded as npm package so I can now install it from npm in producion servers.
Even thought it needs a little setup to be installed. That is:
Install NodeJS and NPM (sudo apt-get install nodejs-legacy npm in debian-like systems).
Install humandiff itself (sudo npm install -g humandiff).
Usage and output examples can be found in README file so I do'nt extend myself anymore.
I post this answer just in case someone happens to have the same problem but, anyway, better solutions would be welcomed too.
Edit: I missed to say, even it's pretty obvious, that in fact I didn't implement any diff algorithm at all. I just noticed that having position and offset metatata provided by GNU diff and one of the original files is possible to construct the other or that merged file a were searching for so I simply implemented a wrapper to de so. But, instead of calling GNU Diff binary, I found an also named "diff" module in the NPM repository that served to me for the same task.

Flyway Oracle Deployment

I just started working on a new project. We are building a new application from scratch. Team started with a brand new schema. I wanted to automate the database build process, so I started looking for the options. Flyway seems to be a good one. I have been playing around a bit and found some limitations of the tool. Perhaps, someone will be able to help.
We have the following directory structure for SQL files:
SQL
-- DDL
-- DML
-- PACKAGES
We are doing agile development, so file names are based on the sprint number. The file naming convention we are using is:
Sprint#_script#_userstory#_description
For example:
S1_01_US123_CreateNewTable.sql
S1_02_US123_AddConstraint.sql
Next sprint:
S2_01_US456_AddColumn.sql
And so on...
I setup the JDBC parameter and I am able to connect. I tested basic things like: clean, repair, info and migrate with couple of test scripts and that worked like a charm. I started to run into issues when I tried deploying all the scripts. Issues like:
- It didn't like single underscore.
- It didn't like the file names starting with S1_01_*, rest of the file name is different and they are in different folders.
I have the following questions:
Can I build using Flyway without having to rename the files?
How can I get it to deploy in this order:
DDLs
DMLs
Packages (everytime I deploy). And we have a separate header and body files, so deploy header first as well.
Can I change the structure of schema_version table?
Can I do selective clean? Like flag some of the objects to not to be dropped?
My main concern is running DDLs before everything else. If I can accomplish that, then I can start using Flyway and learn as I go.
Thanks in advance.
Harbinder
Can I build using Flyway without having to rename the files?
Maybe. Experiment with the flyway.sqlMigrationSeparator property. Try "_US" which will break after the script number. You'll also need to set flyway.sqlMigrationPrefix=S.
How can I get it to deploy in this order: DDLs, DMLs, Packages (everytime I deploy). And we have a separate header and body files, so deploy header first as well.
Specify multiple locations (separated with comma) and ensure the version numbering ordering makes sense as if these files were all in the same directory. If running from the command line, turn on debug with -X to see how flyway collects the migrations.
Additionally, if possible, you should consider renaming your packages as a Repeatable migration (default:R) so that you just need to change the contents of the file for flyway to pick it up.
Can I change the structure of schema_version table?
No. This is managed by flyway.
Can I do selective clean? Like flag some of the objects to not to be dropped?
No. In this situation it might be best to set flyway.cleanDisabled=true to stop accidental mistakes. There are callbacks before and after clean if you wish to do extra cleaning but I don't think you can restrict clean itself without delving into the code.
Good luck!

Using ClickOnce for multiple deployment configurations

I have a ClickOnce deployment that has different web service endpoints and strings that need to be changed in Settings.Settings. Right now I am only having to deal with on localized development version being done in house and one version that i push out to the customer for their UAT. Now i need 4 versions of this application. in house dev and testing, customer testing and production. I also need these 4 deployments to be able to be installed along side each other. I have discovered that i can change the name (i.e. APP -- INTERNAL -- TEST, APP -- INTERNAL -- DEV, APP -- CUST -- TEST, APP -- CUST -- PROD) and that will allow them all to be installed alongside each other. But, having to remember every place a string needs changed in the various settings.setting of each build, swapping the end points, changing the application names, changing the certificate, changing the deploy addreess and the url for each different build is time consuming and cumbersome. Is there a way to just say "Publish internal test build" and have it do the right thing? I was going to just write various mage scripts but I dont thing that gets me around having to mess with the settings.settings stuff. i didnt write this application nor maintain it but I suppose i could go in and use some sort of conditional logic, but the connections strings for instance are wired to reports and table adapter etc... P.S. I hate ClickOnce
Ok, for a useful answer and not a critique of my writing style. mage.exe is severly lacking in options on what it can an cannot do, it is also poorly documented and does not work as advertised. In order to accomplish what I wanted, I had to download sed for windows and write .bat files to manually rename files to .deploy. I used sed to edit the manifest files and flip options on and off and keep track of the different deployments. So in short write a batch file using mage.exe and sed and have a very good understanding of the contents of a manifest file. Feel free to contact me and I can send scripts that will automate multiple ClickOnce deployments, add the .deploy extension, require a specific version number before start up etc... none of these are possible using the tools MSFT provides.

Storing third-party framework/middleware into source control that needs to alter your compiler/IDE

I know there are posts that ask how one stores third-party libraries into source control (such as this and this). While those are great answers, I still can't find the answer to this:
How do you store third-party middleware/frameworks binaries that need to alter your compiler / IDE for the library to work properly? Note: for my needs, I don't need to store the middleware source, I only store header files / lib / JAR ..so that it's ready to be linked.
Typically, you simply link libraries to your app, and you are good. But what about middleware / frameworks that need more?
Specific examples:
Qt moc pre-processor.
ZeroC Ice Slice (ice) compiler (similar to CORBA IDL preprocessor).
Basically these frameworks/middleware need to generate their own code before your application can link to it.
From the point of view of the developer, ideally he wants to just checkout, and everything should be ready to go. But then my IDE/compiler will not be setup properly yet, so the compilation will fail..
What do you think?
Backup everything including the setup of the IDE, operating system, etc. This is what i do
1) Store all 3rd party libraries in source control. I have a branch for all the libraries.
2) Backup the entire tool chain which was used to build. This includes every tool. Each tool is installed into the same directory on each developers computer, so this makes it simple to setup a developers machine remotely.
3) This is the most hardcore, but prepare 1 perfect developer IDE setup which is clean, then make a VMWare / VirtualPC image out of it. This will be useful when you cant seem to get the installers to work in future.
I learned this lesson the painful way because I often have to wade through visual studio 6 code which don't build properly.
I think that a better solution is to make sure that the build is self-contained and downloads all necessary software for itself unless you tell it otherwise. This is the way maven works, and it is really handy. The downside is that it sometimes needs to download a application server or similar, which is highly unpractical, but at least the build succeeds and it becomes the new developers responsibility to improve the build if needed.
This does of course not work great if your software needs attended installs, but I would try to avoid any such dependencies in any case. You can add alternative routes (e.g the ant script compiles the code if eclipse hasn't done it yet). If this is not feasible, an alternative option is to fail with a clear indication of what went wrong (e.g 'CORBA_COMPILER_HOME' not set, please set and try again').
All that said, the most complete solution is of course to ship everything with your app (i.e OS, IDE, the works), but I doubt that that is applicable in the general case, how would you feel about that type of requirements to build a software product? It also limits people who want to adapt your software to new platforms.
What about adding 1 step.
A nant script which is started with a bat file. The developer would only have to execute one .bat file, the bat file could start nant, and the Nant script could be made to do anything you need.
This is actually a pretty subtle question. You're talking about how to manage features of the environment which are necessary in order to allow your build to proceed. In this case it's the top level of your code toolchain, but the problem can be generalised to include the entire toolchain, and even key aspects of the operating system.
In my place of work, we have various requirements of the underlying operating system before our code will successfully run. This includes machine-specific configurations as well as ensuring correct versions of system libraries and language runtimes are present. We've dealt with this by maintaining a standard generic build machine image which contains the toolchain requirements we need. We can push this out to a virgin machine and get a basic environment that contains the complete toolchain and any auxiliary programs.
We then use fsvs to version control any additional configuration, which can be layered on to specific groups of machines as needed.
Finally, we use custom scripts hooked in to our CI server (we use Hudson) to perform any pre-processing steps required for specific projects.
The main advantages for us of this approach is:
We can build and deploy developer and production machines very easily (and have IT handle this side of the problem).
We can easily replace failed machines.
We have a known environment for testing (we install everything to a simulated 'production server' before going live).
We (the software team) version control critical configuration details and any explicit pre-processing steps.
I would outsource the task of building the midleware to a specialized build server and only include the binary output as regular 3rd party dependencies under source control.
If this strategy can be successfully applied depends on whether all developers need to be able to change midleware code and recompile it frequently. But this issue could also be solved via a Continous Integration Server like Teamcity that allows to create private builds.
Your build process would look like the following:
Middleware repo containing middleware code
Build server, building middleware
Push middleware build output to project repository as 3rd party references
Update: This doesn't really answer how to modify the IDE. It's just a sort-of Maven replacement thingy for C++/Python/Java. You shouldn't need to modify the IDE to build stuff, if so, you need a different IDE or a system that generates/modifies IDE files for you. (See CMake for a cross-platform c/c++ project file generator.)
I've written a system (first in Ant/Beanshell at two different places, then rewrote it in Python at my current job) where third-partys are compiled separately (by someone), stored and shared via HTTP.
Somewhat hurried description follows:
Upon start, the build system looks through all modules in repo, executes each module's setup target, which downloads the specific version of a third-party lib or app that the current code revision uses. These are then unzipped, PATH/INCLUDE etc are added to (or, for small libs, copy them to a single directory for the current repo) and then launches Visual Studio with /useenv.
Each module's file check for stuff that it needs, and if it needs installing and licensing, such as Visual Studio, Matlab or Maya, that must be on the local computer. If that's not there, the cmd-file will fail with a nice error message. This way, you can also check that the correct version is in there
So there are a number of directories on the local disk involved. %work% needs to be set using an global environment variable, preferrable on a different disk than system or source-checkout, at least if doing heavy C++.
%work% <- local store for all temp files, unzip, and for each working copy's temp files
%work%/_cache <- downloaded zips (2 gb)
%work%/_local <- local zips (for development or retrieved in other manners while travvelling)
%work%/_unzip <- unzips of files in _cache (10 gb)
%work%&_content <- textures/3d models and other big files (syncronized manually, this is 5 gb today, not suitable for VC either)
%work%/D_trunk/ <- store for working copy checked out to d:/trunk
%work%/E_branches/v2 <- store for working copy checked out to e:/branches/v2
So, if trunk uses Boost 1.37 and branches/v2 uses 1.39, both boost-1.39 and boost-1.37 reside in /_cache/ (as zips) and /_unzip/ (as raw files).
When starting visual studio using bat files from d:/trunk/BuildSystem/Visual Studio.cmd, INCLUDE points to /_unzip/boost-1.37, while if runnig e:/branches/v2/BuildSystem/Visual Studio.cmd, INCLUDE points to /_unzip/boost-1.39.
In the repo, only a small set of bootstrap binaries need to be stored (i.e. wget and 7z).
We currently download about 2 gb of packed data, which is unzipped to 10 gb (pdb files are huge!), so keeping this out of source control is essential. Having this system allows us to keep the repo size small enough to use DVCS such as Mercurial (or Git) instead of SVN, which is very nice. (I'm thinking of using Mercurials bigfiles extension or file sharing instead of a separately http-served directory.)
It work flawlessly. Developers need only to check out, set an enviroment variable for their local cache, then run Visual Studio via a specific batch-file in the repo. No unzipping or compiling or stuff. A new developer can set up his computer in no time. (Installing Visual Studio takes the order of a magnitude more time.)
First time on a new computer takes some time, but then it's fast, only a few seconds. Downloads/unzips are shared on the local computer, do checking out additional branches/versions does not occupy more space. Working offline is also possible, you just need to get the zip files manually if new ones have been uploaded. (This mechanism is essential to test new versions/compilations of third-party libraries.)
The basics are in a repo on bitbucket but it needs more work before it's ready for the public. Apart from doc and polish, I plan to:
extend it to use cmake instead of raw
vcproj-files, to make it more
cross-platform.
script the entire
process from checkout/download of
third-party packages to building and
zipping them (including storing the
download in a local repo) ... currently that's on my dev computer. Not good. Will fix. :)
As for moc, we use Qt's Visual Studio add-in, which stores this in the .vcproj files. Works well. I do think that CMake is one of the best answers for this though