jib - customized entrypoint can only remove classes at runtime, but will fail 3pp vulnerability check before deployment - jib

I am using Jib to pull a base image, add my wrapper java code to it, and build my image on top of that. Due to the widely known log4j CVE in December 2021, we are looking for a way to remove the vulnerable classes. (Now more CVEs are found in 2022, one of them has a score of 10.0, the highest possible. See https://www.cvedetails.com/vulnerability-list/vendor_id-45/product_id-37215/Apache-Log4j.html)
The base image is near EOL, so the provider answered that they would not release a new version; besides, log4j 1.x also reached EOL long before. But the current situation is that we have no plan of upgrading the base image to next version, so removing the classes seem to be the only way now.
The base image will use /opt/amq/bin/launch.sh as entrypoint. And I have found that I can use customized entrypoint to run a script before that, which removes the classes. Like <entrypoint>/opt/amq/bin/my_script.sh</entrypoint>, and in that I have run_fix.sh && /opt/amq/bin/launch.sh.
Then I realized that even this would work by mitigating the risk when the application is actually running, the vulnerability scan(part of security process) will still raise alarms while examining the image binary, as this is a static process done before the image is uploaded to the docker registry for production, way before actually running it. They can only be removed at the moment when the application runs, aka at runtime.
Can jib pre-process the base image while doing Maven build(mvn clean install -Pdocker-build) instead of only allowing it at runtime? According to what I have read, I understand it's a big NO, and there's no plugin for it yet.

By the design of container images, it is impossible for anyone or any tool to physically remove files from an already existing container image. Images are immutable. The best you can try is "mark deletion" with some special "whiteout" file (.wh.xyz), which a container runtime will hide the target files at runtime.
However, I am not sure if your vulnerability scanner will take the effect of whiteout files into account during scanning. Hopefully it does. If it doesn't, the only option I can think of is to re-create your own base image.
Take a look at this Stack Overflow answer for more details.

Related

Patch a plugin with a single class?

This is my situation: We have a third party feature in our Eclipse environment. The feature contains several plugins. The plugin contains a bunch of classes. One of the classes contains a bug.
We have been able to find a solution to the bug, so we have a working version of the class with the bug.
Unfortunately this plugin is covered by a 55 page long EULA (by IBM) so I think it's pretty safe to assume that decompiling the jar, exchanging class files, recompiling and distribute is legally out of the question. I'm no legal expert, but I'd guess we cannot tamper with the jar files in any way.
So this means I have a single class file that I want to be loaded instead of a class in a plugin, is this at all possible?
This page suggests using fragments, but this requires modifying the manifest in the plugin.
This question has the same problem as me, but in that case there is access to the source code and he is able to build a plugin.
This blogpost shows how use feature patches, but they require that I'm able to build my own plugin, which I cant since I have just the one class.
I would not try using a fragment. In my experience, the cleanest thing to do would be to use a feature patch. I have successfully used feature patches to do exactly what you are looking to do. It's not simple, but it is robust. You need to do the following.
create a plugin that encapsulates your single class file
create a feature patch that includes your new plugin and that patches the feature that you are targeting.
export your feature patch and create the p2 metadata (to create an update site).
Install into your Eclipse using the install manager
Rejoice!
(optional) Feature patches by default only target a single version of the target feature. So, if the target feature bumps up its version number, the feature patch will silently no longer be applied. However, it is possible to relax the version constraints on the feature patch. This process is described in detail here: http://aniefer.blogspot.com/2009/06/patching-features-part-2.html
More information:
http://aniefer.blogspot.com/2009/06/patching-features-with-p2.html
http://aniefer.blogspot.com/2009/06/patching-features-part-2.html
The benefit of using a feature patch over a fragment is that anyone can install the patch and get the patch working, but things are more difficult with a fragment in that end users must muck with manifests.
So this means I have a single class file that I want to be loaded instead of a class in a plugin, is this at all possible?
Your first sentence is the answer. You can use a fragment, but that requires modifying the manifest in the plugin. Otherwise, Eclipse would have no idea which class to load.
My suggestion is that you write IBM with all of this information, including the patch. IBM should be able to release a maintenance fix which would solve your problem.
In the mean time, you could pursue the fragment option, which would require you to unpack the jar, add your fragment, modify the manifest, and repackage the jar. Whether or not this is legal is beyond my ability to determine.

Should I put included code under SCM?

I'm developing a web app.
If I include a jQuery plugin (or the jQuery file itself), this has to be put under my static directory, which is under SCM, to be served correctly.
Should I gitignore it, or add it, even if I don't plan on modifying anything from it?
And what about binary files (graphic resources) that might come with it?
Thanks in advance for any advice!
My view is that everything you need for your application to run correctly needs to be managed. This includes third-party code.
If you don't put it under SCM, how is it going to get deployed correctly on your production systems? If you have other ways of ensuring that, that's fine, but otherwise you run the risk that successful deployment is a matter of people remembering to do all the right things, rather than some automated low-risk "push the button" procedure.
If you don't manage it under SCM or something similar, how do you ensure that the versions you develop against and test against are the same? And that they're the same as production? Debugging an issue caused by a version difference you don't notice can be horrible.
I generally add external resources to my project directly. Doing so facilitates deployment and ensures that if someone changes the version of this file in your project, you have a clear audit history of what happened in case it causes issues in the code that you've written. Developers should know not to modify these external resources.
You could use something like git submodules, I suppose, but I haven't felt that this is worth the hassle in the past.
Binary files from external sources can be checked in to the project as well, although if they're extremely large you may want to consider a different approach.
There aren't a lot of reasons not to put external resources like jQuery into your repo:
If you pull it down from jQuery every time you check out or deploy, you have less control over which version you're using. This holds true for most third-party libraries; you probably don't want to upgrade your libraries without testing with your code to see if it breaks something.
You'll always have a complete copy of your site when you check out your repository and you won't need to go seeking resources that may have become unavailable.
For small (in terms of filesize) things like jQuery and images, I'd just add them unless you're really, really concerned about space.
It depends.
These arguments relate to having a copy of the library on your system and not pulling it from it's original location.
Arguments in favour:
It will ensure that everything needed for your project can be found in one place when someone else joins your development team. I've lost count of the number of times I've had to scramble around looking for the right versions of libraries in order to be able to get something working.
If you make any modifications to the library you can make these changes to the source controlled version so when a new version comes out you use the source control's merging tools to ensure your edits don't go missing.
Arguments against:
It could mean everyone has a copy of the library locally - unless you map the 3rd party tools to a central server.
Deploying could be problematical - again unless you map the 3rd party tools to a central server and don't include them in the deploy script.

What to put under version control?

Almost any IDE creates lots of files that have nothing to do with the application being developed, they are generated and mantained by the IDE so he knows how to build the application, where the version control repository is and so on.
Should those files be kept under version control along with the files that really have something to do with the aplication (source code, application's configuration files, ...)?
The things is: on some IDEs if you create a new project and then import it into the version-control repository using the version-control client/commands embedded in the IDE, then all those files are sent to the respitory. And I'm not sure that's right: what is two different developers working on the same project want to use two different IDEs?
I want to keep this question agnostic avoiding references to any particular IDE, programming language or version control system. So this question is not exactly the same as these:
SVN and binaries - but this talks about binaries and SVN
Do you keep your build tools in version control? - but this talks about build tools (e.g. putting the jdk under version control)
What project files shouldn’t be checked into SVN - but this talks about SVN and dll's
Do you keep your project files under version control? - very similar (haven't found it before), thanks VonC
Rules of thumb:
Include everything which has an influence on the build result (compiler options, file encodings, ASCII/binary settings, etc.)
Include everything to make it possible to open the project from a clean checkout and being able to compile/run/test/debug/deploy it without any further manual intervention
Don't include files which contain absolute paths
Avoid including personal preferences (tab size, colors, window positions)
Follow the rules in this order.
[Update] There is always the question what should happen with generated code. As a rule of thumb, I always put those under version control. As always, take this rule with a grain of salt.
My reasons:
Versioning generated code seems like a waste of time. It's generated right? I can get it back at a push of a button!
Really?
If you had to bite the bullet and generate the exact same version of some previous release without fail, how much effort would it be? When generating code, you not only have to get all the input files right, you also have to turn back time for the code generator itself. Can you do that? Always? As easy as it would be to check out a certain version of the generated code if you had put it under version control?
And even if you could, could you ever be sure that didn't miss something?
So on one hand, putting generated code under version control make sense since it makes it dead easy to do what VCS are meant for: Go back in time.
Also it makes it easy to see the differences. Code generators are buggy, too. If I fix a bug and have 150'000 files generated, it helps a lot when I can compare them to the previous version to see that a) the bug is gone and b) nothing else changed unexpectedly. It's the unexpected part which you should worry about. If you don't, let me know and I'll make sure you never work for my company ever :-)
The major pain point of code generators is stability. It doesn't do when your code generator just spits out a random mess of bytes every time you run (well, unless you don't care about quality). Code generators need to be stable and deterministic. You run them twice with the same input and the output must be identical down to least significant bit.
So if you can't check in generated code because every run of the generator creates differences that aren't there, then your code generator has a bug. Fix it. Sort the code when you have to. Use hash maps that preserve order. Do everything necessary to make the output non-random. Just like you do everywhere else in your code.
Generated code that I might not put under version control would be documentation. Documentation is somewhat of a soft target. It doesn't matter as much when I regenerate the wrong version of the docs (say, it has a few typos more or less). But for releases, I might do that anyway so I can see the differences between releases. Might be useful, for example, to make sure the release notes are complete.
I also don't check in JAR files. As I do have full control over the whole build and full confidence that I can get back any version of the sources in a minute plus I know that I have everything necessary to build it without any further manual intervention, why would I need the executables for? Again, it might make sense to put them into a special release repo but then, better keep a copy of the last three years on your company's web server to download. Think: Comparing binaries is hard and doesn't tell you much.
I think it's best to put anything under version control that helps developers to get started quickly, ignoring anything that may be auto-generated by an IDE or build tools (e.g. Maven's eclipse plugin generates .project and .classpath - no need to check these in). Especially avoid files that change often, that contain nothing but user preferences, or that conflict between IDEs (e.g. another IDE that uses .project just like eclipse does).
For eclipse users, I find it especially handy to add code style (.settings/org.eclipse.jdt.core.prefs - auto formatting on save turned on) to get consistently formatted code.
Everything that can be automatically generated from the source+configuration files should not be under the version control! It only causes problems and limitations (like the one you stated - using 2 different project files by different programmers).
Its true not only for IDE "junk files" but also for intermediate files (like .pyc in python, .o in c etc).
This is where build automation and build files come in.
For example, you can still build the project (the two developers will need the same build software obviously) but they then could in turn use two different IDE's.
As for the 'junk' that gets generated, I tend to ignore most if it. I know this is meant to be language agnostic but consider Visual Studio. It generates user files (user settings etc..) this should not be under source control.
On the other hand, project files (used by the build process) most certainly should. I should add that if you are on a team and have all agreed on an IDE, then checking in IDE specific files is fine providing they are global and not user specific and/or not needed.
Those other questions do a good job of explaining what should and shouldn't be checked into source control so I wont repeat them.
In my opinion it depends on the project and environment. In a company environment where everybody is using the same IDE it can make sense to add the IDE files to the repository. While this depends a bit on the IDE, as some include absolute paths to things.
For a project which is developed in different environments it doesn't make sense and will be pain in the long run as the project files aren't maintained by all developers and make it harder to find "relevant" things.
Anything that would be devastating if it were lost, should be under version control.
In my opinion, anything needed to build the project (code, make files, media, databases with required program info, etc) should be in repositories. I realise that especially for media/database files this is contriversial, but to me if you can't branch and then hit build the source control's not doing it's job. This goes double for distributed systems with cheap branch creation/merging.
Anything else? Store it somewhere different. Developers should choose their own working environment as much as possible.
From what I have been looking at with version control, it seems that most things should go into it - e.g. source code and so on. However, the problem that many VCS's run into is when trying to handle large files, typically binaries, and at times things like audio and graphic files. Therefore, my personal way to do it is to put the source code under version control, along with general small sized graphics, and leave any binaries to other systems of management. If it is a binary that I created myself using the build system of the IDE, then that can definitily be ignored, because it is going to be regenerated every build. For dependancy libraries, well this is where dependancy package managers come in.
As for IDE generated files (I am assuming these are ones that aren't generated during the build process, such as the solution files for Visual Studio) - well, I think it would depend on whether or not you are working alone. If you are working alone, then go ahead and add them - they will allow you to revert settings in the solution or whatever you make. Same goes for other non-solution like files as well. However, if you are collaborating, then my recomendation is no - most IDE generated files tend to be, well, user specific - aka they work on your machine, but not neccesarily on others. Hence, you may be better of not including IDE generated files in that case.
tl;dr you should put most things that relate to your program into version control, excluding dependencies (things like libraries, graphics and audio should be handled by some other dependancy management system). As for things directly generated by the IDE - well, it would depend on if you are working alone or with other people.

Do you version "derived" files?

Using online interfaces to a version control system is a nice way to have a published location for the most recent versions of code. For example, I have a LaTeX package here (which is released to CTAN whenever changes are verified to actually work):
http://github.com/wspr/pstool/tree/master
The package itself is derived from a single file (in this case, pstool.tex) which, when processed, produces the documentation, the readme, the installer file, and the actual files that make up the package as it is used by LaTeX.
In order to make it easy for users who want to download this stuff, I include all of the derived files mentioned above in the repository itself as well as the master file pstool.tex. This means that I'll have double the number of changes every time I commit because the package file pstool.sty is a generated subset of the master file.
Is this a perversion of version control?
#Jon Limjap raised a good point:
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
That's really the crux of the matter in this case. Yes, released versions of the package can be obtained from elsewhere. So it does really make more sense to only version the non-generated files.
On the other hand, #Madir's comment that:
the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes
is also rather pertinent in that if a user finds a bug and I fix it immediately, they can then head over to the repository and grab the file that's necessary for them to continue working without having to run any "installation" steps.
And this, I think, is the more important use case for my particular set of projects.
We don't version files that can be automatically generated using scripts included in the repository itself. The reason for this is that after a checkout, these files can be rebuild with a single click or command. In our projects we always try to make this as easy as possible, and thus preventing the need for versioning these files.
One scenario I can imagine where this could be useful if 'tagging' specific releases of a product, for use in a production environment (or any non-development environment) where tools required for generating the output might not be available.
We also use targets in our build scripts that can create and upload archives with a released version of our products. This can be uploaded to a production server, or a HTTP server for downloading by users of your products.
I am using Tortoise SVN for small system ASP.NET development. Most code is interpreted ASPX, but there are around a dozen binary DLLs generated by a manual compile step. Whilst it doesn't make a lot of sense to have these source-code versioned in theory, it certainly makes it convenient to ensure they are correctly mirrored from the development environment onto the production system (one click). Also - in case of disaster - the rollback to the previous step is again one click in SVN.
So I bit the bullet and included them in the SVN archive - the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes.
Not necessarily, although best practices for source control advise that you do not include generated files, for obvious reasons.
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
Normally, derived files should not be stored in version control. In your case, you could build a release procedure that created a tarball that includes the derived files.
As you say, keeping the derived files in version control only increases the amount of noise you have to deal with.
In some cases we do, but it's more of a sysadmin type of use case, where the generated files (say, DNS zone files built from a script) have intrinsic interest in their own right, and the revision control is more linear audit trail than branching-and-tagging source control.

Best build process solution to manage build versions

I run a rather complex project with several independent applications. These use however a couple of shared components. So I have a source tree looking something like the below.
My Project
Application A
Shared1
Shared2
Application B
Application C
All applications have their own MSBuild script that builds the project and all the shared resources it needs. I also run these builds on a CruiseControl controlled continuous integration build server.
When the applications are deployed they are deployed on several servers to distribute load. This means that it’s extremely important to keep track of what build/revision is deployed on each of the different servers (we need to have the current version in the DLL version, for example “1.0.0.68”).
It’s equally important to be able to recreate a revision/build that been built to be able to roll back if something didn’t work out as intended (o yes, that happends ...). Today we’re using SourceSafe for source control but that possible to change if we could present good reasons for that (SS it’s actually working ok for us so far).
Another principle that we try to follow is that it’s only code that been build and tested by the integration server that we deploy further.
"CrusieControl Build Labels" solution
We had several ideas on solving the above. The first was to have the continuous integration server build and locally deploy the project and test it (it does that now). As you probably know a successful build in CruiseControl generates a build label and I guess we somehow could use that to set the DLL version of our executables (so build label 35 would create a DLL like “1.0.0.35” )? The idea was also to use this build label to label the complete source tree. Then we probably could check out by that label and recreate the build later on.
The reason for labeling the complete tree is to include not only the actual application code (that’s in one place in the source tree) but also all the shared items (that’s in different places in the tree). So a successful build of “Application A” would label to whole tree with label “ApplicationA35” for example.
There might however be an issue when trying to recreate this build and setting the DLL version before deploying as we then don’t have access to the CruiseControl generated build label anymore. If all CrusieControl build labels were unique for all the projects we could use only the number for labeling but that’s not the case (both application A and B could at the same time be on build 35) so we have to include the application name in the label. Hence SourceSafe label “Application35”. How can I then recreate build 34 and set 1.0.0.34 to the DLL version numbers once we built build 35?
"Revision number" solution
Someone told me that Subversion for example creates a revision number for the entire source tree on every check in – is this the case? Has SourceSafe something similar? If this is correct the idea is then to grab that revision number when getting latest and build on the CruiseControl server. The revision number could then be used to set the DLL version number (to for example “1.0.0.5678”). I guess we could then get this specific revision for the Subversion if needed and that then would include that application and all the shared items to be able to recreate a specific version from the past. Would that work and could this also be achived using SourceSafe?
Summarize
So the two main requirements are:
Be able to track build/revision number of the build and deployed DLL.
Be able to rebuild a past revision/build, set the old build/revision number on the executables of that build (to comply with requirement 1).
So how would you solve this? What would be your preferred approach and how would you solve it (or do you have a totally different idea?)? **Pleased give detailed answers. **
Bonus question What are the difference between a revision number and a build number and when would one really need both?
Your scheme is sound and achievable in VSS (although I would suggest you consider an alternative, VSS is really an outdated product).
For your "CI" Build - you would do the Versioning take a look at MSBuild Community Tasks Project which has a "Version" tasks. Typically you will have a "Version.txt" in your source tree and the MSBuild task will increment the "Release" number while the developers control the Major.Minor.Release.Revision numbers (that's how a client of mine wanted it). You can use revision if you prefer.
You then would have a "FileUpdate" tasks to edit the AssemblyInfo.cs file with that version, and your EXE's and "DLL's" will have the desired version.
Finally the VSSLabel task will label all your files appropriately.
For your "Rebuild" Build - you would modify your "Get" to get files from that Label, obviously not execute the "Version" task (as you are SELECTING a version to build) and then the FileUpdate tasks would use that version number.
Bonus question:
These are all "how you want to use them" - I would use build number for, well the build number, that is what I'd increment. If you are using CI you'll have very many builds - the vast majority with no intention of ever deploying anywhere.
The major and minor are self evident - but revision I've always used for a "Hotfix" indicator. I intend to have a "1.3" release - which would in reality be a product with say 1.3.1234.0 version. While working on 1.4 - I find a bug - and need a hot fix as 1.3.2400.1. Then when 1.4 is ready - it would be say 1.4.3500.0
I need more space than responding as comments directly allows...
Thanks! Good answer! What would be the
difference, what would be better
solving this using SubVersion for
example?Richard Hallgren (15 hours
ago)
The problems with VSS have nothing to do with this example (although the "Labeling" feature I believe is implemented inefficiently...)
Here are a few of the issues with VSS
1) Branching is basically impossible
2) Shared checkout is generally not used (I know of a few people who have had success with it)
3) performance is very poor - it is exteremly "chatty"
4) unless you have a very small repository - it is completely unreliable, to the point for most shops it's a ticking timebomb.
For 4 - the problem is that VSS is implemented by the entire repository being represented as "flat files" in the file system. When the repository gets over a certain size (I believe 4GB but I'm not confident in that figure) you get a chance for "corruption". As the size increases the chances of corruption grow until it becomes an almost certainty.
So take a look at your repository size - and if you are getting into the Gigabytes - I'd strongly recommend you begin planning on replacing VSS.
Regardless - a google of "VSS Sucks" gives 30K hits... I think if you did start using an alterantive - you will realize it's well worth the effort.
Have CC.net label the successful builds
have each project in the solution link to a common solutioninfo.cs file which contains assembly and file version attributes (remove from each projects assemblyinfo.cs)
Before the build have cruise control run an msbuild regex replace (from msbuild community tasks) to update the version information using the cc.net build label (passed in as a parameter to the msbuild task)
build the solution, run tests, fx cop etc
Optionally revert the solution info file
The result is that all assemblies in the cc.net published build have the same version numbers which conform to a label in the source code repository
UppercuT can do all of this with a custom packaging task to split the applications up. And to get the version number of the source, you might think about Subversion.
It's also insanely easy to get started.
http://code.google.com/p/uppercut/
Some good explanations here: UppercuT