Greetings,
This question is about vscode itself, the Microsoft build. I would
like to know all the dependencies it does have, recursively until I
hit the transitive closure. I also need to know the version/license of
each one of these dependencies. By dependencies I mean any piece of
software it does require to run, whether is something you install at
the OS level with, a bundled component/library, a nodejs/yarn package,
a vscode internal extension, etc. Essentially, I want to know all the
dependencies regardless of the "layer" at which these dependencies get
used.
I guess we can start with the OSS dependencies, given the commercial
build will have a superset of those. In that regard, I see at least 4
files on the github repo which look like partial views of this total
set of dependencies:
https://github.com/microsoft/vscode/blob/main/ThirdPartyNotices.txt
https://github.com/microsoft/vscode/blob/main/package.json
https://github.com/microsoft/vscode/blob/main/cglicenses.json
https://github.com/microsoft/vscode/blob/main/cgmanifest.json
The first question I would have is whether any of the files above is
already recursive, meaning it also includes the dependencies of the
dependencies of the dependencies ... etc. The second question would be
how to assemble the total list of dependencies/version/licenses I am
craving for.
Thank you in advance and kudos for the great tool.
Related
This may or may not be a duplicate of How to use chisel module as package.
Again, for scala/sbt/maven experts this may be obvious, for old school ASIC designers it's not:
I have project PROJ with standard directory structure PROJ/src/main/scala/myproj.scala. How do I use some chisel code from some external library LIB? Eg. from /usr/libs/LIB/src/main/scala/{stuff}.scala?
(Not a full answer, more a warning than an answer)
"Search path" sounds a bit concerning, so I'd just want to make sure that you don't expect something like a C/C++ build that's searching for some files on some file systems.
Before proceeding, it might be helpful to ponder on the thought that the entire scala / java / kotlin / maven / sbt / gradle / ... ecosystem is "internet-centric", not "file-system-centric". It essentially assumes that all packages are available under a globally unique identifier in some online repository (even when they are not, local installation will make them look as if they came from a public repository, see below). Local file systems are used only as temporary local caches (and it is assumed that you as a human will not look into those caches without a good reason). In general, it tries really hard not to depend on the machine on which it's built: everything it needs is specified in the build.sbt, presence or absence of any files in /usr/lib is irrelevant.
If you want to use a package, you have to declare it as a dependency in your build config (SBT documentation definitely does tell how to do this; Maven Central even provides a helpful little textfield from which you can copy the correctly formatted pieces of the config).
If your package does not come from any public repository, you'll first have to install it locally (SBT documentation must also tell how to "install a package locally", it's a short SBT command that will copy the package into the local cache in your file system, so that other projects that depend on it can pretend as if it came from some repository).
If you have just the src/foo/bar/baz/stuff.scala-files, but no build.sbt-file, you'll probably first want to convert it into a proper sbt-project, then build it, then install it locally (you need a JAR, adding .scala files to the CLASSPATH won't buy you anything, they must be compiled first; Doing anything to the CLASSPATH manually is essentially hopeless anyway, the only way to do it is to let SBT take care of everything).
We are building all the solutions to a shared bin directory. Having different projects reference different versions of the same dependency is not healthy for our build.
So, we consolidated the dependencies - great. But now the versions start to drift again. We do not want to consolidate them manually every now and then - we want to prevent the drift completely.
Why we do not want to use Paket? The main reason is that it seems we would lose the ability to migrate the NuGet package dependencies to the new PackageReference items in the projects. So, currently we have package.config files, but we plan to replace them with the respective PackageReferences. That means we will use internal NuGet support by msbuild, which seems to leave no place for Paket.
Now, I assume we are not unique in this world and others have the same problem as we are. How do you solve it?
EDIT 1
We have our internal NuGet repo, but we use it for dependencies which do not have organic representation in Nuget.org and for sharing our own internal packages.
One approach is to consume only from the internal NuGet repository. This has challenges, like:
Who uploads the dependencies there? Developers? But then how to make sure they do not upload different versions of the same dependency? Dedicated people? Then they become a bottleneck.
Small thing, but we need to block commits to the central NuGet.config
Uploading a dependency to the internal NuGet repo is not immediate. You cannot just download it from NuGet.org and upload to the internal one, because that would miss any transitive dependencies. So, a process should be built around it.
It is all possible, but I am reluctant to go down that route ... Must be a better way.
EDIT 2
While we do plan to migrate to PackageReference, it will take time. And unfortunately as long as we have Silverlight (another year, at least) a whole bunch of projects in the dedicated Silverlight solution (80+) will not be migrated to PackageReference, because by doing so it becomes impossible to debug the code with VS 2015.
Next, suppose we do migrate ALL the projects and then externalize all the PackageReference items to a single targets file imported by all the projects. This is feasible when using a shared bin directory as we plan to do. But when inspected in VS 2017 this setup communicates a wrongful picture that every single project depends on the entire set of NuGet dependencies.
I would rather avoid this.
Once you move to PackageReference, you can take advantage of MSBuild. For example, you can have a MSBuild file that contains all your dependency versions. It could be a file that you need to <Import ... /> in all your csproj files, or you could use Directory.Build.props. Finally, in each of your projects, change the version number in any <PackageReference to a MSBuild variable that uses the property you previously defined. Most of Microsoft's open source repositories use this technique, with minor variations about file names and whether it's imported automatically with Directory.Build.props, or an explicit <Import ... />.
While you can still use the Package Manager UI in Visual Studio to check for updates, you won't be able to update the package versions with it (at least, it won't preserve how and where the versions are defined). However, just make sure your MSBuild file that defines the versions is in your solution, so you can trivially open the file in Solution Explorer and then type the new version number in. Adding new package references is slightly more effort, but it's generally not done often, and it's still very easy with SDK-style projects, since Visual Studio lets you edit the csproj while the project is still loaded.
Since you didn't accept the other solution, maybe you could take a look at paket
It's a package manager for dotnet than (among other feature) holds solution wide dependency lock file. It is very customizable, and while it solves LOTS of problems, as any tool, creates some new ones. In my experience, the new ones are far less infuriating :)
I used to package my various Eclipse RCP products with PDE, for years.
With my latest upgrade attempt to Eclipse Oxygen, I got some new strange resolution errors which I could not solve, and I decided it really was time to give Tycho a try. I followed the excellent article about Tycho by Lars Vogel, and after a bit of tweaking, it worked well (and I was not stucked by the same resolution errors as in PDE! Yay!).
But indeed it was a simple test: I created a folder for my bundles, another for my features, created my poms, and so on. Now I look at the degree of automating in my PDE and find quite a huge gap.
In PDE, there is a build.properties where you give your master feature file and a map file, and the process will, seemingly:
parse the master feature
parse the features in it (recursively)
parse the plugins in them
find in the map file the plugins to be packaged (the other dependencies are supposed to be in the target platform)
download the relevant git repos
move the relevant plugins/features to the working directories
launch compile, p2 and so on
(note : the git part needs you provide the egit fetchfactory)
Now in Tycho, I have to create poms, but it is not the problem. I have to create some master poms, and for the individual plugin poms, I have either the pomless option, or the pom generator. The pom generator also seems to have the advantage of creating the parent pom which contains all the plugins as modules. So far so good.
But I have to fill the features and plugins folders, and I'm stuck here.
I do not have PSFs for my products, because I never needed it: in PDE, map + product definition does the trick.
Does it mean I will have to maintain PSFs from now on or is there another tycho solution I did not find? (Tycho doc is quite scarce, in my opinion). Maintaining PSFs seems redundant to me because I have product and map, and also because I have many products, many plugins, and many of which are common to several products.
(Indeed, a basic solution would be to take the git repositories mentioned in the map file, dump them all and launch tycho. Tycho would compile all the plugins, and then the p2 part would package only the product-relevant ones. The problem is that I have plenty of different products that rely on plenty of different repositories. And even in a given git repo, I have plugins that may or may not be relevant for a given product. Thus, I would compile hundreds of useless plugins in the process.)
My need is to copy in the tycho folders only the plugins and features which are referenced in my product and which are not already in my target platform. Generating a PSF from my product and my map would be shifting the problem.
Indeed, I can code this, and I will if needed.
But given that all this is already automated in PDE, is there at least some parts of the process that could be automated with some tycho plugins I did not uncover?
After some time of digging, here is the solution I finally chose.
In order to fetch the relevant features and plugins, I used... PDE ! I digged in PDE and found the various steps in its process. The first one is to fetch (it is an ant task named eclipse.fetch). I externalized this part, and my script launches it, then generates the master poms by scanning the fetched features names and fecthed plugins names, then adds the other tycho confugurations and then launches tycho.
In the end, granted, it is not a full tycho solution, it is a hybrid one PDE + Tycho. But it works like a charm, and the build/package process is Tycho, only the initial fetch is delegated to PDE. (anyway, PDE build/package process does not work in my case, as stated initially)
I've been using sbt-assembly to generate standalone JAR file for my scala project. However, I would like to reduce the size of my JAR file (its currently around 150MB and there's defintely room for improvement there).
I used the following command to list the contents of the JAR file that's produced:
jar tf <JAR file>
This revealed that there are lots of classes in the generated JAR file that are not used in the project. I believe these classes get included as part of third-party JARs.
Questions
(a) Is there an option that I can use to instruct sbt-assembly to generate a minimal JAR file that does not include the third-party classes that are not used in my project?
(b) I could use AssemblyStrategy to manually specify which files need to be excluded. Is this a sound strategy? I'm a bit concerned that with this approach the JAR file might end up throwing unexpected ClassNotFound exceptions.
Thanks in advance.
It's not easy to say what's used in your project and what is not. If you include a dependency into a project it might bring a few other ones in. Those child dependencies might also require their own dependencies and so on.
By default if you include some dependency in your project you intend to use it. The author of a dependency usually does the same thing. Thus, there is usually not much you can throw away, it's there for a reason. There are couple cases when this is not true:
Dependency author includes additional dependencies that will be used only in some settings, and that does not apply to your project
You are using a mega-dependency when you actually need only one of its libraries/features.
There are counter examples to this as well: Scalatest does not ship pegdown for generating html test reports because you don't need it usually. But it might be needed if you try to use -h flag to generate html.
Imagine the case when you use Apache Tika for pdf parsing. It wraps PDFBox to do the parsing. You don't need a bloat of all other libraries in that case that parse MS documents. The best thing to do is not to exclude files manually via sbt exclude or sbt-assembly rules because there is a risk you get it wrong and get run time class loading exception. Instead you need to use the right dependency like PDFBox directly. Unfortunately this is a lot of manual work in many cases to figure out all dependencies that you need, so it's your choice: easy and fat JAR, or painful and lean.
There are two ways to exclude dependencies:
Exclude transitive dependencies with exclude. See the docs here.
Don't use the top level dependency and manually add its subdependencies as you need them.
Ok, one more less fun option: use provided and make sure libraries are copied to your target environment and are on classpath. If you have many jars using the same libraries this helps to share those.
You can visualize your dependency tree with this plugin: https://github.com/jrudolph/sbt-dependency-graph. It's very helpful when trying to figure out what you are using and what you can remove. There are some tools like tattletale and loosejar that people suggest but I haven't tried them. If anyone has experience with those please share.
What might want to look at are treeshakers
For Java there's the following (I have not tried/used it):
http://proguard.sourceforge.net/
example:
For logging, my code uses log4j. but other jars my code is dependent upon, uses slf4j instead. So both jars must be in the build path. Unfortunately, its possible for my code to directly use (depend on) slf4j now, either by context-assist, or some other developers changes. I would like any use of slf4j to show up as an error, but my application (and tests) will still need it in the classpath when running.
explanation:
I'd like to find out if this is possible in eclipse. This scenario happens often for me. I'll have a large project, that uses alot of 3rd party libraries. And of course those 3rd party jars have their own dependencies as well. So I have to include all dependencies in the classpath ("build path" in eclipse) for the application and its tests to compile and run (from within eclipse).
But I don't want my code to use all of those jars, just the few direct dependencies I've decided upon myself. So if my code accidentally uses a dependency of a dependency, I want it to show up as a compilation error. Ideally, as class not found, but any error would do.
I know I can manually configure the classpath when running outside of eclipse, and even within eclipse I can modify the classpath for a specific class I'm running (in the run configurations), but thats not manageable if you run alot of individual test cases, or have alot of main() classes.
It sounds like your project has enough dependency relationships that you might consider structuring it with OSGi bundles (plug-ins). Each bundle gets its own classloader and gets to specify what bundles (and optionally what version ranges, etc.) it depends on, what packages it exports, whether it re-exports stuff from its dependencies, etc.
Eclipse itself is structured out of Eclipse plug-ins and fragments, which are just OSGi bundles with an optional tiny bit of additional Eclipse wiring (plugin.xml, which is used to declare Eclipse "extension points" and "extensions") attached. Eclipse thus has fairly good tooling for creating and managing bundles built-in (via the Plug-in Development Environment). Much of what you find out there may lead you to conflate "OSGi bundle" with "plug-in that extends the Eclipse IDE", but the two concepts are quite separable.
The Eclipse tooling does distinguish rather clearly (and sometimes annoyingly, but in the "helpful medicine" way) between the bundles in your build environment vs. the bundles that a particular run configuration includes.
After a few years of living in OSGi land, the default Java "flat classpath" feels weird and even kind of broken to me, largely because (as you've experienced) it throws all JARs into one giant arena and hopes they can sort of work things out. The OSGi environment gives me a lot more control over dependency relationships, and as a "side effect" also naturally demands clarification of those relationships. Between these clear declarations and the tooling's enforcement of them, the project's structure is more obvious to everyone on the team.
if my code accidentally uses a dependency of a dependency, I want it to show up as a compilation error. Ideally, as class not found, but any error would do.
Put your code in one plug-in, your direct dependencies in other plug-ins, their dependencies in other plug-ins, etc. and declare each plug-in's dependencies. Eclipse will immediately do exactly what you want. You won't be offered dependencies' dependencies' contents in autocompletes; you'll get red squiggles and build errors; etc.
Why not use access rules to keep your code clean?
It looks like it would better be managed with maven, integrated in eclipse with m2eclipse.
That way, you can only execute part of the maven build lifecycle, and you can manage separate set of dependencies per build steps.
In my experience it helps to be more resrictive, I made the team filling out (paper) forms why this jar is needed and what license...
and they did rather type in a few lines of code instead of drag along 20 jars to open a file using only one line of code, or another fancy 'feature'.
Using maven could help for a while, but when you first spot jars having names like nightly-build or snapshot, you will know you're in jar-hell.
conclusion: Choose dependencies well
Would using the slf4j-over-log4j jar be useful? That allows using slf4j with actual logging going to log4j.