Making a shared library that "re-exports" symbols from other shared libraries - interface

Suppose that there is a simple binary that depends on three libraries, libA.so, libB.so, and libC.so. In the usual case, these three dependencies would show up in readelf as needed. However, I am curious about whether it is possible to make a shared library libABC.so that does absolutely nothing but act as an interface to the three actual libraries by "redirecting" the symbols. This way, perhaps one can have multiple versions of libABC.so that in turn point to different versions of the three dependencies, and the binary can "depend" on just libABC.so. Is this possible with ELF?
Another possible use case is the inverse, when the binary already depends on an existing library libABC.so that just so happens to have become split up into three individual libraries.
Beware that I do not necessarily have a practical use or actual use case for this. Whether or not the above example cases are practical, I am merely curious about the possibility.
Re-export Shared Library Symbols from Other Library (OS X / POSIX) has a promising title, but the answers seem either Darwin-specific, or do not quite answer this question.

That kind of works with ELF because of the flat namespace of symbols: if you're depending on one library you usually get access to the symbols of its dependencies at the same time (the exception being when dlopen() is used).
But most link editors (ld) do not do that by default (anymore), because it would let unneeded libraries to be added to the dependencies otherwise. In GNU ld the feature is controlled by the --as-needed flag, and was turned on around 10 years ago by default if I remember correctly.
You should be able to force the behaviour you're looking into with GNU ld by linking (e.g. via the GCC frontend) with gcc yourprogram.c -Wl,--no-as-needed -lABC -Wl,--as-needed. That will force linking to libABC.so whether the program is using one of its exported symbols or not.
I have written extensively on the feature, because it solved many problems for distributions at the time, on my blog if you're looking into what the practicalities of it are.

Related

How can I have 2 verions of Gensim for summarization in one Jupyter notebook?

I want to have 2 versions of Gensim for using summarization and keyword function from old Gensim.
How can I setup this senario?
In general, a single Jupyter notebook is backed by a single Python interpreter/environment, and popular packages at their 'official' installation paths can only be installed once.
There are a few hackish workarounds suggested in answers like:
Installing multiple versions of a package with pip
However, each workaround presents operational problems.
One approach is to install the older package to a non-standard path (directory) that's still found by Python importing logic (controlled by PYTHONPATH). For example, put/move the older copy of Gensim to a gensim_old package directory. But: this is only likely to work well with very sime (single-.py-file) packages.
With any signficant library (like Gensim) which cross-imports a lot of things from its own utility modules, using the standard paths, lots of things are likely to break unless you dig into all involved individual files to change their import paths. That's kind of kludgey & hard-to-maintain. (Though, to the extent you're just using one old version, say gensim-3.8.3 for the removed summarization feature, perhaps it'd be worth fighting through this process once, then keeping the changes around.)
Another approach is to create a totally-separate Python environment with the alternate version, and only use that other environment from the notebook by a system-call – via either something in Python-code like subprocess.call(), or the notebook-cell ! or !! magic-escapes to run a shell command. That is, you give up the ability to run individual interactive lines of Python in that alt environment - but could still send it batches of data, and either capture the console output or observe its output files to continue processing in your notebook.
I'd expect this to be a better option – cleaner & more-maintainable – provided that either the old-version-functionality (summarization) or new-version-functionality (whatever else) can be condensed into one (or a few) single-step scripts.
Another option would be to try to completely copy the gensim.summarization source code files to some new location inside your own project – performing whatever (few, minor) edits are necessary to ensure it works from the alternate location.
One of the reasons that functionality was removed was that its approach to things like tokenization was not consistent/integrated with other Gensim practices – which actually means it's likely to be a little easier to keep it working (given its use of its own idiosyncratic approaches) separately.
Personally I'd rank these three options desirability as:
(best) Section off the summarization tasks to be run via subprocess executions in a separate Python environment, which has only the older package installed.
(maybe ok) Copy the 10 .py files that implement the gensim.summarization' to your own local module. Edit lightly as necessary to ensure they still work. (That should mainly be updating import` lines, but might reuire a few other adaptations to other Python 3.x/Gensim 4.x changes.)
(probably too messy) Install the whole old package to a non-standard directory, edit lots of files to ensure anything you're using still works.
Finally, note that the main reason the feature was removed is that it did not offer very impressive or adaptable results. While I've seen some people say it's worked OK for their applications, I've never seen even so much as a demo where its practices/algorithm – which can only extract some subset of important sentences, never paraphrase – gave impressive results.
So unless you already know that its approach works well for your needs, don't get your hopes up! Good luck.

How does a disassembler work and how is it different from a decompiler?

I'm looking into installing a disassembler (or decompiler) on my Linux Mint 17.3 OS and I wanted to know what the difference is between a disassembler and a decompiler. I have a rough idea of what they are (the names are fairly self-explanatory), but they are still a bit confusing.
I've read that a disassembler turns a program into assembly language, which I don't know, so it seems kind of useless to me. I've also read that a decompiler turns a 'binary file' into its source code. What exactly is a binary file?
Apparently, decompilers cannot decompile to C, only Python and other similar languages. So how can I turn a program into its original C source code?
A disassembler is a pretty straightforward application that transfers machine code into assembly language statements - This activity is the reverse operation that an assembler program does and is straightforward because there is a strict one-to-one relationship between machine code and assembly. A disassembler aims at a specific CPU. The original assembler that was used to create the executable is only of minor relevance.
A decompiler aims at recreating a compiled high-level language program from machine code into its original format - Thus trying the reverse operation of a C or Forth (popular languages for which de-compilers exist) compiler. Because there are so many high-level languages and thus so many ways in how original high-level language constructs could be expressed in machine code (even a lot of different strategies for the same language and construct, even in the same compiler, and even different strategies depending on the compiler mode and situation), this operation is much more complex and very dependent on the original compiler (and maybe even the command line that was used, it's chosen optimization level and also the used version).
Even if all that fits, most of the work of a decompiler is educated guessing and will most probably never reach a point where it can reconstruct the original program in its source code form 100% - It will rather end up with a version of source code that could have been the original program.

Is It Better Practice To Use package-name:symbol In Code Or :use :package-name In A DEFPACKAGE?

This is I suspect, a matter of style and/or personal taste but I thought I'd ask anyway.
I have been in the habit of defining packages thus:
(defpackage :wibble
(:use :cl :drakma)
(:export :main))
Once I have executed IN-PACKAGE (:wibble, in this case), I can then use the symbols in DRAKMA unadorned:
(http-request ...
Then I recently read that seasoned Lisp hackers would rather not :use but:
(drakma:http-request ...
Just wondered what the consensus of opinion was on here and whether there were any pros or cons (not that type of CONS :) ) either way?
Cheers,
Peter
When you use a package, there are a couple subtle ways things might go wrong if the used package changes.
First, the package might export more symbols in the future. If, for example, the package exports a new symbol library:rhombus and you're already using that myapp::rhombus to name something, you are suddenly using the inherited symbol, with all possible attachments (e.g. classes, defuns, macros, etc), with sometimes strange results. If you use qualified symbol names, you will not get any more or any less than the symbols you want.
Second, the package might stop exporting symbols in the future. So if, for example, library:with-rhombus disappears, your call to (with-rhombus (42 42 42) ...) will suddenly get an error for an invalid function call (42 ...) rather than something that points directly to the source of the problem, the "missing" symbol. If you use qualified symbol names, you will get an error along the lines of Symbol WITH-RHOMBUS is not exported from the LIBRARY package which is clearer.
Importing symbols (with :import-from or :shadowing-import-from or import) is not without its own trouble. Importing works on any symbol, regardless of whether it's external or not. So it could be the case that the symbol is now library::rhombus, i.e. not intended for public consumption any more, but importing will still work with no errors.
Which option you use depends on your comfort level with the source package. Do you control it, and you will not make any conflicting changes without thorough testing? Go ahead and import or use to your heart's content. Otherwise, be careful about checking for unintended side-effects as library package interfaces change.
This is more a style issue, so it's impossible to categorize it in black and white, but here are the pros and cons:
Using package-qualified symbols.
Avoids symbol conflicts.
Allows to clearly distinguish foreign symbols.
Allows to easily search, replace, copy,... uses of a certain symbol from the external library (for refactoring, extracting the code to some other place etc.)
Makes code uglier, but only when library names are too long. (For example, I add a nickname re to cl-pprce, and now the code using it is even better, than w/o qualification: think re:scan)
Importing the whole package
Basically, the opposite of the previous case. But I tend to use it with utility libraries, because using qualified names often beats their whole purpose of making code more concise and clear :)
:import-from package symbol
This is one option you've forgotten to mention. I think it may be useful, when you use one or too very distinct symbols from a certain package very frequently. It also allows to import unexported symbols.
Good answers so far.
Another view is that a package and its symbols make up a language. If you think a symbol should be a part of this language, then you should make it available without the need to qualify it with another package - when programming in this language.
For example in the CLIM implementation there is a CLIM-LISP package which sets up the implementation language. It is a variant of the COMMON-LISP package. Then there are packages like CLIM-SYS (resources, processes, locks, ...), CLIM-UTILS (various utilities and extensions of Common Lisp) and CLIM itself. Now in a new package SILICA (an abstract window system) these four packages are used. The implementation of Silica thus is implemented in a language which is built as a union of two languages (the Common Lisp variant CLIM-LISP and the UI commands of CLIM) plus two utility packages which extend CLIM-LISP with some facilities.
In above example it makes sense to use the packages, since they are extending each other to form a new language and the implementation in that new package makes heavy use of those.
If you had a package which needs conflicting packages, then it would not make sense to use them. For example a package could use drawing commands tailored to a GUI and for Postscript output. They would have similar names. Using them both would lead to conflicts. You also want to make clear in the source code for the human reader from where these symbols are coming. Is it a line-drawing command from a postscript or a GTK+ library? Would be great if you can find it out easily - even though the function names are the same.
As a rule of thumb, I :use packages that extend the general language, but use qualified symbols for packages that have some special application. For example, I'd always :use alexandria, but refer fully qualified to symbols from Hunchentoot. When in doubt, I use qualified names.

Why is "package" keyword sometimes separated by a comment from the package name?

Analyzing sources of CPAN modules I can see something like this:
...
package # hide from PAUSE
Try::Tiny::ScopeGuard;
...
Obviously, it's taken from Try::Tiny, but I have seen this kind of comments between package keyword and package identifier in other modules too.
Why this procedure is used? What is its goal and what benefits does it have?
It is indeed a hack to hide a package from PAUSE's indexer.
When a distribution is uploaded to PAUSE, the indexer will examine each file in the upload, looking for the names of packages that are included in the distribution. Any indexed packages can show up in CPAN search results.
There are many reasons for not wanting the indexer to discover your packages. Your distribution may have many small or insignificant packages that would clutter up the search results for your module. You may have packages defined in your t (test) directory or some other non-standard directory that are not meant to be installed as part of the distribution. Your distribution may include files from a completely different distribution (that somebody else wrote).
The hack works because the indexer strictly looks for the keyword package and an expression that looks like a package name on the same line.
Nowadays, you can include a META.yml file with your distribution. The PAUSE indexer will look for and respect a no_index specification in this file. But this is a relatively new capability of the indexer so older modules and old-timer CPAN contributors will still use the line break hack.
Here's an example of a no_index spec from Forks::Super
no_index:
directory:
- t
- inc
package:
- Sys::CpuAffinity
- Signals::XSIG
- Signals::XSIG::Default
- Signals::XSIG::TieArray56
Sys::CpuAffinity and Signals::XSIG are separate distributions that are also packaged with Forks::Super. Some of the test scripts contain package declarations (e.g., Arbitrary::Test::Package) that shouldn't be indexed.
Okay, here's another shot at this phenomenon ... I've been whacky-hacking Perl for a dozen years and I've rarely seen this packy hack and possibly simply ignored and never bothered to investigate. One thing seems clear, though. There's some hackish processing going on at PAUSE that's been crafted in the good ol' Perl'n'UNIX school of thought that without the shadow of a doubt involves line-oriented text parsing, so they parse those Perl files, possibly even using grep, but rather perl itself, who knows, to extract package names and then kick of some procedure or get some stats or whatnot. And to trip up this procedure and hack around its ways the author splits the package declaration in two lines so the hacky packy grep job doesn't have a clue that there's a package declared right under its nose and the programmer is happy about his hacky skills and the PAUSE stats or whatever it is they're cobbling together are as they should be. Does that make sense?

LISP In Small Pieces - best LISP environment to run code in?

Christian Queinnec has written a masterpiece called LISP In Small Pieces, which features eleven Lisp Interpreters and two Lisp compilers.
When you go to download the code from the website here - it has the comment:
The programs of this book are available on the net.
These programs used to run with some Scheme systems around 1994.
Any idea:
(a) What Scheme systems these ran on at the time, and more importantly;
(b) What Scheme systems these would run on today?
There's a lot of programs in there. I did a few tests to see how well I could answer this without having to try them individually. There are 131 files in the tarball with extension ".scm". However there appear to be Scheme programs with other extensions such as .bgl. So I did a search for files containing 'L i S P' in the first five lines. That yields 173 files. I tried running all of these on my preferred Scheme implementation. 31 of these run without error. Almost all of these are in the "src" directory. So the language-specific programs really do seem language-specific. Let's look at one of the src/ files that failed, "chap9z.scm". It's choking on define-abbreviation. I don't know the origin of this symbol, but it's not defined anywhere in guile. But all of its uses could be performed by guile's syntax-rules.
Some Scheme implementations that existed in 1994 still are still around and maintained: Scheme 48, Chez Scheme, Gambit, Bigloo, MIT Scheme and SCM.
Probably the code from LiSP will run in other modern Scheme systems such as Guile or Larceny.
Personally, I would recommend using Racket. Most likely, much of the code will run in #lang racket with no changes, and there's no requirement to use [] (but your code may be easier to read :). Things that don't work are probably easy to fix, and you can also use the R5RS language implementation provided by Racket which will likely work for all of the code.
(a) What Scheme systems these ran on at the time
The Makefile in the source tarball from the author's website has targets for running the code under bigloo, elk, gambit, mit-scheme, scheme2c, and scm.
The Makefile mentions SCM 4e1 and Bigloo 1.9d as known working versions, though I haven't tested them myself. I didn't find any mention of specific versions for the other schemes.
(b) What Scheme systems these would run on today?
The code in this github repo has been updated so that almost all of the tests in the included test suite pass with current (as of 06/2014) versions of bigloo, gambit, and mit-scheme.
If you just want to be able to run the code and follow along with the book, one of those schemes should work for you.
[full disclosure: I'm the owner of the repo and I'm a Scheme noob. The code in the repo is WOMM certified, but your mileage may vary.]
If, on the other hand, you're not content to use bigloo / gambit / mit-scheme, it shouldn't be too hard to add support for guile / racket / insert-favorite-scheme-here. Use one of the book.* files as a starting point, e.g. gambit/book.scm or mitscheme/book.mit. If you can get a version of book.scm to load in your favorite scheme, then have a look at the test.interpreters make target, and finally the grand.test target to verify things are working as expected.
The included README file states:
These files were tested with a Scheme interpreter augmented with
a test-suite driver (tester.scm),
the define-syntax and define-abbreviation macros (using
Dybvig's syntax-case package),
and an object system: Meroonet (meroonet.scm).
Bigloo, Scheme->C, Gambit, Elk or SCM can be used. The first three are
better since a specialized interpreter may be built that contains a
compiled Meroonet and compiled hygienic macros.
Apparently Appleby has posted an updated version of the source code. Racket is missing though )=
See https://github.com/appleby/Lisp-In-Small-Pieces