How can I get poppler to use the extra encoding data in a non-standard directory? - poppler

I'm testing pdftotext as a part of poppler. It came pre-installed on the shared host that I'm using. I'd like to add the poppler encoding data which contain language packs to help combat errors such as "Missing language pack for 'Adobe-GB1' mapping".
Since it doesn't appear that I have permission to install the encoding data in the recommended directory on my shared host (/usr/share/poppler), how can I tell poppler where I've placed the data so that it will get used?
The pdftotext man page doesn't appear to describe any flags that would point to the data packages.
pdftotext resides at /usr/bin/pdftotext

Sorry, that seems not to be possible.
Assuming that your hoster uses Linux, the file relevant to reading the encoding data is GlobalParams. Lines 170 and 677 deal with initializing the base data directory and the paths for encodings, and these use compile-time hard-coded values for the base directory. There is no facility to pass arguments to poppler (and, as a consequence, to any package depending on it), and there also is no user-writable configuration file like for many other Linux software packages to change that behavior.
Your best chance is to ask your hosting provider to install these features for you. Many smaller providers will react friendly towards such a request. Otherwise, you'll have to change your provider.

Related

Configure dependencies in RPM

I have built a RPM-package for Centos 6.6 that is installed on a machine of our customer.
This package contains our own software, customized for the specific use case, but also uses the open-source package HAProxy.
HAProxy (RPM-version 1.5.4-2.el6_7.1) comes with a default-configuration in /etc/haproxy/haproxy.conf and it cannot be customized without changing this file.
But I want the configuration to be part my generated package. RPM throws an error if the /etc/haproxy/haproxy.conf file is in my package, because it is also part of the haproxy-package.
I have worked around this problem by providing a custom upstart-script which starts HAProxy with a different config file, but this does not seem to be the right way to do this.
Is there a preferred way to handle such customizations?
In cases like this, I've created an RPM which installs configuration files into a different subdirectory, and in its %post and %preun scriptlets modifies the uncooperative package's config-files:
when installing, I renamed the original config-files, and made symbolic links from those pathnames to the overwriting config-files, and
when uninstalling, the package removed the symbolic links and restored the original package's files.
Doing it that way of course meant that my config-RPM was dependent on the original RPM. A little awkward to describe, but it works.
In followup, the issue of updating was mentioned. Updating an RPM requires special handling to avoid uninstalling things. The rpm program passes a parameter $1 which you can test in the %pre and %preun scriptlets to notice that this is an upgrade and that there is no need to save the original config-files (or restore them). The rest of the scriptlet would be the same, by copying the new versions of your config-files over the others.
Further reading:
Defining installation scripts (shows the use of `$1)
RPM upgrade uninstalls the RPM
Your approach is correct. On EL6 and sysv there is no other choice than creating custom haproxy package or custom haproxy service or create script which customer runs after installation. I see creating another service as best option.
Note that on EL7 with SystemD you have much better option as you can use Drop-In feature of SystemD. For more information see:
https://coreos.com/os/docs/latest/using-systemd-drop-in-units.html
https://wiki.archlinux.org/index.php/systemd#Drop-in_snippets
https://wiki.archlinux.org/index.php/Systemd/User#Service_example
The usual way this is done is to have a drop-in configuration directory, e.g. /etc/httpd/conf.d/, where your package would drop its configuration, and you would tell the other daemon, e.g. httpd, to do a graceful restart in your %post/%postun.
I don't know anything about HAProxy, but a quick search implies that they do not support this configuration directory concept that has been around for many years. A few people have hacked it in, but unless it is out-of-the-box, you will run into your original problem again.

packing a tcl program for deployment in linux

We are getting ready to deploy a Tcl application, but i'm having trouble figuring out how to do it. Currently, I'm experimenting with tclkit and sdx.kit. I can pack a single tcl file and run it, but the structure of the whole application contains folders and images and c files that work together with tcl. i have two folders and inside a bunch of c files and tcl files and other stuff. How would i go and wrap the whole thing. What tool do you guys recommend other than tclkit and why?
The main way that you're recommended to distribute applications is as a tclkit. There are a few alternatives (e.g., TOBE, ActiveState's commercial tooling) but they're pretty similar as they all build on top of Tcl's virtual filesystem layer. (NB. This isn't the same as the Linux VFS stuff; this is a VFS in a single application.) Indeed, the ActiveState tooling actually is a rebadged tclkit (plus some other stuff like code obfuscation). I believe that TOBE uses ZIP archives instead of metakit databases.
The advantage of using a VFS-based solution is that it means that lots of things work inside, particularly including both source (for getting another .tcl file in) and load (for getting a binary library). In fact, you can put your application, the packages it depends on, and the resources (images, etc.) inside the VFS and be fairly sure that things will work. About the only things that we know run into real problems are where you want to exec something in the archive (the VFS mount is process-local; you have to copy the subsidiary file out if you want it to be seen in subprocesses) and if you're wanting to load certificates of private keys with the tls package (because the underlying OpenSSL library doesn't delegate to Tcl to handle that part of its I/O for some reason, AIUI).
When you're building these things, effectively you make a directory (and its subdirectories) that have everything laid out right. Then you run the packager (sdx for tclkits) and it builds the overall application for you. Attach the result to a runtime (the standard tclkit) and you're ready to test and deploy.
We don't generally do tool recommendations here on Stack Overflow, but the ActiveState Tcl Dev Kit is actually rather widely used. Many other people use sdx/tclkit. TOBE is quite a lot rarer. (There are other packaging techniques, but I wouldn't recommend them these days; a packaged VFS works very well indeed.)

Internal CPAN - what module

I want to setup in-house CPAN for distributing our internal code.
So I was looking at CPAN::Mini as recommended here. But it looks there are other options as CPAN::Site, CPAN::Dark, Dist::Zilla ...
I'm little bit overwhelmed with all these options. What do people mostly use/ recommend?
What I need is a way to push internal modules to repository which can be accesses from several machines.
The quick answer is that you want to use CPAN::Mini to create a local mirror of all that is current on the CPAN, and then CPAN::Mini::Inject to add your own distributions to it.
The long answer is that it helps to understand how a CPAN mirror is constructed. Broadly speaking, it is simply a directory that contains two sub-directories.
The 'modules' directory contains in turn two files, 03modlist.data.gz, whose contents is ignored by modern CPAN clients but there's legacy code that assumes this file exists, so just copy it from an existing mirror. The other is 02packages.details.txt.gz, which I shall describe later.
The 'authors' directory contains a file '01mailrc.txt.gz' which is another relic of the past whose contents can be ignored, so just copy it from another mirror, and it contains the 'id' directory. This in turn contains sub-directories and distributions, whose names follow a pattern. For example, my PAUSE id is DCANTRELL, and one of my distributions is XML-Tiny-2.06.tar.gz, so that file lives at .../authors/id/D/DC/DCANTRELL/XML-Tiny-2.06.tar.gz.
The 02packages.details.txt.gz file is the index that maps module names to distributions, and this must be up to date for your mirror to work properly. It consists of a few header lines, which must be present and correct, followed by a blank line, followed by one line for each module. Those lines are three fields separated by spaces:
module name
module version
distribution filename
eg
XML::Tiny 2.06 D/DC/DCANTRELL/XML-Tiny-2.06.tar.gz
(you may also see .tgz, .zip, and a coupla others)
A distribution may appear in several lines, once for each module it contains. eg
XML::Tiny::DOM 1.1 D/DC/DCANTRELL/XML-Tiny-DOM-1.1.tar.gz
XML::Tiny::DOM::Element 1.1 D/DC/DCANTRELL/XML-Tiny-DOM-1.1.tar.gz
In a normal CPAN mirror, there may be several versions of a distribution, and several versions of a module - for example, the current version and a few older ones, or the current stable one and a dev one. The index file contains the most recent stable version. You can tell dev versions of distributions because they have an underscore in their version, or contain the string '-TRIAL'.
So, knowing all that, you can construct a CPAN-a-like that contains only your code. But using CPAN::Mini and CPAN::Mini::Inject to add your stuff to a "real" CPAN is less work.
Once you've created your CPAN-a-like, you can either expose it on HTTP and access it using any client as normal, or you could just have it in the filesystem and configure the CPAN client to access it using a file:/// URL.
You might also consider Pinto. Pinto allows you to curate your own stable CPAN repository, which can contain any number of both public and private distributions. Pinto also helps you to manage change as your dependencies evolve over time.
DrHyde gave a very nice answer to the question. But if you don't want to maintain a CPAN mirror, you can use MyCPAN::App::DPAN together with MyCPAN::Indexer.
Cave: Both distributions are under development. Not all combinations will work. What I use is the latest version of MyCPAN::App::DPAN on github (1.28_11) and MyCPAN::Indexer version 1.28_10 (later versions don't work with MyCPAN::App::DPAN).
MyCPAN::App::DPAN will create a CPAN-like directory structure on your local disk from the distributions you feed it. You will need to create a config file for it (say, .dpanrc):
# contents of .dpanrc
indexer_id Edward Baudrez <my.email.address#example.org>
dpan_dir /home/ebaudrez/rsync.net/dpan
merge_dirs /home/ebaudrez/rsync.net/dpan/dists
report_dir /home/ebaudrez/rsync.net/dpan/indexer_reports
Put your distribution tarballs in the directory merge_dirs (I think there's no reason that directory should reside under dpan_dir, but I'm too lazy to figure it out right now). Then call dpan:
dpan -f $HOME/.dpanrc
dpan will create a CPAN-like structure in dpan_dir (containing, in particular, authors and modules). This directory can then be used with cpanm (for instance):
cpanm --mirror $HOME/rsync.net/dpan --mirror http://search.cpan.org/CPAN
Note that I use real CPAN as fallback, because the DarkPAN is by definition incomplete. If you also happen to have a mini CPAN mirror, you can also use it here:
cpanm --mirror $HOME/rsync.net/dpan --mirror $HOME/mirrors/minicpan --mirror-only
Do note that, for this scheme to work, you will need to create distribution tarballs from your source code. I like and use Dist::Zilla, but note that you can also generate tarballs from Makefile.PL, so you definitely don't need to use Dist::Zilla. But it takes care of a lot of the details.
Creating a real distribution from your source code may seem like a lot of work, but Dist::Zilla helps lift the burden, and the transition to a real CPAN module, some day in the future ;-), is also simplified a lot when you already have a distribution.

Automating Solaris custom software deployment and configuration for multiple nodes

Essentially, the question I'd like to ask is related to the automation of software package deployments on Solaris 10.
Specifically, I have a set of software components in tar files that run as daemon processes after being extracted and configured in the host environment. Pretty much like any server side software package out there, I need to ensure that a list of prerequisites are met before extracting and running the software. For example:
Checking that certain users exists, and they are associated with one or many user groups. If not, then create them and their group associations.
Checking that target application folders exist and if not, then create them with pre-configured path values defined when the package was assembled.
Checking that such folders have the appropriate access control level and ownership for a certain user. If not, then set them.
Checking that a set of environment variables are defined in /etc/profile, pointed to predefined path locations, added to the general $PATH environment variable, and finally exported into the user's environment. Other files include /etc/services and /etc/system.
Obviously, doing this for many boxes (the goal in question) by hand will certainly be slow and error prone.
I believe a better alternative is to somehow automate this process. So far I have thought about the following options, and discarded them for one reason or another.
Traditional shell scripts. I've only troubleshooted these before, and I don't really have much experience with them. These would be my last resort.
Python scripts using the pexpect library for analyzing system command output. This was my initial choice since the target Solaris environments have it installed. However, I want to make sure that I'm not reinveting the wheel again :P.
Ant or Gradle scripts. They may be an option since the boxes also have Java 1.5 enabled, and the fileset abstractions can be very useful. However, they may fall short when dealing with user and folder permissions checking/setting.
It seems obvious to me that I'm not the first person in this situation, but I don't seem to find a utility framework geared towards this purpose. Please let me know if there's a better way to accomplish this.
I thank you for your time and help.
Most of those steps sound like things handled by use of a packaging system to install your package. On Solaris 10, that would be the SVR4 packaging system included with the OS.

How do I use rpm to update/replace existing files?

I have several applications that I wish to deploy using rpm. Some of the files in my application deployments override files from other deployed packages. Simply including the new files in the deployment package will cause rpm conflicts.
I am looking for the proper way to use rpm to update/replace already installed files.
I have already come up with a few solutions but nothing seems quite right.
Maintain custom versions of the rpms containing the original files.
This seems like a large amount of work for a relatively small reward even though it feels less like a hack than some of the other possible solutions.
Include the files in the rpm with another name and copy them over in the post section.
This would work but will mean littering the system with multiple copies of the files. Also it means additional maintenance in the rpm build spec for each file.
Use wget in the post section to replace the original files from some known server.
This is similar to the copy technique but the files wouldn't even live in the rpm. This might act like a nice central configuration authority though.
Deploy the files as new files, then use symlinks to override the originals.
This is also similar to the copy technique but with less clutter. The problem here is that some files don't behave well as symlinks.
To the best of my knowledge, RPM is not designed to permit updating / replacing existing files, so anything that you do is going to be a hack.
Of the options you list, I'd choose #1 as the least bad hack if the target systems are systems that I admin (as you say, it's more work but is the cleanest solution) and a combination of #2 and #4 (symlinks where possible, copies where not) if I'm creating the RPMs for others' systems (to avoid having to distribute a bunch of RPMs, but I'd make it very clear in the docs what I'm doing).
You haven't described which files need to be updated or replaced and how they need to be updated. Depending on the answers to those questions, you may have a couple of other options:
Many programs are designed to use a single default configuration file and also to grab configuration files from a .d subdirectory. For example, Apache uses /etc/httpd/conf/httpd.conf and /etc/httpd/conf.d/*.conf, so your RPMs could drop files under /etc/httpd/conf.d instead of modifying /etc/httpd/conf/httpd.conf. And if the files that you need to modify are config files that don't follow this pattern but could be made to, you can suggest to the package maintainers that they add this capability; this wouldn't help you immediately but would make future releases easier.
For command-line utilities like sendmail and lpr that can be provided by multiple packages, the alternatives system (see man alternatives) permits more than 1 RPM that provides these utilities to be installed side by side. Again, if the files that you need to modify are command-line utilities that don't follow this pattern but could be made to, you can suggest to the package maintainers that they add this capability.
Config file changes on systems that you administer are better managed through a tool like Cfengine or Puppet rather than through custom RPMs. I think that Red Hat favors Puppet.
If I were creating the RPMs for systems I don't administer, I'd consider using a third-party tool like Bitrock and dumping all of my stuff under /opt just so I wouldn't have to stomp on files installed by other admins' RPMs.
Edit (2019): Nowadays, Software Collections offers a useful alternative. You can create packages that install somewhere under /opt, and the Software Collections tools offer a standardized way for users to opt in to using those instead of whatever's normally installed under /usr. Red Hat uses this to distribute newer versions of tools for their otherwise stable and long-lived (i.e., older) Red Hat Enterprise Linux distributions.
You can also execute rpm -U --replacefiles --replacepkgs ..., which will give you what you want.
See here for more info on RPM %files directives:
http://www.rpm.org/max-rpm/s1-rpm-inside-files-list-directives.html
You can use the arguments from the %post and %pre sections in the RPM scriptlets to determine if you are installing, upgrading or removing packages.
If $1 is 0 - then we're removing old stuff. Targeting 0 packages installed.
If $1 is 1 - then we're installing new stuff. Targeting a total of 1 package to be installed.
If $1 is 2 or more - then we're upgrading this package and $1 represents the number of packages already installed.
These sections help with managing files among the versions.
Keep track of what you're doing between versions and consider what one might do if they were to skip a version or two.
Have consideration for these things and you should be good to go!