Config file handling in Perl

Config file handling in Perl - perl

There are plenty of Modules in the Config:: Namespace on CPAN, but they are all limited in ond way or another.
I'm currently using Config::Std, which is fine most of the time, however it makes certain things difficult:
more than two levels of nested directives
handling of multiple values per key
conf.d directories, i.e. multiple config files which are merged into one big config hash
Config::Std generates a blessed hashref after parsing the config, so all my applications are coded to use a hashref for configuration. I'd prefer not having to change this.
What I am looking for is a universal, lightweight Config Module that produces a hashref.
My Question is: Which Config Modules should I consider for replacing Config::Std?

Config::Any (for loading several files and flattening to a hash) and its Config::General backend (for arbitrarily nested configuration items and multiple values per key à la Apache httpd)

You didn't state where your data is coming from. Are you reading in a configuration file and running into the limit of the configuration file itself?
Config::Std is a great module. However, it was meant to read and write Windows Config/INI files, and Windows Config/INI files are very flat and simple formats. Thus, I wouldn't expect Config::Std to do much more.
If you're using Windows Config/INI files right now, but may need to read more complex data structures in the future, Config::Any is a good way to go. It'll handle Windows Config/INI files and using the same programming interface, read and write XML, YAML, and JSON file structures too.
If you're merely trying to keep a complex data structure in your program and don't care about reading and writing configuration files, I would recommend looking at XML::Simple for the very simple reason that it is ...well... simple and can handle all sorts of data structures. Plus, XML::Simple is a very commonly used module, so there's lots of help on the Internet if you have any questions about the module, and it is actively supported.
You could use Config::Any, but I find it more complex to use, and harder to configure. In fact, you have to install XML::Simple (or a similar module) in order to use it. The advantage of Config::Any is that it is a single interface for all sorts of configuration file formats. That way, you don't have to hack through your program if you decide to switch form Windows Config/INI to XML or YAML.
So, if you're working with Windows Config/INI files now, and need a more complex data structure: Look at Config::Any.
If you're merely wanting a simple way to track complex data structures, look at XML::Simple.

YAML will handle that and more.
And here's the website for the protocol.

Related

How can I have 2 verions of Gensim for summarization in one Jupyter notebook?

I want to have 2 versions of Gensim for using summarization and keyword function from old Gensim.
How can I setup this senario?

In general, a single Jupyter notebook is backed by a single Python interpreter/environment, and popular packages at their 'official' installation paths can only be installed once.
There are a few hackish workarounds suggested in answers like:
Installing multiple versions of a package with pip
However, each workaround presents operational problems.
One approach is to install the older package to a non-standard path (directory) that's still found by Python importing logic (controlled by PYTHONPATH). For example, put/move the older copy of Gensim to a gensim_old package directory. But: this is only likely to work well with very sime (single-.py-file) packages.
With any signficant library (like Gensim) which cross-imports a lot of things from its own utility modules, using the standard paths, lots of things are likely to break unless you dig into all involved individual files to change their import paths. That's kind of kludgey & hard-to-maintain. (Though, to the extent you're just using one old version, say gensim-3.8.3 for the removed summarization feature, perhaps it'd be worth fighting through this process once, then keeping the changes around.)
Another approach is to create a totally-separate Python environment with the alternate version, and only use that other environment from the notebook by a system-call – via either something in Python-code like subprocess.call(), or the notebook-cell ! or !! magic-escapes to run a shell command. That is, you give up the ability to run individual interactive lines of Python in that alt environment - but could still send it batches of data, and either capture the console output or observe its output files to continue processing in your notebook.
I'd expect this to be a better option – cleaner & more-maintainable – provided that either the old-version-functionality (summarization) or new-version-functionality (whatever else) can be condensed into one (or a few) single-step scripts.
Another option would be to try to completely copy the gensim.summarization source code files to some new location inside your own project – performing whatever (few, minor) edits are necessary to ensure it works from the alternate location.
One of the reasons that functionality was removed was that its approach to things like tokenization was not consistent/integrated with other Gensim practices – which actually means it's likely to be a little easier to keep it working (given its use of its own idiosyncratic approaches) separately.
Personally I'd rank these three options desirability as:
(best) Section off the summarization tasks to be run via subprocess executions in a separate Python environment, which has only the older package installed.
(maybe ok) Copy the 10 .py files that implement the gensim.summarization' to your own local module. Edit lightly as necessary to ensure they still work. (That should mainly be updating import` lines, but might reuire a few other adaptations to other Python 3.x/Gensim 4.x changes.)
(probably too messy) Install the whole old package to a non-standard directory, edit lots of files to ensure anything you're using still works.
Finally, note that the main reason the feature was removed is that it did not offer very impressive or adaptable results. While I've seen some people say it's worked OK for their applications, I've never seen even so much as a demo where its practices/algorithm – which can only extract some subset of important sentences, never paraphrase – gave impressive results.
So unless you already know that its approach works well for your needs, don't get your hopes up! Good luck.

Embed perl inside YAML file

I want to create YAML config. files inside which I can put in perl code to reduce the effort of writing the same blocks multiple times. Or for any other operation. I would like to have something like this inside my YAML file:
---
`my $a = 1;`
`if($a){`
foo: bar
`}else{`
foo: baz
`}
...
As is obvious, I want key 'foo' to be assigned the value 'bar' in this example.
Is this a good or a bad idea? If latter, what's the best way to write the file and the pre-processor for such a file?

This is a not a good idea, but CAN be made better.
You have two approaches:
Treat your YAML / JSON file as a template for one of standard Perl templating systems.
Embperl is a good candidate in my opinion, but that's because it's what I'm very familiar with. Any templating framework that supports control flow will fit.
This way, you don't need to write your own language/syntax; AND you get a produced final YAML/XML/JSON in pure form so there's no need for a custom parser.
Create a Perl-based format.
E.g. your config is a chunk of Perl code that is in essence an eval-able (or do-able) code chunk resulting in a config hash.
The result hash structure ca be anything you want, but if you wish it to be, it can mirror the original JSON/YAML, and can be converted to one using standard conversion tools.

How to split long Perl code into several files without too much manual editing?

How do I split a long Perl script into two or more different files that can all access the same variables - without having to rename all shared variables from e.g. $count to $::count (or $main::count which is the same)?
In other words, what's the best and simplest way to split the Perl script into several files without having to import a lot of variables/functions and/or do a lot of manual editing.
I assume it has something to do with making the code part of the same package/scope/namespace, but my experiments so far have failed.
I am not sure it makes a difference, but the script is used for web/CGI purposes and will be running under mod_perl.
EDIT - Background:
I kind of knew I would get that response. The reason I want to split up the file is the following:
Currently I have a single very old and very long Perl file. I know it is not following Perl best practices but it works.
The problem is, I need to distribute the data files it uses between different web servers, first of all for performance reasons. There will be one "master" server and one or several "slaves".
About 20% of the mentioned Perl file contains shared functions, 40% has the code need to run on the master server and 40% on the slave servers. Therefore, I would like to split the code into three files: 1. shared, 2. master-only, 3. slave-only. On the master server, 1 and 2 will be loaded, on the slaves, 1 and 3 will be loaded.
I assume this approach would use less process RAM and, more importantly, I would minimize the risk of not splitting the code correctly (e.g. a slave process calling a master data file). I don't see a great need for modularization, as the system works and the code does not need a lot of changes or exchanges with other projects.
EDIT 2 - Solution:
Found the solution I was looking for here:
http://www.perlmonks.org/?node_id=95813
In cases where the main package is in ownership of the variable, the
actual word 'main' can be ommitted to yield something like: $::var
It is possible to get around having to fully qualify variable names
when strict is in use. Applying a simply use vars to your script, with
the variable names as it arguments will get around explicit package
names.
Actually, I ended up repeating the our ($count, etc...) statement for the needed variables instead of use vars ();
Do let me know if I am missing something vital - apart from not going with modules! :)
#Axeman, Thanks, I will accept your answer, both for your effort and for sending me in the right direction.

Unless you put different package statements in their files, they will all be treated as if they had package main; at the top. So assuming that the scripts use package variables, you shouldn't have to do anything. If you have declared them with my (that is, if they are lexically scoped variables) then you would have to make sure that all references to the variables are in the same file.
But splitting scripts up for length is a rotten substitute for modularization. Yes, modularization helps keep code length down, but modularization if the proper way to keep code length down--for all the reasons that you would want to keep code-length down, modularization does it best.
If chopping the files by length could really work for you, then you could create a script like this:
do '/path/to/bin/part1.pl';
do '/path/to/bin/part2.pl';
do '/path/to/bin/part3.pl';
...
But I kind of suspect that if the organization of this code is as bad as you're--sort of--indicating, it might suffer from some of the same re-inventing the wheel that I've seen in Perl-ignorant scripts. Just offhand (I might be wrong) but I'm thinking you would be surprised how much could be chopped from the length by simply substituting better-tested Perl library idioms than for-looping and while-ing everything.

parsing different files of the same grammar and calculating file to file similarities

I've got a bunch of ACPI Source Language files and I want to calculate file to file similarities between them. I thought of using something like Perl's Parse::RecDescent
but I am stuck at:
1) Translating the ACPI Grammar (www.acpi.info/DOWNLOADS/ACPIspec40a.pdf) to something Parse::RecDescent would understand
2) Have a metric to compare 2 parsed files
Any ideas?

To get started with Parse::RecDescent you may look at Pro Perl Parsing, Ch. 5 or
at Advanced Perl Programming, Ch. 2
Xml Diff tools should be appropriate for comparing hierarchically structured data; perhaps you can apply such a tool to ASTs saved in XML format

So you have two problems:
Parsing ACPI to build an AST. This has the usual troubles of ensuring that you have a well defined grammar, that your parsing machinery can parse according to that grammar (often you have to bend a good grammar definition to enable the parsing machiney to process it), and building a corresponding AST. You will have these troubles with Perl parsing machinery, simply because it is a parsing engine.
Comparing the structure of the ASTs and producing a sensible answer. What you are likely to find here is that there is some literature describing roughtly how to do this (using e.g. Levenshtein distance), but that the details for ASTs matter. (Change distilling: Tree differencing for fine-grained source code change extraction Finally, having determined the distance, you need to print out the deltas in some readable form.
However, AFAIK, my company is the only one that has reduced this to practice. See our Smart Differencer tool. THe SmartDifferencers parse, build ASTs, and report changers in terms of ASTs elements moved, inserted, deleted, replaced, or modifiied by consistent identifier substitition. They depend on any underlying very strong GLR parsing engine which minimized the problems of accepting new grammars. They work for many common languages but not presently for ACPI.

How can I combine Catalyst and ngettext?

I'm trying to get my head around i18n with Catalyst. As far as I understood the matter, there are two ways to make translations with Perl: Maketext and Gettext. However, I have a requirement to support gettext's .po format so basically I'm going with gettext.
Now, I've found Catalyst::Plugin::I18n and thus Locale::Maketext::Lexicon, which does what I want most of the time. However, it doesn't generate proper pluralization forms, i.e. properly writing msgid_plural and msgstr[x] into the .pot file. This happens probably because Maketext depends on its bracket notation [quant,_1...] and thus has to have the same notation in the translation.
Yet another solution might be using some direct gettext port like Locale::Messages, however this would mean rewriting C::P::I18n.
Does anybody have a proper solution for this problem apart from rewriting several modules? Anything that combines proper gettext with all its features and Catalyst will do.

You will probably get a better answer on the mailing list:
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
I assume you've also read this:
http://www.catalystframework.org/calendar/2006/18