Embed perl inside YAML file

Embed perl inside YAML file - perl

I want to create YAML config. files inside which I can put in perl code to reduce the effort of writing the same blocks multiple times. Or for any other operation. I would like to have something like this inside my YAML file:
---
`my $a = 1;`
`if($a){`
foo: bar
`}else{`
foo: baz
`}
...
As is obvious, I want key 'foo' to be assigned the value 'bar' in this example.
Is this a good or a bad idea? If latter, what's the best way to write the file and the pre-processor for such a file?

This is a not a good idea, but CAN be made better.
You have two approaches:
Treat your YAML / JSON file as a template for one of standard Perl templating systems.
Embperl is a good candidate in my opinion, but that's because it's what I'm very familiar with. Any templating framework that supports control flow will fit.
This way, you don't need to write your own language/syntax; AND you get a produced final YAML/XML/JSON in pure form so there's no need for a custom parser.
Create a Perl-based format.
E.g. your config is a chunk of Perl code that is in essence an eval-able (or do-able) code chunk resulting in a config hash.
The result hash structure ca be anything you want, but if you wish it to be, it can mirror the original JSON/YAML, and can be converted to one using standard conversion tools.

Related

Parsing yaml in perl --> Code: YAML_LOAD_ERR_BAD_MAP_ELEMENT

I have a yaml file which is generated from another source as shown below.
connect1:
connect2:
- { level1 : name, level2: age,
level3: gender}
My code looks something like this --?
use YAML::Tiny qw(LoadFile);
use YAML;
use YAML::Loader;
use YAML::Syck;
use YAML qw(LoadFile);
use Data::Dumper;
my $data = LoadFile("file.yaml");
my #config = $data->{connect1}->{connect2};
print Dumper(#config);
I'm getting this following error -->
YAML Error: Invalid element in map
Code: YAML_LOAD_ERR_BAD_MAP_ELEMENT
Line: 3
Document: 1
If level1 , level2 and leve3 are in the same line then I dont see this issue.
I see the issue because of indentation I think.
But there is no way that I can change this file.yaml.
So is there a way in perl that I can still parse this yaml file without modifying the file.yaml?

The state of YAML in Perl is a bit unfortunate, because there are several modules which support different features.
YAML::Syck, YAML::XS and YAML::PP can parse your example. YAML::XS is probably a good choice right now.
YAML.pm was the first perl module for YAML, and it was written for YAML 1.0.
YAML::Syck is based on libsyck, which was written for YAML 1.0. It can parse more than YAML.pm though.
YAML::XS is based on libyaml, written for YAML 1.1. You should be able to parse most YAML with it, and libyaml is used in (or was ported to) many other languages.
YAML::Tiny just supports a subset of YAML, which does not include flow collections { ... }, [ ... ] and aliases/anchors (&x, *x)
YAML::PP is pretty new and already can parse a lot, but it's also not complete yet. It aims to parse YAML 1.2 (and it will also partially support 1.1 in the future) Disclaimer: I'm the author
Here you can find my slides from the London Perl Workshop 2017:
https://perlpunk.github.io/slides.lpw2017/yaml-where-and-how-to-use/
Starting at slide 24 you'll find a quick overview over the 5 modules.

Looking at the cpan page
In exchange for this adding this extreme flexibility, it provides support
for only a limited subset of YAML. But the subset supported contains most
of the features for the more common uses of YAML.
So I think you might need to try a more complete YAML parser
If you didn't fancy that then you could investigate a command line utility to do the parsing, such as yq or another YAML to JSON conversion and process as JSON.

How to split long Perl code into several files without too much manual editing?

How do I split a long Perl script into two or more different files that can all access the same variables - without having to rename all shared variables from e.g. $count to $::count (or $main::count which is the same)?
In other words, what's the best and simplest way to split the Perl script into several files without having to import a lot of variables/functions and/or do a lot of manual editing.
I assume it has something to do with making the code part of the same package/scope/namespace, but my experiments so far have failed.
I am not sure it makes a difference, but the script is used for web/CGI purposes and will be running under mod_perl.
EDIT - Background:
I kind of knew I would get that response. The reason I want to split up the file is the following:
Currently I have a single very old and very long Perl file. I know it is not following Perl best practices but it works.
The problem is, I need to distribute the data files it uses between different web servers, first of all for performance reasons. There will be one "master" server and one or several "slaves".
About 20% of the mentioned Perl file contains shared functions, 40% has the code need to run on the master server and 40% on the slave servers. Therefore, I would like to split the code into three files: 1. shared, 2. master-only, 3. slave-only. On the master server, 1 and 2 will be loaded, on the slaves, 1 and 3 will be loaded.
I assume this approach would use less process RAM and, more importantly, I would minimize the risk of not splitting the code correctly (e.g. a slave process calling a master data file). I don't see a great need for modularization, as the system works and the code does not need a lot of changes or exchanges with other projects.
EDIT 2 - Solution:
Found the solution I was looking for here:
http://www.perlmonks.org/?node_id=95813
In cases where the main package is in ownership of the variable, the
actual word 'main' can be ommitted to yield something like: $::var
It is possible to get around having to fully qualify variable names
when strict is in use. Applying a simply use vars to your script, with
the variable names as it arguments will get around explicit package
names.
Actually, I ended up repeating the our ($count, etc...) statement for the needed variables instead of use vars ();
Do let me know if I am missing something vital - apart from not going with modules! :)
#Axeman, Thanks, I will accept your answer, both for your effort and for sending me in the right direction.

Unless you put different package statements in their files, they will all be treated as if they had package main; at the top. So assuming that the scripts use package variables, you shouldn't have to do anything. If you have declared them with my (that is, if they are lexically scoped variables) then you would have to make sure that all references to the variables are in the same file.
But splitting scripts up for length is a rotten substitute for modularization. Yes, modularization helps keep code length down, but modularization if the proper way to keep code length down--for all the reasons that you would want to keep code-length down, modularization does it best.
If chopping the files by length could really work for you, then you could create a script like this:
do '/path/to/bin/part1.pl';
do '/path/to/bin/part2.pl';
do '/path/to/bin/part3.pl';
...
But I kind of suspect that if the organization of this code is as bad as you're--sort of--indicating, it might suffer from some of the same re-inventing the wheel that I've seen in Perl-ignorant scripts. Just offhand (I might be wrong) but I'm thinking you would be surprised how much could be chopped from the length by simply substituting better-tested Perl library idioms than for-looping and while-ing everything.

Config file handling in Perl

There are plenty of Modules in the Config:: Namespace on CPAN, but they are all limited in ond way or another.
I'm currently using Config::Std, which is fine most of the time, however it makes certain things difficult:
more than two levels of nested directives
handling of multiple values per key
conf.d directories, i.e. multiple config files which are merged into one big config hash
Config::Std generates a blessed hashref after parsing the config, so all my applications are coded to use a hashref for configuration. I'd prefer not having to change this.
What I am looking for is a universal, lightweight Config Module that produces a hashref.
My Question is: Which Config Modules should I consider for replacing Config::Std?

Config::Any (for loading several files and flattening to a hash) and its Config::General backend (for arbitrarily nested configuration items and multiple values per key à la Apache httpd)

You didn't state where your data is coming from. Are you reading in a configuration file and running into the limit of the configuration file itself?
Config::Std is a great module. However, it was meant to read and write Windows Config/INI files, and Windows Config/INI files are very flat and simple formats. Thus, I wouldn't expect Config::Std to do much more.
If you're using Windows Config/INI files right now, but may need to read more complex data structures in the future, Config::Any is a good way to go. It'll handle Windows Config/INI files and using the same programming interface, read and write XML, YAML, and JSON file structures too.
If you're merely trying to keep a complex data structure in your program and don't care about reading and writing configuration files, I would recommend looking at XML::Simple for the very simple reason that it is ...well... simple and can handle all sorts of data structures. Plus, XML::Simple is a very commonly used module, so there's lots of help on the Internet if you have any questions about the module, and it is actively supported.
You could use Config::Any, but I find it more complex to use, and harder to configure. In fact, you have to install XML::Simple (or a similar module) in order to use it. The advantage of Config::Any is that it is a single interface for all sorts of configuration file formats. That way, you don't have to hack through your program if you decide to switch form Windows Config/INI to XML or YAML.
So, if you're working with Windows Config/INI files now, and need a more complex data structure: Look at Config::Any.
If you're merely wanting a simple way to track complex data structures, look at XML::Simple.

YAML will handle that and more.
And here's the website for the protocol.

Removing comments using Perl

Something I keep doing is removing comments from a file as I process it. I was was wondering if there a module to do this.
Sort of code I keep writing time and again is
while(<>) {
s/#.*// ;
next if /^ \s+ $/x ;
**** do something useful here ****
}
Edit Just to clarify, the input is not Perl. It is a text file of my own making that might have data I want to process in some way or other. I want to beable to place comments that are ignored by my programs

Unless this is a learning experience I suggest you use Regexp::Common::comment instead of writing your own regular expressions.
It supports quite a few languages.

The question does not make clear what type of file it is. Are we dealing with perl source files? If so, your approach is not entirely correct - see gbacon's comment. Perl source files are notoriously difficult (impossible?) to parse with regex. In that case, or if you need to deal with several types of files, use Regexp::Common::comment as suggested by Niffle. Otherwise, if you think your regex logic is correct for your scenario, then I personally prefer to write it explicitly, it's just a pair of strighforward lines, there is little to be gained by using a module (and you introduce a dependency).

Why are Perl source filters bad and when is it OK to use them?

It is "common knowledge" that source filters are bad and should not be used in production code.
When answering a a similar, but more specific question I couldn't find any good references that explain clearly why filters are bad and when they can be safely used. I think now is time to create one.
Why are source filters bad?
When is it OK to use a source filter?

Why source filters are bad:
Nothing but perl can parse Perl. (Source filters are fragile.)
When a source filter breaks pretty much anything can happen. (They can introduce subtle and very hard to find bugs.)
Source filters can break tools that work with source code. (PPI, refactoring, static analysis, etc.)
Source filters are mutually exclusive. (You can't use more than one at a time -- unless you're psychotic).
When they're okay:
You're experimenting.
You're writing throw-away code.
Your name is Damian and you must be allowed to program in latin.
You're programming in Perl 6.

Only perl can parse Perl (see this example):
#result = (dothis $foo, $bar);
# Which of the following is it equivalent to?
#result = (dothis($foo), $bar);
#result = dothis($foo, $bar);
This kind of ambiguity makes it very hard to write source filters that always succeed and do the right thing. When things go wrong, debugging is awkward.
After crashing and burning a few times, I have developed the superstitious approach of never trying to write another source filter.
I do occasionally use Smart::Comments for debugging, though. When I do, I load the module on the command line:
$ perl -MSmart::Comments test.pl
so as to avoid any chance that it might remain enabled in production code.
See also: Perl Cannot Be Parsed: A Formal Proof

I don't like source filters because you can't tell what code is going to do just by reading it. Additionally, things that look like they aren't executable, such as comments, might magically be executable with the filter. You (or more likely your coworkers) could delete what you think isn't important and break things.
Having said that, if you are implementing your own little language that you want to turn into Perl, source filters might be the right tool. However, just don't call it Perl. :)

It's worth mentioning that Devel::Declare keywords (and starting with Perl 5.11.2, pluggable keywords) aren't source filters, and don't run afoul of the "only perl can parse Perl" problem. This is because they're run by the perl parser itself, they take what they need from the input, and then they return control to the very same parser.
For example, when you declare a method in MooseX::Declare like this:
method frob ($bubble, $bobble does coerce) {
... # complicated code
}
The word "method" invokes the method keyword parser, which uses its own grammar to get the method name and parse the method signature (which isn't Perl, but it doesn't need to be -- it just needs to be well-defined). Then it leaves perl to parse the method body as the body of a sub. Anything anywhere in your code that isn't between the word "method" and the end of a method signature doesn't get seen by the method parser at all, so it can't break your code, no matter how tricky you get.

The problem I see is the same problem you encounter with any C/C++ macro more complex than defining a constant: It degrades your ability to understand what the code is doing by looking at it, because you're not looking at the code that actually executes.

In theory, a source filter is no more dangerous than any other module, since you could easily write a module that redefines builtins or other constructs in "unexpected" ways. In practice however, it is quite hard to write a source filter in a way where you can prove that its not going to make a mistake. I tried my hand at writing a source filter that implements the perl6 feed operators in perl5 (Perl6::Feeds on cpan). You can take a look at the regular expressions to see the acrobatics required to simply figure out the boundaries of expression scope. While the filter works, and provides a test bed to experiment with feeds, I wouldn't consider using it in a production environment without many many more hours of testing.
Filter::Simple certainly comes in handy by dealing with 'the gory details of parsing quoted constructs', so I would be wary of any source filter that doesn't start there.
In all, it really depends on the filter you are using, and how broad a scope it tries to match against. If it is something simple like a c macro, then its "probably" ok, but if its something complicated then its a judgement call. I personally can't wait to play around with perl6's macro system. Finally lisp wont have anything on perl :-)

There is a nice example here that shows in what trouble you can get with source filters.
http://shadow.cat/blog/matt-s-trout/show-us-the-whole-code/
They used a module called Switch, which is based on source filters. And because of that, they were unable to find the source of an error message for days.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse