I have a yaml file which is generated from another source as shown below.
connect1:
connect2:
- { level1 : name, level2: age,
level3: gender}
My code looks something like this --?
use YAML::Tiny qw(LoadFile);
use YAML;
use YAML::Loader;
use YAML::Syck;
use YAML qw(LoadFile);
use Data::Dumper;
my $data = LoadFile("file.yaml");
my #config = $data->{connect1}->{connect2};
print Dumper(#config);
I'm getting this following error -->
YAML Error: Invalid element in map
Code: YAML_LOAD_ERR_BAD_MAP_ELEMENT
Line: 3
Document: 1
If level1 , level2 and leve3 are in the same line then I dont see this issue.
I see the issue because of indentation I think.
But there is no way that I can change this file.yaml.
So is there a way in perl that I can still parse this yaml file without modifying the file.yaml?
The state of YAML in Perl is a bit unfortunate, because there are several modules which support different features.
YAML::Syck, YAML::XS and YAML::PP can parse your example. YAML::XS is probably a good choice right now.
YAML.pm was the first perl module for YAML, and it was written for YAML 1.0.
YAML::Syck is based on libsyck, which was written for YAML 1.0. It can parse more than YAML.pm though.
YAML::XS is based on libyaml, written for YAML 1.1. You should be able to parse most YAML with it, and libyaml is used in (or was ported to) many other languages.
YAML::Tiny just supports a subset of YAML, which does not include flow collections { ... }, [ ... ] and aliases/anchors (&x, *x)
YAML::PP is pretty new and already can parse a lot, but it's also not complete yet. It aims to parse YAML 1.2 (and it will also partially support 1.1 in the future) Disclaimer: I'm the author
Here you can find my slides from the London Perl Workshop 2017:
https://perlpunk.github.io/slides.lpw2017/yaml-where-and-how-to-use/
Starting at slide 24 you'll find a quick overview over the 5 modules.
Looking at the cpan page
In exchange for this adding this extreme flexibility, it provides support
for only a limited subset of YAML. But the subset supported contains most
of the features for the more common uses of YAML.
So I think you might need to try a more complete YAML parser
If you didn't fancy that then you could investigate a command line utility to do the parsing, such as yq or another YAML to JSON conversion and process as JSON.
Related
I have a bunch of repeated text in my perlpod documentation. I could of course create a separate section and reference it, but I was wondering if there's a way to enter the text once somewhere and have it inserted in multiple places?
I don't think this is possible, but thought I'd ask to make sure I'm not missing anything.
Or - perhaps there's a better perl documentation technique?
As you've realised, Pod is (deliberately) a very simple markup language. It doesn't have any particularly complicated feature and one of the things it is missing is a way to embed text from another source.
I'd suggest moving the repeated text to its own section and linking to that section (using L<...>) whenever you want to reference that text.
While Pod markup is meant to be very basic, we don't have to literally type it all by hand.
Text for documentation can be processed like any other text in Perl, using its extensive set of tools, to create a string with pod-formatted text. That string can then be displayed using core Pod::Usage, via a file (that can be removed or kept), or directly by using core Pod::Simple.
Display the Pod string by writing a file
use warnings;
use strict;
use feature 'say';
use Path::Tiny; # convenience, to "spew" a file
my $man = shift;
show_pod() if $man;
say "done $$";
sub show_pod {
require Pod::Usage; Pod::Usage->import('pod2usage');
my $pod_text = make_pod();
my $pod_file = $0 =~ s/\.pl$/.pod/r;
path($pod_file)->spew($pod_text);
pod2usage( -verbose => 2, -input => $pod_file ); # exits by default
}
sub make_pod {
my $repeated = q(lotsa text that gets repeated);
my $doc_text = <<EOD;
=head1 NAME
$0 - demo perldoc
=head1 SYNOP...
Text that is also used elsewhere: $repeated...
=cut
EOD
return $doc_text;
}
The .pod file can be removed: add -exitval => 'NOEXIT' to pod2usage arguments so that it doesn't exit and then remove the file. Or, rather, keep it available for other tools and uses.
I've split the job into two subs as a hint, since it can be useful to be able to only write a .pod file, which can then also be used and viewed in other ways and formats.† This isn't needed to just show docs, and all Pod business can be done in one sub.
Dispaly the Pod string directly
If there is no desire to keep a .pod file around then we don't have to make it
sub show_pod { # The rest of the program same as above
my $pod_text = make_pod();
require Pod::Simple::Text;
Pod::Simple::Text->filter( \$pod_text ); # doesn't exit so add that
}
The ->filter call is a shortcut for creating an object, setting a filehandle, and processing the contents. See docs for a whole lot more.
Either of these approaches provides you with far more flexibility.
Also, while one can indeed solve the repeated text by referencing a separate section with that text, that of course doesn't allow us to use variables or do any Perl processing -- what is all available when you write a Pod string, which is then passed over to perlpod to dispaly.
Note Use of .pod files affects perldoc. Thanks to #briandfoy for a comment.
For smaller documentation, that has no particular benefit of using separate .pod files, I recommend the second approach, as hinted in the answer. It only differs in how the docs text is organized in the file while still allowing one to process it as any text is normally processed with Perl.
For use cases where .pod files are of good value I still find it an acceptable trade-off but that is my call. Be aware that perldoc is affected and assess how much that matters in your project.
† I use a system like this in a large project, with .pod files in their directory. There is also a simple separate script for overall documentation management, that invokes individual programs with options to write/update their .pods, in HTML with CPAN's style-file, for the main web page. Any program can also simply display its docs in a desired format.
I want to create YAML config. files inside which I can put in perl code to reduce the effort of writing the same blocks multiple times. Or for any other operation. I would like to have something like this inside my YAML file:
---
`my $a = 1;`
`if($a){`
foo: bar
`}else{`
foo: baz
`}
...
As is obvious, I want key 'foo' to be assigned the value 'bar' in this example.
Is this a good or a bad idea? If latter, what's the best way to write the file and the pre-processor for such a file?
This is a not a good idea, but CAN be made better.
You have two approaches:
Treat your YAML / JSON file as a template for one of standard Perl templating systems.
Embperl is a good candidate in my opinion, but that's because it's what I'm very familiar with. Any templating framework that supports control flow will fit.
This way, you don't need to write your own language/syntax; AND you get a produced final YAML/XML/JSON in pure form so there's no need for a custom parser.
Create a Perl-based format.
E.g. your config is a chunk of Perl code that is in essence an eval-able (or do-able) code chunk resulting in a config hash.
The result hash structure ca be anything you want, but if you wish it to be, it can mirror the original JSON/YAML, and can be converted to one using standard conversion tools.
Analyzing sources of CPAN modules I can see something like this:
...
package # hide from PAUSE
Try::Tiny::ScopeGuard;
...
Obviously, it's taken from Try::Tiny, but I have seen this kind of comments between package keyword and package identifier in other modules too.
Why this procedure is used? What is its goal and what benefits does it have?
It is indeed a hack to hide a package from PAUSE's indexer.
When a distribution is uploaded to PAUSE, the indexer will examine each file in the upload, looking for the names of packages that are included in the distribution. Any indexed packages can show up in CPAN search results.
There are many reasons for not wanting the indexer to discover your packages. Your distribution may have many small or insignificant packages that would clutter up the search results for your module. You may have packages defined in your t (test) directory or some other non-standard directory that are not meant to be installed as part of the distribution. Your distribution may include files from a completely different distribution (that somebody else wrote).
The hack works because the indexer strictly looks for the keyword package and an expression that looks like a package name on the same line.
Nowadays, you can include a META.yml file with your distribution. The PAUSE indexer will look for and respect a no_index specification in this file. But this is a relatively new capability of the indexer so older modules and old-timer CPAN contributors will still use the line break hack.
Here's an example of a no_index spec from Forks::Super
no_index:
directory:
- t
- inc
package:
- Sys::CpuAffinity
- Signals::XSIG
- Signals::XSIG::Default
- Signals::XSIG::TieArray56
Sys::CpuAffinity and Signals::XSIG are separate distributions that are also packaged with Forks::Super. Some of the test scripts contain package declarations (e.g., Arbitrary::Test::Package) that shouldn't be indexed.
Okay, here's another shot at this phenomenon ... I've been whacky-hacking Perl for a dozen years and I've rarely seen this packy hack and possibly simply ignored and never bothered to investigate. One thing seems clear, though. There's some hackish processing going on at PAUSE that's been crafted in the good ol' Perl'n'UNIX school of thought that without the shadow of a doubt involves line-oriented text parsing, so they parse those Perl files, possibly even using grep, but rather perl itself, who knows, to extract package names and then kick of some procedure or get some stats or whatnot. And to trip up this procedure and hack around its ways the author splits the package declaration in two lines so the hacky packy grep job doesn't have a clue that there's a package declared right under its nose and the programmer is happy about his hacky skills and the PAUSE stats or whatever it is they're cobbling together are as they should be. Does that make sense?
What's the purpose of use statement below which I stumbled across in some Perl 6 module?
use CGI:from<perl5>;
...
...
The rest of the code is just mundane usage of the Perl 5 CGI module, as far as I can tell.
Is the ":from" suffix used to invoke some kind of a Perl 5 compatibility layer. Can't seem to find any documentation about it.
Look at the perl6 Synopsis 11: Modules:
The use statement allows an external language to be specified in addition to (or instead of) an authority, so that you can use modules from other languages. The from adverb also parses any additional parts as short-form arguments. For instance:
use Whiteness:from<perl5>:name<Acme::Bleach>:auth<cpan:DCONWAY>:ver<1.12>;
use Whiteness:from<perl5 Acme::Bleach cpan:DCONWAY 1.12>; # same thing
So indeed, it's a scheme to support "other languages", perl5 in this instance.
There are plenty of Modules in the Config:: Namespace on CPAN, but they are all limited in ond way or another.
I'm currently using Config::Std, which is fine most of the time, however it makes certain things difficult:
more than two levels of nested directives
handling of multiple values per key
conf.d directories, i.e. multiple config files which are merged into one big config hash
Config::Std generates a blessed hashref after parsing the config, so all my applications are coded to use a hashref for configuration. I'd prefer not having to change this.
What I am looking for is a universal, lightweight Config Module that produces a hashref.
My Question is: Which Config Modules should I consider for replacing Config::Std?
Config::Any (for loading several files and flattening to a hash) and its Config::General backend (for arbitrarily nested configuration items and multiple values per key à la Apache httpd)
You didn't state where your data is coming from. Are you reading in a configuration file and running into the limit of the configuration file itself?
Config::Std is a great module. However, it was meant to read and write Windows Config/INI files, and Windows Config/INI files are very flat and simple formats. Thus, I wouldn't expect Config::Std to do much more.
If you're using Windows Config/INI files right now, but may need to read more complex data structures in the future, Config::Any is a good way to go. It'll handle Windows Config/INI files and using the same programming interface, read and write XML, YAML, and JSON file structures too.
If you're merely trying to keep a complex data structure in your program and don't care about reading and writing configuration files, I would recommend looking at XML::Simple for the very simple reason that it is ...well... simple and can handle all sorts of data structures. Plus, XML::Simple is a very commonly used module, so there's lots of help on the Internet if you have any questions about the module, and it is actively supported.
You could use Config::Any, but I find it more complex to use, and harder to configure. In fact, you have to install XML::Simple (or a similar module) in order to use it. The advantage of Config::Any is that it is a single interface for all sorts of configuration file formats. That way, you don't have to hack through your program if you decide to switch form Windows Config/INI to XML or YAML.
So, if you're working with Windows Config/INI files now, and need a more complex data structure: Look at Config::Any.
If you're merely wanting a simple way to track complex data structures, look at XML::Simple.
YAML will handle that and more.
And here's the website for the protocol.