Perl module that works like Data::Dumper but allow data manipulation - perl

Is there a popular Perl module that works like Data::Dumper but allows user to write hook to manipulate the data inside complex structure or object.
There are a few modules showing up in google such as Data::Visitor or Data::Structure::Util that might do the job, but I'm not sure if they are the popular ones .

I've written Data::Dmap to do this, but as mentioned, Data::Rmap, Data::Transformer and Data::Visitor are also relevant.
The basic idea of Data::Dmap is that it allows you to transform anything in a nested data structure and still tries to behave like the built in map function.

I am not sure it is what you mean, but Data::Dump supports hooks to filter dumped data. Similar hooks are also possible in Data::Printer.
Edit: If you need editing, I would look at Data::Rmap or Data::Transformer. Also, if your structure is simple (say only scalars, hashes and arrays), you can make simple recursive traversal yourself.

YAML is a nice serialization format, easy to edit string values and such. It might not handle all your objects, but it's worth a try, and it both serializes and reloads things easily.

Related

Parsing HTML which is not valid XML

I need to parse a website which has a lot of nested <div>s all over. I tried with XML::Simple to get a nice tree-structure, but the parse fails all the time because there seems to be two or three not closed <p> somewhere. I tried HTML::Parser, but that only lets me define some handler functions that give me the right tags, but not their nested elements.
There any way to get XML::Simple accept non-valid XML or HTML::Parser to give me a handy tree structure?
The HTML::TreeBuilder builds nice trees and gives tons of handy methods to traverse it.
An alternative to something based on HTML::TreeBuilder is XML::LibXML->load_html(...).
But is it valid HTML? If so, XML::LibXML will do a marvelous job if you use the HTML parsing functions. It is lightning fast and provides a great interface. It should even be able to handle some bad HTML using the recover option.
Alternatively, HTML::Parser (often used via HTML::TreeBuilder or HTML::TreeBuilder::XPath) is renown for handling bad HTML. It won't be as fast, though.

Perl Need to Compare Two Data Structures and Return Differences

I have two data structures with a mix of hashes and arrays. How can I compare the two data structures and return their differences, something like perl's Test::Harness module, but I don't want to actually run a unit test. ...or is there a way to tun Test::Harness without actually running a unit test?
Perl Monks says that Test::Deep, Data::Compare, and Data::Match are your friends. Those packages don't seem to be geared towards producing detailed diffs but you might be able to hack in a callback to keep track of the precise differences.
Test::Deep::NoTest (from Test::Deep) might be what you are looking for, giving the functions of Test::Deep outside a test script (eq_deeply, cmp_deeply, etc). Look at using deep_diag() to see what the differences are.
Data::Compare - also gives functions returning boolean responses (rather like using cmp on the command line for files), but (from memory) is harder to return what those differences are.
I used the former most recently, probably to get the deep_diag() details that Data::Compare didn't provide, but I haven't tried parsing the response.

Should I use the function-oriented or object-oriented CGI interfaces?

I've been learning about the CGI module lately, and the book I'm using shows there are two ways you can use CGI, function-oriented or object-oriented. They say the benefit of having object-oriented is only to be able to create two CGI objects. First of all is this true, and are there any other benefits, and secondly what example is there for using two CGI objects?
When I need to put together a very simple CGI script, I use the CGI module's OO interface.
I use the OOP interface because the standard, imperative interface imports a ton of symbols that may conflict with my own symbols. I don't like this, so I always prevent symbol importation. I don't use CGI;. Instead, I use CGI ();.
I also limit my use to generating the header and parsing parameters. I always generate HTML as HTML or better yet, use a template module like TemplateToolkit.
I strictly avoid CGI's HTML generation functions. Why?
I (along with many other people) already know HTML, and I see no benefit in learning CGI's pseudo-html interface.
When a script grows up and needs to be used in another environment, it is easier to extract the HTML blocks or templates and reuse them.
Don't interpret what I've written as a blanket condemnation of CGI.pm. There's plenty to love about CGI.pm. It gets content type generation right. It makes parameter parsing trivial. It is a core module. It makes command line debugging and testing easy.
I think I have found the answer to my question
http://perldoc.perl.org/CGI.html#PROGRAMMING-STYLE
Reading through the faq, an example given for multiple uses of CGI objects is I can store CGI and load previous CGI objects, which is quite useful.
Beyond the advantages you cite I'd also point out that OOP usage of CGI.pm is much cleaner to read (at least for me) and manage than the functional version.
I also suspect it is more common so people who have to maintain your code after you (including you six months from now) will find it easier to maintain.

Signal/Slot mechanism for Perl

I'm wondering if there is an equivalent to Qt's signal/slot mechanism for Perl. I have looked into POE, but since it's huge, I couldn't find anything useful.
Thank you in advance,
Perhaps you are looking for something like Object::Event, an API for registering and emitting events, mostly for AnyEvent, but I imagine you could use it elsewhere. Gtk2 also has a mechanism similar to QT's, especially combined with Glade XML, which lets you automatically map event slots|signals to perl object methods or functions. AnyEvent is a generic event loop which supports Gtk/Glib and POE, amongst others, and is much easier to grok than the large set of modules that is POE.
The concept is generally called Publish/Subscribe. The search result for pubsub on CPAN gives you what you want.

What's the best way to make a deep copy of a data structure in Perl?

Given a data structure (e.g. a hash of hashes), what's the clean/recommended way to make a deep copy for immediate use? Assume reasonable cases, where the data's not particularly large, no complicated cycles exist, and readability/maintainability/etc. are more important than speed at all costs.
I know that I can use Storable, Clone, Clone::More, Clone::Fast, Data::Dumper, etc. What's the current best practice?
Clone is much faster than Storable::dclone, but the latter supports more data types.
Clone::Fast and Clone::More are pretty much equivalent if memory serves me right, but less feature complete than even Clone, and Scalar::Util::Clone supports even less but IIRC is the fastest of them all for some structures.
With respect to readability these should all work the same, they are virtually interchangeable.
If you have no specific performance needs I would just use Storable's dclone.
I wouldn't use Data::Dumper for this simply because it's so cumbersome and roundabout. It's probably going to be very slow too.
For what it's worth, if you ever want customizable cloning then Data::Visitor provides hooking capabilities and fairly feature complete deep cloning is the default behavior.
My impression is that Storable::dclone() is somewhat canonical.
Clone is probably what you want for that. At least, that's what all the code I've seen uses.
Try to use fclone from Panda::Lib which seems the fastest one (written in XS)
Quick and dirty hack if you're already dealing with JSONs and using the JSON module in your code: convert the structure to a JSON and then convert the JSON back to a structure:
use JSON;
my %hash = (
obj => {},
arr => []
);
my $hash_ref_to_hash_copy = from_json(to_json(\%hash));
The only negative possibly being having to deal with a hash reference instead of a pure hash, but still, this has come in handy a few times for me.