How do you get elements in a header/footer of a odt doc?
for example I have:
use OpenOffice::OODoc;
my $doc = odfDocument(file => 'whatever.odt');
my $t=0;
while (my $table = $doc->getTable($t))
{
print "Table $t exists\n";
$t++;
}
When I check the tables they are all from the body. I can't seem to find elements for anything in the header or footer?
I found sample code here which led me to the answer:
#! /usr/local/bin/perl
use OpenOffice::OODoc;
my $file='asdf.odt';
# odfContainer is a representation of the zipped odf file
# and all of its parts.
my $container = odfContainer("$file");
# We're going to look at the 'style' part of the container,
# because that's where the header is located.
my $style = odfDocument
(
container => $container,
part => 'styles'
);
# masterPageHeader takes the style name as its argument.
# This is not at all clear from the documentation.
my $masterPageHeader = $style->masterPageHeader('Standard');
my $headerText = $style->getText( $masterPageHeader );
print "$headerText\n"
The master page style defines the look and feel of the document -- think CSS. Apparently 'Standard' is the default name for the master page style of a document created by OpenOffice... that was the toughest nut to crack... once I found the example code, that fell out in my lap.
Related
My code is to enter an actor name and the program, via the given actor's filmography in IMDB, lists on a hash table all the cinematic genres of the movies he has acted in as well as their frequency. However, I have a problem: When I type a name like "brad pitt" or "bruce willis" after running the program at the prompt, execution takes indefinitely. How do you know what the problem is?
Another problem: when I type "nicolas bedos" (an actor name that I entered from the beginning), it works but it seems that the index is only made for a single movie selected in the #url_links list. Should the look_down function of the TreeBuilder module within a foreach loop be adapted? I was telling myself that the #genres list was overwritten on each iteration so I added a push () but the result remains the same.
use LWP::Simple;
use PerlIO::locale;
use HTML::TreeBuilder;
use WWW::Mechanize;
binmode STDOUT, ':locale';
use strict;
use warnings;
print "Enter the actor's name:";
my $acteur1 = <STDIN>; # the user enters the name of the actor
print "We will analyze the filmography of the actor $actor1 by genre\n";
#we put the link with the given actor in Mechanize variable in order to browse the internet links
my $lien1 = "https://www.imdb.com/find?s=nm&q=$acteur1";
my $mech = WWW::Mechanize->new();
$mech->get($lien1); #we access the search page with the get function
$mech->follow_link( url_regex => qr/nm0/i ); #we access the first result using the follow_link function and the regular expression nm0 which is in the URL
my #url_links= $mech->find_all_links( url_regex => qr/title\/tt/i ); #owe insert in an array all the links having as regular expression "title" in their URL
my $nb_links = #url_links; #we record the number of links in the list in this variable
my $tree = HTML::TreeBuilder->new(); #we create the TreeBuilder module to access a specific text on the page via the tags
my %index; #we create a hashing table
my #genres = (); #we create the genre list to insert all the genres encountered
foreach (#url_links) { #we make a loop to browse all the saved links
my $mech2 = WWW::Mechanize->new();
my $html = $_->url(); #we take the url of the link
if ($html =~ m=^/title=) { #if the url starts with "/title"
$mech2 ->get("https://www.imdb.com$html"); #we complete the link
my $content = $mech2->content; #we take the content of the page
$tree->parse($content); #we access the url and we use the tree to find the strings that interest us
#genres = $tree->look_down ('class', 'see-more inline canwrap', #We have as criterion to access the class = "see-more .."
sub {
my $link = $_[0]->look_down('_tag','a'); #new conditions: <a> tags
$link->attr('href') =~ m{genres=}; #autres conditions: "genres" must be in the URL
}
);
}
}
my #genres1 = (); #we create a new list to insert the words found (the genres of films)
foreach my $e (#genres){ #we create a loop to browse the list
my $genre = $e->as_text; #the text of the list element is inserted into the variable
#genres1 = split(/[à| ]/,$genre); #we remove the unnecessary characters that are spaces, at and | which allow to keep that the terms of genre cine
}
foreach my $e (#genres1){ #another loop to filter listing errors (Genres: etc ..) and add the correct words to the hash table
if ($e ne ("Genres:" or "") ) {
$index{$e}++;
}
}
$tree->delete; #we delete the tree as we no longer need it
foreach my $cle (sort{$index{$b} <=> $index{$a}} keys %index){
print "$cle : $index{$cle}\n"; #we display the hash table with the genres and the number of times that appear in the filmography of the given actor
}
Thank you in advance for your help,
wobot
The IMDB Conditions of Use say this:
Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.
So you might want to reconsider what you're doing. Perhaps you could look at the OMDB API instead.
I am developing a TYPO3 6.0 plugin that shows the subpages of the current page as tabs. For example, on the following pages my plugin is inserted on TabRoot:
If TabRoot is requested, the plugin's ActionController looks up the database for the subpage titles and contents and passes all gathered data to a Fluid template. The page is then rendered like the following:
With JS in place I always hide/show content below based on the selection. My problem is that I want to show the translated content of the subpages based on the current language selection. How am I able to do this? I've tried it with several methods, but neither of them was flawless. These are the methods I've tried:
Using RECORDS This method is not affected by the selected language, it always returns the content in the default language:
//Get the ids of the parts of the page
$select_fields = "uid";
$from_table = "tt_content";
$where_clause = 'pid = ' . $pageId;
$res = $GLOBALS['TYPO3_DB']->exec_SELECTquery(
$select_fields,
$from_table,
$where_clause,
$groupBy='',
$orderBy='sorting',
$limit=''
);
$ids = '';
$firstIteration = true;
while ( $row = $GLOBALS['TYPO3_DB']->sql_fetch_assoc( $res ) ) {
if (!$firstIteration) $ids .= ",";
$ids .= $row ['uid'];
$firstIteration = false;
}
$GLOBALS['TYPO3_DB']->sql_free_result( $res );
//Render the parts of the page
$conf ['tables'] = 'tt_content';
$conf ['source'] = $ids;
$conf ['dontCheckPid'] = 1;
$content = $this->cObj->cObjGetSingle ( 'RECORDS', $conf );
Using CONTENTS According to TYPO3: How to render localized tt_content in own extension, this is the way to do it, however for me this also returns the content rendered with the default language. It is not affected by a language change.
$conf = array(
'table' => 'tt_content',
'select.' => array(
'pidInList' => $pageId,
'orderBy' => 'sorting',
'languageField' => 'sys_language_uid'
)
);
$content = $this->cObj->cObjGetSingle ( 'CONTENT', $conf );
Using VHS: Fluid ViewHelpers I installed the vhs extension and tried to render the content with <v:content.render />. The result is the same as with CONTENTS; it only works with the default language.
{namespace v=Tx_Vhs_ViewHelpers}
...
<v:content.render column="0" order="'sorting'" sortDirection="'ASC'"
pageUid="{pageId}" render="1" hideUntranslated="1" />
Using my own SQL query I've tried to get the bodytext fields of the page and then render those with \TYPO3\CMS\Frontend\Plugin\AbstractPlugin::pi_RTEcssText(). This method returns the content based on the current language, however the problem is that bodytext's do not contain the complete content (images, other plugins, etc).
$select_fields = "bodytext";
$from_table = "tt_content";
$where_clause = 'pid = ' . $pageId
. ' AND sys_language_uid = ' . $GLOBALS ['TSFE']->sys_language_uid;
$res = $GLOBALS['TYPO3_DB']->exec_SELECTquery(
$select_fields,
$from_table,
$where_clause,
$groupBy='',
$orderBy='sorting',
$limit=''
);
$content = '';
while ( $row = $GLOBALS['TYPO3_DB']->sql_fetch_assoc( $res ) ) {
$content .=
\TYPO3\CMS\Frontend\Plugin\AbstractPlugin::pi_RTEcssText( $row ['bodytext'] );
}
$GLOBALS['TYPO3_DB']->sql_free_result( $res );
What am I missing? Why isn't the content rendered with the current language in the case of the CONTENTS method?
Easiest way is to use the cObject viewhelper to render right from TypoScript.
And inside your TypoScript template provide the configuration:
lib.myContent = CONTENT
lib.myContent {
...
}
BTW, you are bypassing the TYPO3 CMS API. Please do not do so. Always use the API methods to query for data.
e.g. \TYPO3\CMS\core\Database\DatabaseConnection is always available at GLOBALS['TYPO3_DB']->. Do not use the the mysql function.
On top of that, I believe that you can archive whatever you are trying to do with pure TypoScript, without the need to program anything. Feel free to ask a new questions to get help on this.
In TYPO3 4.x you could use the following methods to load the translated record:
t3lib_pageSelect->getRecordOverlay
t3lib_pageSelect->getPageOverlay
They are also available at $GLOBALS['TSFE']->sys_page->getRecordOverlay().
I need some help getting a bugzilla extension off the ground.
I want to hook into bug_format_comment (FWIW: to change some plain text automatically added as a comment when I commit to SVN to links to the respective SCM commit.)
Right now, nothing seems to happen when I manually add a comment to a bug.
Is there anything special I need to do to make the extension run, besides putting it in the /extensions/my-ext-name/ dir ?
How can I test if the extension is called at all?
I use an old version of Bugzilla (3.2.x). Is that hook even supported? (I can't find that info in the documentation).
Here's my complete Extension.pm file (I have no experience in Perl. I took the example of the hook from the example extension and ran from there)
package Bugzilla::Extension::Websvn-scmbug-autolink;
use strict;
use base qw(Bugzilla::Extension);
# This code for this is in ./extensions/Websvn-scmbug-autolink/lib/Util.pm
use Bugzilla::Extension::Websvn-scmbug-autolink::Util;
use URI::Escape;
our $VERSION = '0.01';
# See the documentation of Bugzilla::Hook ("perldoc Bugzilla::Hook"
# in the bugzilla directory) for a list of all available hooks.
sub install_update_db {
my ($self, $args) = #_;
}
sub bug_format_comment {
my ($self, $args) = #_;
my $regexes = $args->{'regexes'};
# push(#$regexes, { match => qr/\bfoo\b/, replace => 'bar' });
# 6665 --> 6666
# CTUFramework:trunk/CTUCsharpRuntime/CtuFramework/text1-renamed.txt
#my $bar_match = qr/\b(bar)\b/;
my $bar_match = qr/(?:^|\r|\n)(\d+|NONE) (-->) (\d+|NONE)[ \r\n\t]+([^:]+):(.*?)[\r\n]/s; #/s - treat as single line
push(#$regexes, { match => $bar_match, replace => \&_replace_bar });
my $scm_match2 = qr/(?:^|\r|\n)(\d+|NONE) (-->) (\d+|NONE)[ \r\n\t]+([^:]+):(.*?)[\r\n]/s; #/s - treat as single line
push(#$regexes, { match => $scm_match2, replace => \&_replace_bar });
}
# Used by bug_format_comment--see its code for an explanation.
sub _replace_bar {
my $args = shift;
my $scmFromVer = $args->{matches}->[0];
my $scmToVer = $args->{matches}->[1];
my $scmArrow = $args->{matches}->[2];
my $scmProject = $args->{matches}->[3];
my $scmFile = $args->{matches}->[4];
# Remember, you have to HTML-escape any data that you are returning!
my $websvnRoot = "http://devlinux/websvn";
my $websvnRepo = uri_escape($scmProject); #maybe do a mapping
my $websvnFilePath = uri_escape("/".$scmFile);
my $fromRevUrl = sprintf("%s/revision.php?repname=%s&rev=%s",
$websvnRoot, $websvnRepo, $scmFromVer);
my $toRevUrl = sprintf("%s/revision.php?repname=%s&rev=%s",
$websvnRoot, $websvnRepo, $scmToVer);
my $diffUrl = sprintf("%s/diff.php?repname=%s&path=%s&rev=%s",
$websvnRoot, $websvnRepo, $websvnFilePath, $scmToVer);
my $fileUrl = sprintf("%s/filedetails.php?repname=%s&path=%s&rev=%s",
$websvnRoot, $websvnRepo, $websvnFilePath, $scmToVer);
# TODO no link for 'NONE'
my $fromRevLink = sprintf(qq{%s}, $fromRevUrl, $scmFromVer);
my $toRevLink = sprintf(qq{%s}, $toRevUrl, $scmToVer);
my $diffLink = sprintf(qq{%s}, $diffUrl, $scmArrow);
my $fileLink = sprintf(qq{%s}, $fileUrl, $scmFilePath);
# $match = html_quote($match);
return "$fromRevLink $diffLink $toRevLink:$fileLink";
};
__PACKAGE__->NAME;
As written, your extension won't even load: dashes are not valid in Perl package names.
Change the name from Websvn-scmbug-autolink to Websvn_scmbug_autolink.
I searched the Bugzilla sources, and found that the hook is simply not supported in version 3.2.x. The hook was introduced in Bugzilla 3.6: http://bzr.mozilla.org/bugzilla/3.6/revision/6762
PS. I hacked the regex replaces right in the template script for comments. Hacky, but it works.
I am currently attempting to create a Perl webspider using WWW::Mechanize.
What I am trying to do is create a webspider that will crawl the whole site of the URL (entered by the user) and extract all of the links from every page on the site.
But I have a problem with how to spider the whole site to get every link, without duplicates
What I have done so far (the part im having trouble with anyway):
foreach (#nonduplicates) { #array contain urls like www.tree.com/contact-us, www.tree.com/varieties....
$mech->get($_);
my #list = $mech->find_all_links(url_abs_regex => qr/^\Q$urlToSpider\E/); #find all links on this page that starts with http://www.tree.com
#NOW THIS IS WHAT I WANT IT TO DO AFTER THE ABOVE (IN PSEUDOCODE), BUT CANT GET WORKING
#foreach (#list) {
#if $_ is already in #nonduplicates
#then do nothing because that link has already been found
#} else {
#append the link to the end of #nonduplicates so that if it has not been crawled for links already, it will be
How would I be able to do the above?
I am doing this to try and spider the whole site to get a comprehensive list of every URL on the site, without duplicates.
If you think this is not the best/easiest method of achieving the same result I'm open to ideas.
Your help is much appreciated, thanks.
Create a hash to track which links you've seen before and put any unseen ones onto #nonduplicates for processing:
$| = 1;
my $scanned = 0;
my #nonduplicates = ( $urlToSpider ); # Add the first link to the queue.
my %link_tracker = map { $_ => 1 } #nonduplicates; # Keep track of what links we've found already.
while (my $queued_link = pop #nonduplicates) {
$mech->get($queued_link);
my #list = $mech->find_all_links(url_abs_regex => qr/^\Q$urlToSpider\E/);
for my $new_link (#list) {
# Add the link to the queue unless we already encountered it.
# Increment so we don't add it again.
push #nonduplicates, $new_link->url_abs() unless $link_tracker{$new_link->url_abs()}++;
}
printf "\rPages scanned: [%d] Unique Links: [%s] Queued: [%s]", ++$scanned, scalar keys %link_tracker, scalar #nonduplicates;
}
use Data::Dumper;
print Dumper(\%link_tracker);
use List::MoreUtils qw/uniq/;
...
my #list = $mech->find_all_links(...);
my #unique_urls = uniq( map { $_->url } #list );
Now #unique_urls contains the unique urls from #list.
I'm trying to submit a form by post method using WWW::Mechanize perl module.
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
...
$mech->get($url);
...
my $response = $mech->submit_form(
form_name => $name,
fields => {
$field_name => $field_value
},
button => 'Button'
);
$field_name is generally speaking a text field (though the type is not specified explicitly in the form), which has a preset value.
$field_name => $field_value in $mech->submit_form on whatever reason does not replace the value, instead $field_value is added into the form after the original value:
{submitted_field_value} = {original_value},{provided_value}
How to replace {original_value} with {provided_value} in the form to be submitted ?
What happens if you add this single line to your code before calling $mech->submit_form():
$mech->field( $name, [$field_value], 1 );
This makes sure that the first value is added, or overwritten if it already exists.
1 is the number parameter (or position index)
See the documentation of WWW::Mechanize:
$mech->field( $name, \#values, $number )
Given the name of a field, set its value to the value specified. [...]
The optional $number parameter is used to distinguish between two
fields with the same name. The fields are numbered from 1.
It's important to remember WWW::Mechanize is better thought of as a 'headless browser' as opposed to say LWP or curl, which only handle all the fiddly bits of http requests for you. Mech keeps its state as you do things.
You'll need to get the form by using $mech->forms or something similar (its best to decide from the documentation. I mean there so many ways to do it.), and then set the input field you want to change, using the field methods.
I guess the basic way to do this comes out as so:
$mech->form_name($name);
$mech->field($field_name, $field_value);
my $response = $mech->click('Button');
Should work. I believe it will also work if you get the field and directly use that (ie my $field = $mech->form_name($name); then use $field methods instead of $mech.
I managed to make it working at my will. Thanks Timbus and knb for your suggestions. Though my case may not be completely general (I know the preset value) but I'd share what I've found (by trails & errors).
my $mech = WWW::Mechanize->new();
$mech->get($url);
$mech->form_name( $name );
my $fields = $mech->form_name($name);
foreach my $k ( #{$fields->{inputs}}){
if ($k->{value} eq $default_value){
$k->{value}=$field_value;
}
}
my $response = $mech->click('Button_name');