Extracting Text in body that is not part of tag with HTML::TreeBuilder - perl

I have some ugly html that is emailed to my program that looks like:
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
</head>
<body>
Saved search results.<br>
<br>
Name: 'Some splunk search' <br>
Query Terms: 'tag=foo NOT BAR=\"Boom\"' <br>
Link to results: <a href="https://foo/search/blahblahblah">
https://foo/search/blahblahblah</a>
<br>
<br>
<table border="1">
...snipped the rest for brevity.
I am able to pull the table elements out using HTML::TreeBuilder but can't figure out how to
pull the "Name:" an "Query Terms" from above out without resorting to other means.
A $root->dump of the above looks like:
<html> #0
<head> #0.0
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" /> #0.0.0
<body> #0.1
<p> #0.1.0 (IMPLICIT)
" Saved search results. "
<br /> #0.1.0.1
<br /> #0.1.0.2
" Name: 'Some splunk search' "
<br /> #0.1.0.4
" Query Terms: 'tag=foo NOT BAR=\"Boom\""
So is there a way to get the naked text between the #0.1.0.2 and #0.1.0.4
Thanks!
Todd

If there is a pattern to the text, it might be easier to use a combination of HTML parsing and regular expressions.
my $body_text = $body->as_text(skip_dels => 1);
my ($name) = ($body_text =~ m#Name: '([^']+)'#s);
my ($query_terms) = ($body_text =~ m#Query Terms: '([^']+)'#s);

Related

How to add meta tags from widget in Orchard?

I'm trying to add some meta tags from widget (/views/parts/ folder) that gets data from database outside of orchard. I need to put them to head section, and frankly I got no idea how to achieve that.
I tried:
using (Script.Head())
{
<meta property="description" content="ABC">
}
SetMeta("ABC", "description");
But none of these work :-(
Edit: our document.cshtml code:
#using Orchard.Mvc.Html;
#using Orchard.UI.Resources;
#{
RegisterLink(new LinkEntry { Type = "image/x-icon", Rel = "shortcut icon", Href = Url.Content("~/modules/orchard.themes/Content/orchard.ico") });
string title = Convert.ToString(Model.Title);
string siteName = Convert.ToString(WorkContext.CurrentSite.SiteName);
string classForPage = "static " + Html.ClassForPage();
}
<!DOCTYPE html>
<!--[if lt IE 7]>
<html lang="#WorkContext.CurrentCulture" class="#classForPage no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]>
<html lang="#WorkContext.CurrentCulture" class="#classForPage no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]>
<html lang="#WorkContext.CurrentCulture" class="#classForPage no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!-->
<html lang="#WorkContext.CurrentCulture" class="#classForPage no-js"> <!--<![endif]-->
<head>
<meta charset="utf-8" />
<title>#Html.Title(title, siteName)</title>
<meta name="viewport" content="width=device-width">
#{
Display(Model.Head);
}
<meta property="og:title" content="#Layout.Title - #Convert.ToString(WorkContext.CurrentSite.SiteName)">
<meta property="og:site_name" content="#Convert.ToString(WorkContext.CurrentSite.SiteName)">
<meta property="og:url" content="#Request.Url">
<meta property="og:type" content="article">
<script>(function(d){d.className="dyn"+d.className.substring(6,d.className.length);})(document.documentElement);</script>
</head>
<body>
#Display(Model.Body)
#Display(Model.Tail)
</body>
</html>
Does anybody know how to achieve that?
IResourceManager provides the necessary methods.
For use it in view:
var resourceManager = WorkContext.Resolve<Orchard.UI.Resources.IResourceManager>();
resourceManager.SetMeta(new Orchard.UI.Resources.MetaEntry
{
Name = "description",
Content = "ABC"
});
But it can be also used in other places (e.g. part driver).
Edit
using SetMeta("description", "ABC") in view give the same results.
I used the following code in my layout.cshtml file before header tag and it worked.
#using (Script.Head())
{
<meta name="description" content="<your description>"/>
<meta name="keywords" content="<your keywords here>"/>
}
Enjoy!!!

Invalid Session Flow

I am trying to submit a form by PERL. I have managed to submit the form, but I am getting an HTML page showing "Invalid Session Flow" after the form submission. If I submit from a browser, the new page contains another form.
I couldn't find the reason why that message could come. Is it possible to troubleshoot if I don't have any access on the server side? Or it has to be checked from server side?
My Code:
my $url = "https://MY_URL";
my $Browser = new LWP::UserAgent();
$Browser->ssl_opts(verify_hostname => 0,SSL_verify_mode => 0x00);
my $page = $Browser->get($url);
my $content = HTML::TreeBuilder->new_from_content($page->decoded_content) or die $!;
my $match = $content->find_by_attribute('name' => 'token');
my $token = $match->attr('value');
chomp($token);
my %fileds = ("DATA" => "STD111","token" => $token);
my $Page = $Browser->request(POST $url,\%fileds);
if ($Page->is_success){
print $Page->status_line."\n";
print $Page->content."\n";
}else{
print $Page->status_line."\n";
print $Page->message;
}
Below is the page sources viewed from FireFox
Initial Page:
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1">
<title>Website Title</title>
</head>
<body>
<form method="post" action="/">
<input type="hidden" name="token" value="5f75b4fb68ed">
<input name="stdname">
<input type="submit" value="Submit">
</form>
</body>
</html>
The output I am getting:
200 OK
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1">
<title>Website Title</title>
</head>
<body>
<form method="get" action="/">
ERROR: Invalid session flow<br>
<input type="submit" value="Relogin">
</form>
</body>
</html>
The Actual Landing page when submitted via any browser:
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1">
<title>Website Title</title>
</head>
<body>
<form method="post" action="/">
<input type="hidden" name="token" value="5f75b4fb68ed">
<input type="password" name="stdpass">
<input type="submit" value="Submit">
</form>
</body>
</html>
It's likely that the browser is sending other headers that your LWP program is omitting. When faced by a situation like this, I find the best approach is to use browser plugin I like Live HTTP Headers for Firefox) that traces the actual HTTP transaction and then change my program to get as close to that as possible.

PHP 5.3 write contents to plain text

I am trying to make a PHP script write into a plain text file. I have done this before and it worked just fine. But it's not working this time for some reason.
Here is the HTML I am using:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<link rel="stylesheet" type="text/css" href="/css/feedback.css" >
<title>Questions, Comments, Suggestions</title>
</head>
<body>
<p class="title">Questions, comments, and suggestions here!</p>
<form method="post" name="userFeedback" action="/submit.php">
<textarea id="comments" placeholder="Leave a comment or review here..."></textarea>
<textarea id="name" placeholder="Your name here"></textarea>
<textarea id="contact" placeholder="Put any means of contact you want to here (optional)"></textarea>
<br>
<input class="enter" type="submit" value="Enter">
</form>
</body>
</html>
All I want to do with this is to print out whatever is entered onto a plain .txt file with PHP 5.3. Here is the code:
$data = ($_POST["comments"] ." || ". $_POST["name"] ." || ". $_POST["contact"]);
$data = strip_tags($data);
$file = "feedback.txt";
$f = fopen($file, "a+");
fwrite($f, $data . "\n" . "\n");
fclose($f);
header ( 'Location: index.html' );
Please remember that I am using 5.3. I'm sure there's a simple error in here somewhere. Can someone help me with this? Thank you in advance!
We got it! Turns out that the PHP $_POST method looks for the "name" attribute and not the "id".

Why does Lift escape this <lift:bind> value?

I have (roughly) this LIFT-ified HTML in my default template:
<html>
<head>
<title>FooBar Application | <lift:bind name="page-title"/></title>
</head>
<body>
<h1><lift:bind name="page-title" /></h1>
<div id="page-content">
<lift:bind name="page-content" />
</div>
</body>
</html>
...and then this in my main template:
<lift:surround at="page-content">
<lift:bind-at name="page-title">Home</lift:bind-at>
</lift>
...which give me this in the generated HTML:
<html>
<head>
<title>FooBar Application | <lift:bind name="page-title"/></title>
</head>
<body>
<h1>Home</h1>
<div id="page-content">
</div>
</body>
</html>
Why is the <lift:bind> tag in the <title> getting escaped, and the one in the <body><h2> not? And how do I prevent that from happening?
Are the pages defined through SiteMap? As was mentioned before, <title> can be a special case, and it is interpreted in several places - some of which might be doing the escaping. If you can, I'd try setting the page title in one of two ways:
Through the Sitemap you can use the Title Loc Param as referenced here: Dynamic title with Lift
You can also have something like: <title data-lift="PageTitle"></title> to have it invoke a snippet called page-title. Where, the snippet would be something like:
class PageTitle {
def render = "*" #> "FooBar Application | Home"
}

Update og:title

i used this tutorial https://developers.facebook.com/docs/opengraph/tutorial/ to make my first App but i want to use php variable for my og:title like this <meta property="og:title" content="<?php echo $title; ?>" /> . This php variable is changing constantly every time the page loads but when I post my action in the facebook appears with the old title!
Here's the code:
<?php
$title = ' Hello world';
// <---- This php variable is changing every time
the page loads but the title is not being recognised
?>
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US"
xmlns:fb="https://www.facebook.com/2008/fbml">
<head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# fitnessgod: http://ogp.me/ns/fb/fitnessgod#">
<meta property="og:locale" content="en_US" />
<meta property="fb:app_id" content="My app Id" />
<meta property="og:type" content="fitnessgod:news" />
<meta property="og:url" content="https://www.fitness-god.com/share-facebook.php" />
<meta property="og:title" content="<?php echo $title; ?>" />
<meta property="og:description" content="let's do sport" />
<meta property="og:image" content="https://www.fitness-god.com/images/sport dinamic.png" />
<script type="text/javascript">
function postCook()
{
FB.api('/me/fitnessgod:share' +
'?news=https://www.fitness-god.com/share-facebook.php','post',
function(response) {
if (!response || response.error) {
alert('Error occured');
} else {
alert('Post was successful! Action ID: ' + response.id);
}
});
}
</script>
</head>
<body>
<div id="fb-root"></div>
<script src="https://connect.facebook.net/en_US/all.js"></script>
<script>
FB.init({
appId:'My App id', cookie:true,
status:true, xfbml:true, oauth:true
});
</script>
<fb:add-to-timeline></fb:add-to-timeline>
<h3>
<font size="30" face="verdana" color="grey">
Stuffed Cookies
</font>
</h3>
<p>
<img title="Sports News"
src="https://www.fitness-god.com/images/sport dinamic.png"
width="550"/><br />
</p>
<form>
<input type="button" value="Share news" onClick="postCook()" />
</form>
<fb:activity actions="fitnessgod:share"></fb:activity>
</body>
</html>
You need to correct the meta tags definition. Use attribute name, not property:
E.g.:
<meta name="og:title" content="<?php echo $title; ?>" />
You can use the debugger to test your code:
http://developers.facebook.com/tools/debug
Also note that when you click share, the content of your page might be cached. But, the debugger, always gets the latest content.
It appears you are experiencing "freezing" of og:title that Facebook performs after a number of actions – 50 likes, shares and/or comments – were performed with that link.
One of the main reasons why Facebook does this, is to prevent many people sharing a seemingly safe link and then having the site owner replace the title with offensive content and have it display on people's timelines.
For additional information, see this link.