I want to extract(parse) a portion HTML document from an external website using php
For example: To extract news from yahoo, i tried using SimpleHTML DOM Parser from sourceforge
<?php
$url="http://news.yahoo.com/einsteins-brain-now-interactive-ipad-app-071441969.html";
include('simple_html_dom.php');
$html=new simple_html_dom();
$html->load_file($url);
$xxx=$html->find('title')->innertext;
echo $xxx;
?>
Fatal error: Call to a member function find() on a non-object in
/home/a1234bc/public_html/simple_html_dom.php on line 1113
Then I tried to echo the loaded html
<?php
$url="http://news.yahoo.com/einsteins-brain-now-interactive-ipad-app-071441969.html";
include('simple_html_dom.php');
$html=new simple_html_dom();
$html->load_file($url);
echo $html;
?>
Now I get:
Fatal error: Call to a member function innertext() on a non-object in
/home/a1234bc/public_html/simple_html_dom.php on line 1688
I also tried using DOMDocument() through file_get_contents()
<?php
$url="http://news.yahoo.com/einsteins-brain-now-interactive-ipad-app-071441969.html";
$content = file_get_contents($url);
// echo $content works perfect
$doc = new DOMDocument();
$doc->loadHTML($content);
$jjj=$doc->getElementsByTagName('title')->item(0);
echo $jjj;
?>
This throws up a very long list of Warnings. So let me just copy paste the first 10 alone
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]:
htmlParseEntityRef: no name in Entity, line: 166 in
/home/a1234bc/public_html/simple_html_dom.php on line 37
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]:
htmlParseEntityRef: expecting ';' in Entity, line: 166 in
/home/a1234bc/public_html/simple_html_dom.php on line 37
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]:
htmlParseEntityRef: no name in Entity, line: 256 in
/home/a1234bc/public_html/simple_html_dom.php on line 37
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]:
htmlParseEntityRef: expecting ';' in Entity, line: 256 in
/home/a1234bc/public_html/simple_html_dom.php on line 37
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag
fb:login-button invalid in Entity, line: 256 in
/home/a1234bc/public_html/simple_html_dom.php on line 37
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]:
htmlParseEntityRef: expecting ';' in Entity, line: 275 in
/home/a1234bc/public_html/simple_html_dom.php on line 37
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]:
htmlParseEntityRef: expecting ';' in Entity, line: 287 in
/home/a1234bc/public_html/simple_html_dom.php on line 37
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]:
htmlParseEntityRef: expecting ';' in Entity, line: 292 in
/home/a1234bc/public_html/simple_html_dom.php on line 37
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]:
htmlParseEntityRef: expecting ';' in Entity, line: 311 in
/home/a1234bc/public_html/simple_html_dom.php on line 37
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Attribute
class redefined in Entity, line: 325 in
/home/a1234bc/public_html/simple_html_dom.php on line 37
Can someone please point me to the right direction?
I got the same error when using the Object-oriented way as shown in the manual:
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load('<html><body>Hello!</body></html>');
// Load HTML from a URL
$html->load_file('http://www.google.com/');
// Load HTML from a HTML file
$html->load_file('test.htm');
Got rid of the error and got my script working when I switched to the Quick way as shown in the manual:
// Create a DOM object from a string
$html = str_get_html('<html><body>Hello!</body></html>');
// Create a DOM object from a URL
$html = file_get_html('http://www.google.com/');
// Create a DOM object from a HTML file
$html = file_get_html('test.htm');
After this $html->find worked just fine!
The PHP Simple HTML DOM Parser manual can be found here: http://simplehtmldom.sourceforge.net/manual.htm
Hope this helps!
DOMDocument/SimpleXML are designed for parsing XML not HTML. You would need to use file_get_contents to get the HTML into a string and then using string manipulation functions to get the portion you need. preg_match_all would be a good place to start.
Related
I took a crash dump of my application when I got the "An item with the same key has already been added" exception. I need help finding which object caused this exception. I could print the exception but couldn't figure out how to find the exact key that caused the exception.
This is likely the state you have:
[...]
(3250.7ec): CLR exception - code e0434352 (!!! second chance !!!)
[...]
0:000> .loadby sos clr
0:000> !pe
Exception object: 030c31e8
Exception type: System.ArgumentException
Message: An item with the same key has already been added.
InnerException: <none>
StackTrace (generated):
SP IP Function
010FEE1C 6045F705 mscorlib_ni!System.ThrowHelper.ThrowArgumentException(System.ExceptionResource)+0x35
010FEE2C 609410C7 mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].Insert(System.__Canon, System.__Canon, Boolean)+0xc6af67
010FEE60 5FD4B310 mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].Add(System.__Canon, System.__Canon)+0x10
010FEE68 017004F5 KeyAlreadyAdded!KeyAlreadyAdded.Program.Main()+0x45
[...]
In the native call stack, you can see the call to Dictionary.Add() again, but with the additional information for the frame number:
0:000> k
# ChildEBP RetAddr
00 010fecb0 618fac03 KERNELBASE!RaiseException+0x62
01 010fed4c 618fae08 clr!RaiseTheExceptionInternalOnly+0x27c
02 010fee14 6045f705 clr!IL_Throw+0x141
03 010fee24 609410c7 mscorlib_ni!System.ThrowHelper.ThrowArgumentException(System.ExceptionResource)$##6000335+0x35
04 010fee50 5fd4b310 mscorlib_ni![COLD] System.Collections.Generic.Dictionary`2[System.__Canon,System.__Canon].Insert(System.__Canon, System.__Canon, Boolean)$##6003922+0x87
05 010fee68 6181ebe6 mscorlib_ni!System.Collections.Generic.Dictionary`2[System.__Canon,System.__Canon].Add(System.__Canon, System.__Canon)$##6003915+0x10
[...]
At the Insert() method, you can use the ebx register to get the key:
0:000> .frame /r 4
04 010fee50 5fd4b310 mscorlib_ni![COLD] System.Collections.Generic.Dictionary`2[System.__Canon,System.__Canon].Insert(System.__Canon, System.__Canon, Boolean)$##6003922+0x87
eax=010fec58 ebx=030c2364 ecx=00000005 edx=00000000 esi=030c23b0 edi=030c2364
[...]
0:000> !do 030c2364
Name: System.String
MethodTable: 5fdefd60
EEClass: 5f9c4e90
Size: 22(0x16) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String: this
[...]
So in this case, the duplicate key being added is the string "this". Here's the code:
using System.Collections.Generic;
namespace KeyAlreadyAdded
{
class Program
{
static Dictionary<string, string> dict = new Dictionary<string, string> {{"this", "was already inside"}};
static void Main()
{
dict.Add("that", "goes in easily");
dict.Add("this", "however, causes a duplicate key exception");
}
}
}
On updating to tns core module 2.2.0 and angular rc4 (officially released version by telerik), My app can no longer make http calls to a server, I keep getting this error
com.tns.NativeScriptException:
Calling js method onClick failed
EXCEPTION: Error in /data/data/org.nativescript.EatSafe/files/app/pages/login/login.html:5:75
ORIGINAL EXCEPTION: Error: not implemented
ORIGINAL STACKTRACE:
Error: not implemented
at NativeScriptDomAdapter.Parse5DomAdapter.getCookie (/data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/platform-server/src/parse5_adapter.js:619:68)
at CookieXSRFStrategy.configureRequest (/data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/http/src/backends/xhr_backend.js:150:82)
at XHRBackend.createConnection (/data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/http/src/backends/xhr_backend.js:165:28)
at httpRequest (/data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/http/src/http.js:22:20)
at Http.post (/data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/http/src/http.js:78:16)
at UserService.signin (/data/data/org.nativescript.EatSafe/files/app/shared/services/user.service.js:13:27)
at LoginComponent.login (/data/data/org.nativescript.EatSafe/files/app/pages/login/login.component.js:31:27)
at DebugAppView._View_LoginComponent0._handle_tap_8_0 (LoginComponent.template.js:355:28)
at /data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/core/src/linker/view.js:375:24
at /data/data/org.nativescript.EatSafe/files/app/tns_modules/nativescript-angular/renderer.js:204:26
ERROR CONTEXT:
[object Object]
File: "/data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/core/src/linker/view.js, line: 365, column: 16
StackTrace:
Frame: function:'DebugAppView._rethrowWithContext', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/core/src/linker/view.js', line: 365, column: 17
Frame: function:'', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/core/src/linker/view.js', line: 378, column: 23
Frame: function:'', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/nativescript-angular/renderer.js', line: 204, column: 26
Frame: function:'ZoneDelegate.invoke', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/zone.js/dist/zone-node.js', line: 290, column: 29
Frame: function:'NgZoneImpl.inner.inner.fork.onInvoke', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/core/src/zone/ng_zone_impl.js', line: 53, column: 41
Frame: function:'ZoneDelegate.invoke', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/zone.js/dist/zone-node.js', line: 289, column: 35
Frame: function:'Zone.run', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/zone.js/dist/zone-node.js', line: 183, column: 44
Frame: function:'NgZoneImpl.runInner', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/core/src/zone/ng_zone_impl.js', line: 84, column: 71
Frame: function:'NgZone.run', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/#angular/core/src/zone/ng_zone.js', line: 235, column: 66
Frame: function:'zonedCallback', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/nativescript-angular/renderer.js', line: 203, column: 24
Frame: function:'Observable.notify', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/data/observable/observable.js', line: 174, column: 23
Frame: function:'Observable._emit', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/data/observable/observable.js', line: 193, column: 18
Frame: function:'_android.setOnClickListener.android.view.View.OnClickListener.onClick', file:'/data/data/org.nativescript.EatSafe/files/app/tns_modules/ui/button/button.js', line: 33, column: 32
at com.tns.Runtime.callJSMethodNative(Native Method)
at com.tns.Runtime.dispatchCallJSMethodNative(Runtime.java:862)
at com.tns.Runtime.callJSMethodImpl(Runtime.java:727)
at com.tns.Runtime.callJSMethod(Runtime.java:713)
at com.tns.Runtime.callJSMethod(Runtime.java:694)
at com.tns.Runtime.callJSMethod(Runtime.java:684)
at com.tns.gen.android.view.View_OnClickListener.onClick(View_OnClickListener.java:11)
at android.view.View.performClick(View.java:5233)
at android.view.View$PerformClick.run(View.java:21209)
at android.os.Handler.handleCallback(Handler.java:739)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:152)
at android.app.ActivityThread.main(ActivityThread.java:5507)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:726)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)
I have been trying to look online for a release log of the update to see if there are any breaking changes, but to no avail. does anyone have any pointers on how to make http calls with the new nativescript angular updates?
Thank you
I got this fixed by importing in app.moudle.ts the following
import {NativeScriptHttpModule} from 'nativescript-angular/http';
And then
imports: [NativeScriptHttpModule]
You need this code in your main.ts file
import {Parse5DomAdapter} from '#angular/platform-server/src/parse5_adapter';
(<any>Parse5DomAdapter).prototype.getCookie = function (name) { return null; };
I am parsing(reading) an xml file(without root tag) that has multiple records through spring batch, but it is throwing an error. It works fine with the root tag but i want to parse it without root tag.
Below is the exception:
caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[640,2]
Message: The markup in the document following the root element must be well-formed.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:601)
at com.sun.xml.internal.stream.XMLEventReaderImpl.peek(XMLEventReaderImpl.java:276)
at org.springframework.batch.item.xml.stax.DefaultFragmentEventReader.nextEvent(DefaultFragmentEventReader.java:114)
at org.springframework.batch.item.xml.stax.DefaultFragmentEventReader.markFragmentProcessed(DefaultFragmentEventReader.java:184)
... 33 more
and my fragmentRootElementName RTT
<property name="fragmentRootElementName" value="RTT" />
below is the xml syntax:
<RTT>
<tranaction> </transaction>
<data></date>
</RTT>
<RTT>
<tranaction> </transaction>
<data></date>
</RTT>
<RTT>
<tranaction> </transaction>
<data></date>
</RTT>
It works fine if i put it in a root element :
<RttData> <RTT></RTT><RTT></RTT> <RTT></RTT> <RTT></RTT></RttDate>
I am performing a transformation and getting the following error:
ERROR 2013-10-02 12:38:19,763 [[vistaesb].VistaESBFlow1.stage1.04] org.mule.exception.DefaultMessagingExceptionStrategy:
Message : Failed to transform from "json" to "personal_information"
Code : MULE_ERROR-109
Exception stack is:
1. Unrecognized field "phone_number" (Class personal_information), not marked as ignorable
at [Source: java.io.InputStreamReader#ac7e4af; line: 2, column: 21] (through reference chain: personal_information["phone_number"]) (org.codehaus.jackson.map.exc.UnrecognizedPropertyException)
org.codehaus.jackson.map.exc.UnrecognizedPropertyException:53 (null)
2. Failed to transform from "json" to "personal_information" (org.mule.api.transformer.TransformerException)
org.mule.module.json.transformers.JsonToObject:136 (http://www.mulesoft.org/docs/site/current3/apidocs/org/mule/api/transformer/TransformerException.html)
MY configuration is simple enough:
<flow name="VistaESBFlow1" doc:name="VistaESBFlow1">
<jdbc-ee:inbound-endpoint queryKey="personal_information" responseTimeout="1000" encoding="UTF-8" mimeType="text/plain" queryTimeout="-1" pollingFrequency="10000" connector-ref="applyVista_dev" doc:name="Data Entry Point">
</jdbc-ee:inbound-endpoint>
<json:object-to-json-transformer doc:name="Object to JSON"/>
<data-mapper:transform config-ref="new_mapping_grf" doc:name="DataMapper"/>
<json:json-to-object-transformer doc:name="JSON to Object" encoding="utf8" returnClass="personal_information" mimeType="text/plain"/>
<file:outbound-endpoint path="C:\Users\abrowning\Desktop\test" responseTimeout="10000" doc:name="File" encoding="utf8" mimeType="text/plain"/>
</flow>
There is a link to a similar problem here, 109 Error, but i don't think this has to do with my endpoint.
I'm guessing a 109 is a bush-league error, so nay help is appreciated.
The source of my issue was that I had a data mismatch in my get/set methods and after having written a lot of PHP over the last year, I over looked that.
I can't get the SOAP server working in Zend Framework 2 module. I am not completely sure, but I believe that the problem is the WSDL file. I try to create the WSDL file via Autodiscover, which is provided by the Zend Framework. Here is the error.log:
[Fri Apr 19 20:39:29 2013] [error] [client 172.23.31.109] PHP Warning: SoapServer::SoapServer(): I/O warning : failed to load external entity "http-LINK/services?wsdl" in /PATH/public_html/vendor/zendframework/zendframework/library/Zend/Soap/Server.php on line 749
[Fri Apr 19 20:39:29 2013] [error] [client 172.23.31.109] PHP Fatal error: SOAP-ERROR: Parsing WSDL: Couldn't load from 'http-LINK/services?wsdl' : failed to load external entity "http-LINK/services?wsdl"\n in /PATH/public_html/vendor/zendframework/zendframework/library/Zend/Soap/Server.php on line 749
I added an own module for this services test, this is the structure, module is called "Services":
-Services
--config
---module.config.php
--src
---Services
----API
-----1.0
------servicesAPI.php
---Controller
----ServicesController.php
--view
---services
----serivces
-Module.php
-autoload_classmap.php
This is my file "servicesAPI.php"
class servicesAPI {
/**
* This method takes a value and gives back the md5 hash of the value
*
* #param String $value
* #return String
*/
public function md5Value($value) {
return md5($value);
}
}
And this is ServicesController.php:
namespace Services\Controller;
ini_set("soap.wsdl_cache_enabled", 0);
use Zend\Mvc\Controller\AbstractActionController;
use Zend\Soap\AutoDiscover;
use Zend\Soap\Server;
require_once __DIR__ . '/../API/1.0/servicesAPI.php';
class ServicesController extends AbstractActionController {
private $_options;
private $_URI = "http-LINK/services";
private $_WSDL_URI = "http-LINK/services?wsdl";
public function indexAction() {
if (isset($_GET['wsdl'])) {
$this->handleWSDL();
} else {
$this->handleSOAP();
}
}
private function handleWSDL() {
$autodiscover = new AutoDiscover();
$autodiscover->setClass('servicesAPI')
->setUri($this->_URI);
$autodiscover->handle();
}
private function handleSOAP() {
$soap = new Server($this->_WSDL_URI);
$soap->setClass('servicesAPI');
$soap->handle();
}
}
So when I deploy this and open http-LINK/services in the browser, it gives me the following:
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Body>
<SOAP-ENV:Fault>
<faultcode>WSDL</faultcode>
<faultstring>
SOAP-ERROR: Parsing WSDL: Couldn't load from 'http-LINK/services?wsdl' : failed to load external entity "http-LINK/services?wsdl"
</faultstring>
<detail/>
</SOAP-ENV:Fault>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
On this call also the PHP error output is written!
If I try to open the services?wsdl in browser, it shows me this (chrome and safari):
This page contains the following errors:
error on line 3 at column 1: Extra content at the end of the document
Below is a rendering of the page up to the first error.
This method takes a value and gives back the md5 hash of the value
But if I inspect the element, it looks completely ok:
<?xml version="1.0" encoding="utf-8"?>
<definitions xmlns="http://schemas.xmlsoap.org/wsdl/" xmlns:tns="http-LINK/services" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:soap12="http://schemas.xmlsoap.org/wsdl/soap12/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap-enc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" name="servicesAPI" targetNamespace="http-LINK/services"><types><xsd:schema targetNamespace="http-LINK/services"/></types><portType name="servicesAPIPort"><operation name="md5Value"><documentation>This method takes a value and gives back the md5 hash of the value</documentation><input message="tns:md5ValueIn"/><output message="tns:md5ValueOut"/></operation></portType><binding name="servicesAPIBinding" type="tns:servicesAPIPort"><soap:binding style="rpc" transport="http://schemas.xmlsoap.org/soap/http"/><operation name="md5Value"><soap:operation soapAction="http-LINK/services#md5Value"/><input><soap:body use="encoded" encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" namespace="http-LINK/services"/></input><output><soap:body use="encoded" encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" namespace="http-LINK/services"/></output></operation></binding><service name="servicesAPIService"><port name="servicesAPIPort" binding="tns:servicesAPIBinding"><soap:address location="http-LINK/services"/></port></service><message name="md5ValueIn"><part name="value" type="xsd:string"/></message><message name="md5ValueOut"><part name="return" type="xsd:string"/></message></definitions>
I can validate this xml with any xml validator, it seems to be valid.
I read all the posts concerning this on stackoverflow, searched google, but none of the solutions helped me. Here a short list of what else I tried:
According to this: https://bugs.php.net/bug.php?id=48216 I tried to save the wsdl xml to a file and open it from this file when starting the soap server, fail
I tried to run the autodiscover and soapserver statements with try/catch to catch any exceptions, nothing appears
I tried with echo-ing through toXML() and other outputs, fail
Used XMLReader::open and isValid to make sure, that the xml is valid (it is)
Some more information:
PHP Version 5.3.23
Ubuntu server 11.04
php-soap module is loaded
Zend Framework version 2.1.4
Any help or hints are appreciated. Thanks in advance.
Try instantiate the Soap Server class this way:
...
private function handleSOAP() {
$soap = new Server(
null, array(,
'wsdl' => http-LINK/services?wsdl,
)
);
$soap->setClass('servicesAPI');
$soap->handle();
}
....
Also you should add this line to the end of your indexAction()
return $this->getResponse();
.. it disables the layout.