PHP Simple HTML DOM Parser Manual

FAQ

Problem with finding

Top
Q: Element not found in such case:
$html->find('div[style=padding: 0px 2px;] span[class=rf]');

A: If there is blank in selectors, quote it!  
$html->find('div[style="padding: 0px 2px;"] span[class=rf]');

Problem with hosting

Top
Q: On my local server everything works fine, but when I put it on my external server it doesn't work. 

A: The "file_get_dom" function is a wrapper of "file_get_contents" function,  you must set "allow_url_fopen" as TRUE in "php.ini" to allow accessing files via HTTP or FTP. However, some hosting venders disabled PHP's "allow_url_fopen" flag for security issues... PHP provides excellent support for "curl" library to do the same job, Use curl to get the page, then call "str_get_dom" to create DOM object. 

Example: 
 
$curl = curl_init(); 
curl_setopt($curl, CURLOPT_URL, 'http://????????');  
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);  
$str = curl_exec($curl);  
curl_close($curl);  
 
$html= str_get_html($str); 
... 

Behind a proxy

Top
Q: My server is behind a Proxy and i can't use file_get_contents b/c it returns a unauthorized error.

A: Thanks for Shaggy to provide the solution: 
 
// Define a context for HTTP. 
$context = array

       'http' => array
       ( 
              'proxy' => 'addresseproxy:portproxy', // This needs to be the server and the port of the NTLM Authentication Proxy Server. 
              'request_fulluri' => true, 
       ), 
); 

$context = stream_context_create($context); 
 
$html= file_get_html('http://www.php.net', false, $context); 
...

Memory leak!

Top
Q: This script is leaking memory seriously... After it finished running, it's not cleaning up dom object properly from memory.. 

A: Due to php5 circular references memory leak, after creating DOM object, you must call $dom->clear() to free memory if call file_get_dom() more then once. 

Example: 

$html = file_get_html(...); 
// do something... 
$html->clear(); 
unset($html);

Author: S.C. Chen (me578022@gmail.com)
Original idea is from Jose Solorzano's HTML Parser for PHP 4.
Contributions by: Yousuke Kumakura, Vadim Voituk, Antcs