Featured Posts

Quickies Tip: Set your Glype tmp, logs and cache folders outside the web root. No, chmod'ing to 700 won't work, lol. WTF: Uneven Google! Useful: Fairly decent and promising project: proxycoder.com Useful:...

Read more

Triond 1,000 challenge? Yeah, right... image via Over the last few months I kept bumping into a "challenge" on Triond: people are either trying to get 1,000 articles by the end of 2010 or make $1,000 in any given 30 days (there are at...

Read more

EzineArticles scraper Piece of code that queries EZA for a given string and grabs an article at random from the first results page. Again, this is slow as fuck and shouldn't be used for production sites. [php]<?php ini_set('error_reporting',...

Read more

Basic scraper with PHP and DOM Who says you need UBot to run basic scraping tasks? Here's a trivial script that scrapes centurian.org. It's unoptimized (i.e. slow as fuck) but it still does a great job. [php] <?php // DOMDocument()s...

Read more

Resuming uploads with ProFTPD ProFTPD doesn't allow resuming of uploads out of the box. Here's a quick hack around it: edit the config file (usually /etc/proftpd/proftpd.conf, but can depend on your distro) and add AllowOverwrite...

Read more

  • Prev
  • Next

EzineArticles scraper

Category : Uncategorized

Piece of code that queries EZA for a given string and grabs an article at random from the first results page. Again, this is slow as fuck and shouldn’t be used for production sites.

<?php
ini_set('error_reporting', 0);

$scrapeURL		= 'http://ezinearticles.com/search/?q=';
$baseURL		= 'http://ezinearticles.com';
$query			= 'project+management';

$dom = new DOMDocument();
$dom->loadHTMLFile($scrapeURL . $query);

$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
	if ($div->getAttribute('class') == 'srch_title') {
		foreach ($div->getElementsByTagName('a') as $a) {
			$articles[] = $a->getAttribute('href');
		}
	}
}
$articleURL = $baseURL . $articles[rand(0,count($articles)-1)];

$dom->loadHTMLFile($articleURL);
$article['title'] 	= $dom->getElementsByTagName('h1')->item(0)->textContent;
$article['body'] = $dom->getElementById('body')->textContent;

print "<pre>";
print_r($article);
print "</pre>";

?>

Post a comment


Warning: fsockopen() [function.fsockopen]: php_network_getaddresses: getaddrinfo failed: Name or service not known in /var/www/pleech.com/wp-includes/class-snoopy.php on line 1142

Warning: fsockopen() [function.fsockopen]: unable to connect to twitter.com:80 (php_network_getaddresses: getaddrinfo failed: Name or service not known) in /var/www/pleech.com/wp-includes/class-snoopy.php on line 1142