I don't know about you but when I am searching for video game cheats and stuff, the first place i check is gamefaqs.com If there are any other place that you personal go for video game codes, then leave a comment below and share with everyone. For now we will use gamefaqs.com as our main source of information for this video game database.
Of course Game Faqs lists every single video game platform that they have cheats for. They have more than enough listed make our database the largest video game database out there. What we need from that page is a list of those platforms and there respective link. So lets get that first. . .
Create your database first.
CREATE TABLE IF NOT EXISTS `systems` ( `id` int(11) NOT NULL auto_increment, `url` text NOT NULL, `name` text NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
$host = "localhost";
$username = "username";
$password = "password";
$database = "database name";
$connect = mysql_connect ($host,$username,$password) or die (mysql_error());
mysql_select_db ($database) or die (mysql_error());
$url = "http://www.gamefaqs.com/systems.html";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);
preg_match_all ("/<ul class=\"systems\">([^`]*?)<\/ul>/", $html, $matches);
for ($j =0; $j <= 22; $j++) {
preg_match_all ("/<a href([^`]*?)<\/a>/" , $matches[0][$j], $match);
$size = sizeof($match[1]);
$list = $match['1'];
for($i=0; $i < $size; $i++) {
$pieces = explode(">", $list[$i]);
$theUrl = $pieces[0];
$theUrl = rtrim ($theUrl,'"');
$theUrl = ltrim ($theUrl,'="');
print "<br/>".$theUrl." ".$pieces[1];
$query = mysql_query("INSERT INTO systems (url,name) VALUES ('$theUrl','$pieces[1]')") or die(mysql_error());
}
}
The code above grabs all of the game systems listed on gamefaqs.com
We begin with are variable $url which hold the url string. Next we finalize curl to $ch and set it up with curl_setopt. Were going to use only the bare essentials for this scraping. Of course another way of scraping the content from the pages is file_get_contents().Our first preg_match_all scans through the page we just grabbed with curl and finds all of the occurrences of the systems class. Then it stores them into a array called $matches.
Next we enter our first for loop. Initialize $j to zero , check if it is less than or equal to 22, then we run our code and increment $j. Why the number 22. Well $matches has 22 elements in its array.
Then we have yet another preg_match_all to pull the actually url's and anchor text from those elements. The rest is just a bit of clean up work. Making sure that all of the information is read to go.
That's it for Part 1, if you have any questions leave a comment below




on July 15 2008 2:21 pm
c107t [a] http://www.mylot.com/Citigroup401kplan8 [/a] [a] http://www.mylot.com/KandelightKabins8 [/a] [a] http://www.mylot.com/Autozonecares1 [/a] [a] http://www.mylot.com/MyronKandel6 [/a] [a] http://www.mylot.com/PalmBeachRentfinder2 [/a] [a] http://www.mylot.com/AigValic401kplanDis3 [/a] [a] http://www.mylot.com/Kandelia9 [/a] [a] http://www.mylot.com/HarryPotterAndThePs0 [/a] citigroup 401kplan - http://www.mylot.com/Citigroup401kplan8 kandelight kabins - http://www.mylot.com/KandelightKabins8 autozonecares - http://www.mylot.com/Autozonecares1 myron kandel - http://www.mylot.com/MyronKandel6 [url=http://www.mylot.com/Citigroup401kplan8]citigroup 401kplan[/url] <a href=\"http://www.mylot.com/Citigroup401kplan8\"> citigroup 401kplan </a>