Carnegie Bosch Professor of Business Technologies and Marketing
Distinguished Fellow, INFORMS Information Systems Society
 Director, PNC Center for Financial Services Innovation
David A. Tepper School  of Business
Carnegie Mellon  University
Pittsburgh, PA 15213
Phone: 412-268-3585
  
Resources for PhD Students
How to download data  from Web?
    The ability to download data from the web is really helpful  in business research. I personally benefitted from the codes that Professor  Ravi Bapna (University of Minnesota) posted in the past on his website for  helping PhD students learn how to download data. Since that webpage is no  longer available, I wanted to provide this resource to the phd students who may  benefit from such codes.
    Here are some basic steps you will need to follow.
    1. Download and install XAMPP from http://www.apachefriends.org/en/xampp.html 
    There are three packages (apache, php, mysql) that you would  need to download data and store in a database. It is not easy to install an Apache web server and it gets harder if you  want to add MySQL, PHP and Perl. XAMPP is pre-packaged software  that has already added and combined these three packages. It is free.
2. To download data from the web you will need to write a PHP script. You should make sure that you save it in the “c:\xampp\htdocs” and make sure the file extension is “.php”.
3. Download a php editor such as editplus. Get the community  edition which is free. 
    Here is an example script to download the webpage http://www.amazon.com/dp/0691010188 and save it as file1.html in “C:/”.  Copy  and save the following code as example1.php  in htdocs. To execute this file start  xampp  and then type localhost/example1.php  in your browser and press enter
<?php
    $link=mysql_connect("localhost","root","");
    function project($link)
    {              
                  $urltoget =  "http://www.amazon.com/dp/0691010188";
    // $urltoget is the url we want to download
                  $uhandle =  fopen($urltoget,'r') or die();
                  $filetosave =  "c:/file1.html";
    //$filetosave is the name of the file which will be saved. 
    //You can choose the directory where you want to save it.
    $prhandle=fopen($filetosave,'w')or die();
    //While loop below reads the from $urltoget line by line
                  while(!feof($uhandle))
                                  {
                                  $gread=fgets($uhandle,4096);
                                  fwrite($prhandle,$gread);
                                  }
                  fclose($uhandle);
                  fclose($prhandle);
                  
    }
    project($link);
    ?>
4. Getting information from the file1.html. 
    Here is an example script that gets the title of the book and prints it  out. Copy and save the following code as example2.php  in htdocs. To execute this file start  xampp  and then type localhost/example2.php  in your browser and press enter
<?php
    $link=mysql_connect("localhost","root","");
    function project($link)
    {              
    $file = "c:/file1.html";
    $prhandle=fopen($file,'r')or die();
    $ctr=0;
                  while(!feof($prhandle))
                                  {
                                  $ctr=$ctr+1;
                                  $gread[$ctr]=fgets($prhandle,4096);
    //by looking at the html file I find that the title of the book is preceded  by a unique identified <title> 
    // below I use this information to identify the line which includes  this identifier
                                  
                                  if  (preg_match('/\<title\>/',$gread[$ctr]))
                                                  {
                                                  echo  strip_tags($gread[$ctr]);
    // If you wish to split the title further use a command preg_split to  do it.
                                                  }
                                  }
                  fclose($prhandle);
    }
    project($link);
    ?>
5. If you want to save the title in you’re a database you can do the following. Because we will use mysql it is useful to have a gui for mysql. I use sqlyog (https://code.google.com/p/sqlyog/downloads/list) as the gui. Use the community edition which is free. The gui is very straightforward.
Create a database “mydata” in mysql. The create a table “booktitle”. We  will have only one column in this table called “title”. I will set the field  type as smalltext. Once you have created the table you can run the following  script to get the title in the table.
  <?php
    $link=mysql_connect("localhost","root","");
    function project($link)
    {              
    mysql_select_db('mydata',$link);
    $file = "c:/file1.html";
    $prhandle=fopen($file,'r')or die();
    $ctr=0;
                  while(!feof($prhandle))
                                  {
                                  $ctr=$ctr+1;
                                  $gread[$ctr]=fgets($prhandle,4096);
                                  
                                  if  (preg_match('/\<title\>/',$gread[$ctr]))
                                                  {
                                                  echo  strip_tags($gread[$ctr]);
                                                  $title=strip_tags($gread[$ctr]);
                                                  mysql_query("Insert  into booktitle values ('$title')");
                                                  }
                                  }
                fclose($prhandle);
                  
    }
    project($link);
    ?>
Additional Resources.
    If you want to grab some information from the database and use it in  your php file you can do the following. We have a database called movies, which  has a table called movielist which has columns called imdbid and name.
                                mysql_select_db('movies',$link);
                                  $query  = mysql_query("select distinct imdbid, name from movielist order by  imdbid");
                                  while  ($cr = mysql_fetch_assoc($query))
    {
    //  in the while loop we are reading the data from the table row by row
   $moviename = trim($cr['name']);
                                                   $movieid=trim($cr['imdbid']);
    //// you can write the code which uses $moviename or $movieid within  the while loop.
    }