20-760: Web-Based Information Architectures

[ Home | Schedule | Announcements ]


Web-Based Information Management entails the design, creation, instrumentation and usage of web sites and related indexing and searching software.  The course focuses first on web-based search engines: how to use them optimally, how to design e-commerce sites that maximize customer attraction via search engines, how to analyze competition, and how to architect both topological and key-term page access paths in service of successful e-commerce infrastructures. Then, the course focuses on key technological underpinnings, primarily the hands-on creation of a search engine, including inverted-indexing, partial matching, query-expansion and spidering technology.   Subsequently, the course addresses related issues in web-based information architectures, including: automated text categorization (e.g. indexing web pages into Yahoo-like taxonomies or auction-site catalogs), information extraction from web-pages, and a glimpse into larger-scale text and data mining methods.  Time permitting, the course will survey issues such as multi-lingual web access and distributed information retrieval.

Course Information

Location: GSIA 152

Time: 8:30am - 9:50am Tue/Thu 

Instructor: Prof. Jaime G. Carbonell (augmented with other expert guest lecturers)
Office: NSH 4519

Teaching assistant: Yan Liu
Office: NSH 4506
Office Hours: 4:00pm-5:00pm Thu

Course secretary: TBA
Office: NSH 4517

Prerequisite Skills
  • Basic programming skills (Preferably JAVA)
  • Familiarity with the Web (HTML, browsing, etc)
  • Fundamentals of Web Programming

  • Required: Class notes and handouts
  • Required:
  • Understanding Search Engines: Mathematical Modeling and Text Retrieval
    by Michael W. Berry, Murray Browne
    Also Available at http://www.siam.org/ or call their number 1-800-447-7426
  • Optional Course Materials:
  • 1. Understanding Search Engines: Mathematical Modeling and Text Retrieval (chapter 1-3)
    2. Large-Scale, Component-Based Development (chapter 2)
    3. Databases and Transaction Processing: An Application-Oriented Approach (chapter 4)
    4. The Digital Economy Fact Book (chapter 5)
    There are two copies placed on reserve in the Hunt Library.
  • Optional:
  • Advances in Information Retrieval
    Edited by Croft, Kluwer Academic Publishers, 2000
    [A more detailed state-of-the-art IR book]
  • Optional:
  • Machine Learning
    by Tom M. Mitchell, WCB McGraw-Hill
    [Tools for text categorization and data mining]

  • 30% homeworks (2 programming assignments)
  • 30% mini-project (optional presentation with extra-credits)
  • 15% midterm (closed book, caculator OK, no laptop, you can bring up to 5 page notes)
  • 25% final exam (close book, no laptops, you can bring up to 10 page notes)
  • [ Home | Schedule | Announcements ]
    Last Modified:
    Friday, October 24, 2003