WHEN & WHERE
Spring 2007, 3:00-4:15pm M W, Room 325, Harrelson Hall
DESCRIPTION
The focus is on the mathematical technology involved in building various kinds of
information retrieval systems and search engines. The material will include classical
methods such as latent semantic indexing systems along with various document
clustering schemes and will proceed through more recent techniques based on
nonnegative matrix factorizations. A primary facet involves studying web search with
particular emphasis on the Google technology. It is hoped that upon successful
completion of the course students will have amassed enough knowledge to build their
own search engines. Material beyond the text,
Google's PageRank and Beyond: The Science of Search Engine Rankings,
will be drawn from a variety of recent research papers. The class will be conducted
"seminar style" with students giving presentations throughout the term.
GRADING
Grades are determined on the basis of projects and presentations.
PRESENTATIONS
Shaina Race will speak March 19 on the paper by Patrik O. Hoyer titled
Non-negative Matrix Factorization with Sparseness Constraints.
Anjela Govan will speak March 21 on "Algorithms for Clustering Data"
from a book of the same name by Anil K. Jain and Richard C. Dubes.
Teresa Selee will speak March 26 on the paper by J. Higham and Milla Kibble titled
A Unified View of Spectral Clustering
Hui Xie will speak March 28 on the paper by Litvak titled
The Effect of New Links on Google Pagerank
Rebecca Wills will speak April 2 on the paper by Gyongyi, Garcia-Molina, and
Pedersen titled
Combating Web Spam with Trust Rank
Russell Albright and Jim Cox from SAS will speak on April 9 (during the SAS field trip
described below). Russ will speak on predictive modeling concepts and algorithms, and
Jim will address issues in text mining.
Barbara Ball from the College of Charleston will speak April 12 on "Recent Developments
in Clustering"
Mike Marshall from Fortune Interactive
will speak April 16 on Search Engine Optimization.
SAS Institute Field Trip
The class is scheduled to take a field trip to SAS institute on Monday, April 9,
where they will be given
a presentation and demo of SAS technology involving data and text mining. Below are directions
and times.
TEXT BOOK
Google's PageRank and Beyond: The Science of Search Engine Rankings,
by Amy Langville and Carl Meyer
PREREQUSITES
Probability at the MA421
level and Linear Algebra at the
MA523
or
MA520 level
SCHEDULE
FIRST CLASS: January 10 (Wednesday)
HOLIDAY: (MLK Day) January 15 (Monday)
SPRING BREAK: March 5-9 (Mon-Fri)
LAST DAY TO DROP: March 21 (Wednesday)
HOLIDAY: (Easter Vacation) April 6 (Friday)
LAST CLASS: All projects due on April 27 (Friday)
FINAL EXAM: There is no written final exam.
OUTLINE OF THE WEB SEARCH COMPONENT
Introduction to Web Search Engines
Crawling, Indexing, and Query Processing
Ranking Webpages by Popularity
The Mathematics of Google's PageRank
Parameters in the PageRank Model
The Sensitivity of PageRank
The PageRank Problem as a Linear System
Issues in Large-Scale Implementation of PageRank
Accelerating the Computation of PageRank
Updating the PageRank Vector
The HITS Method for Ranking Webpages
Other Link Methods for Ranking Webpages
The Future of Web Information Retrieval
REFERENCE BOOKS
Matrix Analysis and Applied Linear Algebra by C. D. Meyer
Understanding Search Engines: Mathematical Modeling and Text Retrieval
by Michael Berry and Burray Browne, Second Ed.