Graduate School of Arts and Sciences Bulletin of Yale University
 
Introduction
Departments and Programs
Research Institutes
Policies and Regulations
Financing Graduate School
General Information
   

Statistics

24 Hillhouse, 432.0666
M.A., Ph.D.

Chair
Andrew Barron

Director of Graduate Studies
John Hartigan (Rm 207, 24 Hillhouse, john.hartigan@yale.edu)

Professors
Donald Andrews (Economics), Andrew Barron, Joseph Chang, John Hartigan, Theodore Holford (Epidemiology & Public Health; Biostatistics), Peter Phillips (Economics), David Pollard, Edward Tufte (Political Science; Computer Science)

Associate Professors
Heping Zhang (Epidemiology & Public Health; Biostatistics)

Assistant Professor
Hani Doss (Visiting), John Emerson, Hannes Leeb

Fields of Study
Fields comprise the main areas of statistical theory (with emphasis on foundations, Bayes theory, decision theory, nonparametric statistics), probability theory (stochastic processes, asymptotics, weak convergence), information theory, econometrics, classification, statistical computing, and graphical methods.

Special Admissions Requirements
GRE scores for the General Test and for the Subject Test in the area of the undergraduate major should accompany an application. All applicants should have a strong mathematical background, including advanced calculus, linear algebra, elementary probability theory, and at least one course providing an introduction to mathematical statistics. An undergraduate major may be in statistics, mathematics, computer science, or in a subject in which significant statistical problems may arise. For those whose native language is not English, the Test of English as a Foreign Language (TOEFL) scores are required.

Special Requirements for the Ph.D. Degree
There is no foreign language requirement. Normally during the first two years, fourteen term courses in this and other departments are taken to prepare students for research and practice of statistics. These include courses devoted to case studies and practical work, for which students prepare a written report and give an oral presentation. The qualifying examination consists of three parts: a written report on an analysis of a data set, a written examination on theoretical statistics, and an oral examination. The examination is taken not later than when scheduled by the department in the middle of the second year, with provision for one subsequent reexamination of one or more parts in the event that a student does not pass the first time. All parts of the qualifying examination must be completed before the beginning of the third year. A prospectus for the dissertation should be submitted no later than the first week of March in the third year. The prospectus must be accepted by the department before the end of the third year if the student is to register for a fourth year. Upon successful completion of the qualifying examination and the prospectus (and meeting of Graduate School requirements), the student is admitted to candidacy.

Master's Degrees
M.A. (en route to the Ph.D.). This degree may be awarded upon completion of eight term courses and two terms of residence.

Master's Degree Program. Students are also admitted directly to a terminal master's degree program. To qualify for the M.A., the student must successfully complete eight term courses, chosen in consultation with the director of graduate studies. Full-time students must take a minimum of three courses per term. Part-time students are also accepted into the master's degree program.

Program materials are available upon request to the Director of Graduate Studies, Department of Statistics, Yale University, PO Box 208290, New Haven CT 06520-8290; e-mail, susan.jackson-mack@yale.edu.

Courses
STAT 501–506, Introduction to Statistics.
A basic introduction to statistics, including numerical and graphical summaries of data, probability, hypothesis testing, confidence intervals, and regression. Each course focuses on applications to a particular field of study and is taught jointly by two instructors, one specializing in statistics and the other in the relevant area of application.The Tuesday lecture, which introduces general concepts and methods of statistics, is attended by all students in STAT 501–506 together. The course separates for Thursday lectures (sections), which develop the concepts with examples and applications. Computers are used for data analysis.These courses are alternatives; they do not form a sequence and only one may be taken for credit.

STAT 501au, Introduction to Statistics: Life Sciences.  John Hartigan, Günter Wagner. TTh 1–2.15
Statistical and probabilistic analysis of biological problems presented with a unified foundation in basic statistical theory. Problems are drawn from genetics, ecology, epidemiology, and bioinformatics. Also E&EB 510au.

STAT 502au, Introduction to Statistics: Political Science.  John Hartigan, Rose Razaghian. TTh 1–2.15
Statistical analysis of politics and quantitative assessments of public policies. Problems presented with reference to a wide array of examples: public opinion, campaign finance, racially motivated crime, and health policy.

STAT 503au, Introduction to Statistics: Social Sciences.  John Hartigan, Jonathan Reuning-Scherer. TTh 1–2.15
Introduction to probability and statistics with emphasis on experimental design and data analysis. Survey of many of the great experiments in social science. Topics include obedience to authority, conformity to social pressure, and susceptibility to perceptual distortions.

STAT 504au, Introduction to Statistics in Psychology.  Jonna Kwiatkowski, John Hartigan. TTh 1–2.15
Statistical and probabilistic analysis of psychological problems presented with a unified foundation in basic statistical theory. The problems are drawn from studies of sensory processing and perceptions, development, learning, and psychopathology.

STAT 505au, Introduction to Statistics: Medicine.  John Hartigan, Marek Chawarski. TTh 1–2.15
Statistical methods relied upon in medicine and medical research. Practice in reading medical literature competently and critically, as well as practical experience performing statistical analysis of medical data.

STAT 506au, Introduction to Statistics: Data Analysis.  John Hartigan and staff. TTh 1–2.15
An introduction to probability and statistics, with emphasis on data analysis.

STAT 530bu, Introductory Data Analysis.  David Pollard. MW 2.30–3.45
Survey of statistical methods: plots, transformations, regression, analysis of variance, clustering, principal components, contingency tables, and time series analysis. S-PLUS and Web data sources are used. After or concurrent with STAT 501a.

STAT 538au, Probability and Statistics for Scientists.  Joseph Chang. MWF 2.30–3.20
Fundamental principles and techniques that help scientists think probabilistically, develop statistical models, and analyze data. Essentials of probability: conditional probability, random variables, distributions, law of large numbers, central limit theorem, Markov chains. Statistical inference with emphasis on the Bayesian approach: parameter estimation, likelihood, prior and posterior distributions, Bayesian inference using Markov chain Monte Carlo. Introduction to regression and linear models. Computers are used throughout for calculations, simulations, and analysis of data. After MATH 118a or b or 120a or b. Some acquaintance with matrix algebra and computing assumed.

STAT 541au, Probability Theory.  Hannes Leeb. MWF 9.30–10.20
A first course in probability theory: probability spaces, random variables, expectations and probabilities, conditional probability, independence, some discrete and continuous distributions, central limit theorem, Markov chains, probabilistic modeling. After or concurrent with MATH 120a or b or the equivalent.

STAT 542bu, Theory of Statistics.  Hannes Leeb. MWF 9.30–10.20
Principles of statistical analysis: maximum likelihood, sampling distributions, estimation; confidence intervals; tests of significance; regression; analysis of variance; and the method of least squares. Some statistical computing. After STAT 541a and concurrently with or after MATH 222a or b or 225a or b or the equivalent.

STAT 551bu, Stochastic Processes.  David Pollard. MW 1–2.15
Introduction to the study of random processes, including Markov chains, Markov random fields, martingales, random walks, Brownian motion, and diffusions. Techniques in probability such as coupling and large deviations. Applications to image reconstruction, Bayesian statistics, finance, probabilistic analysis of algorithms, genetics, and evolution. After STAT 541a or the equivalent.

STAT 600bu, Advanced Probability.  David Pollard. TTh 2.30–3.45
Measure theoretic probability, conditioning, laws of large numbers, convergence in distribution, characteristic functions, central limit theorems, martingales. Some knowledge of real analysis is assumed.

STAT 603a, Stochastic Calculus.  Joseph Chang. HTBA
Martingales in discrete and continuous time, Brownian motion, sample path properties, predictable processes, stochastic integrals with respect to Brownian motion and semimartingales, stochastic differential equations. Applications mostly to counting processes and finance. Knowledge of measure-theoretic probability at the level of STAT 600b is a prerequisite for the course, although some key concepts, such as conditioning, are reviewed. After STAT 600b.

STAT 606a, Markov Chain Monte Carlo.  Hani Doss. HTBA
Markov chain Monte Carlo is a simulation method for estimating distributions and expectations that are analytically intractable. This course discusses theory and applications of the method. Topics include the Metropolis-Hastings algorithm and the Gibbs sampler; applications in survival analysis, hierarchical models, and nonparametric Bayes problems; convergence theorems; convergence diagnostics.

STAT 610a, Statistical Inference.  Andrew Barron. HTBA
A systematic development of the mathematical theory of statistical inference covering methods of estimation, hypothesis testing, and confidence intervals. An introduction to statistical decision theory. Undergraduate probability at the level of STAT 541a assumed.

STAT 612au, Linear Models.  Hani Doss. TTh 9–10.15
The geometry of least squares; distribution theory for normal errors; regression, analysis
of variance, and designed experiments; numerical algorithms (with particular reference to
S-plus); alternatives to least squares. Generalized linear models. Linear algebra and some acquaintance with statistics assumed.

STAT 625a, Case Studies.  John Emerson.
Thorough study of some large data sets on such topics as second-hand smoking, crashes in small cars, reticulate evolution, bloc voting, and Connecticut educational standards.

STAT 626b, Practical Work.  John Emerson.
Individual one-term projects, with students working on studies outside the department, under the guidance of a statistician.

STAT 627b, Statistical Consulting.  John Emerson.
Statistical consulting and collaborative research projects usually require statisticians to explore new topics outside their area of expertise. This course exposes students to real problems, requiring them to draw on their expertise in probability, statistics, and data analysis. Students complete the course with individual projects outside the department, under the guidance of a statistician.

STAT 645b, Statistical Methods in Genetics and Bioinformatics.  Hongyu Zhao. HTBA
Stochastic modeling and statistical methods applied to problems such as mapping quantitative trait loci, analyzing gene expression data, sequence alignment, and reconstructing evolutionary trees. Statistical methods include maximum likelihood, Bayesian inference, Monte Carlo Markov chains, and some methods of classification and clustering. Models introduced include variance components, hidden Markov models, Bayesian networks, and coalescent. Recommended background: STAT 541a, STAT 542b. Prior knowledge of biology is not required. Times to be arranged at organizational meeting.

STAT 660b, Multivariate Statistical Methods for the Social Sciences. Jonathan Reuning-Scherer. HTBA
An introduction to the analysis of multivariate data. Topics include principal components analysis, factor analysis, cluster analysis (hierarchical clustering, k-means), discriminant analysis, multidimensional scaling, and structural equations modeling. Emphasis is placed on practical application of multivariate techniques to a variety of examples in the social sciences. Students complete extensive computer work using either SAS or SPSS. Prerequisites: knowledge of basic inferential procedures, experience with linear models (regression and ANOVA). Experience with some statistical package and/or familiarity with matrix notation is helpful but not required. Requirements: regular assignments and a final project. Also F&ES 844b.

STAT 661bu, Data Analysis.  John Hartigan. MW 2.30–3.45
By analyzing data sets using the S-plus statistical computing language, a selection of statistical topics are studied: linear and nonlinear models, maximum likelihood, resampling methods, curve estimation, model selection, classification, and clustering. Weekly sessions are held in the Social Sciences Statistical Laboratory. After STAT 542a and MATH 222a or b or 225a or b or the equivalents.

STAT 664bu, Information Theory.  Andrew Barron. TTh 9–10.15
Foundations of information theory in communications, statistical inference, statistical mechanics, probability, and algorithm complexity. Quantities of information and their properties: entropy, conditional entropy, divergence, mutual information, channel capacity. Basic theorems of data compression and coding for noisy channels. Applications in statistics, communication networks, and finance. After STAT 541a.

STAT 665bu, Data Mining and Machine Learning.  Hannes Leeb. MW 11.30–12.45
Techniques for data mining and machine learning are covered from both a statistical and a computational perspective, including support vector machines, bagging, boosting, neural networks, and other nonlinear and nonparametric regression methods. The course gives the basic ideas and intuition behind these methods, a more formal understanding of how and why they work, and opportunities to experiment with machine learning algorithms and apply them to data. After STAT 542b.

STAT 674au, Analysis of Spatial and Time Series Data.  Staff. TTh 1–2.15
Study of statistical models that are useful for describing data collected over space or time. Models include frequency domain and time domain analysis of time series; state space models and Kalman filters; point processes; Gibbs processes and random fields.

STAT 676a, Portfolio Estimation for Compounding Wealth.  Andrew Barron. HTBA
A study of distributional properties of compounded wealth in repeated gambling and in stock market investment. Wealth concentration inequalities. Strategies of highest concentrated wealth. Normal theory for log-wealth. Relationship to maximum likelihood theory in statistics and to the asymptotic equipartition property in physics and information theory. Greedy strategies. Universal portfolios and their relationship to Bayes methodology. The ratio of idealized wealth (best with hindsight) to actual wealth and the properties of this ratio, both for stochastic stock price sequences and its minimax behavior for arbitrary price sequences. Fast algorithms for universal portfolios. Times to be arranged at organizational meeting.

STAT 683a, Asymptotics.  John Hartigan. HTBA
Consistency, normality, martingales, cumulants, Edgeworth expansions. M-estimates. Matching Bayes and frequentist procedures. Resampling. Asymptotic admissibility using elliptical partial differential equations. After STAT 600a and 610b.

STAT 695a, Internship in Statistical Research.  John Hartigan.
The internship is designed to give students an opportunity to gain practical exposure to problems in the analysis of statistical data, as part of a research group within industries such as: medical and pharmaceutical research, finance, information technologies, telecommunications, public policy, and others. The internship experience often serves as a basis for the Ph.D. dissertation. Students work with the director of graduate studies and other faculty advisers to select suitable placements. Students submit a one-page description of their internship plans to the DGS by May 1, which will be evaluated by the DGS and other faculty advisers by May 15. Upon completion of the internship, students submit a written report of their work to the DGS, no later than October 1. The Internship is graded on a Satisfactory/Unsatisfactory basis, and is based on the student’s written report and an oral presentation. This course is an elective requirement for the Ph.D. degree. Prerequisites: completion of one semester of the Ph.D. program.

STAT 700, Departmental Seminar.
Important activity for all members of the department. See weekly seminar announcements.

Next: The Whitney Seminar