Statistics
24 Hillhouse, 432.0666
M.A., Ph.D.
Chair
Andrew Barron
Director of Graduate Studies
John Hartigan (Rm 207, 24 Hillhouse,
john.hartigan@yale.edu)
Professors
Donald Andrews (Economics), Andrew Barron, Joseph Chang, John
Hartigan, Theodore Holford (Epidemiology & Public Health;
Biostatistics), Peter Phillips (Economics), David Pollard,
Edward Tufte (Political Science; Computer Science)
Associate Professors
Heping Zhang (Epidemiology & Public Health; Biostatistics)
Assistant Professor
Hani Doss (Visiting), John Emerson, Hannes Leeb
Fields of Study
Fields comprise the main areas of statistical theory
(with emphasis on foundations, Bayes theory, decision theory,
nonparametric statistics), probability theory (stochastic
processes, asymptotics, weak convergence), information theory,
econometrics, classification, statistical computing, and graphical
methods.
Special Admissions Requirements
GRE scores for the General Test and for the Subject
Test in the area of the undergraduate major should accompany
an application. All applicants should have a strong mathematical
background, including advanced calculus, linear algebra, elementary
probability theory, and at least one course providing an introduction
to mathematical statistics. An undergraduate major may be
in statistics, mathematics, computer science, or in a subject
in which significant statistical problems may arise. For those
whose native language is not English, the Test of English
as a Foreign Language (TOEFL) scores are required.
Special Requirements for the Ph.D. Degree
There is no foreign language requirement. Normally
during the first two years, fourteen term courses in this
and other departments are taken to prepare students for research
and practice of statistics. These include courses devoted
to case studies and practical work, for which students prepare
a written report and give an oral presentation. The qualifying
examination consists of three parts: a written report on an
analysis of a data set, a written examination on theoretical
statistics, and an oral examination. The examination is taken
not later than when scheduled by the department in the middle
of the second year, with provision for one subsequent reexamination
of one or more parts in the event that a student does not
pass the first time. All parts of the qualifying examination
must be completed before the beginning of the third year.
A prospectus for the dissertation should be submitted no later
than the first week of March in the third year. The prospectus
must be accepted by the department before the end of the third
year if the student is to register for a fourth year. Upon
successful completion of the qualifying examination and the
prospectus (and meeting of Graduate
School requirements), the student is admitted to candidacy.
Master's Degrees
M.A. (en route to the Ph.D.). This degree may
be awarded upon completion of eight term courses and two terms
of residence.
Master's Degree Program. Students are also admitted
directly to a terminal master's degree program. To qualify
for the M.A., the student must successfully complete eight
term courses, chosen in consultation with the director of
graduate studies. Full-time students must take a minimum of
three courses per term. Part-time students are also accepted
into the master's degree program.
Program materials are available upon request to the Director of Graduate Studies,
Department of Statistics, Yale University, PO Box 208290, New Haven CT 06520-8290;
e-mail, susan.jackson-mack@yale.edu.
Courses
STAT 501–506, Introduction to Statistics.
A basic introduction to statistics, including numerical
and graphical summaries of data, probability, hypothesis testing,
confidence intervals, and regression. Each course focuses
on applications to a particular field of study and is taught
jointly by two instructors, one specializing in statistics
and the other in the relevant area of application.The Tuesday
lecture, which introduces general concepts and methods of
statistics, is attended by all students in STAT 501–506
together. The course separates for Thursday lectures (sections),
which develop the concepts with examples and applications.
Computers are used for data analysis.These courses are alternatives;
they do not form a sequence and only one may be taken for
credit.
STAT 501au, Introduction to Statistics: Life Sciences. John
Hartigan, Günter Wagner.
TTh 1–2.15
Statistical and probabilistic analysis of biological
problems presented with a unified foundation in basic statistical
theory. Problems are drawn from genetics, ecology, epidemiology,
and bioinformatics. Also E&EB 510au.
STAT 502au, Introduction to Statistics: Political Science. John
Hartigan, Rose Razaghian. TTh 1–2.15
Statistical analysis of politics and quantitative assessments
of public policies. Problems presented with reference to a
wide array of examples: public opinion, campaign finance,
racially motivated crime, and health policy.
STAT 503au, Introduction to Statistics: Social Sciences. John
Hartigan, Jonathan Reuning-Scherer. TTh 1–2.15
Introduction to probability and statistics with emphasis
on experimental design and data analysis. Survey of many of
the great experiments in social science. Topics include obedience
to authority, conformity to social pressure, and susceptibility
to perceptual distortions.
STAT 504au, Introduction to Statistics in Psychology. Jonna
Kwiatkowski, John Hartigan. TTh 1–2.15
Statistical and probabilistic analysis of psychological
problems presented with a unified foundation in basic statistical
theory. The problems are drawn from studies of sensory processing
and perceptions, development, learning, and psychopathology.
STAT 505au, Introduction to Statistics: Medicine. John
Hartigan, Marek Chawarski. TTh 1–2.15
Statistical methods relied upon in medicine and medical
research. Practice in reading medical literature competently
and critically, as well as practical experience performing
statistical analysis of medical data.
STAT 506au, Introduction to Statistics: Data Analysis. John
Hartigan and staff. TTh 1–2.15
An introduction to probability and statistics, with emphasis
on data analysis.
STAT 530bu, Introductory Data Analysis. David
Pollard. MW 2.30–3.45
Survey of statistical methods: plots, transformations,
regression, analysis of variance, clustering, principal components,
contingency tables, and time series analysis. S-PLUS and Web
data sources are used. After or concurrent with STAT 501a.
STAT 538au, Probability and Statistics for Scientists. Joseph
Chang. MWF 2.30–3.20
Fundamental principles and techniques that help scientists
think probabilistically, develop statistical models, and analyze
data. Essentials of probability: conditional probability,
random variables, distributions, law of large numbers, central
limit theorem, Markov chains. Statistical inference with emphasis
on the Bayesian approach: parameter estimation, likelihood,
prior and posterior distributions, Bayesian inference using
Markov chain Monte Carlo. Introduction to regression and linear
models. Computers are used throughout for calculations, simulations,
and analysis of data. After MATH 118a or b or 120a or b. Some
acquaintance with matrix algebra and computing assumed.
STAT 541au, Probability Theory. Hannes Leeb. MWF 9.30–10.20
A first course in probability theory: probability spaces,
random variables, expectations and probabilities, conditional
probability, independence, some discrete and continuous distributions,
central limit theorem, Markov chains, probabilistic modeling.
After or concurrent with MATH 120a or b or the equivalent.
STAT 542bu, Theory of Statistics. Hannes Leeb. MWF 9.30–10.20
Principles of statistical analysis: maximum likelihood,
sampling distributions, estimation; confidence intervals;
tests of significance; regression; analysis of variance; and
the method of least squares. Some statistical computing. After
STAT 541a and concurrently with or after MATH 222a or b or
225a or b or the equivalent.
STAT 551bu, Stochastic Processes. David Pollard. MW 1–2.15
Introduction to the study of random processes, including
Markov chains, Markov random fields, martingales, random walks,
Brownian motion, and diffusions. Techniques in probability
such as coupling and large deviations. Applications to image
reconstruction, Bayesian statistics, finance, probabilistic
analysis of algorithms, genetics, and evolution. After STAT
541a or the equivalent.
STAT 600bu, Advanced Probability. David Pollard. TTh 2.30–3.45
Measure theoretic probability, conditioning, laws of
large numbers, convergence in distribution, characteristic
functions, central limit theorems, martingales. Some knowledge
of real analysis is assumed.
STAT 603a, Stochastic Calculus. Joseph Chang.
HTBA
Martingales in discrete and continuous time, Brownian motion,
sample path properties, predictable processes, stochastic
integrals with respect to Brownian motion and semimartingales,
stochastic differential equations. Applications mostly to
counting processes and finance. Knowledge of measure-theoretic
probability at the level of STAT 600b is a prerequisite for
the course, although some key concepts, such as conditioning,
are reviewed. After STAT 600b.
STAT 606a, Markov Chain Monte Carlo. Hani
Doss. HTBA
Markov chain Monte Carlo is a simulation method for estimating
distributions and expectations that are analytically intractable.
This course discusses theory and applications of the method.
Topics include the Metropolis-Hastings algorithm and the Gibbs
sampler; applications in survival analysis, hierarchical models,
and nonparametric Bayes problems; convergence theorems; convergence
diagnostics.
STAT 610a, Statistical Inference. Andrew Barron.
HTBA
A systematic development of the mathematical theory of
statistical inference covering methods of estimation, hypothesis
testing, and confidence intervals. An introduction to statistical
decision theory. Undergraduate probability at the level of
STAT 541a assumed.
STAT 612au, Linear Models. Hani Doss. TTh 9–10.15
The geometry of least squares; distribution theory for
normal errors; regression, analysis
of variance, and designed experiments; numerical algorithms
(with particular reference to
S-plus); alternatives to least squares. Generalized linear
models. Linear algebra and some acquaintance with statistics
assumed.
STAT 625a, Case Studies. John Emerson.
Thorough study of some large data sets on such topics
as second-hand smoking, crashes in small cars, reticulate
evolution, bloc voting, and Connecticut educational standards.
STAT 626b, Practical Work. John Emerson.
Individual one-term projects, with students working on
studies outside the department, under the guidance of a statistician.
STAT 627b, Statistical Consulting. John Emerson.
Statistical consulting and collaborative research projects
usually require statisticians to explore new topics outside
their area of expertise. This course exposes students to real
problems, requiring them to draw on their expertise in probability,
statistics, and data analysis. Students complete the course
with individual projects outside the department, under the
guidance of a statistician.
STAT 645b, Statistical Methods in Genetics and Bioinformatics. Hongyu
Zhao. HTBA
Stochastic modeling and statistical methods applied to
problems such as mapping quantitative trait loci, analyzing
gene expression data, sequence alignment, and reconstructing
evolutionary trees. Statistical methods include maximum likelihood,
Bayesian inference, Monte Carlo Markov chains, and some methods
of classification and clustering. Models introduced include
variance components, hidden Markov models, Bayesian networks,
and coalescent. Recommended background: STAT 541a, STAT 542b.
Prior knowledge of biology is not required. Times to be arranged
at organizational meeting.
STAT 660b, Multivariate Statistical Methods for the Social
Sciences. Jonathan
Reuning-Scherer. HTBA
An introduction to the analysis of multivariate data.
Topics include principal components analysis, factor analysis,
cluster analysis (hierarchical clustering, k-means), discriminant
analysis, multidimensional scaling, and structural equations
modeling. Emphasis is placed on practical application of multivariate
techniques to a variety of examples in the social sciences.
Students complete extensive computer work using either SAS
or SPSS. Prerequisites: knowledge of basic inferential procedures,
experience with linear models (regression and ANOVA). Experience
with some statistical package and/or familiarity with matrix
notation is helpful but not required. Requirements: regular
assignments and a final project. Also F&ES 844b.
STAT 661bu, Data Analysis. John Hartigan. MW 2.30–3.45
By analyzing data sets using the S-plus statistical computing
language, a selection of statistical topics are studied: linear
and nonlinear models, maximum likelihood, resampling methods,
curve estimation, model selection, classification, and clustering.
Weekly sessions are held in the Social Sciences Statistical
Laboratory. After STAT 542a and MATH 222a or b or 225a or
b or the equivalents.
STAT 664bu, Information Theory. Andrew Barron. TTh 9–10.15
Foundations of information theory in communications,
statistical inference, statistical mechanics, probability,
and algorithm complexity. Quantities of information and their
properties: entropy, conditional entropy, divergence, mutual
information, channel capacity. Basic theorems of data compression
and coding for noisy channels. Applications in statistics,
communication networks, and finance. After STAT 541a.
STAT 665bu, Data Mining and Machine Learning. Hannes
Leeb. MW 11.30–12.45
Techniques for data mining and machine learning are covered
from both a statistical and a computational perspective, including
support vector machines, bagging, boosting, neural networks,
and other nonlinear and nonparametric regression methods.
The course gives the basic ideas and intuition behind these
methods, a more formal understanding of how and why they work,
and opportunities to experiment with machine learning algorithms
and apply them to data. After STAT 542b.
STAT 674au, Analysis of Spatial and Time Series Data. Staff. TTh 1–2.15
Study of statistical models that are useful for describing
data collected over space or time. Models include frequency
domain and time domain analysis of time series; state space
models and Kalman filters; point processes; Gibbs processes
and random fields.
STAT 676a, Portfolio Estimation for Compounding Wealth. Andrew
Barron. HTBA
A study of distributional properties of compounded wealth
in repeated gambling and in stock market investment. Wealth
concentration inequalities. Strategies of highest concentrated
wealth. Normal theory for log-wealth. Relationship to maximum
likelihood theory in statistics and to the asymptotic equipartition
property in physics and information theory. Greedy strategies.
Universal portfolios and their relationship to Bayes methodology.
The ratio of idealized wealth (best with hindsight) to actual
wealth and the properties of this ratio, both for stochastic
stock price sequences and its minimax behavior for arbitrary
price sequences. Fast algorithms for universal portfolios.
Times to be arranged at organizational meeting.
STAT 683a, Asymptotics. John Hartigan.
HTBA
Consistency, normality, martingales, cumulants, Edgeworth
expansions. M-estimates. Matching Bayes and frequentist procedures.
Resampling. Asymptotic admissibility using elliptical partial
differential equations. After STAT 600a and 610b.
STAT 695a, Internship in Statistical Research. John
Hartigan.
The internship is designed to give students an opportunity
to gain practical exposure to problems in the analysis of
statistical data, as part of a research group within industries
such as: medical and pharmaceutical research, finance, information
technologies, telecommunications, public policy, and others.
The internship experience often serves as a basis for the
Ph.D. dissertation. Students work with the director of graduate
studies and other faculty advisers to select suitable placements.
Students submit a one-page description of their internship
plans to the DGS by May 1, which will be evaluated by the
DGS and other faculty advisers by May 15. Upon completion
of the internship, students submit a written report of their
work to the DGS, no later than October 1. The Internship is
graded on a Satisfactory/Unsatisfactory basis, and is based
on the student’s written report and an oral presentation.
This course is an elective requirement for the Ph.D. degree.
Prerequisites: completion of one semester of the Ph.D. program.
STAT 700, Departmental Seminar.
Important activity for all members of the department.
See weekly seminar announcements.
Next: The
Whitney Seminar
|