Statistical Modeling for Biomedical Researchers – online resources and class notes

Blog Post by William D. Dupont, PhD, Professor of Biostatistics and Preventive Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA

I teach a course in intermediate-level biostatistics as part of the Master of Public Health program at Vanderbilt University.  This program is targeted at clinical fellows who are interested in academic careers in population-based medicine.  Class notes for this course are posted at in both pdf and MS-PowerPoint formats.  This web site also contains the data files used in this course and log files illustrating the analyses performed in the lecture notes.  These notes are based on my text: Statistical Modeling for Biomedical ResearchersThe goal of both this text and these notes is to provide hands-on instruction in modern multi-variable statistical analysis while using a minimum of mathematics.  My web page for this text may be found at

This text will enable biomedical researchers to use a number of advanced statistical methods that have proven valuable in medical research. The past thirty years have seen an explosive growth in the development of biostatistics. As with so many aspects of our world, this growth has been strongly influenced by the development of inexpensive, powerful computers and the sophisticated software that has been written to run them. This has allowed the development of computationally intensive methods that can effectively model complex biomedical data sets. It has also made it easy to explore these data sets, to discover how variables are interrelated and to select appropriate statistical models for analysis. Indeed, just as the microscope revealed new worlds to the eighteenth century, modern statistical software permits us to see interrelationships in large complex data sets that would have been missed in previous eras. Also, modern statistical software has made it vastly easier for investigators to perform their own statistical analyses. Although very sophisticated mathematics underlies modern statistics, it is not necessary to understand this mathematics to properly analyze your data with modern statistical software. What is necessary is to understand the assumptions required by each method, how to determine whether these assumptions are adequately met for your data, how to select the best model, and how to interpret the results of your analyses. The goal of this text is to allow investigators to effectively use some of the most valuable multivariate methods without requiring an understanding of more than high school algebra. Much mathematical detail is avoided by focusing on the use of a specific statistical software package.

This text grew out of my second semester course in biostatistics that I teach in our Master of Public Health program at the Vanderbilt University Medical School. All of the students take introductory courses in biostatistics and epidemiology prior to mine. Although this text is self-contained, I strongly recommend that readers acquire good introductory texts in biostatistics and epidemiology as companions to this one. Many excellent texts are available on these topics. At Vanderbilt we are currently using Katz (2006) for biostatistics and Gordis (2004) for epidemiology.

The statistical software used in this text is Stata, version 10 (2007). It was chosen for the breadth and depth of its statistical methods, for its ease of use, excellent graphics and excellent documentation. There are several other excellent packages available on the market. However, the aim of this text is to teach biostatistics through a specific software package, and length restrictions make it impractical to use more than one package. If you have not yet invested a lot of time learning a different package, Stata is an excellent choice for you to consider. If you are already attached to a different package, you may still find it easier to learn Stata than to master or teach the material covered here from other textbooks.

The topics covered in this text are linear regression, logistic regression, Poisson regression, survival analysis, and analysis of variance. Each topic is covered in two chapters: one introduces the topic with simple univariate examples and the other covers more complex multivariate models. The text makes extensive use of a number of real data sets. They all may be downloaded from my web site at This site also contains complete log files of all analyses discussed in this text.  Both the class notes and the text web site have been updated to take advantage of new features of Stata, version 11.  The notes also contain extensive screen shots that explain how to execute Stata commands using Stata’s point-and-click interface.

Statistical Modeling for Biomedical Researchers, A Simple Introduction to the Analysis of Complex Data, 2nd Edition edited by William D. Dupont, is published by Cambridge University Press


2 Responses to Statistical Modeling for Biomedical Researchers – online resources and class notes

  1. sts says:

    You post great articles, bookmarked for future referrence !

  2. I stumbled across your site, and think it’s fantastic, keep us posting

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: