United States Mortality DataBase

Overview

Directly inspired by the Human Mortality Database (www.mortality.org), the United States Mortality DataBase (USMDB) contains original calculations of death rates and life tables for the United States resident population as well as for the 4 US Census regions, the 9 US Census divisions, the 50 states and the District of Columbia. The input data used to construct the mortality series are death counts and birth counts from the US vital statistics system, as well as census counts and population estimates from the US Census Bureau.

Scope and basic principles

We will update this collection to include more recent data as the input information becomes available and we are planning to expand the database to include cause-of-death information as well county-level estimates if we can secure additional resources.

The main goal of the United States Mortality DataBase is to monitor geographic disparities in mortality across the United States and to foster research into the causes and consequences of mortality inequalities. As much as possible, we have followed four guiding principles in creating this database: comparability, flexibility, accessibility, reproducibility.

We have tried to provide complete documentation of the data available through this site. Users may start by reading a brief summary of how individual data sets are constructed. A complete description of our methodology is contained in the Human Mortality Database methods protocol version 5 (available on the HMD website, at http://v5.mortality.org/Public/Docs/MethodsProtocol.pdf). Where the HMD Methods Protocol has been adapted to the particular format of the United States mortality and population data, details are provided in the Background and Documentation File [direct link here].

All users are welcome to download and analyze any data provided here free of charge. However, before gaining full access to the database, anyone must become a registered user, which requires accepting our user agreement and answering just a few questions. After receiving this information, a link will be sent to you for direct access to all of the database series.

We are still actively developing this database. Although we have been very careful in assembling and manipulating the data presented here, it is possible that some errors remain, and we would appreciate your help in identifying any inaccuracies. If you have comments or questions, or trouble accessing the database, please write to usmdb@mortality.org.

Computing death rates and life tables

The USMDB death rates and life tables have been computed as in the Human Mortality Database, following six steps, corresponding to each of six data types. Here is an overview of the process:

Births. Annual counts of live births by sex have been collected for each population for all years since 1941. These counts have been used mainly for making population estimates at younger ages.
Deaths. Death tabulations were built from the NCHS Mortality Files at the finest level of detail available and uniform methods were implemented to estimate death counts by completed age (i.e., age-last-birthday at time of death), calendar year of death, and calendar year of birth.
Population size. July 1^st annual estimates of population size were obtained from the US Census Bureau for years since 1970 and inter-censal population estimates were constructed for years 1941-1969 from a combination of US Census data, birth and death counts.
Exposure-to-risk. Estimates of the population exposed to the risk of death during each age-time interval were estimated from the annual population estimates, after correcting for the timing of deaths within each interval.
Death rates. Death rates were computed as the ratios of the death counts for a given age-time interval divided by the estimate of the exposure-to-risk in the same interval.
Life tables. Probabilities of death were computed from death rates and used to construct all other life table functions, including life expectancies and other useful indicators of mortality and longevity.

Corrections to the data

The data presented here have been corrected for gross errors (e.g., a processing error whereby 3,800 becomes 38,000 in a published statistical table would be obvious in most cases, and it would be corrected). However, we have not attempted to correct the data for systematic age misstatement (misreporting of age) or coverage errors (over- or under-enumeration of people or events).

Some available studies assess the completeness of census coverage or death registration for various years in the United States but more work is needed before we can take these into account to produce more accurate estimates of mortality in a way that would be consistent over the whole period since 1941.

Age misreporting

Though we know that the coverage of the US census and vital registration system is high, and thus, that fruitful analyses by both specialists and non-specialists should be possible with these data, there is evidence of both age heaping (over-reporting of ages ending in "0" or "5") and age exaggeration in these data.

In general, the degree of age heaping varies over time and is more severe in some areas than in others, but it is usually no burden to scientific analysis. In most cases, it is sufficient to analyze data in five-year age groups in order to avoid the false impressions created by this particular form of age misstatement.

Age exaggeration, on the other hand, is a more insidious problem. Our approach is guided by the conventional wisdom that age reporting in death registration systems is typically more reliable than in census counts or official population estimates. For this reason, we derive population estimates at older ages from the death counts themselves, by implementing extinct cohort methods. Such methods eliminate some, but certainly not all, of the biases in old-age mortality estimates due to age exaggeration.

In the future, we are planning to continue exploring alternative approaches to further improve the estimation of old-age mortality in the USMDB.

Uniform set of procedures

A key goal of this project is to follow a uniform set of procedures for each population. This approach guarantees that we have not introduced biases by our own manipulations. Our desire for uniformity had to face the challenge that raw data come in a variety of formats (for example, 1-year versus 5-year age groups). Our general approach to this problem is that the available raw data are used first to estimate two quantities: 1) the number of deaths by completed age, year of birth, and year of death; and 2) population estimates by single years of age on January 1 of each year. For each population, these calculations are performed separately by sex. From these two pieces of information, we compute death rates and life tables in a variety of age-time configurations.

It is reasonable to ask whether a single procedure is the best method for treating the data over a long time period. The HMD methodology is based on procedures that were developed separately, though following similar principles, in various countries and by different researchers. Earlier methods were synthesized by choosing what was considered the best among alternative procedures and by eliminating superficial inconsistencies.

Although we adhere strictly to a uniform procedure, the data for each time period also receive significant individualized attention. Data have been checked against other available sources to ensure a high level of quality, but assistance from database users in identifying problems is always appreciated!