Using a National Database
Leading Author(s): Yolanda Hagar
Helpful Tips, Hints and Things to Consider
Using a national database can be very advantageous. The primary advantage is that there is no data collection cost. Additionally, good national data sets are of a high quality as they have been planned carefully by many experts. Finally, these data sets are designed so that the individuals sampled represent the entire population, making the results applicable to the entire United States or other group of interest.
However, there are a few challenges posed when using a national database. These issues must be considered before determining if the data set will work for you. Below is a list of questions and topics to consider before beginning any analysis:
Availability and General Documentation for the Data Set
National databases are generally very large and complex, and there are many aspects to consider before using one. The first step is to be sure that your data set will have available, useful documentation that can guide you through the rest of the process. This is one of the most important parts of understanding your data set and using it correctly.
1. Does the data set have an associated handbook or online tutorial? How do you obtain and read the handbook, and who created it?
Examples of helpful handbooks:
2. How thorough is the website? Does it clearly present the information you need and answer questions you have?
Answering your question of interest
Before you can be sure that the data set is right for you, you need to be sure that you can answer your question(s) of interest. Some things to consider:
- Are there any biases present with the way the data was obtained in relation to the study question? (You need to know sampling methods, how missing observations were handled, what the drop out rate is, etc.)
- Are the outcomes of interest included in the data set?
- Are the covariate of interest included in the data set?
- Are there enough people of interest available in the data set?
This information should be available in the handbook and online. Examples include:
Supplemental information (types of variables available, number of people, etc) may be available in handbooks. If you cannot find this kind of information readily, this data set may be too difficult to work with.
Obtaining and Storing the Data Set
Once you have decided that this data set will work and it has the proper documentation, the next step is to consider how you will obtain and store the data set.
Learning to Use the Data Set
Once you have obtained and stored the data set, you will need to prepare your data set for analysis. Do you know what questions to ask yourself in advance? Do you know what is involved in the preparation of the data set you need to answer your question(s) of interest?
Important Estimation Issues to Consider
Because national data sets are often very large and have a complex design, estimation procedures can be different than what you previously done. It is extremely important to find out what procedures are different.
- What is the appropriate software for this analysis? Will my current software work or do I need to learn something new?
- How do I read in the data? Do I need to know a lot of code to get the data into the software I want to use? Can I easily import the data? An example of this type of information can be found at: http://www.cdc.gov/nchs/tutorials/Nhanes/Preparing/Download/intro.htm
- What kind of study design was used to obtain the data? Do you need to account for weighting?
- What code should be used, and how does this code vary among different types of software? This information may be available in the handbook or online tutorial, and it is important to consider.
- Are there special scripts/macros/functions that must be used to work with the data set? Who provides them? Are there any public forums on which people have posted this kind of information?
- Have I taken into consideration all the proper statistical analyses? Do I need to consult a professional for certain parts of my project?