Data Sets

All health outcome data we included in our project were from the 500 Cities Dataset. In addition, we worked with two demographic datasets and several different environmental datasets to measure exposure to natural disasters across the United States. Each dataset had different strengths and limitations when trying to combine it with the 500 cities data. You can read more about our data exploration below and in the Project Diaries.

  • 500 Cities Dataset

    Source:

    500 Cities Dataset

    Description: 

    All health outcome data we included in our project were from the 500 Cities Dataset. 

    Dates: 

     2016, 2015 model based estimates

    Variables used:

    place2010, tract2010, st,placename, plctract10, plctrpop10, stateabbr, population, access2_cr, access2__1, arthritis_, arthritis1, binge_crud, binge_cr_1, bphigh_cru, bphigh_c_1, bpmed_crud, bpmed_cr_1, cancer_cru, cancer_c_1, casthma_cr, casthma__1, chd_crudep, chd_crude9, checkup_cr, checkup__1, cholscreen, cholscre_1, colon_scre, colon_sc_1, copd_crude, copd_cru_1, corem_crud, corem_cr_1, corew_crud, corew_cr_1, csmoking_c, csmoking_1, dental_cru, dental_c_1, diabetes_c, diabetes_1, highchol_c, highchol_1, kidney_cru, kidney_c_1, lpa_crudep, lpa_crude9, mammouse_c, mammouse_1, mhlth_crud, mhlth_cr_1, obesity_cr, obesity__1, paptest_cr, paptest__1, phlth_crud, phlth_cr_1, sleep_crud, sleep_cr_1, stroke_cru, stroke_c_1, teethlost_, teethlost1, geolocatio, lat, long

    Our team found it helpful to create a working 500 Cities data codebook which can be downloaded with the following link: ICHS 500 Cities Data Codebook

  • US Census Dataset 

    Source: 

    NHGIS Data Finder

    Description:

    The US decennial census collects data to inform federal funding distribution. Age, race, counts of housholds and the relationships of people therein is found int eh US Census dataset. 

    Dates: 

    2010

    Variables used:

    dp0010001, dp0010002, dp0010003, dp0010004, dp0010005, dp0010006, dp0010007, dp0010008, dp0010009, dp0010010, dp0010011, dp0010012, dp0010013, dp0010014, dp0010015, dp0010016, dp0010017, dp0010018, dp0010019, dp0010020, dp0010021, dp0010022, dp0010023, dp0010024, dp0010025, dp0010026, dp0010027, dp0010028, dp0010029, dp0010030, dp0010031, dp0010032, dp0010033, dp0010034, dp0010035, dp0010036, dp0010037, dp0010038, dp0010039, dp0010040, dp0010041, dp0010042, dp0010043, dp0010044, dp0010045, dp0010046, dp0010047, dp0010048, dp0010049, dp0010050, dp0010051, dp0010052, dp0010053, dp0010054, dp0010055, dp0010056, dp0010057, dp0020001, dp0020002, dp0020003, dp0030001, dp0030002, dp0030003, dp0040001, dp0040002, dp0040003, dp0050001, dp0050002, dp0050003, dp0060001, dp0060002, dp0060003, dp0070001, dp0070002, dp0070003, dp0080001, dp0080002, dp0080003, dp0080004, dp0080005, dp0080006, dp0080007, dp0080008, dp0080009, dp0080010, dp0080011, dp0080012, dp0080013, dp0080014, dp0080015, dp0080016, dp0080017, dp0080018, dp0080019, dp0080020, dp0080021, dp0080022, dp0080023, dp0080024, dp0090001, dp0090002, dp0090003, dp0090004, dp0090005, dp0090006, dp0100001, dp0100002, dp0100003, dp0100004, dp0100005, dp0100006, dp0100007, dp0110001, dp0110002, dp0110003, dp0110004, dp0110005, dp0110006, dp0110007, dp0110008, dp0110009, dp0110010, dp0110011, dp0110012, dp0110013, dp0110014, dp0110015, dp0110016, dp0110017, dp0120001, dp0120002, dp0120003, dp0120004, dp0120005, dp0120006, dp0120007, dp0120008, dp0120009, dp0120010, dp0120011, dp0120012, dp0120013, dp0120014, dp0120015, dp0120016, dp0120017, dp0120018, dp0120019, dp0120020, dp0130001, dp0130002, dp0130003, dp0130004, dp0130005, dp0130006, dp0130007, dp0130008, dp0130009, dp0130010, dp0130011, dp0130012, dp0130013, dp0130014, dp0130015, dp0140001, dp0150001, dp0160001, dp0170001, dp0180001, dp0180002, dp0180003, dp0180004, dp0180005, dp0180006, dp0180007, dp0180008, dp0180009, dp0190001, dp0200001, dp0210001, dp0210002, dp0210003, dp0220001, dp0220002, dp0230001, dp0230002
  • University of South Carolina (UofSC) Hazards & Vulnerability Research Institute Social Vulnerability Index

    Source: 

    SoVI® for the United States

    Description: 

    The Social Vulnerability Index (SoVI®) 2010-14 synthesizes 29 socioeconomic variables which contribute to vulnerability to environmental hazards.

    Dates: 

    2010-2014

    Variables used:

    SOVI0610, SoVI0610_5CL, SoVI0610_3CL
  • Small Business Administration  (SBA) Disaster Loan Database

    Source: 

    SBA Disaster Loan Database

    Description: 

    The SBA Disaster Loan Database provides verified residential losses by zip code. 

    Dates:

    10/01/2000 -  09/30/2015

    Variables used:

    Damaged Property Zip Code, Total Verified Loss

    Procedure:

    The SBA data was concatenated into one file.  Four rows with NAN or 99999 values for zip codes were removed (40817, 41890, 39428, 39103 (from zero-index)).  The SBA dataset was then merged with the HUD zip code – census tract dataset on zip code. This file was then grouped by TRACT and the values were summed.   This merged and grouped dataset was then merged with v9 project data on ‘tract2010’ (V9) and ‘TRACT’ (grouped dataset). This process, except for the concatenation, was repeated for each individual year.

  • Housing and Urban Development (HUD) Zip Code Dataset

    Source: 

    HUD USPS Zip Code Crosswalk Files 

    Description: 

    The HUD USPS Zip Code Crosswalk Files allocate zip codes to census tracts. 

    Dates:

    4th Quarter 2015

    Variables used:

    TRACT, ZIP

    Procedure:

    Because the zip code and census tracts have different geometries zip codes can allocated to multiple census tracts. This dataset was used to allocate SBA data to specific census tracts.  See SBA Dataset for details.

  • Federal Emergency Management Agency (FEMA) National Flood Hazard Layer

    Source:

    National Flood Hazard Layer

    Description: 

    The National Flood Hazard Layer contains flood hazard mapping data from FEMA’s National Flood Insurance Program (NFIP)

    Dates: 

    Flood data are updated on a continuous basis, zone by zone.

    Variables used:

    FLD_ZONE, SHAPE_Area, plctract10

    Procedure:

    The FEMA SFHA shapefiles were grouped in a geodatabase at the state level. The state resolution was initially used as the analysis to cut down on the processing time and memory used. The SFHA shapefiles were multipart shapefiles that contained shapes that describe the different flooding areas [A, AE, AO,V,VE,X etc]. To identify the flooding areas within each of the tracts, the 500 Cities tract shapefile had to be split in to individual tracts to then 'clip' the flood data from the SFHA shapefile. The ‘clipped’ flooding areas within each tract area was then joined by using the ‘Intersect’ tool in ArcGIS. For each of the 500 cities tract, an excel file was created that contained each of the flooding zones clipped from the FEMA SFHA shapefile with attributes identifying the tract, the flood type, and the area. As there could be multiple areas of the same type of flooding within the flooding shapefile output, a script was made to sum the different types of flooding and to differentiate between coastal and non-coastal flooding. For each tract csv file that was analysed, the output contained the following data: tract identifier, proportion of flooding for each flood type, proportion of flooding overall, the proportion of coastal flooding, and the proportion of non-coastal flooding. 

  • National Oceanic and Atmospheric Administration (NOAA) Storm Database

    Source:

    NOAA Storm Database

    Description: 

    The NOAA Storm Database consists of reported  “significant weather phenomena” , “rare, unusual, weather phenomena”, or ‘other significant meteorological events.”  

    Dates:

    1996 – 2015

    Variables used:

    The data is collected from many sources and is not necessarily verified by National Weather Service (NWS), but the NWS tries to ensure that the most reliable information source is used.  A description of the variables and criteria is available at: https://www.nws.noaa.gov/directives/sym/pd01016005curr.pdf.

    Procedure:

    Storm events were grouped by state and county FIPS codes. The number of EVENT_ID rows which occurred in a county were counted and categorized into three storm types: 1) tropical storms/hurricanes; 2) severe weather, tornadoes, flooding; 3) wildfires/heat-related disasters.

  • United States Geologic Survey (USGS) Coastal Vulnerability Index

    Source: 

    USGS Coastal Vulnerability Index

    Description: 

    The Coastal Vulnerability Index is the synthesis of six variables that describe the vulnerability of different U.S. coastal environments to future rises in sea-level.

    Dates: 

    2001 model based index

    Variables used:

    none