The data for this project came from the Virginia Department of Transportation and the Federal Highway Administration. The VDOT data came as a series of HTML pages; the federal data came as an Excel spreadsheet.
We copied the VDOT data from the HTML tables and pasted it into Excel. In Excel, we cleared the formatting to get rid of any HTML. No such changes were necessary for the federal data, as it came pre-packaged in an Excel spreadsheet.
We cleaned the VDOT data in Excel. For one thing, the data set included both culverts and bridges – a total of more than 20,000 rows of information. We narrowed this down to around 13,000 by filtering out the culverts and only looking at bridges.
Then we imported the bridge data into Microsoft Access, a more powerful database manager. With Access, we ran dozens of “queries” on the data.
A query is a way of asking Access specific questions related to the data. Our questions included:
¶ How many deficient and obsolete bridges are in each VDOT region?
¶ What is the average daily traffic statewide over bad bridges?
¶ What is the average age of Virginia’s bridges?
We copied some of the query results – such as the number of deficient and obsolete bridges, along with average daily traffic over these bridges – into Excel. Here, we created simple formulas to calculate sums and percentages.
The federal data already existed in Excel, but that doesn’t mean it was simple to analyze. The complicating factor was that the data was spread over dozens of separate worksheets.
One worksheet had summary data for all 50 states, but there was also a separate worksheet for each state, detailing that state’s counties. All of the data sets listed the total number of bridges, the number of structurally deficient and obsolete bridges, the total number of bridges needing work, and the area of the bridge that was deficient.
We hid some of the columns, such as the area of the bridge that was deficient, to zero in on the most relevant. We created a formula in Excel that calculated the percent of bridges that were deficient and obsolete using the numbers in the tables.
– William Lineberry and Jeannette Porter