The next challenge that I will tackle in the Cambridge Open Data inventory is the ‘Assessing Building Information FY2015’ dataset, which has this problem statement:
‘How can we use this data to improve housing availability and affordability in Cambridge?’
This is arguably the biggest topic of discussion among residents of Cambridge, and legislators are continuing to search for solutions to the alarming rise in housing prices. The Cambridge City Council election was a few months ago, and when I went to a public candidate’s forum, affordable housing was an emphasis for every candidate. There are obvious reasons for the increase in prices (more tech and biotech companies coming to Cambridge/Boston, bringing higher wages and increasing demand), but data can still be used to develop novel and specific solutions, as well as highlight areas for opportunity.
As of the writing of this article, the dataset contains 25,612 entries, with 25 columns. With this many columns/variables, I decided to start with an exploratory analysis so that I could get a better understanding of the dataset and the meaning of the values.
Luckily, the Cambridge Open Data Portal has a document that has short descriptions for some of the columns, so this can serve as our data dictionary. For the sake of our research questions, I’ll pick certain categories that could be used as features for a potential model:
Actual Year Built
We would have to take into account different types of residences, so we can use the ‘Building Type’ variable for that grouping.
There are also various geographic variables that can be used for mapping.
Going back to the problem statement, if we want to address issues of housing availability and affordability, then we need to develop research questions of our own, and then manipulate the data to answer those questions. Some questions I have right now are:
a) Where is housing availability a concern?
b) Where is housing affordability a concern?
c) What is the trend of housing construction, and how is this trend going to impact residents of Cambridge?
d) How can we tie relationships between housing types/costs and the residents in the area?
I will start with attempting to explore these questions, and then I can build off of what I find.
Where is housing availability a concern?
The first two questions deal with geography, so I used Tableau to first map out the housing data on a map of Cambridge and visualize the housing density:
Tableau offers some nice map layers that you can use to add context to your dashboards. I used three different ones (Population Growth % 2018-2023, Household Growth % 2018-2023, Housing Unit Growth % 2018-2023) to project where housing availability might be a concern in the future. The lowest granularity that Tableau offers is Block Group, so that’s the best that we can do for now.