Conference is now cancelled
Spatial data on steroids: The Million Deaths Study
Patrick E Brown (Department of Statistical Sciences, University of Toronto)
When I was learning Spatial Statistics from Peter 25 years ago, spatial data as rich and extensive as the Million Deaths Study (MDS) didn't exist. This large health survey from India has one million data points from 12,000 spatial locations spanning 14 years, with a cause of death, date of birth and death, smoking status, sex, drinking, and vegetarianism for each observation. The problem of 'not enough information in the data' was often the main impediment to getting publishable results during the early days of Model-Based Geostatistics, and data as extensive as the MDS might be expected to be the solution to all of a Spatial Statistician's problems. There is, unfortunately, no free lunch.
This talk will explore the complications and difficulties which are faced when using the Generalized Linear Geostatistical Model with the MDS data, focusing on the effect on mortality of satellite-derived air quality data. The reasons why more data is not the panacea one might expect will be discussed. Possible methods for accommodating uncertainty in the denominator, unmeasured confounding, and potential spatial variation in the effect sizes will be described. More data brings a desire (or an imperative?) to answer more complex questions, and the talk will conclude with possible directions for Spatial Statistics in the big data world.