Why promoting open data increases economic opportunities
During the 2016 Collision Conference held in New Orleans, our Content Strategist Cecilia Haynes interviewed conference speaker Dr. Tyrone Grandison. At the time of the interview, he was the Deputy Chief Data Officer at the U.S. Department of Commerce. Tyrone is currently the Chief Information Officer for the Institute for Health Metrics and Evaluation.
Coming fresh off his talk on "Data science, apps and civic responsibility", Cecilia was thrilled to chat with Tyrone all about the democratization of data and how open data can help anyone build innovative products and services.
Issues with Data Ownership and Privacy
Cecilia: Thanks for meeting with me! I saw your talk and I thought you would be the the perfect person to reach out to. Since you're in government, you're approaching data in a different way than the business or tech world. What is your take on open data?
Tyrone: Data within startups and companies is proprietary. I have this big issue with data ownership, data privacy, and data security and many companies feeling that because they collected and are stewards for data, they immediately have ownership rights.
For example, who does the data belong to if you were in a hospital and the hospital takes down your information for an evaluation? When a hospital generates data on your condition in the process of delivering care, you likely believe that that data is still yours. However hospitals don’t assume that.
Cecilia: I actually didn't know that. That's troubling.
Tyrone: I mean it's basically their proprietary intellectual property at that point where they now have the right to sell it based upon the terms and conditions that you actually agreed to.
It's the same thing that happens when you use something like a Fitbit.
*Note that Cecilia was wearing a Fitbit...
I looked at your hand and was just like, "That data is not yours."
The US Government's Approach to Data
"We want to reduce the barrier of entry for people working on and with data."
Cecilia: What is the government’s approach to data?
Tyrone: So the government is more focused on the power of open data and how do we actually increase the accessibility and usability of it.
This includes exploring how to enable public-private data partnerships, and, in the process, help government be more data-driven in how it’s run. What I've observed is that the Department of Commerce, for example, has highly valuable data sets.
A quick example is NOAA, which is the National Oceanic and Atmospheric Administration. Commerce has twelve bureaus and NOAA is a bureau within Commerce. NOAA provides information for the weather industry globally.
It's all free, but no one really knows this. It is technically all open, but it's very difficult to find and it's very difficult to actually understand.
And there are some companies that have leveraged this information by investing in understanding it and making it clean and accessible. That's why you have theweather.com, that's why you have the weather channel, that's all NOAA. Even worse, NOAA collects around 20-30 terabytes of data per day. They even have satellites monitoring the sun’s surface. They have sensors monitoring sounds underwater, you name it, they monitor it. 30 terabytes a day, but they only actually release 2 terabytes of that data and it's only a fraction of those 2 terabytes that funds the world’s weather system.
Cecilia: Oh, so is that why weather predictions are unreliable?
Tyrone: No, no, that's not the data's fault. That is on the analytical models on top of the data.
If you had access to more data, and you had a better understanding of the nuances of collection like what you have to filter out and what overlaps, then you can actually get better models. The prediction models are actually better now than they were like three years ago and current three-day predictions are pretty spot on. If we go farther than that, then okay, not so reliable…
Using Open Data to Find Targeted Demographics
Cecilia: What other data sources can benefit companies?
Tyrone: The Census Bureau has this thing called the American Community Survey which basically documents the daily lives of all Americans. So, if you want to know anything, they have tens of thousands of features, which means tens of thousands of descriptors on the lives of Americans.
Every single study that you see that actually talks about how Americans are living, or whatever else, that's all from the Census Bureau. These studies don't recognize the Census, they don't like give attribution back to the Census. But there is nowhere else the data can come from.
Say I wanted to get access to senior citizens over 65 who collected social security benefits and who used to commute 10 miles to their job. Almost any attribute you could actually think of, you could find this demographic right now with open data.
Cecilia: And it's all completely available?
Tyrone: It's all open. There is a project called the Commerce Data Usability Project that we're doing at the department where we produce tutorials that show you:
Here's a valid data set, here is a story as to why you should care about the data set, here is how to get it, here is how it’s processed, here is how to actually make some visualizations from it, here is how to actually analyze it. Go.
Tools to Support the Democratization of Data
Cecilia: The democratization of data is such a big deal to us as well. It’s why we open source our software and products, and why we made an open source visual scraper, so that anyone can engage with web data.
One of our goals is to enable data journalists, data scientists, everyday people to be able to use our tools to seek out the information they need.
Tyrone: Commerce is really dedicated to this goal as well. That’s why we have a startup now within Commerce called Commerce Data Service whose mission it is to support all the bureaus on their data initiatives.
We want to fundamentally and positively change the way citizens and businesses interact with the data products from Commerce. We recognize the problems are tied to marketing, access, and usability.
The Data Service commits to having everything in the open, everything transparent as much as possible. If you want to see everything we're working on right now, it's on github.com/commercedataservice.
Take a look at the Data Usability Project since we have a bunch of tutorials on everything from census data to data from NIST which is the standard's organization that has everything from internet security standards to time standards, you name it.
We also have satellite information. So there is a satellite that was launched, I think it was October 2011, called the Delta 2. It had on it this device called the Visible Infrared Imaging Radiometer Suite, VIIRS, which actually monitors all human activity as it goes around.
So a bunch of scientists have been looking at this VIIRS data set that no one knows exists and figured out that it's a really good proxy for a lot of amazing stuff. For example, you could actually use satellite imagery to predict population very simply. You could even use it to predict the number of mental health related arrests in San Francisco. You could also use it to figure out economic activity in a particular place.
Machine Learning for Data Analysis
Cecilia: So do you incorporate machine learning into analyzing the data?
Tyrone: We've got the platform and we have examples that show you how to use machine learning with the data sets. If you want to use machine learning algorithms on a data set, you can find everything you need. If you want to use the data sets with something else in a really straight forward way to do straight mapping, for example, then you have it on our platform.
Cecilia: This is actually really helpful to me because we have partnerships with BigML, a company that specializes in predictive analytics, along with MonkeyLearn, a machine learning company that works on text analysis.
We’re always looking for new ways to highlight our collaboration, so we’ll have to check out VIIRS.
Using Open Data to Create Economic Opportunity
Cecilia: What is your role in the Department of Commerce?
Tyrone: I'm the Deputy Chief Data Officer. I'm one of three people that leads this Commerce Data Service and the office itself is the lead for the data pillar across Commerce.
The Secretary has a strategic plan that has five initiatives that everyone has to tie into, data is one of them and we're responsible for making sure that data is successful.
Cecilia: Have you found it really challenging so far?
Tyrone: The support from the Secretary and the senior staff at Commerce has been amazing. The challenge has actually been that we are not in the private sector. Since it is a little bit different delivering products in government than it is in private industry.
In the private industry, you're focused on clicks, and buys, and elastic problems where it's all about growing and shrinking some base. Whereas with government, it's more of the hardest, most difficult problems that can considered baseline needs like, “I'd like to have health care. I'd like not to be homeless.”
These are problems that you know no company that will actually tackle because there is no profit motive, but these should be basic intrinsic rights for anyone who lives in the US. These are the problems that the government has to handle, and we have to produce amazing data products to make sure that we approach them in the right way.
Cecilia: So your goal is to create products that allow people to access data more easily?
Tyrone: Our approach is two-fold: One, we're building the products to help people engage with the services that we're offering. And two, we're building a platform that's an enabler. I hope that the platform is something that citizens can use to help solve local issues.
The Commerce Department's mission is to create conditions for economic growth and opportunity. We want to empower citizens to take this data and build businesses and create more jobs.
That’s why we want to open as much data as possible and just encourage and engage with people so that they can build great things.
No Such Thing as Bad Data
Cecilia: So open data is a critical part of your strategy?
Tyrone: The more data you have, the more you can shed light on issues. However, you can’t let the data speak for itself because you have to recognize that there is bias in data. If you recognize the bias first, you can try and filter out for it, and if you can't, chuck it and use a different data source.
It’s important to have a data source that is real, legitimate, and sound so you can find a signal and get meaningful information out of it. It’s helpful if you have a purpose, or a direction, or a question you're asking. Then you can actually say, "I want to see spending trends. I want to see who's spending X on Y." And just do an analysis of this one feature over time.
Cecilia: How do you determine the difference between good data vs bad data?
Tyrone: There is no good or bad since data is a product of the collection process and the people that handle it. It’s more about the people that clean, process, and provide it.
Cecilia: So there is a lot of importance in having a reliable group who gathers the data?
Tyrone: There is a lot of value in having the people responsible for ETL (Extract, Transform, Load) who can create a data set that is a gold standard. They reduce biases as much as possible, and they minimize errors as much as possible.
It’s important that they're honest with the upstream consumer about the problems with any data sets they provide. If you're really honest about it, then somebody else can know what are the right techniques to use on the data. Otherwise people might just use it willy-nilly and not know that it shouldn’t be used for that purpose or in a particular way.
So the good and bad thing, there is no dichotomy, it's all data and the interpretation of it.
Advice for Getting Started with Using Open Data
Cecilia: Do you have any advice to people who are looking to get into open data or data security in this industry?
Tyrone: I'd say just go in with a problem or a question, something that's burning in your heart that you want to solve. And then figure out what data sets you can use.
You have a backpack of methods and technologies available and it all starts with the question or problem you're fundamentally trying to solve. You need to understand the user, the problem, and the context in which you have to deliver something.
That determines what tools you need to actually use to solve that problem, not the other way around. Don’t approach this with, “I have a hammer, I'm going to smash everything with it.”
One of my favorite interviews of the entire conference, thank you again for meeting with me, Tyrone!