Big data has the potential to be used to speed up our response to health crises, provide our soldiers with superior information capabilities and enhance interagency intelligence. How are the different departments and agencies coordinating this cohesive effort to ensure the said end benefits are being realized?
Big Data has a lot of potential in terms of accelerating government processes and making the public sector smarter and more agile on the whole. But so far, agencies have been slow to adopt it.
Challenges for government however differ from the commercial sector. Because it changes security and infrastructure requirements and has a big impact on budgets and solutions, implementing Big Data has to be done while upholding regulations and maintaining people’s privacy and confidentiality.
Although some organizations have already handled large amounts of data in the past, evaluation must be done to determine if these new sources of data can bring value, how it can be used, if it can save agencies money, if it will be reliable, and if it will be accepted by the public. Further, all of this ultimately depends on policy, your internal and external customers’ data needs and your ability to manage that data.
For an organization like the Post Office, Big Data has to have a return that is measurable in terms of revenue and cost reduction. Improving logistics and monitoring service will help bolster customer value, but managing that remains the biggest challenge.
Like other industries, the government can most benefit from data that is provisioned in real-time. In this regard, some agencies have already seen significant benefits, but it requires a large investment in order to manage an agency’s entire inventory. Partnering agencies and citizens alike benefit from these improved processes, but this technology needs to make processes less expensive and swifter for it to be realistically adopted.
Interconnectivity then is a crucial element that needs to be built up between agencies. Most connections between federal, state and local government organizations are not in place, and larger agencies do not have access to the valuable data at the smaller, more localized level. Big Data will allow for the transfer of information securely and confidentially.
Effort also has to be put in to ensure that all this data is verifiable, often requiring surveys and statistical models to remove bias. The confidence in the technology and the data itself is there, but bringing together legacy systems with siloed data sources is the more systemic problem.
Government cannot rely on old data sources to tell us what they already know. New data sources and new ways to use that data in order to create revenue sources have to be imagined in the long run.
Sanjog Aul: Welcome listeners, this is Sanjog Aul, your host, and the topic for conversation is “Smarter Government through Big Data.” And our guests for today’s show are Cavan Capps who is the Chief DataWeb Systems with the US Census Bureau, and Ellis Burgoyne who is the CIO and Executive VP of the US Postal Service. Now, as we already know, Big Data is in vogue with all industries, but we also know how much potential Big Data has for the government and government institutions. But this is also a huge topic because we want to look at what benefits it might have, why it’s been so slow in terms of adoption and how government has challenges and business values that are unique in comparison to the private sector. So when we talk about Big Data in a government context, what is it specifically that government is really hoping to get out of it?
Cavan Capps: I think different organizations look at Big Data in different ways. It came out of the private sector, came out of Google, Twitter and Facebook to basically do marketing, and I think government has a lot of functions, some of it is enforcement functions, some of it is basically processing things like the postal office stuff. Statistical agencies have their own particular needs; we produce information for policies and for business and individuals that people can use. And we have to do that while maintaining people’s confidentiality and privacy. It’s almost like when you think of a statistical agency, privacy is our prime directive. So we don’t look at it in terms of the way the commercial sector has; we have unique problems.
Sanjog: It’s clear government has its own needs, but at the same time, Big Data is not a mature technology, and there is much at stake in terms of doing things in a predictable and secure manner. Is this the reason why people are holding back?
Ellis Burgoyne: There are a number of challenges that come with Big Data, especially for government. Agencies like the Postal Service, we’re kind of in a commercial space as well as the public policy space on the government side. And I think what holds back a lot of organizations are the scope of data and the amount of data that you have to process and store, primarily because it changes a lot about your IT infrastructure, and it changes not only hardware but software. It also changes security requirements. It changes a lot about the business, and it changes our relationship with our customers as well. The broad scope of what Big Data can do and the broad scope of the requirement for Big Data have a big impact on budgets both government and commercial and also the solutions that need to be brought to the market to make Big Data successful.
“The broad scope of what Big Data can do and the broad scope of the requirement for Big Data have a big impact on budgets both government and commercial.”
Sanjog: If this is too big of a bite to chew, why not convert or divide it into smaller chunks so that it doesn’t have that one time big impact on how government does things and how it approaches a given new paradigm or a technology. That would make it manageable and perhaps the speed of adoption will be faster. What do you think?
Cavan: In the statistical agencies, our needs are a little bit different from the other agencies like law enforcement. They actually look for individuals; we try to make sure nobody can identify an individual company or person. So for us, we’ve always handled large amounts of data. We are looking at new data from new sources because the European countries are often looking at data from the digital information and digital environment ecosystem. Right now we’ve got to evaluate whether or not that new data sources and those electronic transactions can give us good information more cheaply, give us more specific small area estimates without breaking confidentiality. Right now the issue is not so much the technology, not what kind of hardware to use, what kind of software to use; that challenge will come later. Right now we’ve got to find out, is there value here? How do we use it? How do we save money? Is it reliable in the long run and will the public accept it?
“Right now we’ve got to find out, is there value here? How do we use it? How do we save money? Is it reliable in the long run and will the public accept it?”
Ellis: What we found to answer your question about how do you manage this much data and how do you manage it within your environment, there are a number of technologies we are looking at in terms of platform solutions that help us scale horizontally to be able to manage this data, so as you invest in data, we start to see more and more potential down the road for data. You have a system that you can scale as opposed to buying systems that may suit your data needs in the short term. Those are where some of the technologies of solutions are and where the opportunity is for us going forward.
Sanjog: Do you think Big Data by design is no longer a technology issue and people who have a better handle on people and processes will be more successful?
Ellis: It’s multifaceted. It’s a number of issues. As data grows, especially in the government space, there is a greater emphasis on security, how you secure that data and also about the stake holders that help define how you use data, whether it’s internally to a marketing group or an operations group, or externally its customers. It’s either commercial customers or residential customers and what their needs for data are. Trying to sort through what your external customers or internal customers’ data needs are is a real challenge. Once you have that, the bigger challenge is around data management. So you can provision data to your customer in the right way if it makes them satisfied and gives them the results they are looking for. However, you have to have a very good data management system that allows you to keep the data accurate so that it stays accurate and it’s useful for the customer in the long term.
Sanjog: Cavan, in your world data, the volume of data that you handle has never been small. So Big Data may not be a big or new element in your mix. Where do you see the rest of the world scrambling, and where have you aced this art?
Cavan: We are looking at some of the horizontal scaling as well. The issue is right now, we produce the geography data that the rest of the world and the United States lives of off. We’ve been doing that for years. So we are handling it now, but we see potentially new data sources coming on the horizon with large administrative data sets where we are actually trying to produce new estimates for things that we couldn’t afford to do before. Some of these surveys cost a lot of money. They cost $25-50 million to do, so if we can reduce our costs by using other transactional data sources, we will. But again, now it depends on policy. In the interim, we were studying these horizontal technologies like Hadoop and Spark that allow us to do recommendation systems. Also, systems associated with analytics will help us reduce our survey processing costs. But this is all right now at the beginning, and we haven’t committed to anything yet.
Sanjog: Is Big Data getting us distracted from our main goal by causing us to chase any sorts of new data sources that can produce analytics and have perceived value? Are we chasing a ghost?
Ellis: Well, I think you can get distracted and chase data solutions from customers that maybe don’t get you or the customer anywhere. I think that can be very dysfunctional, especially in an environment like government or in an agency like ours. We don’t have unlimited resources in the government space to deal with producing that data provisioning it. So you have to use your IT dollar and your Big Data dollar well, and you have to pick and choose what is the best investment for the data management strategy and the data capture strategy, and I think that’s an on-going challenge in trying to determine that. But I think that’s where you have to have a very good relationship with your business customer. To be able to sift through the better Big Data projects as opposed to those that are just nice to have in terms of data.
“You have to use your IT dollar and your Big Data dollar well, and you have to pick and choose what is the best investment for the data management strategy and the data capture strategy.”
Sanjog: Why has the business side opened their eyes to Big Data and triggered them to go after these initiatives?
Cavan: There is a lot of potential, at least in the statistical space that we work in, for Big Data. It has the potential to save a lot of money, it has the potential to produce data faster and estimate faster so that we’re not talking about lag. The issue here though is, you can look over 30 years in IT and we’ve had fads that were overpromised, pointing all the way back to artificial intelligence. And what we have to do here is look at the business, see what we need to do, what the real value is and try to focus on the pieces of the Big Data space that will actually solve some of these problems. A lot of that means we will have to understand these problems better before we start doing big investments.
“What we have to do here is look at the business, see what we need to do, what the real value is and try to focus on the pieces of the Big Data space that will actually solve some of these problems.”
Sanjog: What are the top three things we can inventory that could make for smarter government thanks to the application of Big Data?
Ellis: We are a little unique in the sense that we’re a revenue generating organization and we are not supported by tax dollars. So when we generate IT solutions or Big Data solutions, it has to have a return for us, and it has to be something that you can measure in terms of revenue and cost reduction. The big thing that we are working on right now is being able to create an inventory of every piece of mail. If you think about the challenge of that, we process over half a billion pieces of mail every day. With that mail, we capture a number of elements on the piece of mail, including the letter, address, return address and postage. We actually apply what we call a license plate, or a bar code, on it so we can track it through our system. Unfortunately, we could really only inventory about half of our mail right now because we don’t have a bar code on every piece of mail, and this is intelligence we need to generate data. We can measure that mail and do what most companies do around Big Data logistics: being able to improve revenue and reduce costs. But in January, we hope to have almost a 100 percent of bar codes on all pieces of mail. And that’s going to really double the amount of data we are going to have manage within our system. But that gives us the great opportunity to look at new opportunity for revenue, being able to look at logistics and how we could improve cost on a logistics area and also most importantly be able to improve service.
By improving logistics and by monitoring service, we are able to generate better customer value that will help us grow revenue going into the future. That’s our biggest challenge right now for Big Data: how we manage that. The most important challenge for us going forward is we have to manage that data in real time; that’s our goal. We’ve been managing data in the past by what happened yesterday. Now we are trying to manage all that data, almost 23 petabytes of data, and manage the provision data in real time. That’s a big challenge for us.
“By improving logistics and by monitoring service, we are able to generate better customer value that will help us grow revenue going into the future. That’s our biggest challenge right now for Big Data: how we manage that.”
Sanjog: This is a very good set of areas where you say this will help, but do we have any guarantees that it will to the degree you claim?
Ellis: Well I think there is a guarantee from the service and logistics perspective in terms of reducing cost. I think in both the public and the private sectors, it’s really a principle of business now to be able to manage your inventory and how companies and organizations have being able to reduce their cost. So by having 50 percent or 100 percent of all of our inventory being able to be managed and managed in real time, we’ve already seen significant improvements in cost and significant improvements in service, especially in an environment where we are downsizing our infrastructure and our networks. You really have to rely on Big Data to be able to manage all those moving parts. So as you consolidate operations, you don’t impact service. So we’ve already seen some significant benefits by Big Data.
“We’ve been managing data in the past by what happened yesterday. Now we are trying to manage all that data, almost 23 petabytes of data, and manage the provision data in real time.”
Sanjog: Having experience dealing with larger amounts of data in your industry, what Cavan is your advice for how agencies like the US Postal Service and others could get started with something like this?
Cavan: We have logistical problems too, but primarily when we are doing surveys like the decennial where we are trying to get questionnaires, processing the information and attaining geography data, we’re trying to get that close to real time. Like the postal service, we’re trying to get that closer to real time. A lot of our surveys and data, for example poverty data, comes out a year or year and a half after people actually tell us what happened.
In a lot of statistical agencies, the data comes out later than we’d like. If you’re looking at retail sales, which is one of the major indicators how well the economy is doing and how well your investments and stock markets should be making its investments, it’s only a national number. We’d love to be able to get state numbers. Atlanta, Chicago and New York are all different economies; they actually compete against each other, and we’d love to be able to get data on retail sales and each of the metropolitan areas and not have to wait to get small retail sales numbers somewhat later.
These are all challenges that we have in terms of the statistical agencies. The question is, can we get the technology, and also can we get the data? We’d like to do it in a way other than these very expensive surveys. Our surveys are probably the best in the world, but we don’t ask a 1000 people; we ask, for example when employment rate comes up, 64,000 households a month. So we ask 64,000 different families every month whether they were looking for work, and we process that information.
But that’s very expensive, and the question for us is, can we make that less expensive? We want to do things along the lines of the post office. If we want to process anything quicker, we want to use technology such that it allows us to do it.
Ellis: I think this is a good example with the census. We’ve benefited from the revenue that comes our way when the census comes around, and we understand that revenue, in terms of hard copy surveys, will decline. So we’re looking for ways to be able to partner with agencies and other companies to tap in to the vast amount of data that we have.
We go to every household every day, we know where individuals live, we are now pairing that information with the amount of mail that they get, and we’re not only doing that in real time, but we’re looking to add GPS technology to every household in America as well in terms of the address mapping, which gives us an opportunity. We’re not quite sure where that will be in the future, but being able to map latitude and longitude coordinates to every physical mailing address in the United States, adding that to our database and being able to manage how people move around with a change of address, it’s a valuable amount of data that we’re looking forward to seeing how we can help other agencies in terms of their own mission or other organization in terms of marketing strategies, while protecting our mandate around privacy.
Sanjog: That’s a very interesting example Ellis about the GPS. But as a consumer, I get my mail at 2:30, but how does it matter to me as an end customer if it comes at 2:00? Is there another value proposition which helps somebody else?
Ellis: It matters to you because right now you don’t know when your mail gets to you unless you’re home waiting for it. What we are looking at in the future is being able to do a couple of things. One is predictive delivery. We’ll be able to tell you everyday what time you can expect your mail within an hour. Many companies only predict delivery by eight o’clock tonight. We’ll be able to predict within an hour when you’ll be able to expect mail.
“We’ll be able to tell you everyday what time you can expect your mail within an hour. Many companies only predict delivery by eight o’clock tonight. We’ll be able to predict within an hour when you’ll be able to expect mail.”
That may mean a lot to some customers who are expecting something pretty valuable. They might want to come home for it, or they might want to wait for it. And on the sender side, it’s important for a lot of commercial mailers to know when mail reaches your household, so they can follow up with emails or texts regarding savings or sales that you might be able to take advantage of later. The other is you’ll be able to not only know where your mail is, you’ll eventually be able to see electronically online what that mail looks like, an actual image of that online and be able to determine how important that is for you whether you do want to come home or not. So it’s not just what time you get the mail, it’s also what kind of mail it is and exactly get an idea of what it is, before you even get home to understand what’s in your mailbox.
Cavan: One thing that we’ve been dependent on the mail service for years is the mail service gives us our address files. So anything that the postal service does to improve those address files so we don’t go to houses that abandoned or that are under construction and have an address but no one is living there, that costs us millions of dollars when you’re doing something like the decennial. So we are interested in continuing to partner with organizations like the postal service to improve our data that we use internally for making decisions like targeting addressing.
Sanjog: Would it make more sense for Big Data implementation to be a cooperative arrangement across agencies rather than each agency having their own initiative and trying to integrate it later? How are you guys thinking in your respective agencies? Is there one cohesive effort?
Ellis: We’re trying to build a platform in terms of addressing, mail and logistics that would benefit the commercial space and also the government space, like the census bureau and other organizations that could benefit from that information or also consumers. In the new world we’re looking at in terms of Big Data customers will be able to tap in to their own data, to know what’s coming, to be able to look at their own inventory of mail usage, expand and create dashboards.
“Dashboards we normally think in terms of the commercial space or the operational space. But we have the opportunity to bring dashboards and the benefits of Big Data to an individual end user or consumer.”
Dashboards we normally think in terms of the commercial space or the operational space. But we have the opportunity to bring dashboards and the benefits of Big Data to an individual end user or consumer. And I think those are some of the platforms we’re trying to build, and I think it’s an ecosystem that’s has tremendous potential as we go forward.
Sanjog: I’ll ask the same question to you Cavan: How would you envision creating one cohesive effort?
Cavan: If we want to build up this ecosystem as Ellis was speaking about, one of the big things is that, improve interconnectivity between these different organizations. Ellis has a much richer reason to develop mailing addresses than we do; but we use them all the time. If we can get the changes real-time, for example, we were doing services architectural transmitting updates on a continuous basis, we can save money at least in the space that we’re operating. One of the big issues within government, and I’d say connections between state and local government, for example like housing permit data, we don’t get. We get bulk downloads on a regular basis.
“One of the biggest features of Big Data systems is getting data from other organizations and using it with your data in a secure and confidential way.”
If there’s a way we can develop interfaces of the different parts of the government could talk together more freely and more securely and more in real-time, we can save a huge amount of money all over the place. Because as these Big Data systems get going, one of the biggest features of Big Data systems is getting data from other organizations and using it with your data in a secure and confidential way.
Sanjog: We can build a strong infrastructure, but how do we make sure the quality of that data is strong? If data quality is the weakest link, what is the plan to resolve it in a predictable and cohesive manner?
Cavan: Well it’s an issue that we have to deal with all the time because we produce official statistics, and as a statistical agency, private organizations that produce statistics don’t have the requirements to be as transparent as we do. So before we take any data, we do costs of evaluations of it and see if it’s reliable. Most Big Data sources are biased statistically. So we have to run a survey against it and balance it and find statistical models that take the bias out. Again, what we are dealing with is trying to predict aggregate trends, and not like Ellis is doing, rather actually pulling up individual data for individual people. We want to tell you if retail sales are really going up in Atlanta, for example, or nationally. We want that to be reliable. So it’s a different problem than the classical marketing problem.
Sanjog: In terms of reliability and veracity of the Big Data outputs, how much confidence do we have in it today, given where we stand with it?
Ellis: The data that we use, we have strong confidence in the results. One of the problems we are having is as we move into this new uses for data, we are having to bring together some legacy systems and some legacy data sources that have been siloed over many many years and have been built separately. And when we try to get a number, like a revenue number for a large commercial customer, the commercial customer may have 10 or 12 different sources where they outsource actual work, and that revenue transaction isn’t always transparent. So we struggle trying to rationalize all of these different elements within our systems that have been around for many years, and as I mentioned, are built for different purposes to now extract and get new information. But trying to map all of those old systems and map them into some new data systems has been a real challenge for us, and when we do that, it’s a challenge in trying to look at that number, either that revenue number that cost number or that statistic or that predictive element. It gives you pause to see if that data is accurate. And so it’s that new data set, it’s that new data source we are looking at, and that’s the piece that we worry the most about.
Sanjog: Cavan, if you were Ellis’s best friend, how would you suggest he solve this problem?
Cavan: This is a problem we have been dealing with in IT community for years, and that is we typically develop requirements for what I would call an in-system, or what I would call a brittle system. The issue has been over time that we need to increase a system change and they need to be organic, and part of that is that we need to build communication into them. And when we solve one problem, we often don’t generalize it enough so that it evolves gracefully into the next problem. I think all of us have done this, we’ve all developed some pipes, and I think that to some degree, they will continue simply because it’s easier to do. It’s easier to understand a limited set of requirements.
“When we solve one problem, we often don’t generalize it enough so that it evolves gracefully into the next problem.”
But when we build a system, it’s these people that can connect the dots, as Ellis is doing; he is connecting a lot of dots. He is seeing a lot of questions that he can answer from his data, and one of the big issues is we have to find people who can bring that inside of connecting those dots to bring out new product for less money, and when we do that that means we are going to have to connect systems more and more in ways that we didn’t do in the past. And so I think that from the get-go, we have to try to think of how we make systems communicate for requirements that we don’t have today. It’s going to be an ongoing challenge.
Sanjog: We’ve got enough problems we need to solve today. Why would we want to invest in something for a problem that will come in the future?
Cavan: Well, if you make it as a general principle that it may cost you millions and millions of dollars to rebuild the system and it may take you 10 years to rebuild it, and you have a situation where you are saying, I have data in this batch system over here that has a lot of information in it, and I have somebody like Ellis who says, I can save millions of dollars if I can get the information from that batch oriented stove pipe into the system and I combine it with another system like my customer addresses to make new insights that I can sell, those are the kind of things that you can’t depend on and that you can’t afford to rebuild later. So building in the generality to communicate these things maybe even to subset and put it into another sandbox for analytics inexpensively is something we are going to constantly be pushed to do, and I think we will do that more and more as we move to a cloud.
“From the get-go, we have to try to think of how we make systems communicate for requirements that we don’t have today. It’s going to be an ongoing challenge.”
Ellis: I think Cavan is exactly right. You can’t rest on old data sources in an environment like we have where some of the old data is just telling us what we already know, and it can predict what we already probably could predict already, especially for us as hard copy mail is declining. So you need new data sources and you have to think about new ways to use data to create new revenue sources. So what can you do with hard copy mail to make it sexier and to make it more appealing to help bend that curve of decline that we have been experiencing because of disrupted technology that the Big Data has helped to create?
“You can’t rest on old data sources in an environment like we have where some of the old data is just telling us what we already know.”
So we have to think about things like augmented reality, being able to put QR codes on mail. You have to think of new technology and embedding that into mail. You can make it more interesting, more marketable and more commercially viable. But that also creates the need for bigger data, better data and better predictive data for marketing to be able to make that decision and to make the right decision about where to invest, what marks the target and what information you provision to commercial mailers and also what information consumers want and how do they want it provisioned to them.
Sanjog: So this is about what we can do and what will create value. The same is true of speed. We all want agility, and if we want agility, that means we don’t want to be waiting 10 years to reap benefits of this. So how could you make this faster?
Cavan: For us right now we are looking at new data sources back in the traditional mold of perhaps just doing a survey and giving you the answers. For us, it may mean combining data sources together in ways we haven’t. Right now, the big issue for us before we can really move forward on a major scale is public acceptance and scientific acceptance. We are handcuffed by something that Ellis isn’t. He has the ability to say I can combine this data, I see new insight and I can map to my customers and then have some value added proposition. We have two problems: We have to say, do we have somebody that is as bright as Ellis here that can connect those dots and see those new products, and then secondly, since we are a source of official statistics for the nation, does the nation accept this or does accept those methods? So we have to look at both of those processes, and it would be much simpler if we can just simply say we are going to combine this data and here is the new estimate for a journey to work or how long it takes to get to work everyday? We can’t do that without going through the proper channels.
“Right now, the big issue for us before we can really move forward on a major scale is public acceptance and scientific acceptance.”
Sanjog: What changes would you like to see in compliance, privacy and regulatory issues? Or are these becoming a scapegoat for agencies that have been slow to adoption?
Ellis: It’s complicated because we rely heavily on our brand, which is a secure brand where traditionally it’s been what’s called the sanctity of mail. We protect what’s in your mail, we protect who is sending it and who is receiving it. That’s been really a large part of our brand, and it’s legally required. That’s a differentiator for us in the market space that people can trust us as we move into the digital space. As we moving to Big Data, we want to carry that trusted brand in your data, your payment card information, your address, the contents of the message, whether it’s digital or hard copy, that is protected. So that’s kind of a brand value for us that we want to protect, but at the same time, it inhibits us from a marketing perspective where we can leverage a lot of information. We couldn’t leverage a lot of information that could benefit commerce itself.
“As we moving to Big Data, we want to carry that trusted brand in your data.”
If we could do something, I don’t think it’s really a barrier to have been able to do more with Big Data. I don’t think we have to violate our current privacy laws to be able to provide better information to commercial mailers and better provide opportunity for consumers and marketers as well. But that’s a grey area, and we have to work with the legal department along with the public to decide what new areas in this new space in Big Data, what are the barriers and how much opportunity do we have to provide information that we aren’t currently doing while we protect the privacy statutes we are required to maintain?
“What are the barriers and how much opportunity do we have to provide information that we aren’t currently doing while we protect the privacy statutes we are required to maintain?”
Sanjog: What is the need for a role such as a data scientist or Chief Data Officer within government? How acute is the shortage of talent based on demand? Has government created the right incentives and environment to attract that talent?
Cavan: With all new technologies, we probably benefit by having the seasoned, older folks who have seen a lot of production systems and seen a lot of problems and having the younger folks that come in and look at things differently. I think we can use both. In terms of a data scientist, I think you need to break that into three different parts. One is are the people that understand the business enough to see new opportunities by bringing data together and making new decisions? A second group would be the analytical group, the statisticians that may be looking at statistics differently then we have in the past. When you went to school or to grad school, you had a hypothesis, you ran a model and then you tested it. Today, we may say, let’s put the machine algorithms to work, come up with a million different algorithms, let the machine select between them and then let the analyst pick from those which ones make sense.
So it may be turning the traditional way of doing analytics on its head. And then finally are those people who really understand data management. In these systems, it’s not just making decisions, and it’s not just bringing data together and making connections. It’s also, how do the data flow how do you manage it, how do you have the technology, so that you know when it’s real time when it’s near time, what kind of technologies and how do you integrate traditional relational enterprise databases with these new non-sequel databases for analytics.
Sanjog: Is it worth getting someone like a Chief Data Officer to work shoulder to shoulder with who would handle the appropriate information while your group takes care of the plumbing?
Ellis: Currently we are working on a new organization, a data management organization, that will provide most of that work. But I think there is always going to be a strong relationship between how we architect data and then how we manage data in terms of data quality. So I’m not quite sure going forward organizationally what that structure might look like, but in the short term for us it’s going to be managed within our IT infrastructure, the data management and also the system architecture and the system management that links the Big Data. So right now I think it’s a collaborative relationship.
“There is always going to be a strong relationship between how we architect data and then how we manage data in terms of data quality.”
Sanjog: Where are the gaps in the workforce, and how is that causing lag and keeping you from going at full steam?
Ellis: The biggest gap is really in the requirements from our business owner trying to manage these legacy systems to provide data that will satisfy our business customers and our external customers as well. Those are where the gaps exist for us, and I think that’s a dual responsibility between data management and the IT infrastructure.
Sanjog: If you were to paint a picture of the Holy Grail for government in terms of Big Data, what would it look like?
Cavan: I think it’s the same it is for business, and that is basically we are doing things close to real time and we are giving more information to people when they need it in a secure and confidential way. They’re not pretty big requirements. We’re not even sure we all agree on what confidentiality is necessarily. LinkedIn for example looks at it differently and Facebook looks at it differently than the Census Bureau does. How do we share data among agencies quickly in a real time way? How do we share data between different levels of government? State and local and government has a lot of great data out there; they collect a lot of information sales tax for example. That would be great to do if you want actually find out how regional economies are going. If you wanted to link that to the national statistical data, that would be great. Right now, we don’t have protocols to do that, we don’t have confidentiality protocols, and there is the security protocols to do that. So I see it taking a while before we actually integrate things, and part of that integration is going to be the accountability part, security confidentiality and the stuff that Ellis talked about.
“How do we share data among agencies quickly in a real time way? How do we share data between different levels of government?”
Sanjog: What is your appeal and message to other leaders in other agencies trying to grapple with these challenges, bring about using big data and achieve the label of smarter government?
Ellis: The real advice would be to know your customer because they are the ones that you are trying to serve with this data. Knowing your customer and understanding what the customer requirements are critical to building effective data systems. And also collaboration, collaborating not only within the government space. That collaboration with the government and also with the private sector has new ideas on how to manage Big Data and has a lot of success stories. That helps you avoid having to reinvent the wheel and spend a lot of money on investment that maybe you wouldn’t have to. But I think that those are probably the two key takeaways.