Getting Value from Your Data Series: It’s All About Your Data Ecosystem
October 12, 2015 at 8:41 am Mary Ludloff 1 comment
By Marilyn Craig and Mary Ludloff
We’re back with the fourth post in our series on how to get value from your data, including how to ensure that new “data” and “analytics” products are designed for successful delivery to new and existing customers.
In the previous posts in this series, we discussed our methodology and what is required in terms of understanding your target customer—who they are and what they need—as well as making sure you have the right Team in place to work on the project. In this post, we are going to discuss how you build your Data Ecosystem:
- What is needed to ensure that data processes will support the new product(s)?
- How do you identify appropriate data partners and enhancements?
- What privacy- and security-related issues must you be aware of and address?
Do You Have the Data You Need?
The first question to consider about your data is: Do you have what you need? In other words, what data is required to deliver the analytics, algorithms, and/or models that your Customers will find valuable? Let’s look back at the examples for Acme Corporation (a manufacturer of “smart” aka IP-enabled, parking meters) we laid out earlier in the series. In each case, the customer segments will benefit from the treasure trove of operational data Acme wants to monetize, but those analytics will be even more valuable when they are enhanced with new offsets of data from a public source or partners.
- The UPS store owner wants to correlate his current marketing programs to take better advantage of the shopping traffic near his store. He could build behavioral clustering models to help characterize his customers’ shopping patterns – what does parking behavior say about the other errands they are running when they visit his store? He could build co-operative marketing programs with other stores to help create new demand. He could build predictive models that let him know when traffic is slumping which signals that new demand generation is needed. These analytics could be potentially sold to the national UPS franchise office as well.
- A restaurant owner in the area wants to offer location based mobile coupons strictly following the peak (or non-peak) parking hours. Typical weather patterns could be correlated with the natural flow of traffic through the area and staffing hours could be refined to better match this predictive model, saving her from over-staffing on slow, bad weather days. She could also monitor parking and freeway traffic and decide in real-time that it is worth it to stay open an extra hour that night.
- The City wants to use meter data to notify citizens of a health concern. Say there is evidence of a salmonella outbreak from a street vendor in the area. City health officials could use individual PII data from parking meters to contact those that were in the area and encourage them to seek medical attention. If real-time weather information were mashed up with other data, the city could predict (based on high temperatures or other indicators) when a food-driven health concern might occur and then work with food operators to limit or prevent a health concern.
To successfully deliver analytics products to meet those customer requirements, Acme needs to build a Data Ecosystem that creates and supports not only their own internal data processes, but also those data assets that will come from public or partner sources. The Data Ecosystem must:
- Meet the data requirements of the analytics product(s).
- Support and protect the data processes and metadata.
- Deliver the right data quality, frequency and timing.
- Provide the level of privacy and data security demanded by the Customer, partners, other sources of data, as well as regulatory requirements.
The first step to evaluating a Data Ecosystem, or building a new one, is to think through a “Day in the Life” of your data sources. For Acme, that means thinking about how consumers using the parking meters interact with them as well as how the transactional data is created and then passed through the collection process.
You need to ensure that the data processes are scalable and capable of sustaining a product that will regularly be sold to the target customer segment(s). Vulnerabilities in those processes can impact your ability to consistently create or obtain the needed data. If new data will be coming from partners, consider the “ownership” issues (legal, political, organizational) of using that data in the new product(s) and their impact on the data creation and communication processes. Remember, you are creating an analytics product; this is not a one-off, ad hoc project. The dynamic process of creating new data and analytics for your data product is as important as a reliable supply chain is to a manufacturer of physical goods. The supply chain for your data can’t just suddenly fail, or you won’t have a data product to sell.
In addition, Acme needs to think about the metadata needed for the analytics their customers are looking for: How is the metadata created and maintained? Metadata can also drive aggregations and groupings. What ongoing processes are needed to ensure that metadata creation is maintained?
A small aside on “dirty data:” all data is dirty and you need to understand the quality of the data coming in (how dirty is it) as well as realize that you must constantly evaluate it as it is an iterative process and you must be ready to recover from any process breakdowns that impact the quality of the data. It is not impossible to build analytics product(s) from imperfect data but companies like Acme just need to understand the data processes intimately and evaluate them regularly to ensure that their definition of data quality is met.
Privacy and Security: You Must Assess, Identify, and Mitigate
All data projects must be evaluated for privacy risks at the beginning and throughout the development lifecycle. Privacy Impact Assessments, or PIAs, should be standard operating procedure for any data project:
“A privacy impact assessment (PIA) is a tool for identifying and assessing privacy risks throughout the development life cycle of a program or system.”
While in theory a PIA should be performed whenever personally identifiable information (PII) is collected and used, you should always consider privacy risks any time that data is a product (or project) regardless of whether it meets the definition of PII. In addition to assessing the privacy and security risks to PII, you should also consider the following:
- Do any of the data sets you are using include de-identified personal information? Any time you deal with a de-identified data set, you risk re-identification and you should assess the risks and figure out what you can do to mitigate them.
- Could the aggregation of the data sets you are working with result in the identification of individuals? Any time you aggregate data sets be aware that you might also be providing enough information, on aggregate, to identify individuals. Again, assess the risks and figure out what to do to mitigate them.
It is safe to say that when you are working with data, you will encounter privacy issues so understand your exposure, assess and identify privacy-related risks, and then figure out how to remediate those risks:
- Determine the project’s scope. How much personal information is handled? Is the information sensitive? Will the information be handled internally or outsourced? Is the information new or is it existing information? Will it be aggregated in databases? How will it be used? How will it be stored?
- Map the information flows. Describe and map the personal information flows:
What information is collected, used, and disclosed? How will it be held and protected? Who has access to it? What is the process for ensuring its quality? - Identify and assess privacy-related risks. What is the impact on the individual as a result of how PII is handled? How do the risks affect the project’s goals? Does the project have an effect on the individual’s choices about who has access to their personal information? Is there compliance with privacy law? Is there compliance with the organizations privacy policies and privacy notice?
- Provide a remediation plan. How can you reduce or eliminate the risks outlined in step 3? What changes need to be made in terms of policies, procedures, or features to reduce or eliminate the risks?
Now, let’s revisit what data Acme is collecting. As we stated in the first part of our series: Acme collects a high volume of data produced by each parking transaction that every parking meter produces. A majority of the parking meters also happen to be credit card-enabled which means that PII is certainly available through those transactions.
Let’s say that Acme was able to obtain anonymized credit card data for those transactions in order to aggregate that data set with other parking meter data they have as well as other data sets they purchased. One of the risks you would need to assess is the possibility that those credit card customers could be indentified once that data is aggregated with parking meter location and time data as studies have shown how people can be indentified through credit card use. In this case, your remediation plan might include preventing your data customers from getting direct access to that data so that they could not identify individuals as well as updating Acme’s privacy policy to not allow re-identification of anonymized data for any purpose.
This is just one example of how to look at the collection and use of data from a privacy perspective. Of course, within that assessment process, you must also consider the security of that data and ensure that your remediation plan addresses how to prevent customers from getting direct access as well as how you are going to administer, monitor, and enforce compliance to your privacy policy changes.
Now that you’ve built your data ecosystem, it’s time to start looking at the technology Acme will need to successfully build their new analytics products. Stay tuned for the next post where we will look at options and tradeoffs when picking your analytics platform.
Entry filed under: Big Data Project, General Analytics. Tags: analytics, data privacy, data security, streaming analytics.
1. Middleware-centric APM | Getting Value from Your Data Series: It’s All About Your Data Ecosystem | October 22, 2015 at 4:57 am
[…] Read the source article at Big Data Big Analytics […]
LikeLike