This chapter examines a simple methodology and approach for developing analytics solutions. When I first started analyzing networking data, I used many spreadsheets, and I had a lot of data access, but I did not have a good methodology to approach the problems. You can only sort, filter, pivot, and script so much when working with a single data set in a spreadsheet. You can spend hours, days, or weeks diving into the data, slicing and dicing, pivoting this way and that…only to find that the best you can do is show the biggest and the smallest data points. You end up with no real insights. When you share your findings to glassy-eyed managers, the rows and columns of data are a lot more interesting to you than they are to them. I have learned through experience that you need more.
Analytics solutions look at data to uncover stories about what is happening now or what will be happening in the future. In order to be effective in a data science role, you must step up your storytelling game. You can show the same results in different ways—sometimes many different ways—and to be successful, you must get the audience to see what you are seeing. As you will learn in Chapter 5, “Mental Models and Cognitive Bias,” people have biases that impact how they receive your results, and you need to find a way to make your results relevant to each of them—or at least make your results relevant to the stakeholders who matter.
You have two tasks here. First, you need to find a way to make your findings interesting to nontechnical people. You can make data more interesting to nontechnical people with statistics, top-n reporting, visualization, and a good storyline. I always call this the “BI/BA of analytics,” or the simple descriptive analytics. Business intelligence (BI)/business analytics (BA) dashboards are a useful form of data presentation, but they typically rely on the viewer to find insight. This has value and is useful to some extent but generally tops out at cool visualizations that I call “Sesame Street analytics.”
If you are from my era, you grew up with the Sesame Street PBS show, which had a segment that taught children to recognize differences in images and had the musical tagline “One of these things is not like the others.” Visualizations with anomalies identified in contrasting colors immediately help the audience see how “one of these things is not like the others,” and you do not need a story if you have shown this properly. People look at your visualization or infographic and just see it.
Your second task is to make the data interesting to the technical people, your new data science friends, your peers. You do this with models and analytics, and your visualizing and storytelling must be at a completely new level. If you present “Sesame Street analytics” to a technical audience, you are likely to hear “That’s just visualization; I want to know why is it an outlier.” You need to do more—with real algorithms and analytics—to impress this audience. This chapter starts your journey toward impressing both audiences.
Model Building and Model Deployment
As mentioned in Chapter 1, “Getting Started with Analytics,” when it comes to analytics models, people often overlook a very important distinction between developing and building and implementing and deploying models. The ability for your model to be usable outside your own computer is a critical success factor, and you need to know how to both build and deploy your analytics use cases. It is often the case that you build models centrally then deploy them at the edge of a network or at many edges of corporate or service provider networks. Where do you think the speech recognition models on your mobile phone were built? Where are they ultimately deployed? If your model is going to have impact in your organization, you need to develop workflows that use your model to benefit the business in some tangible way.
Many models are developed or built from batches of test data, perhaps with data from a lab or a big data cluster, built on users’ machines or inside an analytics package of data science algorithms. This data is readily available, cleaned, and standardized, and they have no missing values. Experienced data science people can easily run through a bunch of algorithms to visualize and analyze the data in different ways to glean new and interesting findings. With this captive data, you can sometimes run through hundreds of algorithms with different parameters, treating your model like a black box, and only viewing the results. Sometimes you get very cool-looking results that are relevant. In the eyes of management or people who do not understand the challenges in data science, such development activity looks like the simple layout in Figure 2-1, where data is simply combined with data science to develop a solution. Say hello to your nontechnical audience. This is not a disparaging remark; some people—maybe even most people—prefer to just get to the point, and nothing gets to the point better than results. These people do not care about the details that you needed to learn in order to provide solutions at this level of simplicity.
Figure 2-1 Simplified View of Data Science
Once you find a model, you bring in more data to further test and validate that the model’s findings are useful. You need to prove beyond any reasonable doubt that the model you have on your laptop shows value. Fantastic. Then what? How can you bring all data across your company to your computer so that you can run it through the model you built?
At some point in the process, you will deploy your analytics to a production system, with real data, meaning that an automated system is set up to run new data, in batches or streaming, against your new model. This often involves working with a development team, whose members may or may not be experts in analytics. In some cases, you do not need to deploy into production at all because the insight is learned, and no further understanding is required. In either case, you then need to use your model against new batches of data to extend the value beyond the data you originally used to build and test it.
Because I am often the one with models on my computer, and I have learned how to deploy those models as part of useful applications, I share my experiences in turning models into useful tools in later chapters of this book, as we go through actual use cases.