Home > Articles > Approaches for Analytics and Data Science

Approaches for Analytics and Data Science

Chapter Description

In this sample chapter from Data Analytics for IT Networks: Developing Innovative Use Cases, you will explore model building and deployment, analytics methodology and approach, the distinction between the use case and the solution, and logical models for data science and date.

Analytics Methodology and Approach

How you approach an analytics problem is one of the factors that determine how successful your solution will be in solving the problem. In the case of analytics problems, you can use two broad approaches, or methodologies, to get to insightful solutions. Depending on your background, you will have some predetermined bias in terms of how you want to approach problems. The ultimate goal is to convert data to value for your company. You get to that value by finding insights that solve technical or business problems. The two broad approaches, shown in Figure 2-2, are the “explore the data” approach, and the “solve the business problem” approach.

Figure 2-2

Figure 2-2 Two Approaches to Developing Analytics Solutions

These are the two main approaches that I use, and there is literature about many granular, systematic methodologies that support some variation of each of these approaches. Most analytics literature guides you to the problem-centric approach. If you are strongly aware of the data that you have but not sure how to use it to solve problems, you may find yourself starting in the statistically centered exploratory data analysis (EDA) space that is most closely associated with statistician John Tukey. This approach often has some quick wins along the way in finding statistical value in the data rollups and visualizations used to explore the data.

Most domain data experts tend to start with EDA because it helps you understand the data and get the quick wins that allow you to throw a bone to the stakeholders while digging into the more time-consuming part of the analysis. Your stakeholders often have hypotheses (and some biases) related to the data. Early findings from this side often sound like “You can see that issue X is highly correlated with condition Y in the environment; therefore, you should address condition Y to reduce the number of times you see issue X.” Most of my early successes in developing tools and applications for Cisco Advanced Services were absolutely data first and based on statistical findings instead of analytics models. There were no heavy algorithms involved, there was no machine learning, and there was no real data science. Sometimes, statistics are just as effective at telling interesting stories. Figure 2-3 shows how to view these processes as a comparison. There is no right or wrong side on which to start; depending on your analysis goals, either direction or approach is valid. Note that this model includes data acquisition, data transport, data storage, sharing, or streaming, and secure access to that data, all of which are things to consider if the model is to be implemented on a production data flow—or “operationalized.” The previous, simpler model that shows a simple data and data science combination (refer to Figure 2-1) still applies for exploring a static data set or stream that you can play back and analyze using offline tools.

Figure 2-3

Figure 2-3 Exploratory Data Versus Problem Approach Comparison

Common Approach Walkthrough

While many believe that analytics is done only by math PhDs and statisticians, general analysts and industry subject matter experts (SMEs) now commonly use software to explore, predict, and preempt business and technical problems in their areas of expertise. You and other “citizen data scientists” can use a variety of software packages available today to find interesting insights and build useful models. You can start from either side when you understand the validity of both approaches. The important thing to understand is that many of the people you work with may be starting at the other end of the spectrum, and you need to be aware of this as you start sharing your insights with a wider audience. When either audience asks, “What problem does this solve for us?” you can present relevant findings.

Let’s begin on the data side. During model building, you skip over the transport, store, and secure phases as you grab a batch of useful data, based on your assumptions, and try to test some hypothesis about it. Perhaps through some grouping and clustering of your trouble ticket data, you have seen excessive issues on your network routers with some specific version of software. In this case, you can create an analysis that proves your hypothesis that the problems are indeed related to the version of software that is running on the suspect network routers. For the data first approach, you need to determine the problems you want to solve, and you are also using the data to guide you to what is possible, given your knowledge of the environment.

What do you need in this suspect routers example? Obviously, you must get data about the network routers when they showed the issue, as well as data about the same types of routers that have not had the issue. You need both of these types of information in order to find the underlying factors that may or may not have contributed to the issue you are researching. Finding these factors is a form of inference, as you would like to infer something about all of your routers, based on comparisons of differences in a set of devices that exhibit the issue and a set of devices that do not. You will later use the same analytics model for prediction.

You can commonly skip the “production data” acquisition and transport parts of the model building phase. Although in this case you have a data set to work with for your analysis, consider here how to automate the acquisition of data, how to transport it, and where it will live if you plan to put your model into a fully automated production state so it can notify you of devices in the network that meet these criteria. On the other hand, full production state is not always necessary. Sometimes you can just grab a batch of data and run it against something on your own machine to find insights; this is valid and common. Sometimes you can collect enough data about a problem to solve that problem, and you can gain insight without having to implement a full production system.

Starting at the other end of this spectrum, a common analyst approach is to start with a known problem and figure out what data is required to solve that problem. You often need to seek things that you don’t know to look for. Consider this example: Perhaps you have customers with service-level agreements (SLAs), and you find that you are giving them discounts because they are having voice issues over the network and you are not meeting the SLAs. This is costing your company money. You research what you need to analyze in order to understand why this happens, perhaps using voice drop and latency data from your environment. When you finally get these data, you build a proposed model that identifies that higher latency with specific versions of software on network routers is common on devices in the network path for customers who are asking for refunds. Then you deploy the model to flag these “SLA suckers” in your production systems and then validate that the model is effective as the SLA issues have gone away. In this case, deploy means that your model is watching your daily inventory data and looking for a device that matches the parameters that you have seen are problematic. What may have been a very complex model has a simple deployment.

Whether starting at data or at a business problem, ultimately solving the problem represents the value to your company and to you as an analyst. Both of these approaches follow many of the same steps on the analytics journey, but they often use different terminology. They are both about turning data into value, regardless of starting point, direction, or approach. Figure 2-4 provides a more detailed perspective that illustrates that these two approaches can work in the same environment on the same data and the very same problem statement. Simply put, all of the work and due diligence needs to be done to have a fully operational (with models built, tested, and deployed), end-to-end use case that provides real, continuous value.

Figure 2-4

Figure 2-4 Detailed Comparison of Data Versus Problem Approaches

There are a wide variety of detailed approaches and frameworks available in industry today, such as CRISP-DM (cross-industry standard process for data mining) and SEMMA (Sample Explore, Modify, Model, and Assess), and they all generally follow these same principles. Pick something that fits your style and roll with it. Regardless of your approach, the primary goal is to create useful solutions in your problem space by combining the data you have with data science techniques to develop use cases that bring insights to the forefront.

Distinction Between the Use Case and the Solution

Let’s slow down a bit and clarify a few terms. Basically, a use case is simply a description of a problem that you solve by combining data and data science and applying analytics. The underlying algorithms and models comprise the actual analytics solution. In the case of Amazon, for example, the use case is getting you to spend more money. Amazon does this by showing you what other people have also bought in addition to buying the same item that are purchasing. The intuition behind this is that you will buy more things because other people like you needed those things when they purchased the same item that you did. The model is there to uncover that and remind you that you may also need to purchase those other things. Very helpful, right?

From the exploratory data approach, Amazon might want to do something with the data it has about what people are buying online. It can then collect the high patterns of common sets of purchases. Then, for patterns that are close but missing just a few items, Amazon may assume that those people just “forgot” to purchase something they needed because everyone else purchased the entire “item set” found in the data. Amazon might then use software implementation to find the people who “forgot” and remind them that they might need the other common items. Then Amazon can validate the effectiveness by tracking purchases of items that the model suggested.

From a business problem approach, Amazon might look at wanting to increase sales, and it might assume (or find research which suggests) that, if reminded, people often purchase common companion items to what they are currently viewing or have in their shopping carts. In order to implement this, Amazon might collect buying pattern data to determine these companion items. The company might then suggest that people may also want to purchase these items. Amazon can then validate the effectiveness by tracking purchases of suggested items.

Do you see how both of these approaches reach the same final solution?

The Amazon case is about increasing sales of items. In predictive analytics, the use case may be about predicting home values or car values. More simply, the use case may be the ability to predict a continuous number from historical numbers. No matter the use case, you can view analytics as simply the application of data and data science to the problem domain. You can choose how you approach finding and building the solutions either by using the data as a guide or by dissecting the stated problem.

4. Analytics Methodology and Approach | Next Section Previous Section

Cisco Press Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from Cisco Press and its family of brands. I can unsubscribe at any time.