It goes without saying that the world could benefit from having real-time analytics embedded in their organizational workflow and decision making, but how are we currently approaching building or buying this capability? Innovative platforms such as in-memory computing could be an effective enabler of real-time analytics, but have these platforms been successful? If so, what would be the tenants of such a platform for it to create real value within the organization, and how far are we from adopting it? With that said, we wouldn’t want something so powerful to be seen as a privileged resource, so would it make sense to offer this as a service?
When it comes to areas such as cyber security, fraud protection or any other analytics driven endeavor, making split decisions is a necessity to stay competitive or even mitigate damage. Thus the concept of real-time analytics, such that these decision makers can be notified timely and instantaneously, has gained a lot of traction with the capabilities of new technology.
However, many organizations struggle with accumulating the data they need in order to make that decision in real time. Many records, recursive variables and transaction histories can complicate the data and the ability to make a decision. Multiple data threads and transactions being processed at once can produce “contention”. So while new technology has solved the velocity problem of Big Data, it’s on the heads of practitioners to summarize that information into a single transaction profile.
It’s important then for IT to help in enabling this greater and easier insight. From an IT perspective, the big part of driving this is essentially focusing on making sure that IT has the right sort of architecture of these systems understood. A very strong knowledge of what the options are in terms of very low latency access to data and understanding those systems very intimately to help make the right sort of decisions is essential.
In-Memory Computing can help in this understanding. What used to take strenuous effort of lab time 15 to 20 years ago can now be completed in seconds with ease. Though it will also be critical in cases where data is “self-learning” and there is no historical data available, IMC will be essential to help organizations understand the business value of the data they’re collecting, why they’re making a decision and the importance of responding rapidly to a customer inquiry. Though not all organizations fully recognize the competitive advantage just yet, the application of Real Time Analytics via IMC is going to make big difference to how customers feel they’re being accommodated and how well their needs are being addressed.
Sanjog Aul: Welcome listeners this is Sanjog Aul, your host and the topic for today’s conversation is “Enabling Real Time Analytics.” Joining us today is Scott Zoldi. Scott is the Vice President of Analytic Science for FICO. Hello Scott, thank you for joining us. Now as part of our series on In-Memory Computing, we have talked extensively about its potential and whether or not its ready for the mainstream, and yet one of the most interesting use cases involves something that goes beyond just technology or architecture. And that’s enabling Real-Time Analytics. So IMC or not, this is something a lot of organizations would like to tackle, and today we want to discuss how they might approach that and what immediate impact doing so could have. So Scott, the first question for you is where so far have organizations struggled to attain genuinely real-time analytics?
Scott Zoldi: Generally, with respect to organizations taking on real time analytics, it’s a technology attack. So one of the things that they’ve had trouble with in the past is whether or not it can accumulate the data that they need to make that real time decision in an efficient fashion. So as an example, FICO developed a set of models called fraud management models, and in those models we leverage something called transaction profiles where there is data that is efficiently summarized and condensed in terms of small records that would have these recursive variables, so it’s kind of like transaction histories. This is a product that’s been protecting cardholders in the US for two decades and leverages this concept of a small, summarized transaction data record. Those companies that don’t do that, that try to go and have real-time, on demand access to all the transaction history to make a decision, they struggle with the ability to do that in real-time because of the data access requirements.
“Companies like FICO concentrate only on what we call Streaming Analytics. That’s really this concept of summarizing data very efficiently in terms of a single transaction profile that allows us to get access to the entire history of transactions in a very summarized fashion.”
So when we look at real time analytics, companies like FICO concentrate only on what we call Streaming Analytics. That’s really this concept of summarizing data very efficiently in terms of a single transaction profile that allows us to get access to the entire history of transactions in a very summarized fashion, and that enables real-time analytics. And those that don’t use tricks like that typically do struggle.
Sanjog: When we look at the phrase “real time”, how close are we to actual real time, and how close to real time are the majority in getting insight from Big Data analytics and the like?
Scott: In terms of real time, the things like these fraud solutions that are out there in the market today that the banks leverage, a real time response from an analytic model would be about 10 milliseconds. With the newer technologies around NoSQL databases, it’s really possible to go sub-millisecond decisions in terms of making those real time decisions. So that’s roughly the time span.
So for example, here at FICO when we develop a fraud model, we’re returning realtime decisions and scores based on tenths of milliseconds. And when we look at areas such as cyber security, it’s very important that those decisions get made in a timely fashion so that people that need to make a decision or monitor what’s happening at the edges of networks can be notified of abnormal events and then try to remedy those situations.
Sanjog: Now what have been the challenges in real time analytics implementation thus far, whether you see it within FICO or outside, and what’s the need to accelerate that process even more?
Scott: I think some of the challenges have really been around adoption of newer technology. Today there are things such as Storm and Spark and other In-Memory databases that are available for making decisions in real-time. Typically what happens you have the situation where you need to decide the level of complexity of the scoring or the calculations that need to occur.
For example, we develop models that have a number of these summarized transaction profiles, and that leads to issues of what we call “contention”, where there might be multiple threads all trying to process a large number of transactions in real-time, all utilizing shared memory objects. That means that there’s contention for those pieces of information that are being summarized. That can be a training exercise, where if you are leveraging things like an NoSQL database coupled with Spark or Storm, you need to look at you know having the right number of compute notes and the right amount of potential remedies in place, or one moves away from some of the Big Data architectures that looks at some of the memory paths out there that allow access to this data to be done completely in-memory without the NoSQL databases that may have the combination of a cache and a disc persistence.
But that’s generally going to be the issue for the types of real time analytics and real time decisions that are relatively complex from an analytics perspective because there’s typically going to be a lot of shared data in these decisions and contentions for data resources in a very short period of time when that real time decision needs to be made.
“When we look at areas such as cyber security, it’s very important that those decisions get made in a timely fashion so that people that need to make a decision or monitor what’s happening at the edges of networks can be notified of abnormal events and then try to remedy those situations.”
Sanjog: So we understand the value of real time analytics and perhaps it might be providing some incremental value, but do you think this can be a source of competitive advantage?
Scott: Absolutely. Real-time analytics is certainly a competitive advantage. In the fraud space for example you have an opportunity to just stop the fraud transaction before it goes through with payment cards. In a cyber application, you have an opportunity to see that there is a breach occurring or there’s a commanding control set of messages that are going out between your organization or lets say a Botnet out there and you know how to stop that organization before its too late, versus looking at an analysis later and knowing that your data has left the organization.
“The more that we can bring those decisions, offers or notifications right to a customer or protect our defenses from a fraud or cyber security threat at the right time, that’s the competitive advantage.”
There are also many use cases where we want to make a real time marketing offer to customers who walk through a store based on things that associate with the most recent spending activity. And the more that we can bring those decisions, offers or notifications right to a customer or protect our defenses from a fraud or cyber security threat at the right time, that’s the competitive advantage. If we don’t, then we run into situations where we will see fraud being committed and have to deal with that after its already occurred, or odd situations such as breaches where data has been stolen from an organization and the opportunity to potentially stop that from happening wasn’t taken because there weren’t this real time analytics that could test that event.
Sanjog: Now to that end, how can IT help? How can IT help drive this advantage?
Scott: From an IT perspective, the big part of driving this is essentially focusing on making sure that we have the right sort of architecture of these systems understood. So as an example, the topic of contention for certain memory that is associated with an analytic model, that might be an exercise where some very specialized sort of knowledge of how to leverage, lets say Storm, might be something that will be used or might be something where we’d say listen, “We are not going to look at distributive computing, we need to have you know a architecture that involves the shared memory fabric because we have pieces of information or calculations that are going to have a very low latency.” So from an IT perspective, a very strong knowledge of what the options are in terms of very low latency access to data or more complicated NoSQL databases to have efficient memory caches coupled with a persistence layer to disc, and understanding those systems very intimately to help make the right sort of decisions around the latency of access in this data in a real time storing application, it is really essential and that’s probably where most applications struggle. But there are lots of options out there and around getting sufficient experience to the pros and cons of different types of technologies.
“When we look at our daily lives and all the data that we produce, having an analytics system that can go and make a real-time decision and provide the right sort of feedback to a user, it’s going to be more and more expected by anyone that uses PCs and mobiles devices and having it be relevant.”
Sanjog: Is the financial industry the one most likely to see benefits from this, or will everyone be able to benefit?
Scott: Well certainly the financial industry has a head start for things like credit card fraud. But other industries certainly will benefit. One of the most interesting things that I’m involved with in the last year has been this concept of cyber analytics potentially bringing a different breed of analytics to look for cyber security issues, and this is one that applies to every one and every company out there in terms of protecting your company from cyber threats. And in those situations today a lot of the signature based methods that are out there from a cyber security perspective are failing, and this is where more advanced analytics, similar to what we do in the fraud space, are really going to take hold and be differentiated to protect companies, and that affects every one of us.
In addition to that, the same technology that we’re talking about here, its key to the whole problem of velocity. We talk about the V’s of the data, the velocity of data as sent through systems. And I think that we look at our daily lives and all the data that we produce, having an analytics system that can go and make a real-time decision and provide the right sort of feedback to a user, it’s going to be more and more expected by anyone that uses PCs and mobiles devices and having it be relevant. So I think we’re going to see it take hold very significantly, and I think a lot of newer technologies around self learning analytics and topics like that will really become part of the next decade of kind of analytic advancements in terms of real time decision making.
Sanjog: How far are most organizations from adopting real-time analytics into their organization, and are these adoption challenges due to any infrastructure shortcomings or other issues? If so, would you think something like In-Memory Computing would be a viable fix?
Scott: In terms of many of the organization I think there’s a mind shift that has to occur. For example, in the fraud area, it was a natural area for real-time analytics to take place. When you swipe your credit card, there’s an authorization that has to occur, and the things responsible for making that authorization in a short period, we don’t want to be sitting in front of a payment device waiting to purchase our groceries.
“There’s still work to be done within organizations around making sure they understand you know just the business value… It’s going to be a differentiator in terms of how customers feel they’re being accommodated and how well their needs are being addressed.”
In other areas, I think there’s still work to be done within organizations around making sure they understand you know just the business value for some of those decisions in terms of, “Why do I need to collect this data? Why do I have to return a decision or send a tweet? Why would I want to respond so rapidly to that customer inquiry?” So, I think there’s a piece of it which is making sure that people understand that it is going to be a differentiator to their business. It’s going to be a differentiator in terms of how customers feel they’re being accommodated and how well their needs are being addressed.
From an IT perspective then and from a data collection perspective, there are challenges in terms of making sure that the frequency at which data is received by organizations is sped up. Things like an hourly batch are not going to be sufficient if you need a real-time decision. So there are some of those infrastructure changes that have to occur. I think those businesses that do that first will obviously be the innovators in the space and will have more competitive projects and offerings, and I think that will be part of the transformational seed. But one of the first steps is to get organizations and businesses to think about how they can use that as a differentiator.
Sanjog: So, would you think that In-Memory, this new paradigm of computing, could it perhaps be a savior or provide that assistance in making this a reality? Absolutely I do. It becomes much, much easier with a lot of the open source that we have today in terms of In-Memory Computing and Big Data infrastructures. To very quickly bring up proofs of concept of how these systems would work, I think that’s going to be one of the big drivers. Things that used to be challenges, like working in a high performance computing lab maybe 15 or 20 years ago, are now pretty routine in terms of the tools that we have on a Big Data prospect.
“Things that used to be challenges, like working in a high performance computing lab maybe 15 or 20 years ago, are now pretty routine in terms of the tools that we have on a Big Data prospect.”
And that’s great, because that’s really that ease of implementing and trying these things out from an in-memory and real-time analytics perspective are really going to be the things that are going to drive the innovation at this stage. Where things that were very difficult before Big Data and some of the open source tools are now relatively easy, and that’s going to open a huge landscape for innovation and new creative solutions using in-memory computing and real-time analytics.
Sanjog: Now, taking to the next level, would you think that Analytics as a Service is something you can foresee becoming a viable option in the future?
Scott: Absolutely. There’s a lot of focus on Analytics as a Service. I have an example from a historical precedent. Falcon continues to be one of the oldest and one of the most successful solutions with real-time analytics. These are systems that you would install at a customer’s premises, but now we’re bringing more and more of this real-time analytics onto cloud infrastructures to enable real-time analytics as a service.
So, I think that’s a natural next step. It’s in fact that something that FICO is actively looking at in terms of some of the same technologies that we’ve developed in Falcon, and in our cyber security solution, we bring it onto our own cycle of analytic cloud and enabling those analytics services in terms of real-time analytics. So, we should be able to see more and more of that, and I think that’s another way that we’re going to see innovation of this space because it further removes restrictions from people trying out the technologies and doing some group concepts around using real-time analytics to change their businesses.
Sanjog: So, when you look at the type of provider solutions or vender solutions that are available, what specific conflicts or challenges are surfacing? How should an organization go about identifying or determining if they’re a good candidate for such a solution?
Scott: The different solutions that you have out there have some level of pedigree in terms of you know the analytic techniques, and what are the pros and cons of using them? For example, there are a number of companies and services that will allow you access the data that you would persist with in-memory. But even if I have access to the transaction history in-memory, it still may not be fast enough for the private decisions that I need to make.
“Where things that were very difficult before Big Data and some of the open source tools are now relatively easy, and that’s going to open a huge landscape for innovation and new creative solutions using in-memory computing and real-time analytics.”
But I think that the key to the differentiation is as one looks at the technologies and vendors that are out there. Clearly define what the problem is that we wanted to find from a real-time analytics perspective. If it’s an analytic score, then there would be a lot more that one needs to do in respect to a pedigree of streaming analytics experience and effective use cases out there.
Without that and without looking at the reputation and credibility of this establishment, it’s going to be a little difficult than one might find. Our projects are being deemed failures, but they’re really not. It wouldn’t be if different technology decisions were made or different vendors were chosen that have the pedigree in applying the right streaming analytics techniques. That’s probably the biggest piece: finding the right vendor that has that experience, that has solutions in production today and that have the real-time analytics application that fits or mirrors some of the use cases or successes you’ve had in the real-time analytics space.
Sanjog: How so far has real-time analytics through the use of In-Memory computing been manifested, and what advice would have for someone looking to adopt something similar?
Scott: Today, the biggest use cases have been around addressing the velocity piece of big data. These are really interesting areas because there are situations where you may decide that the data that you see in real-time may or may not resemble historical data. You may or may not have the outcomes or paths that you’d want to develop a real model. Or you may just say it’s too non-stationary, and that’s a whole class of problems where the in-memory and real-time analytics is just essential because it’s a class of problems where the model needs to self-learn, which means that we’re not going to have time to just necessarily pour over loads of historical data to build static models that would be used or rule sets that would be used in these production environments. But it would be a class of problems
where its analytics itself that the real-time computing is learning relationships in real-time because its shifting so rapidly.
I think today is the best use case for that are cyber security challenges that we have with respect to mandate control activity and malware. The very advanced adversaries we have in those spaces in terms of their technical prowess and the amount of change that we see there, those are going to be the best sorts of use cases in terms of making an adoption and trying to merit some of the scenarios where you may not have historical data to build these applications or you may need the application to learn in production as data streams by.