Earlier this year, we announced Artemis, Endgame’s chat interface to facilitate and expedite complex analyses and detection and response within networks. Bots have been all the rage over the last few years, but had yet to make a splash in security. On the one hand, tech trends come and go and it is essential to refrain from jumping on the latest fad and avoid forcing the wrong solution onto a specific problem. At the same time, we could see the value in leveraging natural language processing in a conversational interface to reimagine search and discovery. And so we started on our journey into security bots. In this first of two posts, we’ll walk through our research problem and use case, the lessons learned as we dug deep into bot research, and the challenges with implementing a paradigm change in the way analysts and operators are accustomed to interacting with data. I mean we’re talking about a glorified chatbot, right? How hard can this be? Surely it wouldn’t require design discussions with engineering, product, data science, front end, UX, and technical writers...
Our Challenge
As data scientists working in security, we are well aware of the ever-growing data challenges, and the complaints of those analysts who are forced to find creative ways to circumvent the clunkiness of their current toolset. Their current workflow is manually-intensive, and any automation requires scripting or programming knowledge. They needed a way to help automate some of the rote processes, while integrating analytics such as anomaly detection that take too long to be operationally impactful. Often, when analysts are provided tools that help in this area, they either don’t fit the workflow, require learning some kind of proprietary language. In some cases, the tools actually make life harder, not easier, for the analysts.
Even if the automation challenge is overcome, most security teams still lack resources to stay apace of the growing challenges from the data and threat environments. This lack of resources crosses over into three distinct areas - time, money, and personnel. It is no secret that, on average, the time to discovery and detection of an attack lags significantly behind the time it takes for an attacker to achieve his or her objective. Consider the OPM breach, where the attackers had access to US personnel data for over a year before discovery. Unfortunately, most security teams lack the personnel and funding to shorten this gap.
Finally, while other tech industries have embraced and prioritized user experience for their broad and diverse user base, the security industry continues to make tools for themselves. They generally require a level of expertise that simply exceeds the talent pipeline, which simultaneously also detracts many from entering the field.
It is against this backdrop that we started to explore a solution that addresses these three core challenges security teams encounter:
Insufficient resources
Lack of automated tools
Usability of security platforms
Why Build A Bot?
We explored different ways to augment our users’ ability to tackle these three challenges by focusing on enhancing their ability to discover and remediate alerts. This remains a core pain point, with alert fatigue and false positives impeding their workflow. We wanted a solution that provided analysts an interface to ask questions of their data, automate collection, and bring back only the relevant information from an endpoint. When exploring our solution, we had two primary users to consider: Tier 1 and Tier 3 analysts.
As the graphic above illustrates, even though these individuals work together as a part of a security team, their workflows could not be more different. But maybe, if done correctly, we could augment Tier 1 analysts while providing Tier 3s a non-obtrusive (and even helpful!) interface for their day-to-day. Our solution: An intelligent assistant or a chatbot.
A bot allowed us to build a service that leverages natural language processing to determine user intentions and help automate workflows. By mimicking conversation, we could provide users an interface that allowed them to be as expressive or as curt as they want as they inquire about everything from alerts to endpoint data.
Rule #1 in Bot Design: Manage Expectations
When we pitched the idea to our team we were mindful of tossing around words like Artificial Intelligence. After all, we weren’t building JARVIS from Iron Man and at no point was Scarlett Johansson going to starting talking to SOC analysts about the latest network threats. Moreover, we emphasized that this feature would not be the abrasive hell that was MS Clippy. Instead we kept it simple. We posited that a simple conversational interface could encourage users to dig deep into endpoint data by eliminating the need for query syntaxes or complex click through user interfaces. It cannot be overstated that managing expectations was paramount to get the project off the ground. Once we were given the go-ahead a bigger question arose… “How are we going to build this thing?”
Challenges with Open Source Bots
The rise of bots and intelligent assistants is largely due to an increase in Bot Development Kits or BDKs. Companies like Wit.ai (acquired by Facebook), API.ai (acquired by Google) and Chatfuel provide a simple UI for the development of closed-domain, rule-based, goal-oriented bots where no real programming skills are required to successfully launch a bot. In addition, Microsoft and Amazon have released their own dev kits to allow users to quickly get an app integrated into their platforms. These various kits are treasure troves of knowledge on how best to build a bot to suit your needs. We would have loved to use any single one of them, but we were trying to provide a bot within our EDR Platform to interact with complex endpoint data, using a diverse and industry-targeted vocabulary none of these frameworks currently supported.
For instance, large companies have generally focused on building general assistant bots and deploying them on third-party platforms like Slack or Facebook Messenger. Our assistant had to be structured to enable users in a SOC to perform their duties safe and efficiently on our platform, making it impossible to use a third party bot kit service. The specificity of the security domain meant that we often weren’t working with just translating buttons from a form into a bot. Instead the entities (an NLP term we’ll dive into in the next post) we extract are often semistructured at best, and the intents are as varied as “find bad things” to “pull memory from this running process with PID 4231”.
After reviewing the current state-of-the-art it was clear we had our work cut out for us. Where and how do we acquire the dialog data necessary to build our bot? How can we account for and properly capture the diversity in vocabulary across the information security domain? How can we build a tool that helps train Tier 1 analysts while preventing Tier 3 analysts from being shackled by rigid, scripted routines. It was on us to figure out what makes a bot tick and then implement it ourselves.
Designing Artemis
The engine for Artemis needed to fulfill certain design goals. For starters, bot architectures can be rule-based or driven by machine learning, and is primarily dependent on the amount of training data at your disposal. The goal is to produce a dialog tree similar to the one pictured below that is designed to complete a task. Each branch of the tree represents a question a user must answer to progress to the next node or state.
To move toward this functionality, the bot needs to handle missing values for a given intent. For example a user might like to search for a process by name or PID to find if a specific piece of unwanted software is running on multiple endpoints. The user might write “find processes with the name svchost.exe on all endpoints” or “search processes on all endpoints”. In the second example, the intent classifier can classify “search processes” as a ‘search_process’ intent and then note that the intent contains an ordered list of required and wanted values. When Artemis is faced with a known intent and a lack of needed values, it simply asks the user to enter the missing value. Simple, but it works.
In addition to handling missing entities, some intents require multiple values in a single user input, such as username, process and endpoint. Some values a user enters can be recognized by a pattern-matching grammar, but some value types are hard to tell apart from one another. The best we can hope for here is disambiguation. So within a certain context “Admin” is likely a user name, but it may also be a file name. In this case, Artemis will propose one based on context and offer the opportunity to change its interpretation to file name.
Another finer point of implementing dialog systems, explored in detail in this Microsoft Research paper is implicit confirmation. Once done entering in the query string of the search, the user shouldn’t be forced to tell Artemis to continue to the endpoint selection. Instead, Artemis must implicitly confirm the query string choice and move the user towards the endpoint selection and execution of the operation. Although this may seem obvious, early draft designs of Artemis left the user in limbo. After entering one entity, the bot would reply that it “got it” but then wouldn’t move to the next.
This is why our development occurred so closer with both practitioners and our user experience team to ensure we account for as many of these ‘gotchas’ as possible. We took the wide-range of subject matter expertise at Endgame: incident responders, hunters, designers, and malware analysts, and built custom workflows and recommendations on what to do next during an investigation.
Next Steps
Having explored the state-of-the-art in bots, properly scoped our requirements, and integrated subject matter and design expertise, we were ready to finally throw some data science at it and make Artemis a reality. But before we could provide anything useful, we first had to teach Artemis to understand what the end user typed in the window. No. Easy. Task. In our next post, we’ll get into the technical details of our natural language processing (NLP) approach, including the basic tenets of NLP that are relevant to bots and the necessity to coordinate with user experience designers to truly address our user’s key pain points. Combining user-centric design with our natural language understanding pipeline proved to be the essential factor in delivering cutting edge data science capabilities within our user-friendly platform.