Industry Reactions to Devastating Sony Hack

December 4, 2014, 4:00 pm

≫ Next: Is This the Beginning of the End of “Duel”-track Foreign Policy?

≪ Previous: Soft Power is Hard: The World Internet Conference Behind the Great Firewall

↧

Is This the Beginning of the End of “Duel”-track Foreign Policy?

December 8, 2014, 4:00 pm

≫ Next: Blurred Lines: Dispelling the False Dichotomy between National & Corporate Security

≪ Previous: Industry Reactions to Devastating Sony Hack

by Andrea Little Limbago

The Iranian nuclear negotiations occupy a persistent spot in the foreign policy news cycle. The Associated Press recently reported that Iran has agreed to a list of nuclear concessions. Although still improbable, the likelihood of even minimal collaboration between the United States and Iran appears greater now than in recent memory. Unless, of course, you happened to stumble upon the revelations of Operation Cleaver, which has been largely ignored by all but the tech media outlets. The report highlights an alleged widespread Iranian cyber campaign targeted at critical infrastructure in about a dozen countries, including the United States. Just as we’re seeing the first glimpses of potential US and Iran cooperation in the nuclear realm, the opposite is happening in cyberspace. This uncomfortable reality highlights the modern age of diplomacy, wherein diplomacy in the physical world and in the virtual world is completely orthogonal.

Congressman Mike Rogers, House Committee Intelligence Chair, is one of the few policymakers who has actually noted this potential relationship between policy in the physical and virtual worlds, stating that if the nuclear negotiations fail, Iran could resume cyber activity. Unfortunately, as Operation Cleaver highlights, Iranian cyber activity targeting physical infrastructure has likely been escalating, not de-escalating, over the last two years. Operation Cleaver is perhaps the timeliest example of the Janus-faced nature of foreign policy, which has occurred for well over a decade but is not unique to Iranian-US relations. Take the recent APEC meeting, for example, where the US and China brokered a deal to counter climate change. This occurred within weeks of an FBI warning of a widespread Chinese cyber campaign targeted both at the US private sector as well as government agencies, and within days of the announcement of a September breach at the US National Weather Service. This, too, has been linked to China. Similarly, the US and Russia continued the START nuclear non-proliferation negotiations earlier this year just as cyber-attacks escalated, some of which were targeted at US federal agencies. Of course, both states actually escalated their deployed nuclear forces since this past March, but nevertheless the two countries are still on track for additional negotiations in 2015. It’s not unusual for states to pursue divergent relationships across distinct areas of foreign policy. Cooperation vacillates between the various arenas, but rarely does it take on the dueling nature we see occurring between the physical and virtual worlds.

The Director of National Intelligence, James Clapper, and many, many other leaders have described the modern era as containing an unprecedented array of diverse and dynamic threats. This brings new challenges, of course, but perhaps one of the most striking challenges remains largely unspoken. Foreign policy in the modern era has thus far differentiated relationships in the physical and virtual worlds. Will this remain a distinct, modern foreign policy challenge? With the continued trend of the private sector surfacing foreign nation-state cyber campaigns, it seems 2014 may mark the beginning of the end of dueling foreign policies. The ongoing series of revelations of alleged foreign states and their affiliates targeting the US public and private sectors (e.g. China’s PLA Unit 61398 and Axiom group, Russian association with the JP Morgan breaches, North Korea with Sony, and now Operation Cleaver) is likely indicative of the future “outing” of cyber behavior by the private sector. In the future, the US is likely to leverage disclosures made by the private sector, which in turn provides the government the luxury of concealing or revealing its own information, and can even assist negotiations across the diplomatic spectrum.

This period of disparate US policies in the physical and virtual worlds will be increasingly difficult to juggle in light of publicized revelations of cyber campaigns conducted against US federal agencies and corporations. At some point, public opinion will reach a tipping point and demand a more coordinated response and defense against cyber campaigns by foreign states. It will be increasingly difficult to maintain a two-track foreign policy as new revelations occur. That tipping point may still be in the distant future, as the US public remains largely unaware of many of these campaigns because they are not broadly publicized. In fact, many of these foreign-sponsored cyber campaigns – especially if targeted against federal agencies – remain publicized only by tech-focused media outlets. We’ll spend some time examining this particular trend in more detail in a future post.

↧

Blurred Lines: Dispelling the False Dichotomy between National & Corporate Security

December 15, 2014, 4:00 pm

≫ Next: Understanding Crawl Data at Scale (Part 1)

≪ Previous: Is This the Beginning of the End of “Duel”-track Foreign Policy?

by Andrea Little Limbago and Cody Pierce

Several US government agencies have experienced targeted cyber attacks over the last few months. Many believe China is responsible for cyber attacks on the Office of Personnel Management, the US Postal Service and National Weather Service. Russia has been linked to many recent breaches including those on the White House and State Department unclassified networks. Given the national security implications of such breaches, these attacks should have monopolized the news cycle. However, they have barely registered a small blip. Conversely, the data breaches at large companies such as Sony, Home Depot, Target and Neiman Marcus have dominated the news and have led many Americans to rank concern over hacking higher than any other criminal activity. But characterizing these events as solely private sector or public sector breaches oversimplifies the state of cyber security today. Many of the private sector intrusions are linked to Russia, China, Iran, and now even North Korea. While the Sony breach remains contested, a North Korean spokesman claimed it was part of the larger struggle against US imperialism. In fact, many of these private sector breaches have been directly linked to or are considered retaliation for various aspects of US foreign policy. Formulating a rigid line between public and private sector categorization is not only erroneous, but it also masks the reality of the complex cyber challenges the US faces.

From Unity Against a Common Threat to Disunity Against a Hydra

In the late 1980s and early 1990s, Japan was perceived as a greater threat to US security than the Soviet Union. The private sector was quite vocal during this time, providing evidence of dumping and unfair trade practices, while supporting voluntary export restraints and a series of other protectionist measures for the US domestic sector. While one can question the success of the policies (and assessment of the threat!), it is clear that a unified understanding of a common threat among private and public sectors greatly enhanced the efficiency with which the US was able to respond. It is this common understanding between the two groups that is still missing today.

Russia, China, Iran and many, many other groups have been elevating cyber-attacks on the federal government and private sector for well over a decade. China has been wielding cyber attacks against federal agencies since at least 1999, when it targeted the National Park Service and the Departments of Energy and Interior. However, this is no longer a government-to-government problem, with the increase of non-state actors as both perpetrators (e.g. Syrian Electronic Army) and victims (e.g. multi-national corporations). Each kind of attack – regardless of state or non-state actor involvement – has both national security and economic implications. For instance, Target’s profits and reputation have taken a big hit following last year’s credit card breach. Home Depot faces similar economic risk over the loss of customers following its data breach. It’s too soon to tell exactly how much financial and reputational damage the breach at Sony will incur. These private sector breaches also have natural security implications, especially when targeted at the financial sector and critical infrastructure, which is increasingly a target of cyber-attacks by foreign governments (e.g. Operation Cleaver). Despite these similarities in adversaries, there remains a stark disconnect in the portrayal and general contextualization of breaches in the private and public sectors.

Technical Similarities

These private and public sector breaches exhibit not only similar threat profiles, but also technical similarities. These attacks are indicative of the larger tactics, techniques and procedures (TTPs) of adversaries as they conduct reconnaissance and trust-building intrusions that lead to major attacks such as the Sony breach. In many cases, the initial access to the target systems was through third party contractors, both government and commercial, as well as through targeted spear phishing and watering hole attacks. In each case, the commonality is leveraging trust. From an attacker’s standpoint, every breach of trust enables more opportunity. Successful spearphishing campaigns gather enough information about their targets to properly craft the most effective message to entice a click. In the case of recent federal agency breaches, it is important to remember that adversaries conduct reconnaissance of networks prior to escalating to major attacks, and they often begin with lower value targets before escalating to higher value targets. Every seemingly harmless intrusion must be viewed as a first step toward a larger attack, and not an end in and of itself. If an attacker compromises a government office, what information does that office have that could be used to further compromise both government and commercial companies? Something seemingly innocuous, like email addresses of contractors, could be used to launch a new targeted operation. At some point, people make mistakes, and attackers thrive on mistakes. They have the benefit of time and information to make the best decision about how to increase their trust until critical systems and information have been infiltrated. In short, the TTPs – especially the exploitation of trust to conduct ever-greater intrusions – are very similar in private and public sector breaches.

Could More Convergence Lead to a Unified Response?

Last week, the Senate Banking Committee discussed cybersecurity in the financial sector, including the Cybersecurity Information Sharing Act. Clearly, this is an important step. However, absent from this discussion were some of the major stakeholders in the financial industry, further perpetuating the divide between the public and private sectors. Only when there is a common understanding of the threats and challenges of cyberspace can the two sides come together and provide more holistic and effective responses. The cyber attacks on federal agencies and the private sector must finally be elevated within popular discourse and be understood for what they are – reconnaissance and trust-building intrusions, increasingly by the same foreign adversaries. As news of another cyber attack on a federal agency or private sector occurs, it would be much more helpful if it was placed in the larger context as a targeted, national security breach. A unified response by the US first requires a unified understanding of the threat. Absent a coherent and integrated understanding of the threat, attacks against banks, corporations and federal agencies will only continue to grow.

↧

Understanding Crawl Data at Scale (Part 1)

December 17, 2014, 4:00 pm

≫ Next: 20 Startups to Watch in 2015

≪ Previous: Blurred Lines: Dispelling the False Dichotomy between National & Corporate Security

by John Munro

A couple of years ago, in an effort to better understand technology trends, we initiated a project to identify typical web site characteristics for various geographic regions. We wanted to build a simple query-able interface that would allow our analysts to interact with crawl data and identify nonobvious trends in technology and web design choices. At first this may sound like a pretty typical analysis problem, but we have faced numerous challenges and gained some pretty interesting insights over the years.

Aside from the many challenges present in crawling the Internet and processing that data, at the end of the day, we end up with hundreds of millions of records, each with hundreds of features. Identifying “normal trends” over such a large feature set can be a daunting task. Traditional statistical methods really break down at this point. These statistical methods work well for one or two variables but are rendered pretty useless once you hit more than 10 variables. This is why we have chosen to use cluster analysis in our approach to the problem.

Machine learning algorithms, the Swiss army knife of a data scientist’s toolbox, break down into three classifications: supervised learning, unsupervised learning, and reinforcement learning. Although mixed approaches are common, each of the three lends itself to different tasks. Supervised learning is great for classification problems where you have a lot of labeled training data and you want to identify appropriate labels for new data points. Unsupervised techniques help to determine the shape of your data, categorizing data points into groups by mathematical similarity. Reinforcement learning includes a set of behavioral models for agent-based decision-making in environments where the rewards (and penalties) are only given out on occasion (like candy!). Cluster analysis fits well within the realm of unsupervised learning but can take advantage of supervised learning (making it semi-supervised learning) in a lot of scenarios, too.

So what is cluster analysis and why do we care? Consider web sites and features of those sites. Some sites will be large, others small. Some will have lots of images; others will have lots of words. Some will have lots of outbound links, and others will have lots of internal links. Some web sites will use Angular; others will prefer React. If you look at each feature individually, you may find that the average web site has 11 pages, 4 images and 347 words. But what does that get you? Not a whole lot. Instead, let’s sit back and think about why some sites may have more images than others or choose one java script library over another. Each webpage was built for a purpose, be it to disseminate news, create a community forum, or blog about food. The goals of the web site designer will often guide his or her design decisions. Cluster analysis applies #math to a wide range of features and attempts to cluster websites into groups that reflect similar design decisions.

Once you have your groups, generated by #math, you’ve just made your life a whole lot simpler. A few minutes (or hours) ago you had potentially thousands or millions of items to compare across hundreds of fields. Now you’ve got tens of groups that you can compare in aggregate. Additionally, you now know what makes each group a group and how it distinguishes itself from one or more other groups. Instead of looking at each website or field individually, now you’re looking at everything holistically. Your job just got a whole lot easier!

Cluster analysis gives you some additional bonus wins. Now that you have normal groups of websites, you can identify outliers within the set - those that are substantially dissimilar from the bulk of their assigned group. You can also use these clusters as labels in a classifier and determine in which group of sites a new one fits best.

In coming posts, we will go into more detail about how we cluster and visualize web crawl data. Stay tuned!

↧

20 Startups to Watch in 2015

December 29, 2014, 4:00 pm

≫ Next: The Fog of (Cyber) War: The Attribution Problem and Jus ad Bellum

≪ Previous: Understanding Crawl Data at Scale (Part 1)

↧

The Fog of (Cyber) War: The Attribution Problem and Jus ad Bellum

January 1, 2015, 4:00 pm

≫ Next: Is there more to Washington's economy than government? Investors think so.

≪ Previous: 20 Startups to Watch in 2015

by Andrea Little Limbago

The Sony Pictures Classics film The Fog of War is a comprehensive and seemingly unfiltered examination of former Secretary of Defense Robert McNamara, highlighting the key lessons he learned during his time as a central figure in US national security from WWII through the Cold War. The biopic calls particular attention to jus ad bellum – the criteria for engaging in conflict. Over a decade later, Sony itself is now at the center of a national security debate. As the US government ponders a “proportional response” – a key tenet of Just War theory– in retribution for the Sony hack, and many in the security community continue to question the government’s attribution of the breach to North Korea, it is time to return to many of McNamara’s key lessons and consider how the difficulty of cyber attribution – and the prospect of misattribution – can only exacerbate the already tenuous decision-making process in international relations.

Misperception: The misperception and miscalculation that stem from incomplete information are perhaps the most omnipresent instigators across all forms of conflict. McNamara addresses this through the notion that “seeing and belief” are often wrong. Similarly, given the difficulty of positively attributing a cyber attack, victims and governments often resort to confirmation bias, selecting the circumstantial evidence which best confirms their beliefs. Cyber attacks aggravate the misguided role of incomplete information, leaving victims to formulate a response without fully knowing: 1) the financial and national security magnitude of the breach; 2) what the perpetrator will do with the information; 3) the perpetrator’s identity. Absent this information, a victim may respond disproportionally and target the wrong adversary in response.
Empathize with your Enemy: McNamara’s lesson draws from Sun Tzu’s “know thy enemy” and describes the need to evaluate an adversary’s intent by seeing the situation through their eyes. Understanding the adversary and their incentives is an effective way to help identify the perpetrator, given the technical challenges with attribution. To oversimplify, code can be recycled from previous attacks, purchased through black markets for malware, and can be socially engineered to deflect investigations towards other actors. Moreover, states can outsource the attack to further redirect suspicions. A technical approach can limit the realm of potential actors responsible, such as to nation-states due to the scope and complexity of the malware. But it is even more beneficial to marry the technical approach with an understanding of adversarial intent to help gain greater certainty in attribution.
Proportionality: Proportionality is a key component both of jus ad bellum, as well as jus in bello (criteria for behavior once in war). Given his role in the carpet-bombing of Japan, McNamara somewhat surprisingly stresses the role of a proportional response. President Obama’s promise of a proportional response to the Sony breach draws specifically on this Just War mentality. But the attribution problem coupled with misperception and incomplete information make it exceedingly difficult to formulate a proportional response to a cyber attack. Clearly, a response would be more straightforward if there were a kinetic effect of a cyber attack, such as was recently revealed in the Turkey attack that occurred six years ago. But even this still begs the question of what a proportional response looks like after so many years. It could similarly be years before the complete magnitude of the Sony breach is realized, or exactly what that ‘red line’ might be that would trigger a kinetic or non-kinetic response to a cyber attack.
Rational choice: A key theory in international relations, rational choice theory assumes actors logically make decisions based on weighing potential costs and benefits of an action. While this continues to be debated, McNamara notes that with the advent of nuclear weapons, human error can lead to unprecedented destruction despite rational behavior. This is yet again magnified in the cyber domain, especially if misattribution leads to retaliation against the wrong adversary, or human error in a cyber response has unintended consequences. Rational choice decisions are only as good as the data at hand, and therefore seemingly “rational” decisions can inadvertently result in unintended results due to limited data or misguided data interpretations. Moreover, similar to the nuclear era, human error can also lead to unprecedented destruction in the cyber domain. However, cyber retaliatory responses are not limited to a select few high level officials, but rather the capabilities are much more dispersed across agencies and leadership levels, expanding the scope for potential human error.
Data-driven Analyses: McNamara’s decision to bring in a team of quants to take a more innovative approach to national security analysis is a milestone in international relations. However, like all forms of analyses, quantitative and computational analyses must not be accepted at face value, but rather must be subjected to rigorous inspection of the data and methodologies employed to produce the findings. The last few weeks have seen a range of analyses used to either validate or add skepticism to the attribution of North Korea to the Sony breach. These clearly range significantly in the level of analytic rigor, but many are plagued with limited data which produces analytic problems such as: 1) a small-N, meaning any results should be met with skepticism and are not statistically significant; 2) natural language processing analyses using models that are trained on different language structures and so do not travel well to coding languages; 3) selection bias wherein the sample of potential actors analyzed is not a representative sample; 4) poor data sampling, wherein analysis of different subsets of the data lead to differing conclusions. Because of these different analytic hurdles, various analyses point unequivocally to actors as diverse as North Korea, the Lizard Squad, Russia, Guardians of Peace, and an insider threat. Clearly, attributing the attack is a key goal of the analyses, but limited data exacerbates the ability to confirm prior beliefs. Data-driven analyses provide solid footing when making claims, but the various forms of data gaps inherent in cyber make it much more vulnerable to misinterpretation.

Beyond a Cold War Framework: Each of these lessons highlights how the digital age amplifies the already complex and opaque circumstances surrounding jus ad bellum. As we begin another year, we are yet again reminded not only of the seemingly cyclical nature of history, but also of just how distinct the modern era is from its predecessors. It’s time for a framework that builds upon past knowledge while also adapting to the realities of the cyber domain. Too often, decision-making remains relegated to a Cold War framework, such as the frameworks for conventional warfare, mutually assured destruction, and a known adversary. It would be devastating if the complexity of the cyber domain led to misattribution and a response against the wrong adversary – and all of the unintended consequences that would entail. If nothing else, let’s hope the Sony breach serves as a wake up call for a new policy framework rigorous enough to handle the fog of cyber war.

↧

Is there more to Washington's economy than government? Investors think so.

January 4, 2015, 4:00 pm

≫ Next: The Year Ahead in Cyber: Endgame Perspectives on 2015

≪ Previous: The Fog of (Cyber) War: The Attribution Problem and Jus ad Bellum

↧

The Year Ahead in Cyber: Endgame Perspectives on 2015

January 6, 2015, 4:00 pm

≫ Next: Could a Hollywood Breach and Some Tweets Be the Tipping Point for New Cyber Legislation?

≪ Previous: Is there more to Washington's economy than government? Investors think so.

From the first CEO of a major corporation resigning in the wake of a cyber attack, to NATO incorporating the cyber realm into Article 5, to the still fresh-in-our-minds Sony attack, 2014 was certainly a year to remember in cyber security. As we begin another year, here’s what some of us at Endgame predict, anticipate, or hope 2015 will bring for cyber:

Lyndon Brown, Enterprise Product Manager
In 2014, security teams were blind to most of the activity that happened within their networks and on their devices. While the majority of this activity was benign, security breaches and other malicious activity went unnoticed. These incidents often exposed corporate data and disrupted business operations.

2015 is the year that CISOs must decide that this reality is unsustainable. Motivated, in part, by high-profile breaches, security heads will adjust their strategy and manifest this shift in their 2015 budgets. On average, CISOs will increasingly fund threat detection and incidence response initiatives. As the top security executive of a leading technology company poignantly stated, “we’ve finally accepted that any of our systems are or can be compromised”.

Since security budgeting is usually a zero-sum game, spending on preventive controls (such as anti-virus products) will stay stagnant or decline. As security buyers evaluate new products, they will prioritize solutions that leverage context and analysis to make advanced security judgments, and that see all security-relevant behavior – not just what is available in logs.

Rich Seymour, Senior Data Scientist@rseymour
The world of computer security will no doubt see some harrowing attacks this year, but I remain more hopeful than in years past. Burgeoning work in electronic communication—secure, encrypted, pseudo-anonymized and otherwise (like Pond, ssh-chat, bitmessage, DIME, etc)—won’t likely move into the mainstream in 2015, but it’s always neat to see which projects gain traction. The slowly paced rollout of sorely needed secure open voting systems will continue, which is awesome, and includes California’s SB360 allowing certification of open source voting systems, LA County’s work in revamping its election experience, Virginia’s online voter registration, and the OSET foundation’s work, just to name a few.

I hope that this year’s inevitable front-page security SNAFUs will lead more people to temper their early adoption with a measure of humorous cynicism. Far on the other side of the innovation adoption graph, let’s hope that those same security SNAFUs lead the behemoth tech laggards to pull the plug on dubious legacy systems and begin a blunt examination of their infrastructural vulnerabilities. As a data scientist at Endgame, I don’t want to make any predictions in that domain, lest I get thrown to the wolves on twitter for incorrectly predicting that 2015 will be the year a convolutional deep learning network will pre-attribute an attack before the first datagram hits the wire. Let’s not kid ourselves—that’s not happening until 2016 at the earliest.

Jason Rodzik, Director of CNO Software Engineering
In 2015, I expect to see companies—and maybe even the public as a whole—taking computer security much more seriously than they have previously. 2014 ended with not only a number of high-profile breaches, but also unprecedented fallout from those breaches, including the replacement of a major corporation’s (Target’s) CEO and CIO, increased interest in holding companies legally responsible if they fail to secure their systems, and most drastically, a chilling effect on artistic expression and speech (in addition to the large financial damages) with the reactions resulting from the Sony hack. Historically, it’s been hard for anyone looking at financial projections to justify spending money on a security department when it doesn’t generate revenue, but the cost associated with poor security is growing to the point where more organizations will have to be much more proactive in strengthening their security posture.

Douglas Raymond, Vice President
One area where cybersecurity products will change in 2015 is in the application of modern design principles to the user interfaces. There’s a shortage of skilled operators everywhere in the industry, and there isn’t enough time or resources to train them. Companies must solve their challenges with small staffs that have a diversity of responsibilities and not enough time to learn how to integrate a multitude of products. The cost of cognitive overload is high. Examples such as the shooting down of ML17 over Ukraine, the U.S. bombing of the Chinese Embassy in Belgrade, and the Target data breach, to cite a well known cybersecurity example, demonstrate the real costs of presenting operators with too much information in a poorly designed interface. Data science isn’t enough—cyber companies in 2015 will synthesize data and control interfaces to provide operators with only the most critical information they need to solve the immediate security challenge.

Andrea Little Limbago, Principal Social Scientist@limbagoa
This year will be characterized by the competing trends of diversity and stagnation. The diversity of actors, targets, activities, and objectives in cyberspace will continue to well outpace the persistent dearth of a strategic understanding of the causes and repercussions of computer network operations. A growing number of state and non-state actors will seek creative means to use information technology to achieve their objectives. These will range from nation-state sponsored cyber attacks that may result in physical damage on the one extreme, to the use of cyber statecraft to advance political protest and social movements (e.g. potentially a non-intuitive employment on DDoS attacks) and give a voice to those censored by their own governments on the other. Furthermore, there will be greater diversity in the actors involved in international computer network operations. With the transition away from resources and population to knowledge-based capabilities within cyberspace, there will be a “rise of the rest” similar to economic forecasts of the BRICs (Brazil, Russia, India, China, and later South Africa) a decade and a half ago. Just like those forecasts, some of the rising actors will succeed, and some will falter. In fact, the BRIC countries will be key 2015 cyber actors, simultaneously using computer network operations internally to achieve domestic objectives, and externally to further geopolitical objectives. Additionally, those actors new to the cyber domain – from rising states to multinational corporations to nongovernment organizations – may subsequently expose themselves to retaliation for which they are ill prepared.

However, despite this diversity, we’ll continue to witness the juxtaposition of theoretical models from previous areas onto the cyber domain. From a Cold War framework to the last decade’s counter-terrorism models, many will attempt to simplify the complexities of cyberspace by merely placing it in the context of previous doctrine and theory. This “square peg in a round hole” problem will continue to plague the public and private sectors, and hinder the appropriate institutional changes required for the modern cyber landscape. Most actors will continue to respond reactively instead of proactively, with little understanding of the strategic repercussions of the various aspects of tactical computer network operations.

↧

Could a Hollywood Breach and Some Tweets Be the Tipping Point for New Cyber Legislation?

January 13, 2015, 4:00 pm

≫ Next: Obama's cyber bill faces tough questions from experts

≪ Previous: The Year Ahead in Cyber: Endgame Perspectives on 2015

by Andrea Little Limbago

Two months ago, near-peer cyber competitors breached numerous government systems. During this same time, China debuted its new J-31 stealth fighter jet, which has components that bear a remarkable resemblance to the F-35 thanks to the cyber-theft of data from Lockheed Martin and subcontractors. One might think that this string of cyber breaches into a series of government systems and emails, coupled with China’s display of the fighter jet, would raise public alarm about the increasing national security impact of cyber threats. But that didn’t happen. Instead, it took the breach of an entertainment company, and the cancellation of a movie, to dramatically increase public awareness and media coverage of these threats. While the Sony breach ultimately had minimal direct national security implications, it nevertheless marks a dramatic turning point in the level of attention and public concern over cybersecurity.

Whereas the hack of a combatant command’s Twitter feed a month ago would not have garnered much attention, this week it was considered breaking news and covered by all major news outlets - despite the fact that the Twitter account is not hosted on government servers, and the Department of Defense noted that although it was a nuisance, it does not have direct operational impact. Media coverage consistently reflects public interest. The high-profile coverage of these two latest events, which exhibit tertiary links to national security, reflects the sharp shift in public interest toward cybersecurity and a potentially greater demand for government involvement in the cybersecurity domain. In all likelihood, the Sony breach will not be remembered for its vast financial and reputational impact, but rather for its impact on the public discourse. This discourse, in turn, may well be the impetus that the government requires to finally emerge from a legislative stasis and enable Congress and the President to pursue the comprehensive cyber legislation and response strategies that have been lacking for far too long.

The widespread reporting and interest in the Sony breach may in fact spark a sharp change from an incremental approach to public policy toward a much more dramatic shift. In social and organizational theory, this is known as punctuated equilibrium, whereby events occur that instigate major policy changes. While it is disconcerting - but not shocking - that the Sony breach may be just this event, the recent large media focus on CENTCOM’s Twitter feed (which some go so far as to call a security threat) signals that the discourse has dramatically changed. This is great timing for President Obama, as he speaks this week about private-public information sharing and partnerships prior to highlighting cyber threats within his State of the Union speech next week. In fact, he is using these recent events to validate his emphasis on cybersecurity in next week’s address, noting“With the Sony attack that took place, with the Twitter account that was hacked by Islamist jihadist sympathizers yesterday, it just goes to show much more work we need to do both public and private sector to strengthen our cyber security.” Clearly, these events - which on the national security spectrum of breaches over the last few years are relatively mundane - have triggered a tipping point in the discourse of cybersecurity threats such that cyber legislation may actually be possible.

These recent events provide a “rally around the flag” effect, fostering a public environment that is encouraging of greater government involvement in the cybersecurity realm (and is a notably stark contrast to the public discourse post-Snowden in 2013). Of course, while there is reason for optimism that 2015 may be the year of significant cybersecurity legislation, even profound public support for greater government involvement in cybersecurity cannot fix a divided Congress. With previous cybersecurity legislation passing through an Executive Order after it failed to pass Congress, there is little reason to believe there won’t be similar roadblocks this time around. In addition to the institutional hurdles, legislators will also have to strike the balance between freedom of speech, privacy and security - a debate that has divided the policy and tech communities for years. European leaders just released a Joint Statement, which includes greater emphasis to “combat terrorist propaganda and the misleading messages it conveys”. Doing this effectively without stepping on freedom of speech will be challenging to say the least. However, despite these potential roadblocks, the environment is finally ripe for cyber legislation thanks to the cancellation of a movie over the holiday season and a well-timed hack of a COCOM Twitter feed. Now that the public is paying more attention, cybersecurity policy and legislation may finally move beyond an incremental shift and closer to the dramatic change that is ultimately in sync with the realities of the cyber threat landscape.

↧

Obama's cyber bill faces tough questions from experts

January 14, 2015, 4:00 pm

≫ Next: Is Barack Obama a Cybersecurity Leader?

≪ Previous: Could a Hollywood Breach and Some Tweets Be the Tipping Point for New Cyber Legislation?

↧

Is Barack Obama a Cybersecurity Leader?

January 21, 2015, 4:00 pm

≫ Next: State of the Union address disappoints security experts

≪ Previous: Obama's cyber bill faces tough questions from experts

↧

State of the Union address disappoints security experts

January 21, 2015, 4:00 pm

≫ Next: Understanding Crawl Data at Scale (Part 2)

≪ Previous: Is Barack Obama a Cybersecurity Leader?

↧

Understanding Crawl Data at Scale (Part 2)

January 28, 2015, 4:00 pm

≫ Next: Understanding Crawl Data at Scale (Part 3)

≪ Previous: State of the Union address disappoints security experts

by Richard Xie

Effective analysis of cyber security data requires understanding the composition of networks and the ability to profile the hosts within them according to the large variety of features they possess. Cyber-infrastructure profiling can generate many useful insights. These can include: identification of general groups of similar hosts, identification of unusual host behavior, vulnerability prediction, and development of a chronicle of technology adoption by hosts. But cyber-infrastructure profiling also presents many challenges because of the volume, variety, and velocity of data. There are roughly one billion Internet hosts in existence today. Hosts may vary from each other so much that we need hundreds of features to describe them. The speed of technology changes can also be astonishing. We need a technique to address these rapid changes and enormous features sets that will save analysts and security operators time and provide them with useful information faster. In this post and the next, I will demonstrate some techniques in clustering and visualization that we have been using for cyber security analytics.

To deal with these challenges, the data scientists at Endgame leverage the power of clustering. Clustering is one of the most important analytic methodologies used to boil down a big data set into groups of smaller sets in a meaningful way. Analysts can then gain further insights using the smaller data sets.

I will continue the use case given in Understanding Crawl Data at Scale (Part 1): the crawled data of hosts. At Endgame, we crawl a large, global set of websites and extract summary statistics from each. These statistics include technical information like the average number of javascript links or image files per page. We aggregate all statistics by domain and then index these into our local Elasticsearch cluster for browsing through the results. The crawled data is structured into hundreds of features including both categorical features and numerical features. For the purpose of illustration, I will only use 82 numerical features in this post. The total number of data points is 6668.

First, I’ll cover how we use visualization to reduce the number of features. In a later post, I’ll talk about clustering and the visualization of clustering results.

Before we actually start clustering, we first should try to reduce the dimensionality of the data. The very basic EDA (Exploratory Data Analysis) method of numerical features is to plot them on a scatter matrix graph, as shown in Figure 1. It is an 82 by 82 plot matrix. Each cell in the matrix, except the ones on the diagonal line, is a two-variable scatter plot, and the plots on the diagonal are the histograms of each variable. Given the large number of features, we can hardly see anything from this busy graph. An analyst could spend hours trying to decipher this and derive useful insights:

Figure 1. Scatter Matrix of 82 Features

Of course, we can try to break up the 82 variables into smaller sets and develop a scattered matrix for each set. However, there is a better visualization technique available for handling the high dimensional data called a Self-Organizing Map (SOM).

The basic idea of a SOM is to place similar data points closely on a (usually) two dimensional map by training the weight vector of each cell on the map with the given data set. A SOM can also be applied to generate a heat map for each of the variables, like in Figure 2. In that case, a one-variable data set is used for creating each subplot in the component plane.

Figure 2. SOM Component Plane of 82 Features

By color-coding the magnitude of a variable, as shown in Figure 2, we can vividly identify those variables whose plots are covered by mostly blue. These variables have low entropy values, which, in information theory, implies that the amount of information is low. We can safely remove those variables and only keep the ones whose heat maps are more colorful. The component plane can also be used to identify similar or linearly correlated variables, such as the image at cell (2,5) and the one at cell (2,6). These cells represent the internal HTML pages count and HTML files count variables, respectively.

Based on Figure 2, 29 variables stood out as potential high information variables. This is a data-driven heuristic for distilling the data, without needing to know anything about information gains, entropy, or standard deviation.

However, 29 variables may still be too many, as we can see that some of them are pretty similar. It would be great to sort the 29 variables based on their similarities, and that can be done with a SOM. Figure 3 is an ordered SOM component plane of the 29 variables, in which similar features are placed close to each other. Again, the benefit of creating this sorted component plane is that any analyst, without the requirement of strong statistical training, can safely look at the graph and hand pick similar features out of each feature group.

Figure 3. Ordered SOM Component Plane

So far, I demonstrated how to use visualization, specifically a SOM, to help reduce the dimensionality of the data set. Please note that dimensionality reduction is another very rich research topic (besides clustering) in data science. Here I only mentioned an extremely small tip of the iceberg, using a SOM component plane to visually select a subset of features. One more important point about the SOM is that it not only helps reduce the number of features, but also brings down the number of data points for analysis by generating a set of codebook data points that summarize the original larger data set according to some criteria.

In Part 3 of this series on Understanding Crawl Data at Scale, I’ll show how we use codebook data to visualize clustering results.

↧

Understanding Crawl Data at Scale (Part 3)

February 4, 2015, 4:00 pm

≫ Next: A Martian's Take on Cyber in the National Security Strategy

≪ Previous: Understanding Crawl Data at Scale (Part 2)

by Richard Xie

In Understanding Crawl Data at Scale (Part 2), I demonstrated using SOM to visualize a high-dimensional dataset and use the technique to help reduce the dimensionality. As you may remember, this technique is a time-saver for analysts who are dealing with large data sets consisting of hundreds of features. In this section, I will briefly show the process of clustering and the visualization of its results using a few classical clustering methods. As we know, it is difficult for humans to visually digest any information with more than three dimensions, so I would like to start with illustrating the clustering process using a 2-D data set. The two sort-of-arbitrarily-chosen features come from the data set used before, namely the minimum number of image files and the total number of HTML files.

The 2-D data set can be easily drawn as a scatter plot, as shown in Figure 1. Most of the data are located in a small region at the lower-left corner, while sparser points are stretched across far away in both dimensions. Intuitively we may come up with 3 clusters in our mind, something like the shaded areas below, by just looking at the scatter plot. How true is that? We can use a SOM to get a better idea.

Figure 1. Scatter Plot of the 2-D Data Set

A SOM technique places similar data points close to each other, or even together in the same cell, on a given map. The cells with data populated are part of a codebook data set, which is viewed as a representation of the original data set but with a much smaller number of data points. After the placement is done, the extent of dissimilarity (or distance) between the cells can be computed, and the results can be plotted as a unified distance matrix plot (or U-Mat plot).

Figure 2 shows the same U-Mat plot, but with two possible splits of clusters on the SOM. Darker regions indicate lower distance values and bright red color usually indicates a separation of clusters. Legitimate guesses of the number of clusters might be three or four on the given SOM U-Mat plot, and we are confident that it won’t be more than that.

Figure 2. U-Mat Plot with Possible Separation of Clusters

Now that we have a good idea of how many clusters we would like to try, we can use the K-Means method to group the data points into 3 or 4 clusters (K = 3 or 4 respectively). It is also always a good practice to normalize the data in each dimension before clustering takes place. Here I normalized the data into the range of [0, 1] in both dimensions so that they are comparable.

Figure 3 and Figure 4 show the color-coded clustering results with 3 and 4 clusters. With 3 clusters, 90.5% of data points are assigned to cluster 2, 9% to cluster 1, and 0.5% to cluster 3. Apparently the data points in cluster 3 are outliers in this data set.

Figure 3. Three-Cluster Split Using K-Means

The four-cluster split is a bit different. Cluster 1 now takes 17.5% of the data points, cluster 2, 3.4%, cluster 3, 78.6%, cluster 4, 0.4%.

The contours in both Figure 3 and Figure 4 indicate the areas where data points may have the same level of membership likelihood. In the area where data points are dense, the contours change much more rapidly than those in sparse areas because the clustering is sensitive to the distance of data points to the cluster centers.

Figure 4. Four-Cluster Split Using K-Means

Thanks to the low dimensionality of the hypothetical data set, the split in each case is clear-cut. We can visualize the two different labeling systems using the codebook data placed on a SOM map, as in Figure 5.

The U-Mat plot on the left side of Figure 5 is the same as those in Figure 2, only drawn on a map of slightly different sizes. On the right side of Figure 8, the codebook data are plotted with 3-cluster or 4-cluster labeling. Both of them seem to make sense, and the choice of which to use is really up to the analyst.

Figure 5. Labeled Codebook Data with K-Means(3) and K-Means(4), 2-D Dataset

The world of high dimensionality would be much more blurry. After showing that we can get some satisfactory clustering result with 2-D data, let’s move up to the high-dimensional space, with the same data set but containing 29 variables.

The SOM U-Mat of the 29-feature data set is shown on the left side of Figure 6. The separation of clusters is much less obvious than that in 2-D space. Although the 29 features include the 2 features we used in the 2-D data set, there are many other features adding noises on the once clear split of the data. The consequence is somewhat mixed-up labels as shown in the labeled codebook plots on the right side of Figure 6.

Figure 6. Labeled Codebook Data with K-Means(3) and K-Means(4), 29-D Dataset

We also may want to use the same technique to visually compare the results from different clustering methods. Even with more rigorous measurements of clustering evaluation available, visualization remains a very powerful way for analysts (who may not necessarily be statisticians or data scientists) to gauge the performance of a variety of clustering methods. That being said, a rigorous validation of clustering is always encouraged whenever it is possible, but we won’t be going into the details of this today.

Figure 7 shows the results from two other clustering methods, KMedoid with K=4 and Fuzzy C-Means with C = 4.

Figure 7. Labeled Codebook Data with K-Medoid(4) and Fuzzy C-Means(4), 29-D Dataset

Lastly, I’d like to close this post with hierarchical clustering using the codebook data. When dealing with a very large amount of data, directly clustering might not be a feasible solution. In that case, vector quantization (VQ) will be a handy tool to reduce the data set. SOM’s are one kind of such VQ method. By training the weight vector of each cell in a map, some or all of the cells will resemble a portion of the original data set. The weight vectors associated with those cells are the codebook data. Clustering on the codebook data becomes much less computationally expensive because of the dramatic reduction in data set size.

Figure 8 shows the K-Means clustering (K=10) on the codebook data. K=10 is a sort of arbitrarily chosen large number. With the K-Means clustering result, we can do agglomerative hierarchical clustering.

Figure 8. Codebook Data Grouped into 10 Clusters Created with K-Means

Figure 9 shows the dendrograms of two hierarchical clustering results. The difference lies in the choice of how the distance between two clusters is computed. The x-axis of the dendrogram is the clusters being agglomerated, and the y-axis is the distance measure between two merged clusters. By choosing a threshold of cluster distance, one can cut off the linkage and identify a number of separate clusters. More clusters are generated as the threshold decreases.

Figure 9. Dendrograms of Hierarchical Clustering with Single Linkage (top) and Complete Linkage (bottom)

In summary, this post only highlights some of the ways for visualizing high-dimensional data and the clustering results. It certainly cannot cover everything related to multivariate clustering and visualization. We didn’t even mention projection methods, such as PCA (Principal Component Analysis), MDS (Multi-Dimensional Scaling), and Sammon Mapping. However, I hope that this post provides some interesting ideas for data enthusiasts on clustering and visualization, two of the techniques that are extremely useful in data science. Although the example data is taken from a cybersecurity context, the same practice can be successfully applied to other industries, such as credit risk, customer segmentation, biology, finance, and more.

↧

A Martian's Take on Cyber in the National Security Strategy

February 9, 2015, 4:00 pm

≫ Next: Five Thoughts from the White House Summit on Cybersecurity and Consumer Protection

≪ Previous: Understanding Crawl Data at Scale (Part 3)

by Andrea Little Limbago

In the recent New York Times bestselling book, The Martian, Andy Weir depicts a future world where space travel to Mars is feasible. Through an unfortunate string of events, the book’s hero, Mark Watney, becomes stranded on Mars, unable to communicate with anyone on Earth. After watching Friday’s release of the National Security Strategy (NSS) and the way in which it mimicked last month’s State of the Union (SOTU) address, I wondered how Watney (if he makes it back to Earth – not giving the ending away!) – would interpret the major foreign policy challenges depicted in those speeches. If someone were to land on Earth after being away for years, what would they think of the state of international relations if they only based it on the NSS and SOTU? When it comes to the cyber domain, the rhetoric seems completely misaligned with the realities of the global system. But what would Watney think? Let’s imagine Watney’s interpretation of the NSS and SOTU after having been away from Earth for years…

Log Entry: Day 8

I’m not sure which is worse – being on the brink of death everyday thanks to the inhospitable environment on Mars, or coming home and learning about the various threats present in the inhospitable international environment. Sure, this isn’t really my area, but I’m dying to focus on anything besides botany and engineering for once. The good news is that, despite the laundry list of challenges, we’re in it together with China. Both the SOTU and the NSS give big props to China for its great cooperation in helping battle climate change. Well, that’s a huge relief! We certainly can’t reverse this most existential of threats without the support of the world’s most populous country and second (phew, we’re still number 1!) largest economy.

But here’s what I don’t understand. Sure, I get that climate change is important, but what does it have to do with Ebola and cyber? In each address, those three are grouped together, apparently because they all rely on international norms and cooperation and aren’t considered geopolitical. It’s strange to think that cyber doesn’t belong in the discussion about foreign adversaries like Russia, Iran and North Korea, but I’m simply an engineer, what do I know about that? I guess when it comes to cyber the key concern is privacy and individual rights. I haven’t had privacy for years, so no biggie there. I’m just glad we’re friends with China. I’d hate to relive those days of major power rivalries and espionage.

Log Entry: Day 10

At first I was thankful to finally have something to read besides Agatha Christie novels, but I’ll tell you what, this NSS is even more of a mystery when it comes to cyber. I wasn’t planning on reading it, since Friday’s release simply provided a bit more detail on the list of foreign policy challenges elucidated in the SOTU. But here’s what is interesting. If you actually read the document, there’s a single, yet important line in there that would go completely unnoticed if you listened only to the speeches. On page 24, the NSS states, “On cybersecurity, we will take necessary actions to protect our businesses and defend our networks against cyber-theft of trade secrets for commercial gain whether by private actors or the Chinese government.” What? Where did this come from? This is the concluding sentence in a paragraph that actually talks about concern over China’s military mobilization and the potential for miscalculation. So wait, China is a major cyber threat and has been stealing from us? Where did this come from? I thought we were BFFs. This is so confusing. So I went back and looked at some previous doctrine, just for the heck of it. China isn’t mentioned explicitly by name in the 2011 International Strategy for Cyberspace, and the 2010 NSS is all about seeking cooperation with China. Sooo….the speeches say one thing, the document says another, and this latest NSS takes one big step forward in surfacing China’s espionage within a strategic document. Foreign policy is not for me. I’d much rather deal with the certainty of the plant world instead of these competing narratives. In my world, any miscalculations are entirely my fault and are much more predictable than those in the foreign policy world.

↧

Five Thoughts from the White House Summit on Cybersecurity and Consumer Protection

February 17, 2015, 4:00 pm

≫ Next: Streaming Data Processing with PySpark Streaming

≪ Previous: A Martian's Take on Cyber in the National Security Strategy

by Nate Fick

The Obama Administration deserves credit for putting together the first-ever White House summit on cybersecurity on Friday and – contrary to what some media coverage may lead you to believe – the U.S. private sector mostly deserves credit for showing up.

Rather than offer yet another perspective on how to structure the Cyber Threat Intelligence Integration Center (CTIIC), or speculate on what it means that this or that CEO didn’t attend, I thought I’d just share a few thoughts from a day at Stanford that was packed with conversations with colleagues from across the government, the security industry, and the nation’s critical infrastructure.

1. More than most industries, the security community really is a community and must be bound by trust. Examples of this oft-overlooked reality were abundant: government officials pledging that “the U.S. government will not leave the private sector to fend for itself” and that our actions should be guided by “a shared approach” as a basic, guiding principle; Palo Alto Networks CEO Mark McLaughlin plugging the much-needed Cyber Threat Alliance, a voluntary network of security companies sharing threat intelligence for the good of all; Facebook CISO Joe Sullivan stressing the importance of humility, of talking openly about security failures, and about information security as a field that’s ultimately about helping people. Many of the day’s conversations kept coming back to trust – both the magnitude of what we can accomplish when we have it, and the paralyzing effect of its absence.

2. All companies are now tech companies. Home Depot doesn’t just sell hammers, and even small businesses have learned the great lesson of the past decade’s dev-ops revolution: outsource any software you don’t write yourself by moving it to the cloud and putting the security responsibility on the vendor. An interesting corollary to this is whether, as larger companies get more capable with their security, we will see hackers moving down-market to target smaller companies in increasingly sophisticated ways. This is sobering because scoping the magnitude of the challenge before us leads to the conclusion that it includes…well…everything.

3. Our adversaries will continue getting better partly because we will continue getting better. There’s a nuance here that isn’t captured in the simple notion that higher walls only beget taller ladders. An example from the military world is that Iraq’s insurgents became vastly more capable between 2003 and 2007 because they spent those four years sharpening their blades on a very hard stone: us. So consider, for example, the challenge facing new payments companies today: you’re fighting the guys who cut their teeth against PayPal fifteen years ago, and you’re doing it with a tiny number of defenders since you’re only a start-up, not with the major resources of PayPal’s current security team. Submitting to an “arms race” mentality—or quitting the race altogether—isn’t the answer. But this reality does put the security bar higher and higher for new ventures, and suggests that competition for experienced security talent will only grow more heated.

4. Too many policy-makers are still a long way from basic fluency in this field. That’s intended more as observation than criticism. It takes time to build a deep reservoir of talent in any field of endeavor – across the whole pipeline from funding basic research in science and technology, through nurturing the ecosystem of analysts and writers who can inform a robust conversation about occasionally arcane topics, to reaping the benefits of multi-generational experience where newer practitioners can learn from the battle scars of those who came before them. The traditional defense community has this, as do tax policy, health care policy, and most other major areas of public-private collaboration. It’ll come in the cyber arena too. What worries me, though, is that too many policy makers, when they refer to “the private sector” in this context, seem to imply either that it’s less important than the government, or even (bizarrely) that it’s smaller than the government. The government has a massively important role in cyber security, but it isn’t the whole game, and it probably isn’t even most of the game.

5. Information sharing is only a means to an end. If one of the day’s two major themes was “trust,” then the other was “information sharing.” Yes, our security is only as good as the data we have. Yes, there can be a “neighborhood watch-like” network effect in sharing threat intelligence. Yes, the sharing needs to happen across multiple axes: public to public, public to private, and private to private. But all of that sharing will be for naught if it doesn’t lead to some kind of effective action– across people, process, and technology. (Remember that “Bin Laden Determined to Strike in U.S.” was the heading of the President’s daily briefing from the CIA on August 6, 2001…) The Summit was one action, and the security community needs to take many, many more.

↧

Streaming Data Processing with PySpark Streaming

February 25, 2015, 4:00 pm

≫ Next: Repression Technology: An Authoritarian Whole of Government Approach to Digital Statecraft

≪ Previous: Five Thoughts from the White House Summit on Cybersecurity and Consumer Protection

by Rich Seymour

Streaming data processing has existed in our computing lexicon for at least 50 years. The ideas Doug McIlroy presented in 1964 regarding what would become UNIX pipes have been revisited, reimagined and reengineered countless times. As of this writing the Apache Software Foundation has Samza, Spark and Storm for processing streaming data… and those are just the projects beginning with S! Since we use Spark and Python at Endgame I was excited to try out the newly released PySpark Streaming API when it was announced for Apache Spark 1.2. I recently gave a talk on this at the Washington DC Area Apache Spark Interactive Meetup. The slides for the talk are available here. What follows in this blog post is an in depth look at some PySpark functionality that some early adopters might be interested in playing with.

Using updateStateByKey in PySpark Streaming

In the meetup slides, I present a rather convoluted method for calculating CPU Percentage use from the Docker stats API using PySpark Streaming. updateStateByKey is a better way to calculate such information on a stream, but the python documentation was a bit lacking. Also the lack of type signatures can make PySpark programming a bit frustrating. To make sure my code worked, I took a queue from one of the attendees (thanks Jon) and did some test driven development. TDD works so well I would highly suggest it for your PySpark transforms, since you don’t have a type system protecting you from returning a tuple when you should be returning a list of tuples.

Let’s dig in. Here is the unit test for updateStateByKey from https://github.com/apache/spark/blob/master/python/pyspark/streaming/tests.py#L344-L359:

fromitertoolsimportchain,tee,izipdeftest_update_state_by_key(self):defupdater(vs,s):ifnots:s=[]s.extend(vs)returnsinput=[[('k',i)]foriinrange(5)]deffunc(dstream):returndstream.updateStateByKey(updater)expected=[[0],[0,1],[0,1,2],[0,1,2,3],[0,1,2,3,4]]expected=[[('k',v)]forvinexpected]self._test_func(input,func,expected)

This test code tells us, if we play around a bit that for the input:

[[('k',0)],[('k',1)],[('k',2)],[('k',3)],[('k',4)]]

we expect the output:

[[('k',[0])],[('k',[0,1])],[('k',[0,1,2])],[('k',[0,1,2,3])],[('k',[0,1,2,3,4])]]

updateStateByKey allows you maintain a state by key. This test is fine, but if you ran it in production you’d end up with an out of memory error as s will extend without bounds. In a unit test with a fixed input it’s fine, though. For my presentation, I wanted to pull out the time in nanoseconds that a given container had used the CPUs of my machine and divide it by the time in nanoseconds that the system CPU had used. For those of you thinking back to calculus, I want to do a derivative on a stream.

How do I do that and keep it continuous? Well one idea is to keep a limited amount of these delta x’s and delta y’s around and then calculate it. In the presentation slides, you’ll see that’s what I did by creating multiple DStreams, joining them, doing differences in lambda functions. It was overly complicated, but it worked.

In this blog I want to present a different idea that I cooked up after the meetup. First the code:

deftest_complex_state_by_key(self):defpairwise(iterable):"s -> (s0,s1), (s1,s2), (s2, s3), ..."a,b=tee(iterable)next(b,None)returnizip(a,b)defderivative(s,x,y):"({'x':2,'y':1},{'x':6,'y':2}) -> derivative(_,'x','y') -> float(1)/4 -> 0.25"returnfloat(s[1][y]-s[0][y])/(s[1][x]-s[0][x])defupdater(vs,s):# vs is the input stream, s is the stateifsands.has_key('lv'):_input=[s['lv']]+vselse:_input=vsd=[derivative(p,'x','y')forpinpairwise(_input)]ifsands.has_key('d'):d=s['d']+dlast_value=vs[-1]iflen(d)>len(_input):d=d[-len(_input)]# trim to length of _inputstate={'d':d,'lv':last_value}returnstateinput=[[('k',{'x':2,'y':1})],[('k',{'x':3,'y':2})],[('k',{'x':5,'y':3})]]deffunc(dstream):returndstream.updateStateByKey(updater)expected=[[('k',{'d':[],'lv':{'x':2,'y':1}})],[('k',{'d':[1.0],'lv':{'x':3,'y':2}})],[('k',{'d':[1.0,0.5],'lv':{'x':5,'y':3}})]]self._test_func(input,func,expected)

Here’s an explanation of what I’m trying to do here. I pulled in the pairwise function from the itertools recipe page. Then I crafted a very specific derivative method that takes a dictionary, and two key names and returns the slope of the line. Rise over run You can plug this code into the pyspark streaming tests and it passes. It can be used as an unoptimized recipe for keeping a continuous stream of derivatives, although I can imagine a few nice changes for usability/speed. The state keeps d which is the differences between pairs of the input, and lv which is the last value of the data stream. That should allow this to work on a continuous stream of values. Integrating this into the demo I did in the presentation is left as an exercise for the reader. ;) Comments, questions, code review welcome at @rseymour. If you find these sorts of problems and their applications to the diverse world of cyber security interesting, you might like to work with the data science team here at Endgame.

↧

Repression Technology: An Authoritarian Whole of Government Approach to Digital Statecraft

March 1, 2015, 4:00 pm

≫ Next: Hacking the Glass Ceiling

≪ Previous: Streaming Data Processing with PySpark Streaming

by Andrea Little Limbago

Last week, as discussions of striped dresses and llamas dominated the headlines, academia and policy coalesced in a way that rarely happens. On February 25th, Director of National Intelligence James Clapper addressed the Senate Armed Services Committee to provide the annual worldwide threat assessment. In addition to highlighting the rampant instability, Director Clapper specified Russia as the number one threat in the cyber domain. He noted, “the Russian cyber threat is more severe than we’ve previously assessed.” Almost simultaneously, the Journal of Peace Research, a preeminent international relations publication, pre-released its next issue that focuses on communication, technology and political conflict. Within this issue, an article contends that internet penetration in authoritarian states leads to greater repression, not greater freedoms. Social media quickly was abuzz, with the national security community focusing on Russia’s external relations, while international relations academics were debating the internal relations of authoritarian states, like Russia. And thus, within twenty-four hours, policy and academia combined to present a holistic, yet rarely addressed, perspective on the threat – the domestic and international authoritarian whole of government approach when it comes to controlling the cyber domain.

First, Director Clapper made headlines when he elevated the Russian cyber threat above that of the Chinese. Both are still the dominant threats, a select group to which he also includes Iran and North Korea – responsible for (most prominently) the attack on the Las Vegas Sands Casino Corporation, and Sony, respectively. This authoritarian quartet stands out for their advanced digital techniques and targeting of numerous foreign sectors and states. Director Clapper highlighted the sophistication of the Russian capabilities, while also noting China’s persistent espionage campaign. Clearly, this perspective should predominate a worldwide threat assessment.

At the same time, the Department of State calls this the “Internet Moment in Foreign Policy”, reinforcing former Secretary of State Hillary Clinton’s push for internet freedoms to promote freedom of speech and civil liberties. However, what is often overlooked in her speech from five years ago is the double-edged sword of any form of information technology. Clinton warned, “technologies with the potential to open up access to government and promote transparency can also be hijacked by governments to crush dissent and deny human rights.” She succinctly describes the liberation versus repression technology hypotheses around internet penetration. While the view of liberation technology is the one largely purported by the tech community and diplomats in a rare agreement, the actual impact of internet penetration in authoritarian regimes has never been empirically tested – until now. Espen Geelmuyden Rod and Nils B Weidmann provide the first empirical analysis to test the liberation versus repression technology debate by analyzing the impact of internet penetration on censorship within authoritarian regimes. They find that, contrary to popular perceptions, there is a statistically significant association between internet penetration and repression technology, even after controlling for a variety of domestic indicators and temporal lags. The authoritarian regimes in the sample reflect the authoritarian quarter Clapper references, and is a group that clearly employs digital statecraft both domestically and internationally to achieve national objectives.

These two distinct perspectives together provide the ying and the yang of authoritarian regime behavior in cyberspace. Instead of being viewed in isolation from one another, the international and domestic use of digital instruments of power reflect a whole of government strategy pursued by China and Russia, and other authoritarian states to various degrees. As I wrote last year, internet censorship globally is increasing, but clearly is more pronounced in authoritarian regimes. For instance, since the time of that post, China has begun to crackdown on VPN access as part of an even more concerted internet crackdown. In February, Russia declared that it too might follow suit, cracking down not only on VPN access, but also Tor. When focusing on US national interests, it may seem like only the foreign behavior of these states matters. However, that is a myopic assumption and ignores one of the most prevalent aspects of international relations – the necessity to understand the adversary. While the US was extraordinarily well informed about Soviet capabilities domestic and abroad, the same is no longer true for this larger and more diverse threatscape, especially as it pertains to the cyber domain. This gap could be ameliorated through an integrated perspective of the domestic and international digital statecraft of adversaries.

The confluence of this worldwide threat assessment to Congress and the academic publication is striking, and should be more than an esoteric exercise. It simultaneously reinforced the current gap between academia and policy in matters pertaining to the cyber domain, while also demonstrating that the academic perspective can and should help augment the dialogue when it comes to digital statecraft. However, perhaps even more pertinent is the way in which the article and the Congressional remarks reflect two pieces of the whole. Governments pursue national interests domestically and internationally. It is time we viewed these high priority authoritarian regimes through this bifocal lens. There are many insights to be gained about adversarial centralization of power, regime durability, and technological capabilities by also looking at the domestic digital behavior of authoritarian regimes. Coupling the international perspective with the domestic cyber behavior into threat assessments can help provide great insights into the capabilities, targets, and intent of adversaries.

↧

Hacking the Glass Ceiling

March 2, 2015, 4:00 pm

≫ Next: Endgame Lands FireEye Chief Architect to Head Research Team

≪ Previous: Repression Technology: An Authoritarian Whole of Government Approach to Digital Statecraft

Niloofar Razi Howe

As we approach International Women’s Day this week and edge closer to the 100th anniversary of women’s suffrage (okay, four years to go, but still, a remarkable moment), and as news and current events are sometimes focused on the negative facts and statistics related to the field of women and technology and especially women and venture capital, I feel particularly grateful to be working at Endgame- a technology company that has an amazing cast of phenomenal women—from our developers to scientists to business minds. Our team—not just our leadership, but our entire company- is dynamic and diverse. Of course, Endgame is not alone. At the Montgomery Summit, a technology conference that takes place March 9th-11th in Los Angeles, there is a session devoted to Female Founders of technology companies. I am thrilled to be taking part in this event, which highlights a group of remarkable women who have founded and are leading tech companies in a diverse set of industries.

As a prelude to the conference and the celebration of International Women’s Day, and in hopes of encouraging more girls to embrace the STEM disciplines in school and pursue a career in technology, I want to highlight some amazing women who have dedicated their lives to making a difference—as technologists and as entrepreneurs, because there is true cause for inspiration.

The list of technology heroines is long and hard to winnow. So many have dedicated their lives and technical genius to service and solving some of our hardest problems, especially in the field of security- cyber, information and national security. Many will never be acknowledged publicly, but below are a few who can be:

• Professor Dorothy Denning is not only teaching and working with the next generation of security vanguards at the Naval Postgraduate School, but she is also credited with the original idea of IDS back in 1986.

• Chien-Shiung Wu, the first female professor in Princeton’s physics department, earned a reputation as a pioneer of experimental physics, not only by disproving a “law” of nature (the Law of Conservation of Parity), but also in her work on the Manhattan Project. Wu’s discoveries earned her colleagues the Nobel Prize in physics.

• Lene Hau is Danish physicist who literally stopped light in its tracks. This critical process of manipulating coherent optical information by sharing information in light-form has important implications in the fields of quantum encryption and quantum computing.

• There are many visionary entrepreneurs like Sandy Lerner, co-founder of CISCO, Joan Lyman, co-founder of SecureWorks, and Helen Greiner, co-founder of iRobot and CEO of CyPhyWorks, who work tirelessly and brilliantly to deliver the solutions necessary to keep the world, and the people in it, safe.

• Window Snyder, a security and privacy specialist at Apple, Inc., significantly reduced the attack surface of Windows XP during her tenure at Microsoft, which led to a new way of thinking about threat modeling. She has many contemporaries who have also broken with stereotype and are having tremendous impact in making the technologies we interact with safer. Women like Jennifer Lesser Henley who heads up security operations at Facebook, and Katie Moussouris, Chief Policy Officer at HackerOne.

If we look further back in history, the list of amazing women in technology gets even longer. Many of the names may even surprise you:

• Ada Lovelace: The world’s first computer programmer and Lord Byron’s daughter (“She walks in Beauty, like the night/ Of cloudless climes and starry skies;/ And all that’s best of dark and bright/ Meet in her aspect and her eyes”), she has a day, a medal, a competition and most notably, a Department of Defense language named after her. Ada, the computer language, is a high-level programming language used for mission-critical applications in defense and commercial markets where there is low tolerance for bugs. And herein lies the admittedly tenuous connection to security– despite being a cumbersome language in some ways, “Ada churns out less buggy code” and buggy code remains the Achilles’ heel of security

• Hedy Lamarr: A contract star during MGM’s Golden Age, Hedy Lamarr was “the most beautiful woman in films,” an actress, dancer, singer, and dazzling goddess. She was also joint owner of US Patent 2,292,387, a secret communication system (frequency hopping) that serves as the basis for spread-spectrum communication technology, secure military communications, and mobile phone technology (CDMA). Famous for her quote, “Any girl can look glamorous. All you have to do is stand still and look stupid,” Hedy Lamarr’s legacy is that of a stunningly beautiful woman who refused to stand still. Thankfully, her refusal to accept society’s chosen role for her resulted in a very significant contribution to secure mobile communications.

• Rear Admiral Grace Hopper: Also known as the Grand Lady of Software, Amazing Grace, Grandma COBOL, and Admiral of the Cyber Sea, say hello to Rear Admiral Grace Hopper, a “feisty old salt who gave off an aura of power.” She was a pioneer in information technology and computing before anyone knew what that meant. Embracing the unconventional, Admiral Grace believed the most damaging phrase in the English language is “We’ve always done it this way,” and to bring the point home, the clock in her office ran counterclockwise. Grace Hopper invented the first machine independent computer language and literally discovered the first computer “bug.” Hopper began her career in the Navy as the first programmer of the Mark I computer, the mechanical miracle of its day. The Mark I was a five ton, fifty foot long, glass-encased behemoth — a scientific miracle at the time, made of vacuum tubes, relays, rotating shafts and clutches with a memory for 72 numbers and the ability to perform 23-digit multiplication in four seconds. It contained over 750,000 components and was described as sounding like a “roomful of ladies knitting.” Unable to balance a checkbook (as she jokingly described herself), Hopper changed the computer industry by developing COBOL (common-business-oriented language), which made it possible for computers to respond to words rather than numbers. Admiral Hopper is also credited with coining the term “bug” when she traced an error in the Mark II to a moth trapped in a relay. The bug was carefully removed and taped to a daily log book- hence the term “computer bug” was born.

There is also a group of women who helped save the world with the work they did in cryptology/cryptanalysis during World War I and World War II. There were thousands of female scientists and thinkers who helped ensure Allied victory. I will only highlight a few, but they were emblematic of the many.

• Agnes Meyer Driscoll: Born in 1889, the “first lady of cryptology” studied mathematics and physics in college, when it was very atypical for a woman to do so. Miss Aggie, as she was known, was responsible for breaking a multitude of Japanese naval manual codes (the Red Book Code of the ‘20s, the Blue Book Code of the ‘30s, and the JN-25 Naval codes in the ‘40s) as well as a developer of early machine systems, such as the CM cipher machine.

• Elizebeth Friedman: Another cryptanalyst pioneer, with minimal mathematical training, she was able to decipher coded messages regardless of the language or complexity. During her career, she deciphered messages from ships at sea (during the Prohibition era, she deciphered over 12,000 rum-runner messages in a three-year period) to Chinese drug smugglers. An impatient, opinionated Quaker with a disdain for stupidity, she spent the early part of her career working as a hairdresser, a seamstress, a fashion consultant, and a high school principal. Her love of Shakespeare took her to Riverbank Laboratories, the only U.S. facility capable of exploiting and solving enciphered messages. There she worked on a project to prove that Sir Francis Bacon had authored Shakespeare’s plays and sonnets using a cipher that was supposed to have been contained within. She eventually went to work for the US government where she deciphered innumerable coded messages for the Coast Guard, the Bureau of Customs, the Bureau of Narcotics, the Bureau of Prohibition, the Bureau of Internal Revenue, and the Department of Justice.

• Genevieve Grotjan: Another code breaker whose discovery in September 1940, a correlation in a series of intercepted Japanese coded messages, Genevieve Grotjan changed the course of history and allowed the U.S. Navy to build a “Purple” analog machine to decode Japanese diplomatic messages. This allowed Allied forces to continue reading coded Japanese missives throughout World War II. Prior to her success, the Purple Code had proved so hard to break that William Friedman, the chief cryptologist at the US Army Signal Corps (and Elizebeth Friedman’s husband), suffered a breakdown trying to break it. So as we approach International Women’s Day and as we reflect on the many amazing women who have made a difference throughout history, I hope everyone joins me in celebrating these stories, finding inspiration, and most importantly, sharing that inspiration with the next generation in the hopes that they, too, might find themselves in the position of using their intellect, their skills, and their spirit to change the world for the better.

↧

Endgame Lands FireEye Chief Architect to Head Research Team

March 10, 2015, 5:00 pm

≫ Next: Beyond the Buzz: Integrating Big Data & User Experience for Improved Cyber Security

≪ Previous: Hacking the Glass Ceiling

Endgame Lands FireEye Chief Architect to Head Research Team

A pioneer in adversary intelligence, James Butler brings unparalleled experience and innovation to Endgame’s cyber threat detection and remediation solutions

Arlington, VA– March 11, 2015 – Endgame, Inc., a leading provider of cybersecurity solutions that deliver rapid response to advanced cyber threats, today announced that James Butler, former FireEye Chief Architect and Chief Researcher at Mandiant, has joined the company as Chief Scientist. In this role, Butler will lead Endgame’s research on advanced threats, vulnerabilities and attack patterns, ensuring that the company’s endpoint detection and response (EDR) solutions give customers the earliest warning, fastest detection and most efficient response to prevent damage and loss.

“As constant changes in technology lead to an increasingly dynamic cybersecurity landscape and more advanced threats, research into the origins, nature and implications of these threats is more important than ever before—yet is too often overlooked by traditional solutions,” said Endgame CEO Nate Fick. “At Endgame, we understand the critical role this research plays in our ability to protect national and commercial interests. Jamie has proven that he has the expertise and forward-looking approach to make significant contributions to the security community. With Jamie at the helm of our unparalleled security research team, we will be able to take our threat intelligence capabilities and our threat detection solutions to the next level.”

Butler has directed research teams at some of the most prominent and successful security companies of the last decade. A recognized leader in attack and detection techniques, he has over 17 years of experience and knowledge in operating system security that will augment Endgame’s ability to identify attack patterns and help its customers better defend against advanced threats. In addition to his roles at FireEye and Mandiant, Butler was a computer scientist at the National Security Agency and co-authored the bestseller Rootkits: Subverting the Windows Kernel. Butler is also a frequent speaker at the foremost computer security conferences and serves as a Review Board member for Black Hat. He co-developed and instructs the popular security courses “Advanced Memory Forensics in Incident Response,” “Advanced 2nd Generation Digital Weaponry,” and “Offensive Aspects of Rootkit Technology.”

“Having worked on some of the most influential security solutions in recent history, I’ve seen a lot of innovative ideas for solving the security problems that continue to plague businesses. The ones that are successful take into account that today’s adversaries more often than not know the network they are attacking as well as the IT department,” explained Butler. “It is easy to see that the approach Endgame is taking to enterprise security addresses this dynamic and will raise the bar for the industry. Today’s intrusions have far-reaching global and economic consequences. Endgame is in a unique position to build solutions that minimize those consequences, and I want to be a part of bringing that vision to fruition.“

About Endgame

Endgame protects national security and commercial interests with a comprehensive Cyber Operations Platform that delivers rapid response to advanced cyber threats. Endgame delivers early warning, detection and remediation of these threats by leveraging the Endgame Intelligence Cloud, which combines our proprietary global threat intelligence with enterprise user, system and network behavior to deliver superior defense for federal and commercial customers. Endgame’s technology and techniques are proven in the most extreme environments—from defending U.S. national interests against the world’s most notorious cyber adversaries to protecting social websites experiencing explosive growth.

Endgame was founded in 2008 and has offices in Washington, DC, San Francisco, CA, San Antonio, TX and Melbourne, FL.

↧