Quantcast
Channel: Endgame's Blog
Viewing all 698 articles
Browse latest View live

The More Things Change...Espionage in the Digital Age

$
0
0

Last week, Der Spiegel reported that the BND – Germany’s foreign intelligence agency – had accidentally intercepted calls of U.S. government officials while collecting intelligence on Turkey. For many, this was an example of hypocrisy in international relations, as German Chancellor Angela Merkel was one of the most vocal critics following the Snowden Affair, which strained relations between the U.S. and Germany. But one can’t help but be struck by the media’s surprise that a country that so vocally spoke out against cyber-espionage also conducts it. The main story should not be an overreaction to the collection behavior (accidental or not) between allies, but rather the evolving nature of state behavior in light of technological change. Each historical technological revolution has altered and shaped not only every aspect of warfare, but also of peacetime behavior between states. One of the current manifestations of this adaptation to technological change is the creation of state-sponsored cyber units, potentially for cyber offense and defense alike.

First, and it almost seems ridiculous to note this, but recent events warrant it: espionage is not a new phenomenon. Espionage and intelligence gathering have likely existed since the beginning of time, and were certainly factors in many ancient civilizations including Egypt, Greece and Rome. Just like today, spying was not purely a characteristic of Western behavior - in fact, Sun Tzu devoted an entire chapter of The Art of War to spying and intelligence collection. As technology changed, the modes of espionage evolved from eavesdropping, to binoculars, to pigeons rigged with cameras, to aircraft satellites, to today’s hot term: cyber-espionage. While this over-simplifies the evolution of espionage, it’s important to note that throughout history, each technological change has similarly impacted collection procedures.

Moreover, technological innovations in both war and peace simply cannot remain indefinitely under the purview of a single actor. Eventually, other actors imitate and even leapfrog ahead after the first use of the technology. In fact, a striking feature of the Digital Age is the decreasing amount of time it takes for the replication of technological innovation. While it used to take years to copy the technological capabilities of other actors, this time lag has dramatically decreased due to the fast pace of technological change characterizing the modern era. While in the past, some states may have held onto anachronistic technologies, even governments of closed societies are increasingly tech savvy, leveraging the cyber domain to achieve domestic and international objectives.

Knowing that espionage is not a new phenomenon, and that technological copycats have occurred throughout history, the obvious question becomes: Is Germany indicative of other states that have organizations devoted solely to cyber security? Below are just a few examples. This list is by no means comprehensive, but it is illustrative of a growing trend as states adjust to the realities of the Digital Age. The role of the United States and Germany has been covered in significant detail elsewhere, as have the cyber units of major global and regional powers such as RussiaChinaIsrael and Great Britain. Like most behavior in the international system, the scale and scope of these cyber units vary enormously based on the opportunity (i.e. resources) and willingness of each individual state:

  • Australia: The Australian Cyber Security Centre, formerly known as the Cyber Security Operations Centre, is scheduled to open later this year with a large focus on domestic cyber security. The Australian Signals Directorate has been noted as having closer ties with foreign signals intelligence organizations.
  • Brazil: The Center of Cyber Defense (CDCiber) brings together the Brazilian Army, Air Force and Navy, but is predominantly led by theArmy.
  • France: Some note that France lags behind Western counterparts, but it has established the Centre d’Analyse en Lutte Informatique Defensive (CALID). According to Reuters, while this year’s increased spending will go toward infrastructure, a large part of it will also be allocated toward, “building up web monitoring and personal data collection.”
  • Nigeria: In light of cyber attacks from Boko Haram, Nigeria is stepping up its cyber security capabilities. Recent proposed legislation focuses mainly of combating cyber crime, and includes intercepting various forms of electronic communication.
  • North Korea: Has had a cyber unit since 1998, the most prominent of which is Unit 121. Just this summer, it was reported that North Korea has doubled its cyber military force to 5900 cyber warriors. The General Bureau of Reconnaissance is likely home to this growing group of hackers.
  • Philippines: Despite some delays in its legal system, the Philippine military has created a cybersecurity operations center, called C4ISTAR. This move followed a series of attacks against Philippine governmentwebsites and heightened tensions in the South China Seas.
  • Rwanda: Perhaps the most unlikely case, Rwanda has had a cyber unit for quite some time. This summer the Rwandan government announced plans to strengthen the cyber unit’s capabilities.
  • South Africa: Maintains a National Cyber Security Advisory Council, and as of last year intends to create a cyber security hub based on its National Cyber Security Policy Framework.
  • South Korea: Has had a cyber command since 2010, a likely response to increased cyber attacks from North Korea and elsewhere.
  • Even IGOs are getting in on the action – NATO has strategically placed a cyber unit in Estonia, called the cyber polygon base. NATO has already carried out several cyber exercises at this site.

In short, similar to what we’ve seen in previous eras, states are altering their behavior and organizational features in light of technological disruption. This quick overview by no means makes a normative claim about whether the rise of state-sponsored cyber organizations is bad or good for society, but instead highlights a growing trend in international relations. The latest disclosure on German collection efforts is likely indicative of things to come. But how states respond to this trend will vary greatly. Like all technology, there will be those who embrace it and those who reject it. Germany’s suggestion of adopting typewriters (and not the electronic kind) to protect sensitive information and counter cyber-espionage is just one example of how reactionary measures by states may risk sending them back to the technological ice age. What a great way to protect information - because after all, everyone knows that espionage didn’t exist before the Digital Age!

The More Things Change...Espionage in the Digital Age

Andrea Little Limbago

Working Across the Aisle: The Need for More Tech-Policy Convergence

$
0
0

Last week, the White House confirmed that Todd Park is stepping down from his position as the country’s second Chief Technology Officer to move back to Silicon Valley, though he’ll remain connected to the administration. The latest news indicates that Google executive Megan Smith is a top candidate for his replacement. This Silicon Valley/Washington, DC crisscrossing, although rare, is a welcome development, and it comes at a time when the Washington establishment – Republicans and Democrats alike – is becoming increasingly known for its lack of technical acumen. The divide between those who are tech savvy and politically savvy is not only geographic, but is also perpetuated by industry and academic stovepipes. This is especially true in the cyber realm, an area that has vast technical and political implications but where the two communities remain separated by geography, disciplinary jargon, and inward-focused communities.

It’s a safe bet that I was the only person to attend both Black Hat in Las Vegas and this past weekend’s American Political Science Association Annual Conference in Washington, DC (perhaps best known now for the disruptive fire at the main conference hotel). If anyone else attended both, I’d love to talk to them. For the most part, I was struck by how little acknowledgement was given to cyber at APSA and how little Black Hat addressed the impact of the foreign and domestic policy issues that greatly impact the future of the cyber domain. Each conference should continue to focus on its core expertise and audience, but the increasing interconnection of cyber and policy can’t continue to be brushed aside. For its part, Black Hat had exactly three policy-related presentations out of roughly 120, and that is based on their own coding schema, which seemed accurate. APSA didn’t do any better – three panels had ‘cyber’ in their title, three had ‘Internet’, and maybe two dozen had ‘digital’, although these really only used it as a synonym for the modern era and had nothing to do with technology. To put this in context, APSA generally has over 1000 panels, and the theme this year was “Politics after the Digital Revolution.”

Why does this even matter? During one of the APSA panels (one of the three that addressed cyber), an audience member asked what political science has to do with cyber studies. I viewed this question as similar to someone in the 1950s asking what political science has to do with nuclear engineering. Clearly, they are distinct domains with distinct experts, but policymakers (and those informing them) cannot simply ignore major technological breakthroughs. Especially in the cyber domain, policymakers currently employ cyber statecraft as both sticks and carrots, but lack a body of literature that explicates the impact of these relatively new foreign policy tools. Similarly, engineers and computer scientists focusing on cyber security may find themselves increasingly affected by legal and political decisions. In fact, based on the surprisingly large attendance I saw at the panels on policy at Black Hat, the tech community seems quite aware of the large impact that policy decisions can have on them.

There is room for cautious optimism. At Black Hat, I attended a panel on “Governments as Malware Authors”. It was an interesting, tactical overview of various malware attacks by a range of governments, many of which were not the usual suspects. Similarly, the APSA panel on “Internet Politics in Authoritarian Contexts” provided a great overview of the myriad ways in which authoritarian regimes employ a diverse range of cyber tools to achieve their objectives, including censorship and DDoS attacks. These two panels covered many similar topics, but with strikingly different methodological approaches and data. It would be phenomenal to see these two groups on one panel. I’d argue that panel would produce exactly the kind of information policymakers could actually use.

Similarly, at the beginning of an APSA panel on “Avoiding Cyber War”, I met one of the panel members. When he learned I worked at a tech company, he quietly admitted, “I’m not a political scientist, I’m really a hacker.” To that, I responded, “I’m not really a hacker, I’m a political scientist.” It would be wonderful to see these two perspectives increasingly collaborate and explore each other’s main venues for intellectual innovation. This small but impactful step could finally provide policymakers the insights and technological information that could help improve the glut of tech acumen within the policy domain. The tech community also must be increasingly willing to contribute to the national debate on all of the technology issues that will continue to impact their lives and businesses.

Next year, APSA will be in San Francisco, which presents an exciting opportunity for this kind of collaboration. It would be great to see more panels featuring technology, and more specifically, new analyses of cyberspace and statecraft. Of course, short of a miracle, APSA will have to work on some marketing for that to happen. I, for one, welcome the day when APSA abandons the nylon Cambridge University Press bags it gives away in favor of an APSA sticker or decal that political scientists (and maybe even an engineer or two) can proudly exhibit on their Macs.

Working Across the Aisle: The Need for More Tech-Policy Convergence

Andrea Little Limbago

Cyber Defense: Four Lessons from the Field

$
0
0

In cyberspace, as in more traditional domains, it’s essential to both understand your enemy as well as understand yourself. A comprehensive defensive strategy requires a better understanding of the motivations and intent of an adversary, but this is exceedingly difficult due to non-attribution and the complex nature of cyberspace. It’s safe to say that most organizations don’t actually have the required tools, knowledge, mission set, scalability or authority to incorporate analysis of the adversary into their cybersecurity frameworks. But as I’ve experienced, thinking like an adversary and internal analysis of your network and assets are both essential components of cyber defense.

Recently, my colleague Jason and I attended and presented at the 2014 Malware Technical Exchange Meeting (MTEM). MTEM is the annual malware technical exchange event that brings together practitioners and researchers from industry, the FFRDCs (federally funded research and development centers), academia, and government to present and discuss all things malware. MTEM presentations typically focus on malware analysis at scale, incident response, trend analysis, and research, but this year’s theme was more specific: “Evolving Adversaries: Stories from the Field”. The goal was to exchange information on technical and policy issues related to evolving threats with a focus on presenting new methods for analyzing malware more quickly and effectively and share success stories from the field. Below are four key insights that I’ve gained from my experience at conferences like MTEM and from cyber exercises:

1. Know your network: Today’s cyber defenders must know their network. They need visibility into all assets, including operating systems, users, endpoints, mobile devices as well as knowledge of normal network behavior. Unfortunately, this isn’t always the case. There are some organizations where the defenders and incident responders have extremely limited access/visibility into their own network. They are mostly blind, relying solely on anti-virus software, firewalls, and host-based detection systems. A situation like this could have detrimental consequences. For example, if the defenders only saw sensor-detected “known bads”, an attacker could leverage that by deploying low-level, easily detectable malware that would keep the defenders occupied while the attackers carried out their most nefarious acts. In order to proactively defend against the adversary in real-time, defenders must seek and obtain ubiquitous presence within their own protected cyber space.

2. Think like the adversary: Defenders must also think like an adversary, which goes above and beyond just monitoring anti-virus tools, IDS alerts, and firewall logs. To truly protect themselves, defenders must understand the aggressor tactics that adversaries will use. For example, once an attacker gains access to a victim network, they’ll most likely conduct reconnaissance to learn the lay of the land. This could reveal some of the defensive tools deployed, enabling the attacker to circumvent them. Additionally, the attackers’ recon mission could reveal additional credentials, allowing an attacker to burrow further into the network. Defenders also have to remember that an attacker is not static; the most aggressive attackers will evolve and try new methods to find the most valuable assets. To effectively defend the most critical data networks and level the playing field, defenders must truly think like the adversary. Our MTEM presentation focused on this theme of an evolving adversary and drew on experiences from a recent cyber exercise. The presentation included various network and defender techniques, demonstrating the utility of thinking like the adversary to proactively deter intrusions.

3. Prioritize: A good defense requires organizations to prioritize their most valuable assets, incorporating both what is most valuable to the organization but also what may be deemed most valuable to an adversary. Realistic defensive teams will categorize all of their assets, from the “crown jewels” all the way down to the “postcards at the checkout stand”. To set this in motion, simply put yourself in the mindset of the attacker and ask, “What do I really want or need from this organization?” The answer is most likely where the attacker will try to land. Armed with this information, efforts can be implemented to protect that data and/or alert a defender when someone (or something) tries to access it.

4. Automation & Contextualization: Automation is an essential component of defense, but alone it is not enough. At the same time, since today’s attackers use automated techniques to expedite their attacks, manual defensive measures alone will also probably prove to be an inadequate defense in most cases. Automated technologies that incorporate contextual awareness are key to maintaining situational awareness and strong cyber defense.

And before I sign off, I’d like to leave you with one more thought. It was something a LtGen told a group of us analysts 10 years ago. Regarding counterterrorism, he said, “We have to throw a strike with every pitch while terrorists only need a single hit.” I believe this same sentiment holds true in the world of cyber defense. An attacker only needs a single success to produce catastrophic results for a victimized network or organization. In cyberspace, a good defense requires the ability to anticipate the adversary and continually evolve your defense accordingly.

Cyber Defense: Four Lessons from the Field

Casey Gately

Article 5.0: A Cyber Attack on One is an Attack on All (Part 1)

$
0
0

NATO leaders gathered in Wales in early September to address a variety of security challenges, culminating on September 5th with the Wales Summit Declaration. It is no wonder that the summit of an alliance formed 65 years ago did not garner much media attention. With all of the current crises monopolizing the news cycle – the expanding powerbase of ISIS in Iraq and Syria, the Ebola outbreak in West Africa, and the tenuous ceasefire between Ukraine and Russia – little attention has been devoted to a potentially major policy shift within NATO that could have long-term global implications. For the first time, NATO has determined that cyber attacks can trigger collective defense. This shift is particularly important now since offensive cyber behavior is on the rise in Eastern Europe, and Georgia and Ukraine are still being considered for NATO expansion.

NATO’s influence and even existence have been questioned since the dissolution of the Soviet Union in 1991. With over a decade in Afghanistan, NATO largely shifted its focus to counterinsurgency capabilities, virtually rendering the collective defense aspect of NATO obsolete. NATO members have not prioritized the alliance, which currently boasts an old and decrepit infrastructure, as resources were devoted to Afghanistan and not Europe. Article 5 provides the bedrock of the alliance, explicating the notion of collective defense – an attack on one is an attack on all. As the below map demonstrates, over the last 50 years NATO collective defense has slowly crept toward the Russian borders, and now includes former Soviet states Estonia, Latvia, and Lithuania. This creeping expansion is often cited as inciting Russia to engage in a series of conflicts in Estonia, Georgia, and now Ukraine. Others also believe that Russian President Vladimir Putin and his megalomaniac infatuation with rebuilding the Russian empire fuel his expansionist appetite, including his wide use of the cyber domain to achieve political objectives. With the rising tensions and realpolitik emerging between Russia and several former Soviet states and satellites, NATO leaders have come to the realization that the modern international system now includes an entirely new domain that can’t be ignored – cyberspace.

Russia’s current adventures into Ukraine likely influence this timing, but the increased use of offensive cyber statecraft in Eastern Europe over the past several years has clearly crossed the tipping point such that policy is slowly catching up to the realities of international relations. The inclusion of cyber as a catalyst for collective defense brings to the forefront a series of technical and policy issues that must be parsed out in order to truly give this newest addition to Article 5 some teeth. On the policy front, the Wales Summit Declaration notes, “A decision as to when a cyber attack would lead to the invocation of Article 5 would be taken by the North Atlantic Council on a case-by-case basis.” This extraordinarily vague criteria must be made more specific not only to assuage concerns of NATO’s Eastern European members, but also to signal externally what kind of cyber behavior may actually incur a kinetic response.

Signaling is just as important today as it was during the Cold War, and for policies to be taken seriously, there must be some sign of credible commitment on behalf of member states. The cyber domain is fraught with attribution issues, making the practical aspects of this even more challenging. The Russian group CyberVor has been linked to the theft of passwords and usernames, while a group dubbed DragonFly is possibly responsible for producing the malware Energetic Bear. Energetic Bear was created as a cyber-weapon, crafted to monitor energy usage and disrupt or destroy wind turbines, gas pipelines and powerplants. Energetic Bear, similar to other offensive cyber behavior in the region, exhibits characteristics that lead many to infer it is state sponsored, but proving that is extraordinarily difficult in cyberspace. It is important to note that Energetic Bear, unlike many more publicized examples of Russian state-sponsored cyber attacks, mainly targeted Western European countries. The notion of NATO collective defense against cyber is not solely an Eastern European problem.

All of this begs the question: Is it technically possible for NATO to create a cyber umbrella of collective defense around its members, just as the nuclear umbrella protected them during the Cold War? We’ll tackle this question in two additional blogs that address the technical difficulties associated with the cyber aspect of the Wales Summit Declaration. NATO’s inclusion of cyber attacks has long-term implications for the international system, signaling a return to major power politics and realpolitik. Instead of billiard balls crashing in an anarchic world system, we may now be moving to a world where power politics means binaries crashing in cyberspace.

Article 5.0: A Cyber Attack on One is an Attack on All (Part 1)

Andrea Little Limbago & John Herren

Article 5.0: A Cyber Attack on One is an Attack on All (Part 2): Technical Challenges to a Mobile Cyber Umbrella

$
0
0

Mobile phone networks are prime targets for a cyber attack, and governments large and small are in a particularly powerful position to execute such an attack on another country. Given the September 5th NATO Wales summit resolution declaring that cyber attacks can now trigger collective defense, how will the “cyber umbrella” extend to the mobile and telecommunications domain? This is not a hypothetical scenario, but already has significant precedent. State actors have conducted cyber-espionage (most recently in Hong Kong) and cyber-sabotage of critical infrastructure. We’ve never seen a significant sabotage operation against mobile phones themselves, but the phone network has long been an infrastructure target in traditional wars going back to the age of the telegraph. Just this spring Russia conducted denial of service attacks targeting the Ukraine government’s phone system. So what are the technical challenges associated with the NATO Wales summit resolution?

Some Background

Mobile phone networks are culturally distinct and radically different from the Internet. First, there is an asymmetrical relationship between phone and Internet communication because there are no web servers on the phone network. While phones can reach the Internet, computers on the Internet cannot directly touch your phone. This provides one of the many additional layers of complexity when dealing with malicious cyber activity in the mobile domain.

Second, there also is asymmetry of the markets. The Internet marketplace is often referred to as the “Wild West,” while the mobile network marketplace is extraordinarily oligopolistic. Thousands of Internet service providers exist and anyone can build their own network with some Ethernet cable and a router. The Internet is a wild and chaotic place: e-commerce sites are as easy to connect to as social networks, travel-booking sites, and even your bank. Nobody trusts anybody because any stranger can connect to any web site from anywhere in the world. This is why online accounts have passwords and corporations employ VPNs for additional security. Conversely, only a few companies build Internet telecommunications equipment, which communicates over a collection of esoteric protocols. There are also only a few hundred national and regional phone networks, which are owned and managed by about fifty multinational companies. Phone networks only connect to other phone networks, so only other providers have access. The networks connect to each other through tightly controlled connections managed internally or through trusted third parties.

This oligopoly can not only result in higher prices, but has also produced a complex web of protocols and technologies that further differentiate mobile networks from the Internet. To oversimplify, the Internet uses HTTP over TCP/IP, while Telecommunications networks communicate via SS7 over SCTP/IP. The reality is even far more complicated than this – SIP should eventually replace SS7, but that’s going to be a very long process and new protocols are drafted every year. The situation has been in flux since the 1990s, and that won’t change in the near future.

Finally, because of the proprietary nature of mobile networks, phone network security is an under-researched area because few researchers can get their hands on telecommunications equipment. The stuff is expensive, rare, horribly complicated to use, and its sale and distribution is heavily regulated. So how is this related to cyber-warfare? National governments are closely tied to the phone network infrastructure and providers. They regulate the providers, some of which are state-owned enterprises, and they commonly operate their own massive internal phone networks. Governments themselves are de-facto telecommunications providers, and yield an unwieldy advantage over non-state actors in instigating malicious mobile network behaviour.

Article 5: In Theory and in Reality

In the Wales summit resolution, NATO did not adopt any language that spells out what a cyber attack is, opting instead to say that a decision on when to invoke Article 5 would be made on a “case-by-case basis.”

So let’s see how this would play out in a hypothetical situation. Country X wants to knock country Y offline. X is a de-facto provider, and it has a connection into the private global phone network. From its trusted position in the network, X can exploit a vulnerability in Y’s provider and knock it off-line. X can attribute this to another country by tunnelling through a third party provider.

If the target is hit in the right place, the impact can be enormous. In summer 2012, a failure in the Home Location Register (or HLR - one of hundreds of crucial components in a core network) caused the collapse of the provider Orange in France. For twelve hours, 26 million people had no phone or data service. That same year O2 in the UK experienced a similar outage due to an HLR failure.

X could target the HLR in Y’s network or one of several other choke points. Components that affect large numbers of subscribers include the HLR, MSC, HSS, SMSC, MSS, and GGSN. After the initial attack, X would simply wait until the network service was restored, and knock it back down. Lather, rinse, and repeat.

If NATO is serious about Article 5, it needs to be aware that an attack on a telecommunications network could be the catalyst for invoking it. This isn’t simply some hypothetical, futuristic scenario, but has serious precedent. The core mobile network infrastructure is particularly vulnerable, and the perpetrator of such an attack will likely have the access, skills, and resources of a nation state. NATO member states need to prepare now for how they might respond to a mobile black-out – a case by case strategy simply won’t suffice when an entire population is disconnected from their smart phones.

Article 5.0: A Cyber Attack on One is an Attack on All (Part 2): Technical Challenges to a Mobile Cyber Umbrella

Adam Harder

Article 5.0: A Cyber Attack on One is an Attack on All (Part 3): Why Private Companies Should Care About Geopolitics

$
0
0

In the month since our first post on NATO, the Sandworm virus’ extent and reach has become increasingly publicized. Sandworm is believed to be a Russian cyber-espionage campaign focused on extracting content and emails that reference Ukraine. NATO was among its many targets. To some, this may just appear to be power politics playing out in cyberspace, with only the government sector truly affected. That would be an extraordinarily myopic perspective. Private companies are increasingly entangled in the world of cyber geopolitics and must be wary of how geopolitical developments can impact their own cyber security. When it comes to the cyber realm, the line is increasingly blurred between state and non-state actors. For the private sector, the geopolitical situation may become just as relevant to assessing cyber risk as international markets are to assessing economic risk.

As we’ve noted, there are significant hurdles to implementing NATO’s collective cyber defense, and the challenges in enforcing it will only grow. But the expansion of Article 5 to include cyber is just one tool the West can use to push back against Russian influence. NATO’s adoption of cyber complements the sanctions employed by the US and EU against select (mainly state-owned) Russian companies. US sanctions against Russia largely target the financial, energy, defense, and transportation sectors. Similarly, the Sandworm virus targeted, in addition to NATO and other Western government entities, several energy, telecommunications and defense companies. It also targeted an academic institution due to Ukrainian research by one of the professors. The JP Morgan data breach (and that of a dozen other banks) similarly is largely hypothesized to trace back to Russia, with some viewing it as retaliation for US sanctions.

The permeation of geopolitics into the private cyber domain is not limited to the Russian example. Last year, the Syrian Electronic Army (SEA) attacked several Western media outlets, the most prominent of which was the New York Times website. The timing of the attacks coincided with the Obama administration’s claims that Bashar al-Assad used chemical weapons against his population. The SEA is believed to have targeted anti-government/pro-rebel media outlets. The success of the SEA has led some to wonder whether the Islamic State of Iraq and the Levant (ISIL) is similarly capable of mounting a similar attack.

Geopolitics in cyberspace is certainly not limited to attacks against the West. The recent wave of cyber attacks on mobile phones in Hong Kong is likely an attempt by the Chinese government to quell the pro-democracy demonstrations. In response, Anonymous has vowed to retaliate against the Chinese government. Anonymous is not the only non-state actor fighting back against state-sponsored cyber attacks.

The line between state and non-state actors in cyberspace is becoming increasingly blurred. Members of the private sector, whether companies or individuals, are increasingly likely to be targets of cyber attacks–not because of their own behavior, but because of the growing impact of geopolitics on the private sector. Unlike the seemingly non-politically motivated breaches of companies such as Target or Neiman Marcus, private sector companies may become the targets of retaliatory behavior of foreign governments (or their non-state extensions). Rather than being the result of actions by specific companies, these targeted attacks will more likely be spillover effects of the greater geopolitical tensions between states. Saudi Aramco knows full well just how quickly the business (or state-owned enterprise in their case) sector can become a victim of grander power politics. This is likely to become the norm, not the exception, as states continue to play out disputes in the anonymizing domain of cyberspace. Private sector companies, especially those in energy, finance or defense, are especially likely to be prone to targeting by foreign government affiliated entities.

Article 5.0: A Cyber Attack on One is an Attack on All (Part 3): Why Private Companies Should Care About Geopolitics

Andrea Little Limbago

Fixing America’s Strategic Analysis Gap Without Creating Another Institution

$
0
0

In his recent Washington Post article “America Needs a Council of International Strategy”, David Laitin accurately makes the case for “better analysis of data, trends, and context…” to help policy makers within the government make more informed international policy decisions. He recommends the creation of a “team of strategic analysts capable of using the statistical models of the forecasters…” so that policy makers can explore global events and related policy options. As someone who helped build exactly that kind of analytic team within the government–and then saw it eliminated–I learned plenty of lessons about obstacles to implementation. Instead of creating yet another analytic organization, we should focus on refining current analytic institutions within the Intelligence Community and Department of Defense (which I’ll refer to as the Community) to fill the gap Dr. Laitin identifies.

As Dr. Laitin rightly notes, there does not exist (to my knowledge) a place within the Community that contains a group of government civilian scholars focused on quantitative modeling to inform strategic level decisions. While there are pockets of these capabilities, they are small and disparate. One reason for this gap is a bias against quantitative social science analyses. This can be partially traced to the false promise of many quantitative models that proved to either be incongruent with the operational pace of the Community or were simply based on faulty or obsolete theory and data. These models often contained proprietary black-box modeling techniques, and thus were impossible to fully comprehend. Because of this, quantitative analyses that truly accommodate academia’s technical rigor as well as the Community’s expedited analytic pace continue to be met with skepticism. I still recall a task in which our team responded to a question from the highest levels of military leadership. Our quantitatively-derived findings – which at the time were counterintuitive but have since been proven out – were never included in the final presentation to leadership. Quite simply, domain expertise trumped quantitative analyses then, and it still does today.

Second, there is a bias in the Community for tactical, real-time analysis over longer-term strategic thinking. This is partly due to an incentive structure that focuses on inputs into daily briefs and quick-turn responses in lieu of longer-term, strategic analyses. This is not a surprise given real-world demand for real-time responses. However, in my experience talking to various levels of leadership within the Community, there is also demand for strategic insights. In fact, as you move up the leadership chain, these kinds of analyses become ever more important to help inform global resource allocation, effects assessments, and planning.

Academia is equally culpable for the gap Dr. Laitin identifies. First, there remains a faulty assumption that scholarship can only be rigorous or policy relevant, and not both. This was evident at this year’s American Political Science Association (APSA) Annual Meeting. To provide just one thematic example, cyber analyses across the soft/hard power spectrum were practically non-existent among the over one thousand panels. Academic leaders, just like their government counterparts, similarly need to adapt the discipline for the challenges of the modern era.

Finally, there needs to be greater academic support outside of the beltway for rigorous, policy relevant research and career tracks. Given the academic job market over the last decade, academic leadership should also encourage, not deter, graduate students from pursuing non-academic positions, including in the government analytic community.

I adamantly agree with Dr. Laitin’s acknowledgement that policy makers require greater access to independent, quantitatively driven and probabilistic policy alternatives. However, the best way to fill this gap is from the inside, not from the creation of yet another new organization. Let’s complement and refine the extant analytic institutions such that they too can conduct and make relevant the quantitatively-derived strategic analyses Dr. Laitin describes.

Fixing America’s Strategic Analysis Gap Without Creating Another Institution

Andrea Little Limbago

Malware with a Personal Touch

$
0
0

Over the summer, a friend sent me some malware samples that immediately grabbed my attention. The malware was intriguing because the literal file name of each binary was named after a person or a user ID (for example, bob.exe, bjones.exe, bob_jones.exe, etc). Cool, I thought at first – but after some more detailed analysis, I realized that the malware actually contained hard coded user information, implying that each binary was crafted to target that particular user. Unlike more prominent instances of malware, these samples contained binaries specifically aimed at a pre-generated list of email addresses. No longer is malware targeting only randomized email addresses - this sample indicates a different variety of malware that has a more “personal touch”.

After digging around a bit, it became apparent that this type of malware has been around for a while. Malware of this type was actually circulating during early 2013, but recent open source research revealed there was also a malicious Facebook campaign earlier this year, in May 2014, that delivered similar malware. In the 2014 reports I read, the malware contained embedded clear text bible scripture, but while the samples I received from my friend didn’t contain any bible scripture, there were enough similarities (such as obfuscation techniques and reach back communications) that suggest my variants may have been from the same campaign. In typical phishing fashion, the May campaign began with an email like this:

So it’s been around for a little while and there are some other excellent analytical reports on this piece of malware - some of which delve more into the math behind the malware, which is quite interesting. However, in this post, I’ll be focusing on the personalized nature of the malware, which sets it apart from many I have previously analyzed.

Regardless of the malware genesis, what really amazes me is the number of people who will receive an email, download the zip file, then open it using the password provided. They then inadvertently run the malware and receive a fake MessageBox notification created by the malware. This means that while the user probably thought everything was okay, behind the scenes the malware was off and running. Similar to other types of malware, the binaries are triggered by user behavior and continue to run unbeknownst to the infected user. However, unlike many other types, this sample truly contained a personal touch – leveraging social engineering to fool the user that the malware was truly a personalized, benign message. The following section provides a technical walk-through of the various aspects of the malware.

Malware Execution: A Step by Step Overview

Upon execution, the affected user will see an error MessageBox with the following:

This would probably lead the unsuspecting user to think the program didn’t work, and there’s a good chance the user would just go about their business. If so, they would never realize their computer had just been compromised, and the seemingly innocent MessageBox would have been the only visual sign of something gone awry.

Now let’s take a closer look at the malware’s footprint on a victim host. It self-replicates to two binaries, both located in a malware-spawned folder within the affected users %AppData% folder. The name of the malware folder and the names of the self-replicated binaries are decoded during run time. What’s interesting about this is that while their naming conventions appear random, they were actually static and quite unique to that particular binary. The names of the folder and each binary are hardcoded, but obfuscated, within the binary. In other words, the malware file structure will be the same each time it is run. Pasted below are three different examples that illustrate this. Note: The literal file names have been intentionally changed to protect the identity of the affected users.

At first glance, this file structure reminded me of several ZeuS 2.1 and ZeuS-Licat variants I analyzed several years back, but the ZeuS file structure was not static in any way.

The communication flow looks like this. Within the span of 16 seconds, the infected host will connect out to 85 different domains, each with the same GET request.

Immediately, a pattern of anomalies can be seen regarding the reach back domains. Like children’s mix and match clothing, the reach back domains are mixed and matched combinations consisting of two English words. Subsequent reversing revealed the malware binary contained an obfuscated list of 384 words, ranging from 6 to 12 letters as follows:

6 letter words = 97 

7 letter words = 152 

8 letter words = 82 

9 letter words = 38 

10 letter words = 10 

11 letter words = 4 

12 letter words = 1

The reach back domains are dynamically generated using a Domain Generation Algorithm (DGA) to merge two of the 384 words together into a single domain name with the “.net” top level domain (TLD). The infected host will connect out to 85 domains, and the list of 85 domains will remain constant for 8:32 minutes, meaning that if the malware is restarted during the same 8 and a half minute period, the same domains would be requested.

For demonstration purposes, if the malware was run at 3:59am on 16 Sep, the domain generation will begin with classshoulder.net, followed by thickfinger.net, etc. (as shown above for connections 1-4). At 4:00am, however, the first domain requested will be thickfinger.net, followed by classfinger.net. Eight minutes later, at 4:09am, the first domain will be classfinger.net, followed by againstbeyond.net. This means that domains 2-85 at 4:00am are identical to domains 1-84 from 3:59am. Below is a representation of the first five domains used between 03:59 and 04:09 AM on 16 September, where the first domain can be seen dropping off after 8 minutes.

Referring back to the four connections above, you can see the affected user’s email address is contained within the outbound “GET” via port 80. This too was hardcoded, but obfuscated within the binary and decoded during run time. In essence, this means each binary was crafted for that particular user. Another interesting aspect is that of the variants examined, the domain name list and the affected user information were encoded with different keys.

Now let’s take a quick peek at the malware in action. Once executed, both self-replicated binaries are running processes as shown below:

What really intrigued me about this was that these are bit-for-bit copies of each other, and despite this, they are hooked to one another shown above. This seemed quite odd, but diving in a little deeper revealed a more interesting side of the story. Basically the first spawned process (ovwcntv.exe) is the primary running process and the second spawned process (ctrdyelvupa.exe) is a slave process, but more on that later. For now, let’s check out a brief chronological snippet of the malware and its spawned processes during run time.

Notice how the original binary (person1.exe) wrote the first copy (ovwcntv.exe) to disk, then spawned its process. Yet ovwcntv.exe was the binary that wrote the second copy (ctrdyelvupa.exe) to disk, subsequently spawning the ctrdyelvupa.exe process. This daisy chain actually acts as a persistency mechanism.

The original binary (person1.exe) is launched via a command line argument. Once running, the binary decodes a string equating to “WATCHDOGPROC”, which the running process looks for. If the “WATCHDOGPROC” string is part of the same command line string as the path for either binary, that particular process is launched. If the “WATCHDOGPROC” string isn’t contained within the same command line string as the binary path, the running process will not launch the additional process. Below are stack excerpts to help demonstrate this.

Will launch:

ASCII "WATCHDOGPROC "C:\Documents and Settings\user\Application Data\nripohbdhnewia\ovwcntv.exe""

Won’t launch:

ASCII "WATCHDOGPROC"
ASCII "C:\Documents and Settings\user\Application Data\nripohbdhnewia\ovwcntv.exe"

As stated above, the ovwcntv.exe binary is the active running process while ctrdyelvupa.exe acts as a safeguarding (or slave) process. Using the WATCHDOGPROC procedure, if ovwcntv.exe is terminated, ctrdyelvupa.exe immediately restarts it. If ctrdyelvupa.exe is terminated, ovwcntv.exe will restart it as well.

WATCHDOGPROC, in its encoded form, is a 13-byte block embedded in the original binary at offset 00022990. During run time, those bytes are used to populate the ECX register. Each byte is then XOR decoded with the first 13 bytes contained in the EAX register, resulting in the ASCII string “WATCHDOGPROC”. This is demonstrated below.

My initial interest in this particular piece of malware, however, was the hardcoded user’s email information that was obfuscated within the binary. I was equally interested in the word list used by the DGA. I wanted to find their embedded location inside the original binary. It was bit of a trek to get there, but persistence paid off in the end. So let’s begin with the encoded user information.

Within the person1.exe binary, the encoded email address was located at offset 00023611, preceded by the encoded uri string (at offset 00023600).

During runtime, this data block was stored in the ESI register as shown below:

Additionally, a similarly sized block of data was dynamically generated and stored in EAX as shown below. In essence, this was the decoding key.

Each byte of ESI and EAX were then run through the following incremental XOR loop…

…producing the decoded URI which included the victim user’s email address as shown below.

XOR EXAMPLE 

40 XOR 6F = 2F (‘/‘) 

6F XOR 09 = 66 (‘f’) 

6F XOR 00 = 6F (‘o’) 

56 XOR 24 = 72 (‘r’)

Note: once I isolated the user data (or email address) within the original binary, along with its key, I patched the binary so it would reflect ‘nottherealuser’ vice the name of the actual victim user. This patched binary was then used to obtain the previous examples.

Next, let’s look at the domain name generation. It followed the same scheme as above. A 2800-byte block of hardcoded data from the original binary was stored in ESI. Then an equally sized block of data was dynamically generated and stored in EAX. These two data blocks were run through the same XOR loop producing a list of 384 words. To demonstrate this, the first 48 bytes of the applicable registers are shown below.

XOR EXAMPLE 

FB XOR 91 = 6A (‘j‘) 

4A XOR 25 = 6F (‘o’) 

0A XOR 7F = 75 (‘u’) 

30 XOR 42 = 72 (‘r’) 

4C XOR 22 = 6E (‘n’) 

3F XOR 5A = 65 (‘e’) 

46 XOR 3F = 79 (‘y’)

The interesting part about this binary was that while it appeared packed, it wasn’t. Just about everything within the binary (API calls, strings, etc.) was obfuscated and decoded on the fly during run time as needed. Also, it didn’t debug on its own freewill. This became apparent while stepping through the binary and hitting a point where EIP was set to 00000000. To overcome this, the binary was patched at that particular offset by changing the opcode to a jump (EB FE) so that it would loop back to itself during run time. The patched binary was then saved and executed again, causing it to run in an infinite loop. While running, a debugger was attached to the binary. The jump opcode (EB FE) was then changed back to its original opcode (FF 15 in this case) at which time the intended location of that call (address 00409C50) appeared as can be seen in the following debugger excerpt:

At this point, the binary was patched with a call to the newly identified offset by replacing “CALL DWORD PTR DS:[42774C]” (shown above) with “CALL 409C50”. After this, the binary was saved to a new binary (e.g. binary1.exe).

Next, binary1.exe was loaded into a debugger and a break point was set for CreateProcessA. The binary was then run which generated the first copy of the original binary (in this case ovwcntv.exe), but for simplicity’s sake, we’ll call this binary2.exe.

Binary2.exe was then loaded into a debugger and as we did previously, the opcode at the initial point of user code (E8 48 in this case) was changed to EB FE, changing the initial command from CALL to JMP, as follows:

to

Binary2.exe was then saved as itself, overwriting the original binary2.exe. It also created a backup copy (.bak), which was deleted. Then the debugger was closed.

After this, a debugger was reopened, but it wasn’t attached to anything just yet. Returning to the still-opened debugger for binary1.exe, Alt F9 was pressed in order to execute the binary til user code. This caused binary2.exe to run in a loop (due to the aforementioned patch). The newly opened debugger was then attached to binary2.exe, opening the binary in the ntdll module. From the debugger for binary2.exe, Alt F9 was pressed in order to run it til user code. At this point, the opcode EB FE was changed back to its original opcode (in this case E8 48). A breakpoint was then placed for LoadLibraryA and the binary was run again. Stepping through the binary back into the user code led to all the deobufuscation discussed earlier.

Lastly, below is the complete list of words used for the creation of reach back domains, in order of the lists creation (reading left to right):

 

Malware with a Personal Touch

Casey Gately

INSA Whitepaper: Operational Cyber Intelligence

$
0
0

Endgame Principal Social Scientist Andrea Little Limbago is a coauthor of the Intelligence and National Security Alliance’s (INSA) latest whitepaper, Operational Cyber Intelligence. The paper is part of an INSA series that addresses the multiple levels of cyber intelligence. While much focus has been devoted to the tactical level, the strategic and operational levels of analysis have not garnered equal attention. This latest whitepaper discusses the role of operational cyber intelligence, the key bridge between the tactical and strategical levels, and the non-technical and technical domains. It examines the role of operational cyber intelligence in assessing the operating environment, forecasting and assessing adversarial behavior, and concludes with business and mission requirements to develop operational cyber intelligence capabilities.

Visit INSA to read the full whitepaper.

INSA Whitepaper: Operational Cyber Intelligence

Endgame Contributes Data and Analysis to "Operation SMN" Report

$
0
0

Today, Novetta and a coalition of cyber security companies released the report “Operation SMN: Axiom Threat Actor Group Report,” which details the characteristics of a threat actor group believed to act on behalf of a Chinese government intelligence apparatus. Endgame provided extensive proprietary threat data and analytical processing capabilities that allowed the coalition to gain deeper insight into compromised network footprints.

Read the full report here.

Endgame Contributes Data and Analysis to "Operation SMN" Report

Bestiary of Cyber Intelligence

$
0
0

Welcome to the First Annual Endgame Halloween Blog! Inspired by the recently released Bestiary of Intelligence masterpiece, we have built upon this model with a Bestiary of Cyber Intelligence 2014: Top 10 Creatures. These beasts represent common clichés, terms, or phrases that get over-used, misused, or simply abused through the course of cyber intelligence writings. As you read it, we’re certain other specimens will come to mind. Start keeping track of them now for potential inclusion into our bestiary collection for 2015!

  • Viral meme: Viral memes are self-replicating, cacophonous creatures that can diffuse globally in the blink of an eye. No one knows how they emerge or why they so abruptly disappear. Viral memes rarely stand up well to historical scrutiny, and analysts have yet to clearly identify why some viral memes endure so long even after all normal rationality would predict their demise.

  • Low-hanging fruit: Contrary to conventional wisdom, this animal is not edible. In fact, it can be quite poisonous, lulling analysts into complacency, and forcing them to gravitate toward the easiest, simplest solution achieved with minimal effort exerted. The low-hanging fruit has a very short life span, rotting quickly and is readily replaced once another low-hanging fruit is discovered.

  • Hacking back: Once thought to be a thing of fantasy, there have been increased sightings of the hacking back over the last few years. Most describe it similarly to a triceratops, with a shielded head to protect itself and large teeth and horns to attack. Many in the scientific community deny its existence, and there are divergent descriptions by those who have seen it.

  • Malicious attack: Distant cousin and foe of the heartwarming smiley emoticon, malicious attacks are moody creatures that often can hide for days or weeks unbeknownst to their owners, only to emerge once discovered by adept inspection by a cyber analyst. These dark, slimy creatures have elongated, strong appendages, enabling them to surmount any defense. Because of their notorious reputation, when any data is breached, a malicious attack is the first to get blamed.

  • Compromised systems: These unfortunate beasts are quite fragile, often come in groups and are easily swayed by external forces. However, they camouflage easily into the vast IT infrastructure and thus are quite difficult to see with the naked eye. Similar to the golden snitch in Quidditch, analysts compete with each other in an attempt to be the first to discover a compromised system.

  • Big data: This aquatic creature prefers extreme weather situations - floods, deluges, storms. Big data exhibit long mandibles and a broad head, steam rolling everything in their path. Analysts must be very careful with big data, as it is impossible for them to comprehend yet simultaneously holds the solution to every plausible analytic question ever pondered.

  • Trolls: It wouldn’t be Halloween without a troll, but this isn’t just any grumpy old troll. These trolls are quick, dark creatures, slithering quickly and quietly in and out of forums. Trolls sow discord wherever they go, popping in and out of conversations. At times confused with a devil’s advocate, trolls don’t generally start arguments to help improve decision-making, but instead seek to create disputes.

  • TLDR: The TLDR is every analyst’s worst nightmare. A very complex, multi-faceted beast, the TLDR requires constant nurturing and support to help it grow to adulthood. Analysts spend significant amounts of time growing the TLDR. This unusual beast, however, has the ability to dismember itself upon reaching adulthood, and can divide and metamorphose into creatures that sadly become unrecognizable to analysts.

  • The Cloud: The cyber holy grail, digital heaven, and the closest thing to cyber religion. The cloud is a loyal, fix-all beast that is everything and nothing at once. More importantly, if an analyst can’t find the data, they just need to look in the cloud. If there’s a problem, the cloud can fix it. The one technology that will never fail, and similar to choosing C in multiple choice exams, if a cyber practitioner doesn’t know the technical solution to a problem, just recommend the cloud.

  • Attribution problem: A close companion of the Cloud, the attribution problem is an analyst’s go-to friend if they come across inconclusive findings. The attribution problem gets blamed for many analytic hurdles, which makes it one of the most melancholy beasts. Seriously, if the data fails to yield interesting insights, it’s generally because of the attribution problem. If an analyst can’t find the root cause of malicious attacks or compromised systems, the attribution problem quickly becomes the scapegoat.

Graphics Credit: Anne Harper

Bestiary of Cyber Intelligence

Andrea Little Limbago

To Forecast Global Cyber Alliances, Just Follow the Money (Part 3): Moving Toward a Cyber Curtain - APEC and the Implications of a Potential Sino-Russian Cyber Agreement

$
0
0

Next week’s APEC summit may, in addition to providing great insight into economic collaborative trends, serve as a harbinger to subsequent cyber collaboration. If the economic trends carry over, it’s likely that a Sino-Russian cyber agreement just may provide the impetus that pushes many countries toward closer relations with the US, especially if it addresses joint cyber operations. The Sino-Russian cyber agreement plausibly can be viewed as part of a response to the Snowden disclosures of last year. The disclosures similarly strained relations between the US and its partners across the globe. However, in light of a Sino-Russian cyber accord, these strained relations could dissipate when states are left choosing between two greatly distinct approaches to the Internet. On the one hand, although the US certainly must continue to mend global relations, it nevertheless still promotes an open, transparent, and universal approach to the Internet. From the beginning, the US has encouraged Internet expansion and integration, providing economies of scale for access to information across the globe.

In contrast, between the Great Firewall of China and Russia’s increased censorship, a Sino-Russian pact symbolizes in many ways the modern version of the Iron Curtain. Just as the Iron Curtain epitomized the sharp divide between closed and open societies, a Sino-Russian accord could signify the start of a ‘Cyber Curtain’, reflecting a sharp divide between two very different approaches to Internet freedoms, access to information, and even the role of the government. Despite all of the past year’s controversy over the Snowden disclosures, the US still has soft power on its side as a key proponent of universal Internet expansion and information access. This soft power will likely be much more attractive than the censored and disconnected approach offered by China and Russia.

China will certainly continue to flex its economic muscles during the APEC summit. However, keep an eye out for a Sino-Russian cyber agreement that may sneak under the radar due to the summit’s focus on economic issues. China’s ongoing provocations across the South China Sea, coupled with Russia’s cyber and military expansion into Eastern Europe, have already induced uncertainty and concern among the other players in each region. This uncertainty has already begun to push neighbors and rivals together to counter the provocations. Similarly, a Sino-Russian cyber agreement may inadvertently cause many countries in both Europe and Asia to rethink their stance and push them toward greater cyber collaboration with the US. This would create a cyber curtain reflecting two very distinct approaches to the cyber landscape – one championed by the US and one by Russia and China. Just as the pre-World War I Gold Standard and the Cold War Iron Curtain signified a sharp contrast between global integration and nationalistic isolation, the current global structure may soon reflect a cyber divide between cyber-nationalism and cyber-integration, reflecting the patterns of cyber cooperation.

To get a head start on understanding this emergent cyber security cooperation, policymakers would do well to look at how economic regionalism might help them better forecast the cyber future. If the economic cooperative landscape is any indicator, the US may finally move beyond the tensions sparked by the Snowden revelations and amend cyber relations with the rest of the global community. It’s ironic that Russia and China may play the determining hand in creating that outcome.

To Forecast Global Cyber Alliances, Just Follow the Money (Part 3): Moving Toward a Cyber Curtain - APEC and the Implications of a Potential Sino-Russian Cyber Agreement

Andrea Little Limbago

To Forecast Global Cyber Alliances, Just Follow the Money (Part 2): Cooperation in the Cyber Domain - A Little-Noticed Global Trend That is Mirroring Economic Regionalism

$
0
0

This latest development in the realm of cyber cooperation is by no means unique. In fact, the US has signed its own cyber security agreement with Russia (although it is not as comprehensive as the potential Sino-Russian one) – as well as with India, with the EU, one with Australia as part of a defense treaty, and a cyber security action plan with Canada. Similarly, the EU has formal cyber agreements with Japan, and the UK with Israel, while Japan and Israel also have formed their own bilateral cyber security agreement. India has cyber security agreements with countries as diverse asKazakhstan and Brazil. RTAs are also being augmented with the inclusion of cyber. The African Union, the Shanghai Cooperation Agreement, and the EU’s Budapest Convention are all examples of this. This pattern parallels one found in the economic arena, with cooperative agreements often following closely to geopolitical affinities.

To better understand the impact of future cooperative cyber security agreements, policymakers should revisit the economic models and RTAs of the last quarter century – looking especially at the divergent perspectives that RTAs would either be building blocs or stumbling blocs of a global international order. The building bloc camp believes the RTAs are merely a stepping-stone toward global integration. The stumbling bloc camp believes that RTAs are a new form of neo-mercantilism, which would lead to protectionist walls built around member-states. These camps have theoretical equivalents in today’s cyber domain. The stumbling bloc argument has profound parallels to discussions around the Balkanization of the Internet (i.e. the Splinternet), while the building bloc camp is representative of those suggesting a global diffusion of the Internet. In fact, these two perspectives greatly mirror the divergent ways in which China and Russia approach the Internet (i.e. cyber-nationalism) as opposed to the US approach (i.e. global integration).

While cyberspace will continue to be portrayed as a combative domain as long as attacks persist, policymakers cannot ignore the cooperative aspects of cyber, which increasingly reflect the larger geopolitical and economic landscape. Beijing and Moscow have been expanding collaboration on a range of economic issues. While it’s convenient to point to Sino-Soviet tensions during the Cold War to discount any trans-Asian partnerships by these two giants, such a heuristic not only would be erroneous but it would be detrimental to understanding global cyber trends. These two countries are increasingly aligned diplomatically, and even more so economically. This past spring, Russia and China signed an agreement between their largest banks to exchange in local currencies, bypassing the historic role of the dollar. This summer, the two countries signed a more comprehensive agreement to further trade in local currencies, again eliminating the need for the US dollar. If the latest rumors are correct, next week Russia and China will sign a cyber security agreement at–of all places–the Asia Pacific Economic Cooperation (APEC) summit.

APEC will provide a global forum for China to assert an agenda of greater economic integration in the region, including a push for the Asian Infrastructure Investment Bank (AIIB). This AIIB is viewed as a Chinese attempt at restructuring the post-World War II economic order established by the US and Europe. The US has openly challenged the creation of the AIIB exactly for this reason, and the possibility that it would emerge as a competitor to the World Bank (which was created at the Bretton Woods conference as one of the three pillars of the new Western-dominated global order). While China pushes forth with the AIIB, the US continues to press for the Trans-Pacific Partnership (TPP), a proposed free-trade agreement among a dozen states in the Asian region, and currently excludes China. China claims the TPP is a US attempt to contain China in the region and has been pushing forth with its own alternatives in the region such as the AIIB as well as the Shanghai Cooperation Organization. Now with a potential cyber agreement between Russian and China, it’s likely that this tit-for-tat behavior will overtly manifest in the cyber domain.

To Forecast Global Cyber Alliances, Just Follow the Money (Part 2): Cooperation in the Cyber Domain - A Little-Noticed Global Trend That is Mirroring Economic Regionalism

Andrea Little Limbago

To Forecast Global Cyber Alliances, Just Follow the Money (Part 1): Understanding a Sino-Russian Cyber Agreement through Economic Regionalism

$
0
0

Former Secretary of Defense Leon Panetta called cyberspace “the battlefield of the future,” and this characterization of the cyber domain has only increased as cyber attacks grow more prevalent and disruptive. But this militarization of the cyber domain often masks an underlying cooperation that is occurring simultaneous to rising geopolitical friction. Rumors of a Sino-Russian cyber agreement have sparked alarm, and are a reminder that both cooperation and conflict are natural outcomes as states jockey for power in cyberspace.

The rumored Sino-Russian cyber agreement is just the latest in a global trend of states signaling diplomatic preferences and commitments via formalized cooperative cyber security agreements. Cooperation in cyberspace in the modern era is reminiscent of the transition to economic cooperation in the post-World War II era and the military cooperation that dominated the earlier eras. In each case, states rely upon those distinct domains to signal affinities and exert power. Since the latter part of the 20th century, economic regionalism has become the defining mode of cooperation among states, in many instances replacing the role alliances once played. With that in mind, policymakers should look to the economic cooperative landscape as a foundation for forecasting the future of cyber security cooperation.

Sino-Russian collaboration across the monetary, commercial, and investment space reveals ever tighter integration among the two countries, and thus a cyber agreement should come as no surprise to those who follow global economic relations. However, the real insights may come in using economic regionalism to assess the implications of this rumored agreement. While a Sino-Russian agreement could be extraordinarily disruptive to the global order, it may have unintentional positive ramifications for the US. In fact, such an agreement may encourage other countries across the globe to ameliorate the persistent tensions with the US that have occurred since the Snowden disclosures. Given the current divergent approaches to the role of the Internet, most states are likely to find a universal approach to the Internet much more appealing than the model of censorship and control that Russia and China represent. A quick review of economic regionalism exemplifies the role of agreements, and soft power, in shaping global geopolitical partnerships.

Economic regionalism constitutes the range of economic relations between states, the most prevalent of which are regional trade agreements (RTAs). RTAs increased exponentially beginning with the end of the Cold War and the subsequent global economic liberation. According to the World Trade Organization, there are currently 379 RTAs in force. In many cases, these RTAs have taken on military cooperative aspects, such as Africa’s Economic Community of West African States (ECOWAS). In fact, with the rise of globalization, RTAs often serve as the preferred mode of cooperation as formal alliances have declined. Similarly, cyber security cooperative agreements may soon become the modus operandi for power politics cooperation across the globe, superseding or augmenting the role of economic agreements.

While the impact of today’s RTA-influenced global economic order has been debated considerably, it is clear that cooperation in cyberspace is following a similar structure to that of cooperation in the commercial domain over the last 25 years. In a seminal overview of global political economy, Robert Gilpin notes that, “Important factors in the spread of economic regionalism include the emergence of new economic powers, intensification of international economic competition, and rapid technological developments…Economic regionalism is also driven by the dynamics of an economic security dilemma.” It’s easy to foresee a future wherein “cyber” replaces “economic” in Gilpin’s analysis. In fact, it’s not a stretch to imagine a cyber security dilemma emerging in response to a Sino-Russian cyber security agreement.

To Forecast Global Cyber Alliances, Just Follow the Money (Part 1): Understanding a Sino-Russian Cyber Agreement through Economic Regionalism

Andrea Little Limbago

Back to the Future: Leveraging the Delorean to Secure the Information Superhighway

$
0
0

In the cult classic trilogy Back to the Future, Doc claims, “Where we’re going, we don’t need roads.” He’s referencing 2015, and his assertion reminds us just how difficult it is to forecast the future of modern technology. The movies also remind us how tempting it can be to reflect on how things might have been. The current cyber security landscape is ripe for such reflection. What if you could go back in time, knowing what you know today, and alter the armed forces’ approach to cyber security? This was the focus of a dinner I recently had the privilege of attending at the United States Naval Academy Foundation (USNAF), which addressed the specific question,

“Knowing what you know now about cyber threats, cyber espionage, etc., if you could go back to the year 1999 (15 years ago), what advice would you give the armed forces regarding what is needed to prepare for the future…which is now. And how are we doing compared to what you would have said?”

Below are some of the key themes that emerged from this lively discussion, which brought together a diverse range of military, academic and industry perspectives—though unfortunately without the assistance of a Delorean to facilitate implementation of the recommendations. But it’s never too late, and many of these themes and recommendations can help inform future capabilities and the structure of the cyber workforce:

Cyber-safe as a Precondition, Not an Afterthought 

For the last fifteen years, cyber security has been treated as a luxury, not a necessity. This has created a technical debt that is difficult but essential to overcome. The acquisition process and all of its warts is a critical component for implementing cyber-safe requirements and ensuring that everything is built to a pre-defined minimal requirement of cyber-safety. Cyber-safe as a precondition would have produced many unforeseen, but beneficial, externalities beyond the obvious ones of improved cyber security. For example, users who demand modern web experiences but are currently stuck using archaic web applications would have greatly benefited from this approach. Too often, analytic solutions must be compatible with a five-year old web browser (not naming names) that currently lacks available patches. A key challenge in the cyber domain – and really across the analytic spectrum – is creating modern applications for the community that are on par with their experiences in the unclassified environment. But in a world with cyber-safe as a requirement, users could benefit from modern web applications and all of the user-experience features and functionality that accompany modern web browsers. Data storage, indexing, processing, and many other areas well beyond data analysis would benefit from an a priori cyber-safe requirement for all technologies. Cyber-safe should not be viewed as an afterthought, and the armed forces must overcome significant technical debt to achieve greater cyber security.

Revolutionary, not Evolutionary, Changes to the Cyber Mindset 

In addition to the technology itself, cyber practitioners are equally essential for successful cyber security. During the discussion, we debated the opportunities and challenges associated with greater inclusion of cyber experts who may follow what are currently viewed as non-traditional career tracks (i.e. little or no formal computer science experience). Including these non-traditional experts would require overcoming significant gaps in both pay and culture to attract many of the best and brightest in cyber security. While this may be a longer-term solution, several near-term and more tangible recommendations also emerged. The notion of a military version of the Black Hat conference (which I wrote about here) gained some traction within the group. This type of forum could bring together cyber practitioners across the military, academic and industry spectrum to highlight innovative research and thought leadership and ideally bridge the gap between these communities. There was also interest in formulating analogies in the cyber domain to current practices and doctrine—likely more geared toward tactical application and technical training, but pertinent at the strategic and policy level as well. Frameworks and analogies are useful heuristics, and should be emphasized to help evolve our thinking within the cyber domain.

Redefining Cyberwarriors 

The US government has not been shy about its plans to dramatically expand its cadre of cyberwarriors. However, this usually entails an emphasis on STEM-centric training applied to information security. This is the bedrock of a strong cyber security foundation, but it is not enough. Everyone, regardless of discipline, must become cyber competent. The USNA has already started down this path ahead of most other academic institutions. Upon graduation, every student will have completed two core cyber courses, many take additional interdisciplinary cyber electives, and this year will be the second in which graduates can major in cyber operations. We discussed the need to further expand upon this core, especially in areas such as law that will enable graduates to navigate the complicated legal hurdles encountered within the cyber domain.

As expected with any paradigm shift, there has been resistance to this approach. Nevertheless, the USNA continues to push forward with dual cyber tracks – one for cyber operations majors, and another track for other majors to maintain cyber competency. This will pay great dividends in both the short and long term. Having now spent a significant amount of time with diverse groups of people from engineering, humanities and social science backgrounds, it is clear that linguistic and cultural divisions exist among these groups. Bridging this divide has longer-term implications for cyber competency both at the policy and tactical levels, and it can also spark innovation in the cyber security domain. It will ensure that cyber security technologists understand how their work fits into the larger mission, while similarly elevating technical cyber competency among military leaders and decision makers.

Expanding the notion of what constitutes a cyber warrior may in fact be one of the most important recommendations we discussed. Cyber can no longer be relegated to a niche competency only required for a small percentage of the workforce. The situation reminds me of quite possibly my favorite quote. When releasing the iPad a few years back, Steve Jobs noted, “It’s in Apple’s DNA that technology alone is not enough. It’s technology married with liberal arts, married with the humanities, that yields the results that make our hearts sing.” Knowing what we know now about the great potential for innovation in solutions that draw from technology as well as other disciplines, perhaps this same sort of cross-disciplinary competency can be applied equally to cyber challenges, which will only become more complex and post even greater challenges to our national interests.

Back to the Future: Leveraging the Delorean to Secure the Information Superhighway

Andrea Little Limbago

Challenges in Data-Driven Security

$
0
0

DEFCON 22 was a great learning experience for me. My goal was to soak up as much information security knowledge as possible to complement my existing data science experience. I grew more and more excited as each new talk taught me more and more security domain knowledge. But as Alex Pinto began his talk, this excitement turned to terror.

I knew exactly where he was going with this. And I also knew that any of those marketing blurbs about behavioral analysis, mathematical models, and anomalous activity could have easily been from Endgame. I had visions of being named, pointed out, and subsequently laughed out of the room. None of that happened of course. Between Alex’s talk and a quick Google search I determined that none of those blurbs were from my company. But that wasn’t really the point. They could have been.

That’s because we at Endgame are facing the same challenges that Alex describes in that talk. We are building products that use machine learning and statistical models to help solve security problems. Anyone doing that is entering a field littered with past failures. To try and avoid the same fate, we’ve made sure to educate ourselves about what’s worked and what hasn’t in the past.

Alex’s talk at DEFCON was part of that education. He talked about the curse of dimensionality, adversaries gaming any statistical solution, and algorithms detecting operational rather than security concerns. This paper by Robin Sommer and Vern Paxson is another great resource that enumerates the problems that past attempts have run up against. It talks about general challenges facing unsupervised anomaly detection, the high cost of false-positive and false-negative misclassifications, the extreme diversity of network traffic data, and the lack of open and complete data sets to train on. Another paper critiques the frequent use of an old DARPA dataset for testing intrusion detection systems, and by doing that reveals a lot of the challenges facing machine learning researchers looking for data to train on.

Despite all that pessimism, there have been successes using data science techniques to solve security problems. For years here at Endgame, we’ve successfully clustered content found on the web, provided data exploration tools for vulnerability researchers, and used large scale computing resources to analyze malware. We’ve been able to do this by engaging our customers in a conversation about the opportunities—and the limitations—presented by data science for security. The customers tell us what problems they have, and we tell them what data science techniques can and cannot do for them. This very rarely results in an algorithm that will immediately identify attackers or point out the exact anomalies you’d like it to. But it does help us create tools that enable analysts to do their jobs better.

There is a trove of other success stories included in this blog post by Jason Trost. One of these papers describes Polonium, a graph algorithm that classifies files as malware or not based on the reputations of the systems they are found on. This system avoids many of the pitfalls mentioned above. Trustworthy-labeled malware data from Symantec allows the system to bootstrap its training. The large-scale reputation based algorithm makes gaming the system difficult beyond file obfuscation.

The existence of success stories like these proves that data-driven approaches can help solve information security problems. When developing those solutions, it’s important to understand the challenges that have tested past approaches and always be cognizant of how your approach will avoid them.

We’ll use this blog over the next few months to share some of the successes and failures we here at Endgame have had in this area. Our next post will focus on our application of unsupervised clustering for visualizing large, high dimensional data sets. Stay tuned!

Challenges in Data-Driven Security

Phil Roth

Soft Power is Hard: The World Internet Conference Behind the Great Firewall

$
0
0

For three days, Chinese citizens are able to tweet at will and access Google, Facebook, and other forms of social media and traditionally censored content—but only if they are in the historic town of Wuzhen, where China is currently hosting the World Internet Conference. During this temporary reprieve from Internet censorship in Wuzhen, the rest of the country experienced a surge in censorship targeted at blocking access to several media outlets such as The Atlantic and the content delivery network Edgecast. The conference appears to have been put together in response to a similar series of conferences on global cyber norms led by the UK, South Korea, Hungary and the Netherlands, and it’s just the latest effort that China has made to influence and structure 21st century cyberspace norms. However, just as China failed to conceal the pollution during last week’s APEC summit, it seems that the government is encountering similar challenges in its attempt to simultaneously disguise the Great Firewall and promote Internet freedoms. The conference illuminates the stark contrast between China’s version of a state-controlled Internet within sovereign borders and the free and open Internet promoted by democratic states across the globe.

The goal of the World Internet Conference is to “give a panoramic view for the first time of the concept of the development of China’s Internet and its achievements,” according to Lu Wei, the minister of China’s new Cyberspace Administration. However, the conference may inadvertently highlight the hypocrisy of an uncensored Internet conference occurring within one of the most censored countries in the world. In fact, much of the world seems absent from what was meant to be a global conference, understanding full well the cognitive dissonance that seems to have evaded the Chinese leadership when organizing this conference. Only a handful of the speakers are non-Chinese, but in general the world’s biggest players in the Internet are absent from the discussion.

For several years, China has leveraged its clout to attempt to shape global cyberspace norms. China and Russia jointly proposed The International Code of Conduct for Information Security to the United Nations, ironically calling for a free and open global Internet while their domestic censorship continues to expand. Just as rising powers exerted their influence to shape the post-World War II international order, China is similarly leaning on extant institutions, and also introducing new international institutions, to shape the cyber norms of the 21st century global order. However, China fails to grasp the importance of soft power in shaping global norms of any kind. Power can be achieved via coercion, payment, or attraction. Soft power occupies the realm of attraction, and promoting values that are attractive to others. As Joseph Nye explained last year, China (and Russia for that matter) is failing miserably at soft power because they fail to account for the attraction component of the equation. The World Internet Conference makes this unabashedly clear.

As China continues to exert influence over the global cyber commons, there is certainly cause for concern that they might extend their sphere of influence and encourage others to limit Internet freedoms. As William Nee of Amnesty International notes, “Now China appears eager to promote its own domestic Internet rules as a model for global regulation. This should send a chill down the spine of anyone that values online freedom.” While concern is warranted, it would be myopic to overreact and ignore the vital component of attraction within soft power. What China fails to understand is that its attempt at soft power will present challenges. Soft power is only truly effective when it promotes universal values such as freedom and openness—not dictatorial control over access to information.

Soft Power is Hard: The World Internet Conference Behind the Great Firewall

Andrea Little Limbago

Is This the Beginning of the End of “Duel”-track Foreign Policy?

$
0
0

The Iranian nuclear negotiations occupy a persistent spot in the foreign policy news cycle. The Associated Press recently reported that Iran has agreed to a list of nuclear concessions. Although still improbable, the likelihood of even minimal collaboration between the United States and Iran appears greater now than in recent memory. Unless, of course, you happened to stumble upon the revelations of Operation Cleaver, which has been largely ignored by all but the tech media outlets. The report highlights an alleged widespread Iranian cyber campaign targeted at critical infrastructure in about a dozen countries, including the United States. Just as we’re seeing the first glimpses of potential US and Iran cooperation in the nuclear realm, the opposite is happening in cyberspace. This uncomfortable reality highlights the modern age of diplomacy, wherein diplomacy in the physical world and in the virtual world is completely orthogonal.

Congressman Mike Rogers, House Committee Intelligence Chair, is one of the few policymakers who has actually noted this potential relationship between policy in the physical and virtual worlds, stating that if the nuclear negotiations fail, Iran could resume cyber activity. Unfortunately, as Operation Cleaver highlights, Iranian cyber activity targeting physical infrastructure has likely been escalating, not de-escalating, over the last two years. Operation Cleaver is perhaps the timeliest example of the Janus-faced nature of foreign policy, which has occurred for well over a decade but is not unique to Iranian-US relations. Take the recent APEC meeting, for example, where the US and China brokered a deal to counter climate change. This occurred within weeks of an FBI warning of a widespread Chinese cyber campaign targeted both at the US private sector as well as government agencies, and within days of the announcement of a September breach at the US National Weather Service. This, too, has been linked to China. Similarly, the US and Russia continued the START nuclear non-proliferation negotiations earlier this year just as cyber-attacks escalated, some of which were targeted at US federal agencies. Of course, both states actually escalated their deployed nuclear forces since this past March, but nevertheless the two countries are still on track for additional negotiations in 2015. It’s not unusual for states to pursue divergent relationships across distinct areas of foreign policy. Cooperation vacillates between the various arenas, but rarely does it take on the dueling nature we see occurring between the physical and virtual worlds.

The Director of National Intelligence, James Clapper, and many, many other leaders have described the modern era as containing an unprecedented array of diverse and dynamic threats. This brings new challenges, of course, but perhaps one of the most striking challenges remains largely unspoken. Foreign policy in the modern era has thus far differentiated relationships in the physical and virtual worlds. Will this remain a distinct, modern foreign policy challenge? With the continued trend of the private sector surfacing foreign nation-state cyber campaigns, it seems 2014 may mark the beginning of the end of dueling foreign policies. The ongoing series of revelations of alleged foreign states and their affiliates targeting the US public and private sectors (e.g. China’s PLA Unit 61398 and Axiom group, Russian association with the JP Morgan breaches, North Korea with Sony, and now Operation Cleaver) is likely indicative of the future “outing” of cyber behavior by the private sector. In the future, the US is likely to leverage disclosures made by the private sector, which in turn provides the government the luxury of concealing or revealing its own information, and can even assist negotiations across the diplomatic spectrum.

This period of disparate US policies in the physical and virtual worlds will be increasingly difficult to juggle in light of publicized revelations of cyber campaigns conducted against US federal agencies and corporations. At some point, public opinion will reach a tipping point and demand a more coordinated response and defense against cyber campaigns by foreign states. It will be increasingly difficult to maintain a two-track foreign policy as new revelations occur. That tipping point may still be in the distant future, as the US public remains largely unaware of many of these campaigns because they are not broadly publicized. In fact, many of these foreign-sponsored cyber campaigns – especially if targeted against federal agencies – remain publicized only by tech-focused media outlets. We’ll spend some time examining this particular trend in more detail in a future post.

Is This the Beginning of the End of “Duel”-track Foreign Policy?

Andrea Little Limbago

Blurred Lines: Dispelling the False Dichotomy between National & Corporate Security

$
0
0

Several US government agencies have experienced targeted cyber attacks over the last few months. Many believe China is responsible for cyber attacks on the Office of Personnel Management, the US Postal Service and National Weather Service. Russia has been linked to many recent breaches including those on the White House and State Department unclassified networks. Given the national security implications of such breaches, these attacks should have monopolized the news cycle. However, they have barely registered a small blip. Conversely, the data breaches at large companies such as Sony, Home Depot, Target and Neiman Marcus have dominated the news and have led many Americans to rank concern over hacking higher than any other criminal activity. But characterizing these events as solely private sector or public sector breaches oversimplifies the state of cyber security today. Many of the private sector intrusions are linked to Russia, China, Iran, and now even North Korea. While the Sony breach remains contested, a North Korean spokesman claimed it was part of the larger struggle against US imperialism. In fact, many of these private sector breaches have been directly linked to or are considered retaliation for various aspects of US foreign policy. Formulating a rigid line between public and private sector categorization is not only erroneous, but it also masks the reality of the complex cyber challenges the US faces.

From Unity Against a Common Threat to Disunity Against a Hydra

In the late 1980s and early 1990s, Japan was perceived as a greater threat to US security than the Soviet Union. The private sector was quite vocal during this time, providing evidence of dumping and unfair trade practices, while supporting voluntary export restraints and a series of other protectionist measures for the US domestic sector. While one can question the success of the policies (and assessment of the threat!), it is clear that a unified understanding of a common threat among private and public sectors greatly enhanced the efficiency with which the US was able to respond. It is this common understanding between the two groups that is still missing today.

Russia, China, Iran and many, many other groups have been elevating cyber-attacks on the federal government and private sector for well over a decade. China has been wielding cyber attacks against federal agencies since at least 1999, when it targeted the National Park Service and the Departments of Energy and Interior. However, this is no longer a government-to-government problem, with the increase of non-state actors as both perpetrators (e.g. Syrian Electronic Army) and victims (e.g. multi-national corporations). Each kind of attack – regardless of state or non-state actor involvement – has both national security and economic implications. For instance, Target’s profits and reputation have taken a big hit following last year’s credit card breach. Home Depot faces similar economic risk over the loss of customers following its data breach. It’s too soon to tell exactly how much financial and reputational damage the breach at Sony will incur. These private sector breaches also have natural security implications, especially when targeted at the financial sector and critical infrastructure, which is increasingly a target of cyber-attacks by foreign governments (e.g. Operation Cleaver). Despite these similarities in adversaries, there remains a stark disconnect in the portrayal and general contextualization of breaches in the private and public sectors.

Technical Similarities

These private and public sector breaches exhibit not only similar threat profiles, but also technical similarities. These attacks are indicative of the larger tactics, techniques and procedures (TTPs) of adversaries as they conduct reconnaissance and trust-building intrusions that lead to major attacks such as the Sony breach. In many cases, the initial access to the target systems was through third party contractors, both government and commercial, as well as through targeted spear phishing and watering hole attacks. In each case, the commonality is leveraging trust. From an attacker’s standpoint, every breach of trust enables more opportunity. Successful spearphishing campaigns gather enough information about their targets to properly craft the most effective message to entice a click. In the case of recent federal agency breaches, it is important to remember that adversaries conduct reconnaissance of networks prior to escalating to major attacks, and they often begin with lower value targets before escalating to higher value targets. Every seemingly harmless intrusion must be viewed as a first step toward a larger attack, and not an end in and of itself. If an attacker compromises a government office, what information does that office have that could be used to further compromise both government and commercial companies? Something seemingly innocuous, like email addresses of contractors, could be used to launch a new targeted operation. At some point, people make mistakes, and attackers thrive on mistakes. They have the benefit of time and information to make the best decision about how to increase their trust until critical systems and information have been infiltrated. In short, the TTPs – especially the exploitation of trust to conduct ever-greater intrusions – are very similar in private and public sector breaches.

Could More Convergence Lead to a Unified Response?

Last week, the Senate Banking Committee discussed cybersecurity in the financial sector, including the Cybersecurity Information Sharing Act. Clearly, this is an important step. However, absent from this discussion were some of the major stakeholders in the financial industry, further perpetuating the divide between the public and private sectors. Only when there is a common understanding of the threats and challenges of cyberspace can the two sides come together and provide more holistic and effective responses. The cyber attacks on federal agencies and the private sector must finally be elevated within popular discourse and be understood for what they are – reconnaissance and trust-building intrusions, increasingly by the same foreign adversaries. As news of another cyber attack on a federal agency or private sector occurs, it would be much more helpful if it was placed in the larger context as a targeted, national security breach. A unified response by the US first requires a unified understanding of the threat. Absent a coherent and integrated understanding of the threat, attacks against banks, corporations and federal agencies will only continue to grow.

Blurred Lines: Dispelling the False Dichotomy between National & Corporate Security

Andrea Little Limbago & Cody Pierce

Understanding Crawl Data at Scale (Part 1)

$
0
0

A couple of years ago, in an effort to better understand technology trends, we initiated a project to identify typical web site characteristics for various geographic regions. We wanted to build a simple query-able interface that would allow our analysts to interact with crawl data and identify nonobvious trends in technology and web design choices. At first this may sound like a pretty typical analysis problem, but we have faced numerous challenges and gained some pretty interesting insights over the years.

Aside from the many challenges present in crawling the Internet and processing that data, at the end of the day, we end up with hundreds of millions of records, each with hundreds of features. Identifying “normal trends” over such a large feature set can be a daunting task. Traditional statistical methods really break down at this point. These statistical methods work well for one or two variables but are rendered pretty useless once you hit more than 10 variables. This is why we have chosen to use cluster analysis in our approach to the problem.

Machine learning algorithms, the Swiss army knife of a data scientist’s toolbox, break down into three classifications: supervised learningunsupervised learning, and reinforcement learning. Although mixed approaches are common, each of the three lends itself to different tasks. Supervised learning is great for classification problems where you have a lot of labeled training data and you want to identify appropriate labels for new data points. Unsupervised techniques help to determine the shape of your data, categorizing data points into groups by mathematical similarity. Reinforcement learning includes a set of behavioral models for agent-based decision-making in environments where the rewards (and penalties) are only given out on occasion (like candy!). Cluster analysis fits well within the realm of unsupervised learning but can take advantage of supervised learning (making it semi-supervised learning) in a lot of scenarios, too.

So what is cluster analysis and why do we care? Consider web sites and features of those sites. Some sites will be large, others small. Some will have lots of images; others will have lots of words. Some will have lots of outbound links, and others will have lots of internal links. Some web sites will use Angular; others will prefer React. If you look at each feature individually, you may find that the average web site has 11 pages, 4 images and 347 words. But what does that get you? Not a whole lot. Instead, let’s sit back and think about why some sites may have more images than others or choose one JavaScript library over another. Each webpage was built for a purpose, be it to disseminate news, create a community forum, or blog about food. The goals of the web site designer will often guide his or her design decisions. Cluster analysis applies #math to a wide range of features and attempts to cluster websites into groups that reflect similar design decisions.

Once you have your groups, generated by #math, you’ve just made your life a whole lot simpler. A few minutes (or hours) ago you had potentially thousands or millions of items to compare across hundreds of fields. Now you’ve got tens of groups that you can compare in aggregate. Additionally, you now know what makes each group a group and how it distinguishes itself from one or more other groups. Instead of looking at each website or field individually, now you’re looking at everything holistically. Your job just got a whole lot easier!

Cluster analysis gives you some additional bonus wins. Now that you have normal groups of websites, you can identify outliers within the set - those that are substantially dissimilar from the bulk of their assigned group. You can also use these clusters as labels in a classifier and determine in which group of sites a new one fits best.

In coming posts, we will go into more detail about how we cluster and visualize web crawl data. Stay tuned!

Understanding Crawl Data at Scale (Part 1)

John Munro
Viewing all 698 articles
Browse latest View live