Social media sites are frequently used for stealthy malware command and control (C2). Because many hosts on most networks communicate with popular social media sites regularly, it is very easy for a C2 channel hiding in this traffic to appear normal. Further, there are often rich APIs for communicating via social media sites allowing a malware author to easily and flexibly use the services for malicious purposes. Blocking the HTTP and HTTPS connections to these sites generally is infeasible since it would likely cause a revolt amongst the workforce. Security researchers have discovered multiple malware campaigns that have used social media for C2 capabilities. For example, Twitter has been used to direct downloaders to websites for installing other malware or for controlling botnets. Further, the information posted to a social media site may be obfuscated or Base64 encoded. Even if the malicious social media content is discovered, it would require an astute defender to recognize the intent.
Separately, a few malware strains (e.g., ZeusVM banking trojan) have used image steganography to very effectively disguise information required to operate malware. The stego image represents an effective decoy, with information subtly encoded within something that is seemingly innocuous. This can make malicious intent difficult to uncover. In the case of ZeusVM, an encoded image contained a list of financial institutions that the malware was targeting. However, even a close look at the image would not reveal the presence of a payload. The image payloads were discovered only because the stego images were retrieved on a server containing other malicious files.
We combine these two methods for hiding in plain sight and demonstrate "Instegogram", wherein we hide C2 messages in digital images posted to the social media site Instagram. We presented this research earlier this month at Defcon’s Crypto Village, as part of our larger research efforts that leverage our knowledge of offensive techniques to build more robust defenses. Since our research aims to help inform and strengthen defenses, we conclude with a discussion of some simple approaches for preventing steganographic C2 channels on social media sites, such as bit jamming methods and analysis of user account behaviors.
A Brief History of Steganography
Steganography is the art of hiding information in plain sight and has been used by spies for centuries to conceal information within other text or data. Recently, digital image steganography has been used to obfuscate configuration information for malware. The following timeline contains a brief history of steganography used in actual malware case studies:
JPEG-Robust Image Steganography Techniques & Instagram
Digital image steganography involves altering bits within an image to conceal a message payload. Let’s say Alice encodes a message in some of the bits of a cover image and sends the stego image (with message payload) to Bob, who decodes the message using a private key that he shares with Alice. If Eve intercepts the image in transit, she is oblivious to the fact that the stego image contains any message at all since the image appears to be totally legitimate both digitally and to the human eye.
A simple steganography approach demonstrates how this is possible. Alice wants to encode the bits 0100 into an image whose first 4 pixels are 0x81, 0x80, 0x7f, 0x7e. Alice and Bob agree (private key) that the message will be encoded in row-major order in the image using the least significant bit (LSB) to determine the message bit. Since the LSBs of the first 4 pixels are 1010, Alice must flip the LSBs of the first three pixels so that the LSBs are equal to the desired message bits. Since modifying the LSBs of a few pixels changes the pixel intensity by 1/255, the stego image appears identical to the original cover image.
Additionally, since Instagram re-encodes uploaded images in JPEG/JFIF form, the steganography encoding for Instegogram must be robust against JPEG compression artifacts.
JPEG images are compressed by quantizing coefficients of a 2D block discrete cosine transform (DCT). The DCT transformation is applied after a color transformation from RGB to YCbCr color space which consists of a luminance (Y) channel, and two chrominance channels, blue (Cb) and red (Cr). The DCT transformation on each channel has the effect of condensing most of the image block’s information into a few upper-left (low frequency) coefficients in the DCT domain. The lossy data compression step in JPEG comes in by applying an element-wise quantization to each coefficient in the DCT coefficient image, with the amount of quantization per coefficient determined by a quantization table specified in the JPEG file. The resulting quantized coefficients (integers) are then compactly encoded to disk.
One method of encoding a message in a JPEG image is to encode the message bits in the quantized DCT coefficients rather than the raw image pixels, since there are no subsequent lossy steps. But for use with Instagram, an additional step is required. Upon upload, Instagram standardizes images by resizing and re-encoding them using the JPEG file format. This presents two pitfalls in which the message can be clobbered in the quantized DCT coefficients: (1) if Alice's stego image is resized, the raw DCT coefficients can change, since the 8x8 block in the original image may map to a different rectangle in the resized image; (2) if Alice's stego image is recompressed using a different quantization table, this so-called double-compression can change the LSBs that may contain the secret message.
To prevent the effects of resizing, all cover images can be resized to a size and aspect ratio that Instagram will accept without resizing. To prevent the double-compression problem, the quantization table can be extracted from an existing Instagram image. During the course of this research and monitoring the tables, it appears that this quantization table is used across all Instagram images. Messages are then encoded in images that use those same quantization tables.
A number of other standard practices can be used to provide additional robustness against image manipulations that might occur upon upload to Instagram. First, pre-encoding the message payload using error correcting coding allows one to retrieve the message even when some bits become corrupted. For small messages, the encoded message can be repeated in the image bits until all image bits have been used. A message storing format that includes a header to specify the message length allows the receiver to determine when the message ends and a duplicate copy begins. Finally, for a measure of secrecy, simple methods for generating a permutation of pixel locations (for example, a simple linear congruential generator with shared seeds between Alice and Bob) can communicate to Bob the order in which the message bits are arranged in the image.
Instegogram Overview
Digital image steganography can conceal any message. Stego is appealing and compelling to malware authors for malware C2 because of its inherent stealthiness. Hosting C2 on social media sites is also appealing for the same reason - it is stealthy because it is hard to filter out or identify maliciousness. Instegogram combines these two capabilities - digital image steganography and the use of social media for C2 - to mirror the utilization of social networks for C2 that has increased for years, while exploring the feasibility of using stego on a particular site - Instagram.
The delivery mechanism for our proof of concept (POC) malware was chosen based on today’s most commonly used infiltration methods - a successful spearphish that causes the user to open a document and run a malicious macro. The remote access trojan (RAT) is configured to communicate with specific Instagram accounts that we control, and on which we’ll POST request images containing messages encoded with our steganographic scheme. The malware includes a steganographic decoder that extracts a payload from each downloaded image, allowing arbitrary command execution on the remote system.
The malicious app continuously checks the Instagram account feed for the next command image, decodes the shell command, and executes it to trigger whatever nefarious behavior is requested in the command. Results are embedded steganographically in another image which is posted to the same account.
As with any steganographic scheme, there is a limited number of characters which can be sent through the channel. In this simple POC, 40 characters could be reliably transmitted in the JPEG stego images. The capacity can be increased using more robust coding techniques as discussed previously.
In short, once the remote system is compromised, encoded images can be posted from the command machine using Instagram’s API. The remote system will download the image, decode it, execute the encoded commands, encode the results in another image, and post back to Instagram. This process can be repeated at will. This attack flow is depicted in the graphic below.
Our Instegogram POC was built on Mac OSX, specifically as an uncertified MacOs app developed in obj-c. To execute our POC, we needed to bypass Apple’s built in Gatekeeper protection, which enforces code signing requirements on downloaded applications and thereby makes it more difficult for adversaries to launch a malicious application on an endpoint. We discovered a Gatekeeper bypass, disclosed it to Apple, and are currently working with Apple on a fix.
Instagram API and Challenges
Instagram's API is only partially public. They encourage third-party apps to use likes, subscriptions, requests for images, and similar actions. But, the specifics of calls for uploads are only provided via iPhone hooks and Android Intents. The webapp is a pared down version of the mobile app and doesn't allow uploads. This is likely to deter spam, bots, and maybe malware C2 architectures.
To work as an effective C2 system, we need to be able to integrate our system with Instagram's to automate uploads, downloads, and comments. Creating an image with an embedded message, transferring it to a cell phone, and then uploading it via the official Instagram app is too cumbersome to be useful. We needed to reverse the upload API.
With the help of some open source efforts and a little web-proxy work, we identified the required fields and formats for all the necessary API calls. Charles Proxy was used to sniff out the payloads of API requests coming from the phone app itself. After the general structure was determined, we used a fake Android user-agent, crafted the body of the request, and were in business. The code demonstrating this is in the "api_access" section of Endgame’s github repo.
It is worth noting that the steganographic encode/decode capabilities and capability to programmatically interact with the Instagram API are not specific to a malware use case. They are tools that can be used for other purposes such as secure hidden messaging.
Detecting & Preventing Instegogram
As previously mentioned, this research is motivated by the requirement to strengthen defenses. Therefore, after successfully implementing the malware, we identified the most robust means to detect and prevent Instegogram. There is a range of measures for detecting and preventing C2 via image steganography as well as additional non-steganography measures that can be implemented.
First, a complementary field of research to steganography is steganalysis, in which a defender aims to: (a) predict whether an image may contain a steganographic payload, and if detected, (b) attempt to recover the message from the stego image. However, using statistical techniques to detect whether an image is corrupted may be infeasible for very small message payloads relative to the image size. Payload recovery is a difficult cryptanalysis problem in its own right, but is of interest only for forensic analysis of a detected C2 channel. Given the challenges in successfully implementing these steganalysis techniques through per-image or per-user defensive strategies, we don’t recommend a steganalysis approach.
In contrast, a much simpler set of measures can be implemented by the social media site owner via site-wide policies that effectively jam potential stego traffic. For example, one can create an effectively noisy stego channel for steganographic C2 with minimal visual distortion through one or more of the following methods:
(1) Regularly and pseudorandomly change the site-wide JPEG quantization table used for re-encoding images. This can induce double-compression problems for simple stego routines like our POC that rely on a specific quantization table. Introducing a quantization table mismatch can reduce the channel capacity for communication, and force sophisticated attackers to resort to more robust and advanced information hiding methods.
(2) Implement other minor and visually idempotent but digitally altering transformations on the image, such as cropping the image by a few boundary pixels, randomly shifting the image by a few pixels, and/or randomly flipping the LSBs of quantized bits. While this does not qualitatively alter the image, it creates a challenging environment to transmit information via stego C2.
(3) Institute a policy that requires mandatory application of an Instagram filter, which represents a nonlinear transformation in the image domain. While this qualitatively changes the image, it represents a “visually appealing” change, whilst also providing an effective attack on possible stego C2.
In addition to the stego-focused mitigations, the social media provider can attempt to detect accounts which may be hosting C2. For example, the provider could look for account access anomalies such as a single account being used in rapid succession from geographically distant locations. Anomalies may also be visible in account creation or other factors.
Non-steganography based security policies can also be implemented by a local admin in an attempt to defend against this attack:
(1) Limit access to third-party websites: Like any other remote access trojan or backdoor, you want a policy that limits connections to the command and control server. The simplest technical solution would to limit third-party websites entirely if it's not related to the mission. As mentioned earlier, although it is a simple solution, it may be infeasible due to workforce demands.
(2) Outliers in network behavior: Network monitoring can be configured to detect anomalous network behavior, such as responses from multiple infected machines that utilize a single Instagram account for C2.
(3) Android string detection: The instagram API we provided utilizes an Android network configuration. If your network is primarily Windows endpoints, the user agent string containing Android strings would be an obvious anomaly.
(4) Disable VBA Macros: If not needed, Microsoft Windows Office VBA Macros should be disabled to avoid spearphishing campaigns utilizing the infiltration technique adopted by the POC.
Conclusion
We accomplished our goal of creating a proof of concept malware that utilizes a C2 architecture for nearly untraceable communications via social media and encoded images. We demonstrated a small message channel with the simplest of steganography algorithms. By combining digital image steganography with a prominent trend in malware research - social media for C2 - our research reflects the ongoing vulnerabilities of social media, as well as the novel and creative means of exploiting these vulnerabilities against which modern defenses must be hardened.
There are numerous defensive measures that can be taken to detect and prevent malware similar to Instegogram. As noted, applying Instagram filters or identifying preprocessed images based on quantization tables could be a powerful prevention strategy. Account anomalies can also point at potential issues. However, this is really only possible on the provider side, not the local defender side.
This type of attack is difficult to defend against within a given enterprise. Blocking social media services will be infeasible in many organizations. Local defenders can look at policy-based hardening and more general anomaly detection to defend against attacks like our proof of concept. These steps will help defend against an even broader set of malware.
Malware Timeline Sources
- November 21, 2013 - ZeusVM retrospectively found By Xylibox - http://www.xylibox.com/2014/04/zeusvm-and-steganography.html
- December 11, 2013 (Actual VirusTotal Date) - April 2014 - First Reported in April 2014 Lurk Downloader by Dell SecureWorks Blog: https://www.secureworks.com/research/malware-analysis-of-the-lurk-downloader
- January 31, 2014 - First reported ZeusVM by Jerome Segura, https://twitter.com/jeromesegura/status/423180548236771328 (Blog: https://blog.malwarebytes.org/threat-analysis/2014/02/hiding-in-plain-sight-a-story-about-a-sneaky-banking-trojan/) and The discovery of stego was discovered by French researcher Xylitol: https://twitter.com/xylit0l
- Late 2014 - Gozi Neverquest/Vawtrak - Reported by Dell SecureWorks Blog: https://www.secureworks.com/research/stegoloader-a-stealthy-information-stealer
- June 15 2015 - First reported Stegoloader by Dell SecureWorks Blog: https://www.secureworks.com/research/stegoloader-a-stealthy-information-stealer
- June 26 2015 - Evidence of KINS copying ZeusVm reported by Xylit0l and unixfreaxjp Blog: http://blog.malwaremustdie.org/2015/07/mmd-0036-2015-kins-or-zeusvm-v2000.html
- November 2015 - AdGholas discovered by Proofpoint: https://www.proofpoint.com/us/threat-insight/post/massive-adgholas-malvertising-campaigns-use-steganography-and-file-whitelisting-to-hide-in-plain-sight