New Open Source Repositories for Data Scientists in Infosec

Over the past few years, we have published numerous posts on the benefits and challenges of machine learning in infosec in an effort to help fellow practitioners and customers separate hype from the reality. We also believe contributing to the larger open source community is an essential component of this outreach. In conjunction with Black Hat, DefCon and BSidesLV, we have released two GitHub repositories, each a playground for data scientists in information security.

gym-malware: An OpenAI Gym for Malware Manipulation

First, last week our research team released gym-malware, an open source OpenAI gym for manipulating Windows PE binaries to evade next-gen AV models. The “gym” allows data scientists in information security to simulate realistic black-box evasion attacks against their own machine learning model by training a reinforcement learning agent to compete against it. In contrast to other approaches for attacking machine learning models, this approach is agnostic to the model architecture under attack and only requires API access to the model. The reinforcement learning agent can probe the model to retrieve a malicious or benign label for any query. By learning through tens of thousands of competitive rounds, the reinforcement agent can begin to produce with modest success functional malware that evades the model under attack.

Data scientists may use and modify this framework to answer questions such as:

How sensitive is my model to evasion attacks for ransomware (or other category)?
What mutations tend to evade my model the most?
How can I create a killer reinforcement learning agent to bypass my model?

The repository contains a toy machine learning malware model and some preliminary agents (but bring your own malware!) that data scientists can use as a starting point to improve and optimize.

You Are Special, But Your Model Probably Isn’t

On a lighter note, at BSidesLV I presented “Your model isn’t that special: zero to malware model in not much code, and where the real work lies”. A GitHub repo accompanies this talk, and contains a series of Jupyter notebooks that demonstrate building deep neural networks for Windows PE malware classification. The playground includes code (bring your own data!) for creating:

A multilayer perceptron using hand crafted features (feature extraction code included);
An end-to-end convolutional deep learning network for malware detection;
A slightly silly re-work of ResNet for malware that I’ve named MalwaResNet for even deeper end-to-end convolutional deep learning for malware detection.

The talk and the notebooks aim to demonstrate that one cannot always simply port sophisticated deep learning models from computer vision domains and expect them to work immediately for malware classification. Architectures developed to identify cats in images may not be optimally designed for finding malicious content in raw bytes. Deep learning does require work, and training them can be a challenge. These notebooks point practitioners in the right direction, but also highlight some of the shortcomings through the toy demonstration. For example, in the notebooks intended for consumption on a modest computer, models are trained on far too little data, for too few epochs, with non-optimized optimization parameters. In fact, the simple multilayer perceptron with hand-crafted features and careful attention to the data (bring your own!) can actually produce a decent Windows PE malware machine learning model. Each of the deep learning model architectures in the repository can be instructive to those who are interested in getting started with feature-based and end-to-end deep learning models in infosec.

A Deeper Look at Machine Learning in Infosec

Machine learning has become an important tool in security for detecting and preventing unknown threats, in large part because of its ability to generalize. However, all machine learning models have blind spots that present an attack surface for motivated and sophisticated adversaries. These open source packages help demystify machine learning for malware, and allow others in security to understand, attack, and harden their own machine learning models. Especially in security, a rising tide lifts all boats. At Endgame, we continuously work to improve our models for malware and other threat detection and prevention, and share our insights and lessons learned to support others in the community.

New Open Source Repositories for Data Scientists in Infosec

gym-malware: An OpenAI Gym for Malware Manipulation

You Are Special, But Your Model Probably Isn’t

A Deeper Look at Machine Learning in Infosec

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112