This is Brian Chapman and Ateeq Sharfuddin, and in this blog we discuss a project we are open sourcing: An In-memory Embedding of CPython. We provide a brief overview of this research and also share our results with the community. A paper [1] on this research was accepted in the USENIX Workshop on Offensive Technologies (WOOT 2021), which was co-located with IEEE Security and Privacy Workshops this year.
At SCYTHE, our goal is to assist security researchers and enterprise Red Teams. Enterprise Red Teams are responsible for emulating numerous classes of threat-actors. In many cases, they want to repurpose existing open source scripts and tools instead of writing new internal tools from scratch. Many open source security research scripts are written in Python given its simplicity and popularity.
Let's say you "land" in a process on a device, and you want to run a Python script on this device. If this device does not have Python pre-installed you would have to transmit a copy of Python and the script to the device and save these somewhere on a disk. This is noisy, and decent enterprise security products will detect these unauthorized files. In this case, the device will be quarantined and taken offline in minutes.
We wanted to investigate if we could develop an embedding of CPython that ran entirely from memory to help our customers deploy Python scripts that emulated threat-actor capabilities. This improves the user experience as customers would not need to write native modules in C, for example.
Our research investigated whether we could produce an embedding of CPython that could run entirely from memory (without touching the disk), execute a Python script that is also only in memory on this embedding, and identify what limitations such an embedding will have.
Our embedding of CPython involves generating a special CPython core shared library (cpython38.dll, in this case), freezing our custom zip module loader in it, and bundling all the stock CPython modules into ZIP files and embedding these into this shared library.
We reuse the in-memory module loader we shared previously [2] to load this CPython core shared library. We call PyConfig_InitIsolatedConfig to retrieve a configuration structure that ignores environment variables, global configuration variables, command line arguments, and user site directory. We then call Py_InitializeFromConfig. Our frozen custom zip module loader is loaded and it registers itself as the third module loader, after Builtin and Frozen. Modules such as encodings, codecs, abc, etc., which are loaded from files on disk, will now be loaded by this custom zip module loader instead.
The custom zip module loader (cba_zipimport) exposes an install_cba_metafinder(package_name, package_zip_bytes). Given a package name and a virtual address containing the bytes of a zip file containing python packages, cba_zipimport finds modules in this zip file and loads them as needed. Packages that are dependencies and not available in a stock CPython installation can be placed inside this zip file alongside your package. The custom zip module loader can also load Python C Extensions.
There are two important considerations:
SCYTHE modules can be written in Python. Since 2020 SCYTHE offers a Python runtime [3] environment on the client, which runs entirely in memory. A zip file encapsulating the module and its dependencies can be deployed onto the runtime and loaded with the custom zip module loader invoking install_cba_metafinder. SCYTHE's Python runtime was built on top of this CPython embedding research.
Source code is available in GitHub under:
https://github.com/scythe-io/in-memory-cpython
Harness with precompiled examples is available at https://doi.org/10.5281/zenodo.4638251
We highlighted our research on embedding CPython entirely in memory, and its support in SCYTHE. We cover the steps necessary, and also share the source code and the test harness. Our WOOT 2021 paper delves deeper into this research, and we encourage you to review it.
SCYTHE provides an advanced attack emulation platform for the enterprise and cybersecurity consulting market. The SCYTHE platform enables Red, Blue, and Purple teams to build and emulate real-world adversarial campaigns in a matter of minutes. Customers are in turn enabled to validate the risk posture and exposure of their business and employees and the performance of enterprise security teams and existing security solutions. Based in Arlington, VA, the company is privately held and is funded by Gula Tech Adventures, Paladin Capital, Evolution Equity, and private industry investors.