An In-memory Embedding of CPython with SCYTHE

This is Brian Chapman and Ateeq Sharfuddin, and in this blog we discuss a project we are open sourcing: An In-memory Embedding of CPython. We provide a brief overview of this research and also share our results with the community. A paper [1] on this research was accepted in the USENIX Workshop on Offensive Technologies (WOOT 2021), which was co-located with IEEE Security and Privacy Workshops this year.

Introduction

At SCYTHE, our goal is to assist security researchers and enterprise Red Teams. Enterprise Red Teams are responsible for emulating numerous classes of threat-actors. In many cases, they want to repurpose existing open source scripts and tools instead of writing new internal tools from scratch. Many open source security research scripts are written in Python given its simplicity and popularity.

Let's say you "land" in a process on a device, and you want to run a Python script on this device. If this device does not have Python pre-installed you would have to transmit a copy of Python and the script to the device and save these somewhere on a disk. This is noisy, and decent enterprise security products will detect these unauthorized files. In this case, the device will be quarantined and taken offline in minutes.

We wanted to investigate if we could develop an embedding of CPython that ran entirely from memory to help our customers deploy Python scripts that emulated threat-actor capabilities. This improves the user experience as customers would not need to write native modules in C, for example.

Our Research

Our research investigated whether we could produce an embedding of CPython that could run entirely from memory (without touching the disk), execute a Python script that is also only in memory on this embedding, and identify what limitations such an embedding will have.

Our embedding of CPython involves generating a special CPython core shared library (cpython38.dll, in this case), freezing our custom zip module loader in it, and bundling all the stock CPython modules into ZIP files and embedding these into this shared library.

An In-memory embedding of CPython

We reuse the in-memory module loader we shared previously [2] to load this CPython core shared library. We call PyConfig_InitIsolatedConfig to retrieve a configuration structure that ignores environment variables, global configuration variables, command line arguments, and user site directory. We then call Py_InitializeFromConfig. Our frozen custom zip module loader is loaded and it registers itself as the third module loader, after Builtin and Frozen. Modules such as encodings, codecs, abc, etc., which are loaded from files on disk, will now be loaded by this custom zip module loader instead.

Executing your Python script from memory

The custom zip module loader (cba_zipimport) exposes an install_cba_metafinder(package_name, package_zip_bytes). Given a package name and a virtual address containing the bytes of a zip file containing python packages, cba_zipimport finds modules in this zip file and loads them as needed. Packages that are dependencies and not available in a stock CPython installation can be placed inside this zip file alongside your package. The custom zip module loader can also load Python C Extensions.

Special cases

There are two important considerations:

  1. Because the shared library was not loaded with the Windows Loader, calls to GetModuleHandle or GetModuleHandleEx will not return the value the code is expecting. Instead, a handle to where the DLL is loaded in memory is stored in sys.dllhandle. Some C Extensions require this address, and in these cases, code needs to be updated to handle the case where the CPython shared library does not reside on disk.
  2. Call PyGILState_Ensure to acquire the global interpreter lock (GIL) and store the thread state. Then run the Python code. Once done, call PyGILState_Release to reset the thread state and release the GIL.

Application in SCYTHE

SCYTHE modules can be written in Python. Since 2020 SCYTHE offers a Python runtime [3] environment on the client, which runs entirely in memory. A zip file encapsulating the module and its dependencies can be deployed onto the runtime and loaded with the custom zip module loader invoking install_cba_metafinder. SCYTHE's Python runtime was built on top of this CPython embedding research. 


Source code

Source code is available in GitHub under:

https://github.com/scythe-io/in-memory-cpython

Harness with precompiled examples is available at https://doi.org/10.5281/zenodo.4638251

Conclusion

We highlighted our research on embedding CPython entirely in memory, and its support in SCYTHE. We cover the steps necessary, and also share the source code and the test harness. Our WOOT 2021 paper delves deeper into this research, and we encourage you to review it.

References

  1. Ateeq Sharfuddin, Brian Chapman, Chris Balles. An In-memory Embedding of CPython for Offensive Use, in 2021 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, US, 2021 pp. 352-360.
  2. Jonathan Lim and Ateeq Sharfuddin. SCYTHE, Inc. Loading Capabilities from Memory: Open Sourcing SCYTHE's Windows C In-memory Module Loader, 2021. https://www.scythe.io/library/loading-capabilities-from-memory-open-sourcing-scythes-windows-c-in-memory-module-loader
  3. Ateeq Sharfuddin. SCYTHE, Inc. Under the Hood: SCYTHE Architectural Overview (Part 1), 2020. https://www.scythe.io/library/under-the-hood-scythe-architectural-overview-part-1

About SCYTHE

SCYTHE provides an advanced attack emulation platform for the enterprise and cybersecurity consulting market. The SCYTHE platform enables Red, Blue, and Purple teams to build and emulate real-world adversarial campaigns in a matter of minutes. Customers are in turn enabled to validate the risk posture and exposure of their business and employees and the performance of enterprise security teams and existing security solutions. Based in Arlington, VA, the company is privately held and is funded by Gula Tech Adventures, Paladin Capital, Evolution Equity, and private industry investors.