Navigation



ByteCoat - Protect your Python code

Using Python

The Python programming language is a powerful and flexible tool that can be used in a extensive number of problem domains. It is a dynamically typed and interpreted language, allowing rapid application development and flexible software design. Its comprehensive standard library offers ready-to-use solutions to a wide variety of applications.

The language uses the concept of modules and packages to organize application source code, its own standard library and third-party libraries. A file suffixed with .py represents a source code module. The Python interpreter compiles such modules to an intermediate byte code format. The results are stored in files suffixed with .pyc for later use. Python packages are just plain directories containing an __init__.py file that is executed in case the package is imported.

As all interpreted, dynamically typed languages the execution time of Python modules is slow as compared to languages like C or C++. Therefore, Python offers the possibility to write user defined C code that can be compiled as a shared library and transparently imported by the interpreter as a normal module.

Security Issues

Python flow
The standard Python execution flow. (Click on picture to enlarge)

There exist several ways to distribute proprietary Python software:

  • The source code modules are distributed directly as source files, uncovering all details of the software to the user. This is exactly the motivation of open source software, but highly undesirable for most business models.
  • Only the compiled byte code modules are distributed. This seems like a safe way to prevent users to see proprietary source code. However, Python byte code allows near 100% extraction of source code from compiled byte code. There exist mature commercial tools to achieve that, such as decompyle (www.crazy-compilers.com).
  • Especially for the Microsoft Windows platform, it is desirable to distribute no individual modules or packages that require a compliant Python interpreter on the target platform, but one executable file that can be clicked on and run by the user without installing third-party software. This is achieved by projects such as freeze, py2exe and PyInstaller. While this is a convenient way to dramatically decrease the necessary installation steps, its does not solve the above mentioned security problem as all those solutions pack the compiled byte code together with a Python interpreter into the executable file. Therefore, it is easily possible to extract the source code from the embedded byte code.
  • A fourth solution is translating Python to a second language that does not allow source code extraction. Several open source projects are working on that (PyPy, Pyrex, etc.). However, their main objective is not security, but execution speed. To achieve that, they are all restricted to a subset of Python, which is representable in a statically typed language like C or C++. Those restrictions, constrain such approaches to individual performance-critical modules, and do not allow distributing complete Software packages as compiled C code.

ByteCoat Solution

Python execution flow used by Python ByteCoat
Python execution flow used by Python ByteCoat (Click on picture to enlarge)

To help distribute Python modules containing proprietary software, we developed a method to automatically convert byte code modules to corresponding C extension modules that can transparently replace the original module.

Using our methodology results in exactly one extension module for each Python source module.

Advantages of ByteCoat:

  • Extracting Python source code is nearly as hard as decompiling machine code to C.
  • 99% compliancy - very few restrictions apply to what can be done with a compiled module.
  • In most cases the generated module is even faster then the original code, although much slower than optimized C code.

Disadvantages of ByteCoat:

  • Modules are compiled only for one given platform and Python version.
  • Few restrictions apply:
    • The Python byte code disassembler (module dis) does not work. This is intended and prohibits the access to the byte code.
    • Python ByteCoat uses replacements for some of Python's internal types.

Find answers to frequently asked questions.