{ End Bracket }: Phoenix Rising

Article
10/18/2019

{ End Bracket }

Phoenix Rising

Guy Eddon

Phoenix is neither a compiler nor a JITer, but will eventually transform both. It is the codename for an internal Microsoft project that provides an extensible framework for the analysis, optimization, and modification of code during compilation. Think of Phoenix as a tool that allows you to peek into the black box at the heart of compilers in order to see and modify their internals. Current uses of Phoenix inside Microsoft include code obfuscation, visualization, and custom assembly rewriting. Phoenix is the result of a joint endeavor by Microsoft Research and the Developer Division, and is currently available in pre-beta form for noncommercial use to qualified researchers affiliated with recognized institutions (see www.research.microsoft.com/phoenix).

The Phoenix toolkit provides a prototype that will, in time, replace the Microsoft standard compiler back end (c2.exe), as well as the Microsoft® .NET Framework just-in-time (JIT) compiler (mscorpjt.dll) and pre-JITer (ngen.exe). The Phoenix c2 back end can generate code for two distinct target instruction sets: Intel machine code (both 32- and 64-bit) and Microsoft intermediate language (IL). Furthermore, using these Phoenix-enabled compilers, it is easy to build custom compiler plug-ins simply by extending the Phx.PlugIn and Phx.Phase classes and then overriding the Phase.Execute method with your own code.

Indeed, plug-ins have complete access to all the compiler's internal data structures. For example, the following code fragment implements a compiler phase that displays the name of each method processed. (A more sophisticated compiler plug-in might implement an advanced optimization.)

public class DisplayNamesPhase : Phx.Phase {
   protected override void Execute(Phx.Unit unit) {
      Phx.FuncUnit func = unit as Phx.FuncUnit;
      Console.WriteLine("Function name: {0}", func.NameString);
   }
}

Thus, Phoenix makes it straightforward to implement compiler extensions without writing a new compiler from scratch—in effect, it makes it possible to simply utilize the engineering in Phoenix in order to build extensions that focus solely on the problem you're trying to solve.

Phoenix uses an internal intermediate representation (IR) as the basis for code analysis and transformation. For a program to be rewritten, it first needs to be converted to the Phoenix IR by one of the supplied readers (native code, IL, and abstract syntax tree readers are provided with Phoenix, and others can be written for nonstandard code formats). Once the IR has been instrumented, it can be written out to an executable. From this perspective, the back-end compiler provided by Phoenix is really just an application written on top of the Phoenix framework. It uses Phoenix readers to convert input code to IR, and invokes any requested Phoenix plug-ins to transform the code; finally, it writes out the modified code in the desired format (native or IL). Phoenix programs can therefore be written either as compiler plug-ins or as meta-programming tools run post-compilation.

For a real-life example of Phoenix's utility, suppose you want to analyze what fraction of your code users actually execute on a daily basis by building an instrumented version of your program that collects and records such data. Traditionally, you'd need to write a program that reads in a Portable Executable (PE) binary file, figures out what's code and what's data, identifies each function, basic block, and instruction, inserts the instrumentation code, recomputes all the addresses to adjust for the added code, and then writes out a valid PE binary. With Phoenix, by contrast, you just need to write a program that adds the instrumentation at the appropriate places in the binary; all the grunt work of reading and writing the PE file, identifying elements in the binary code, and recomputing addresses is done for you. Due to its extensible nature, Phoenix has been attracting the interest of computer scientists involved in programming language research.

I've recently been using Phoenix to integrate a transaction processing model with object-oriented programming languages like C# and Java. As in databases, the goal is to make a block of statements succeed or fail as an atomic unit. If code in an atomic block throws an exception, the transaction is rolled back and all mutations must be undone. Before Phoenix, I would have had to write a new compiler or modify an existing one in order to test these ideas. With a framework like Phoenix, however, I can simply implement this code transformation as a compiler plug-in and rely on a commercial-quality compiler.

Guy Eddon is a PhD candidate in Computer Science at Brown University where his research focuses on autonomic computing. Prior to graduate school, he worked for over a decade in IT and wrote about component software for Microsoft Systems Journal and Microsoft Press.

Additional resources