eing a voyeur,
I love peeking in on technologies to see what they do when they think no one is
looking. As a friend of mine likes to say, your effectiveness as a programmer is
directly proportional to how well you understand the layers of technology underneath
you. ATL programmers never made it very far without understanding CLSIDs and QueryInterface.
MFC programmers before them eventually had to face the music and learn about WNDPROCs.
And that's not to mention what it takes to work effectively with device drivers.
When programming in these environments you have
to remember precise incantations or things tend to go haywire. This is because Windows®
(like all operating systems) just loads up an EXE module and jumps in headfirstit
doesn't really care what code you try to run, or if it's even valid code at all.
It just jumps in, counting on the compiler to do the hard work of generating code
that will run reasonably on the process.
.NET seems so much simpler just write
assemblies filled with IL and metadata and let the runtime worry about the rest
while you go drink piña coladas in your newfound spare time. This apparent simplicity
is actually a well-choreographed illusion that would put the Wizard of Oz to shame,
maintained by the common language runtime (CLR) madly dashing about behind the curtain
grinding out and executing code while juggling memory and security management at
the same time.
This extra work performed by the CLR is what
makes it a managed code environment. Rather than simply loading code into a process
and then mostly ignoring it, the CLR treats code as a fluid and dynamic entity.
Code is loaded from the intermediate format, as shown in the following code: .method public hidebysig static void Main() cil managed { .entrypoint Idstr
"Hello, World!" call void [mscorlib]System.Console::WriteLine(string)
ret }
The native assembly code shown in Figure 1 isn't generated until it's actually needed.
The implication is that the runtime has a much more intimate relationship with your
code than you are probably used to. Understanding how your code is intertwined with
that of the CLR will ultimately help you write better programs.
The Common Language Infrastructure
The Holy Writ of managed code execution is laid
out in the set of specifications known as the Common Language Infrastructure, or
CLI (see http://msdn.microsoft.com/net/ecma/).
Like many sacred documents, the CLI spec is huge, defining everything from the intermediate
language (IL) instruction set to the format of .NET assemblies and metadata. If
you are working with .NET today, you owe it to yourself to become familiar with
what the CLI outlines because other implementations are starting to emerge. Microsoft®
is building at least one other implementation (the .NET Compact Framework for Windows
CE) and various other groups are working on implementations of their own.
The CLI shares a feature with many sacred documentsit
tends to be a bit tedious and abstract. Specifications are important, but for me
they are not nearly as satisfying as stepping through live code in a debugger. While
I can't get my hands on the source code to the CLR, I have something that's almost
as goodthe Shared Source CLI implementation, which you're also likely to hear
referred to by its internal Microsoft codename: "Rotor."
A Grab Bag of Goodies
When I first heard about the Shared Source CLI,
I expected it to be the smallest and feeblest implementation Microsoft could get
away with, so I was amazed when I finally got to see it. The Shared Source CLI isn't
some one-off implementation built on the cheap. Rather, the Shared Source CLI code
base is derived from the source code to the CLR and is practically identical in
most important respects. The JIT compiler and garbage collection are simplified
from the commercial product, but the Shared Source CLI is still very capable. The
execution engine includes source and support for all of the familiar CLR idioms,
including remoting, the context model, threading, and code access security. All
of this adds up to one big code basethe entire install is composed of over
9,000 source files. This makes Shared Source CLI the largest source base ever released
by Microsoft for public consumption. By comparison, MFC consists of a puny 13MB
of code spread across 300 or so files, and ATL (my old stomping ground) looks practically
insignificant at about 1MB of code.
Most of these features are mandated by the CLI
specs, so I wasn't too surprised to see them. I was floored to see the rich surrounding
infrastructure, though. The core execution engine is supplemented with the version-aware
assembly loader used in the CLR (known colloquially as "Fusion"), a global
assembly cache that is physically separate from the CLR Global Assembly Cache (GAC),
and a full complement of its own .config files, including security.config and machine.config.
The only things that are missing from the execution
engine are Windows-specific technologies like COM Interop and support for mixed-mode
assemblies containing x86 native code (known as "It just works" or IJW).
These are useful things to have in Windows, but they don't make much sense outside
that environment and their presence would further obscure what is already some fairly
gnarly code.
Having the source code to a solid execution
engine implementation like this is great, but it's hard to get anything interesting
done with just a raw execution engine. In addition to the execution engine itself,
the Shared Source CLI includes source code to practically every .NET tool that comes
with the CLR (and some that don't!) including complete C# and JScript® compilers
and portable versions of all the familiar .NET tools including ILASM, ILDASM, cordbg.exe,
sn.exe, gacutil.exe, al.exe, caspol.exe, and peverify.exe. The Shared Source CLI
also includes source code for numerous debugging and diagnostic tools.
Even if you never run a single executable through
the Shared Source CLI, the availability of these tools makes it much easier to understand
many of the core processes of the CLR like assembly signing, verification, and loading.
The source code for these tools should also provide great starting points for all
kinds of other tools, like debuggers and profilers. I hope to see them begin to
appear very soon.
Class Libraries and the Shared Source CLI
Having an execution engine and related tools
is theoretically enough, but in practice the Shared Source CLI would border on useless
if it didn't ship with some class libraries. The CLI specification takes into account
the fact that various platforms have different resource requirements, so it outlines
two standard implementation profiles developers can use. Developers who want to
build absolute minimum functionality implementations can adhere to the Kernel Profile.
This profile is minimalist to the extremeit doesn't even include floating-point
math. The Kernel Profile is so puny that all of the CLI implementations I am aware
of implement at least the next step up, the Compact Profile.
The Compact Profile mandates additional functionality
in the XML, networking, and reflection namespaces, which start to make the platform
reasonably useful. These two standard profiles and their relationships (outlined
in Partition IV of the CLI specifications) are shown in Figure 2.  Figure 2 CLI Profiles
Any functionality beyond the Compact Profile
is up to the implementer. Naturally the CLR goes way beyond the Compact Profile
and provides oodles more classes than the CLI spec includes for things like Windows
Forms, ADO.NET, and ASP.NET.
The Shared Source CLI implementation doesn't
go quite as far. It conforms pretty closely to the Compact Profile with a few extra
goodies thrown in. The Shared Source CLI ships with source code to five assemblies:
Mscorlib.dll, System.dll, System.xml.dll, System.runtime.remoting.dll, and System.runtime.serialization.formatters.soap.dll.
These libraries don't contain every class featured
in the CLR versions of the libraries, but they contain everything that mattersyou'll
find source code for about 1300 public classes to play with. A summary of the differences
between the Shared Source CLI and the commercial CLR is shown in Figure 3.
Since the Shared Source CLI is intended to be
a demonstration of the CLR and not a commercial product, you won't find services
like Windows Forms, ADO.NET, or Web Services. I tried to get some of these libraries
to load and run in the Shared Source CLI environment, but all of them rely on classes
that aren't in the current Shared Source CLI distribution. This seems to me like
a great place to dabbleI'm optimistic that ADO.NET and Windows Forms can probably
be made to run if the current Shared Source CLI implementation is augmented with
the appropriate classes.
The Shared Source CLI, Windows, and FreeBSD
One thing that the CLI specification does not
mandate is that managed code has to run on Windows. To prove this point, Microsoft
built the Shared Source CLI to compile and run on FreeBSD Unix as well as Windows
XP. Since the Shared Source CLI is derived from the CLR, and the CLR is built on
top of Win32®, some sort of layer is needed to abstract the OS details from
the bulk of the code base. Rather than invent yet another abstraction layer, the
Shared Source CLI code punts and relies on a subset of Win32. This set of functions
provides the foundation for everything in the Shared Source CLI and are collectively
known as the Platform Abstraction Layer or PAL.
The PAL is implemented as a pair of libraries
(rotor_pal.dll and rotor_palrt.dll) that together export approximately 500 functions.
These libraries are the only ones in the entire Shared Source CLI distribution that
have any dependencies on the underlying OS. Everything else (including the compilers,
tools, and runtime) all sit on top of the two DLLs that comprise the PAL. This state
of affairs is shown in Figure 4.  Figure 4 Dependencies in the Shared Source CLI
The PAL is completely defined by a header file
called rotor_pal.h and documented in the pal_guide.html in the docs\techinfo directory.
The current Shared Source CLI distribution includes two implementationsone
for Win32 and one for FreeBSDlocated under the pal directory. As you can imagine,
the Win32 PAL is by far the simpler of the twoit amounts to about 1MB of source
files that are, for the most part, thin wrappers on top of the Win32 or common runtime
equivalents.
The FreeBSD PAL is considerably more complex
as it has to map Win32 functions to FreeBSD as best as it can. Some of these functions
(like lstrcpy) are pretty straightforward to port, but others (like CreateFileMapping)
require the PAL to do some real work. The resulting FreeBSD PAL consists of 2.5MB
of source code and makes pretty interesting reading.
Porting the Shared Source CLI to a new platform
involves more than just porting the PAL however, because of the use of JIT compilation.
JIT compilers are by their very nature tied to a particular processor family in
addition to any OS dependencies underneath them. The Shared Source CLI JIT compiler
gets around this problem with an ingenious system of macros that emits x86 machine
code when compiled on Intel platforms and defaults to a set of ANSI C functions
otherwise. As of this writing, you can't run the Shared Source CLI on non-Intel
CPU architectures like PowerPC, but I don't think it will be long before you can.
Runtime Architecture
The Shared Source CLI runtime environment has
to execute three main tasksloading and executing code, memory management,
and security management. This sounds pretty straightforward, but in practice each
of these tasks is more than a little complicated. Making it more difficult is the
fact that these pieces interact in complex ways to influence one another. Information
gleaned from the JIT compiler helps the garbage collector in its hunt for unused
objects. In turn, the amount of memory available can influence how and when code
is JIT compiled, and security has to be sprinkled throughout. The mechanics get
complicated pretty quickly, but many of the more complex interactions in the CLR
have been smoothed out in the Shared Source CLI, making it easier to follow.
It's still a lot of codethe core runtime
implementation is similar in scale to the entire MFC 4.2 framework. Luckily, most
of the interesting code lives in or around a few core classes. The code for the
runtime lives in ssscli\clr\src\vm. The JIT compiler (roughly the size and complexity
of ATL) lives separately in the sscli\clr\src\fjit directory. The heart of the runtime
consists of the following three classes: - ClassLoader (found in clsload.h) manages the loading of IL and metadata and supervises
when and how the JIT compiler will run. Most of ClassLoader's work is involved with
maintaining instances of EEClass objects (take a look at both class.h and class.cpp).
- FJit (fjit.h) is the class that reads IL and produces x86 machine code for execution.
JIT compilation code lives off by itself in the sscli/clr/src/fjit. (I suspect the
class is called FJit due to origins as the "Fast JIT" compiler that was
mentioned in early betas of the CLR.)
- GCHeap (gc.h) heads up the memory management and garbage collection. GCHeap manages
allocation and collection as well as controlling Finalization.
The high-level relationship of these classes
is shown in Figure 5. I don't have enough room to cover all three of
them here, so I'm going to focus on the main transformation of metadata to running
code and leave the topics of garbage collection and security management for another
time.  Figure 5 Core Classes in the Runtime
Getting Started with the Shared Source CLI
First you need to get the Shared Source CLI
code base from the official Shared Source CLI home page (http://msdn.microsoft.com/net/sscli/).
The distribution is a 10MB zipped TAR file that you can easily extract with WinZip.
To play with all these goodies you'll have to start by setting up the Shared Source
CLI build environment. Due to the complexity of the build you can't just open a
Visual Studio® project and press Ctrl-Shift-B.
The first step on Windows or FreeBSD is to set
up the environment variables used by the build process. This is easily accomplished
by running the env.bat batch file found in the root of the install. (Unix programmers
can use either env.sh or env.csh depending on their preferred shell.) Env.bat basically
does two things: it sets up environment variables that control the build process
and sets up the path to make it convenient to use its implementations of tools like
csc.exe or sn.exe.
When building the Shared Source CLI code base,
you can choose whether to include runtime sanity checks and the degree of code optimization
via an optional parameter to env.bat.
Env.bat configures the build flavor via one
of three optional parameters: fast, fastchecked, or checked. Passing this parameter
mainly modifies the defined symbols and flags passed into the compiler. Checked
builds include internal consistency checks in the generated code and disable all
compiler optimizations. Fastchecked builds (the default) include the consistency
check but enable compiler code optimizations. Fastchecked builds run noticeably
faster than checked builds but can be unpredictable in a debugger. As you may expect,
free builds lose the consistency checks and build the fastest binaries possible.
In any case, env.bat will dump build information like so: Setting environment for using Microsoft Visual Studio .NET tools. (If you
also have Visual C++ 6.0 installed and wish to use its tool from the command line,
run vcvars32.bat for Visual C++ 6.0.) CLR environment (C:\sscli\clr\bin\rotorenv.bat)
Building for Operating System - NT32 Processor Family - x86 Processor - i386 Build
Environment = C:\sscli\rotorenv Build Type - checked
After setting up the build environment you can
kick off a build by running the buildall.cmd file in the root of the install or
the buildall shell script on Unix. Buildall.cmd is a pretty simple cmd filethere
is an intricate system of makefiles underneath it that do the heavy lifting.
Driving the entire process is the program build.exe.
Build.exe is familiar to DDK developers but not to most other programmers (and certainly
wasn't to me). If you plan on doing any modifications to the Shared Source CLI build
process, I recommend checking out John Robbins' January 1999 Bugslayer column, mentioned
in the Related Articles list at the beginning of this article, for details on how
to use Build. The distribution includes more detailed notes in the docs\buildtools
directory.
Once you've kicked off a build, consider getting
some coffee or taking a walkbuilding all those tools takes some time. (On
my Pentium 1.1GHz with 512MB RAM it takes 20-30 minutes to do a full build.) The
result is a new directory tree under \build that contains around 30 DLLs and 23
EXEs along with miscellaneous supporting files. This directory structure contains
pieces familiar to CLR developers, including a mini-GAC, a config directory (containing
a machine.config file), and, of course, the implementation of the Shared Source
CLI execution engine. I've already shown the runtime dependencies in Figure 4note
that there are no dependencies on any OS-specific pieces besides rotor_pal.dll and
rotor_palrt.dll.
Executing "Hello, World!"
The best way to really understand the interaction
of the various facets of the runtime is to follow a simple program as it is compiled
and executed by the Shared Source CLI runtime. Remember, this runtime interacts
with code almost identically to the commercial CLR, so most things you learn spelunking
here transfer directly to the commercial product.
You don't need a complicated program to see
the important pieces of the Shared Source CLI runtime swing into action. In fact,
I'm going to use practically the simplest program known to man: class Hello{ static void Main(){ System.Console.WriteLine("Hello, World!");
} }
I'm going to start by compiling with the shiny
new Shared Source CLI C# compiler shipped with the distribution. It's not strictly
necessary to use this compilercode compiled under other environments will
run unmodified under the Shared Source CLI (and vice versa) as long as they don't
rely on libraries that aren't shipped with the Shared Source CLI. Running dumpbin
/imports on the resulting binary (or any managed EXE for that matter) shows a single
dependency on the _CorExeMain function in mscoree.dll: Dump of file hello.exe File Type: EXECUTABLE IMAGE Section contains the following
imports: mscoree.dll 402000 Import Address Table 402374 Import Name Table 0 time
date stamp 0 Index of first forwarder reference 0 _CorExeMain
This clearly won't work under the Shared Source CLI, as it doesn't even ship with
a file called mscoree.dll. The Shared Source CLI's execution engine lives in sscoree.dll.
Yet this compiled EXE has a dependency on the commercial CLR. What gives?
Every time you compile an executable for .NET,
the compiler emits a tiny assembly stub into the EXE. In Windows this stub allows
managed executables to sneak past the loader by masquerading as ordinary Windows
programs. When Windows starts a managed app, the stub calls _CorExeMain (hence the
dependency I just mentioned), which is a really a thin wrapper that delegates to
_CorExeMain2 (which does the real work). Environments that are aware of managed
code (like the Shared Source CLI) ignore this stubthey may use it as a sanity
check to verify that the program is indeed a managed executable, but they use their
own loaders to get the code up and running.
Since the Shared Source CLI uses its own loader,
it doesn't even export _CorExeMain; it just exports _CorExeMain2 (shown in Figure 6).
Anyone wanting to run code in the Shared Source
CLI environment needs to memory-map the assembly they want to run, load sscoree.dll,
and pass it to _CorExeMain2. The Shared Source CLI ships with a tiny program called
clix.exe which is intended for exactly this purpose. Clix doesn't do much beyond
load sscoree.dll and call _CorExeMain2, so it's very simpleabout 400 lines
of C++. To use it, just pass it the name of your managed executable, like so: clix hello.exe <arguments>
Controlling Execution
The Shared Source CLI offers a lot of room for
experimentation and debugging even if you never write a single line of code. Hundreds
of places in the code base query a class called EEConfig (in eeconfig.h) for configuration
settings used to influence their execution. Over the course of running "Hello,
World," the runtime consults no fewer than 105 different settings controlling
all manner of execution. I've listed some of the more interesting ones in Figure 7. At the moment the vast majority of settings
are not documented anywhere, but I found that you can get a pretty good idea of
the available options by looking through the EEConfig class. As a last resort, searching
through the code base for references to a setting usually turns up pretty obvious
usage.
EEConfig uses a helper class called REGUTIL
which looks in three places for configuration settings, in the following order:
environment variables, a Rotor.ini file in the %USERPROFILE directory, and finally
in a Rotor.ini file in the configuration directory for the build (for example, the
config directory might be build\v1.x86checked.rotor\rotor\rotor.ini).
Rotor.ini should be formatted like standard
Windows INI files with your settings in a [Rotor] section. If you use environment
variables to configure settings be sure to prepend the setting names with the string
"COMPlus_"the INI setting "foo=bar" is equivalent to setting
"COMPlus_foo=bar" in the environment.
I find that the most generally useful switches
are the ones controlling log output. The runtime makes extensive use of the nicely
flexible logging code in utilcode\log.cpp that allows you to choose how much information
to dump and where you want it to go. The master switch for log output is the configuration
setting LogEnable. Setting this value to 1 turns on logging, but you still won't
see any output until you choose where you want it to go. You can specify output
targets by setting one or more of the environment variables shown in Figure 8 to 1.
You can control how much information gets logged
by setting the LogLevel setting to a value between 1 and 10. These values map to
#defines in log.h. I tried each of the settings while executing "Hello, World"
to see how they differed in output and got the results shown in Figure 9.
As you can see, the higher settings can generate
a tremendous amount of data. These giant dump files aren't really all that interesting
in and of themselves, but their output can really help make sense of what's going
on when you're stepping through unfamiliar source code in the debugger. I like to
turn on LogToDebugger var-iable any time I'm debugging codethe messages give
you an idea of what the code was doing right before your breakpoint was hit.
Debugging with Visual Studio .NET
There are a number of debuggers you can use
to step through the Shared Source CLI source code. On Windows your options are WinDBG,
ntsd, and Visual Studio .NET. Which one you choose depends on what you value more,
power or ease of use. WinDBG and ntsd are the most powerful debuggers. The Shared
Source CLI ships with a WinDBG extension called SOS or "Son of Strike"
(another internal codename) that allows you to inspect some of the trickier constructs
like what sorts of objects the Shared Source CLI is maintaining on the call stack.
If you are already familiar with WinDBG you will probably want to check it outSOS
is documented in the docs\techinfo folder.
I may be a voyeur, but I'm no masochist, so
I usually stick with the Visual Studio .NET debugger. I find that it gets me 98
percent of what I want, and it's an order of magnitude easier to use than either
WinDBG or ntsd. It's certainly adequate for my purposes here, so I will stick with
it for the rest of this article.
The Shared Source CLI doesn't ship with .sln
projects for clix.exe, so you can't just load it up and press F5. You can get pretty
close to this, though. My first attempt at debugging with Visual Studio .NET was
to create a C# project and change the Start Application in the project properties
to clix.exe. This didn't work very well. First, you have to remember to turn off
managed debugging, as the managed debugger has a tendency to deadlock when debugging
under the Shared Source CLI (and who could blame it?). Second, stepping over code
is sloweach statement takes about five seconds to step over on my machine,
which quickly drove me to try to find a better solution. The clincher is that launching
Visual Studio from the desktop makes it awkward to set up the configuration environment
variables I talked about earlier.
While poking around for better ways to use Visual
Studio .NET to spy on the runtime, I stumbled onto the optional /debugexe argument
to devenv.exe. This turned out to be the easiest way to go. Simply typing devenv /debugexe \sscli\build\<buildversion>\clix.exe hello.exe
at a command prompt from the hello directory creates a Visual Studio .NET debugging
project around clix.exe. Pressing F11 (Step Into) will step you right into the main
function of clix. Start stepping over the code and soon you will find yourself at
the gate of the runtime, as shown in
Figure 10.
Ironically, the one place you cannot set a breakpoint
is directly on the Console.WriteLine statement in Hello.cs. Remember that you are
using the Visual Studio unmanaged debugger, which will blithely ignore any breakpoints
you set in C# or Visual Basic® .NET files. Currently there is no debugger that
lets you simultaneously do debugging and variable inspection of your managed code
at the virtual machine level.
So how does the Hello.exe assembly get turned
into code? _CorExeMain2 starts the process by getting the execution engine ready
and then handing it hello.exe. Before starting the code, it verifies the signature
on the assembly it's about to execute by calling StrongNameSignatureVerification.
Assuming that the image being executed checks out okay, it starts a chain of initialization
code through CoInitializeEE that eventually winds up in the EEStartup function in
ceemain.cpp. You can lose yourself in this function for daysEEStartup starts
up practically all supporting services needed to execute your code, including the
garbage collector, the JIT compiler, and thread pool manager.
After all the services in the execution engine
have been initialized, CoInitializeEE returns to _CorExeMain2. At this point the
runtime is primed and raring to go. _CorExeMain2 unleashes it by calling SystemDomain::ExecuteMainMethod
and the execution engine is off and running.
Loading Classes
Every managed program begins with a call to
the main entry point, which for Hello.exe is, of course, Main. At this point the
Hello class exists solely as a smattering of IL and entries in some metadata tables
in Hello.exe. Before the runtime can call Main (or anything else for that matter),
ExecuteMainMethod has to read these tables and assemble an in-memory representation.
This job is big enough that the Shared Source CLI has an entire class devoted to
the jobClassLoader.
ClassLoader has a pretty formidable job. It
has to figure out the number and types of fields on the class so that the runtime
can create instances later. It has to figure out the relationship between the class
being loaded and its parent classes because base classes will have their own members
and fields, some of which may be overridden by the class being loaded. All this
work is spread out over about 15,000 lines of code in clsload.cpp, class.cpp, and
method.cpp. The pivotal method that accomplishes all of this work is the method
ClassLoader::LoadTypeHandleFromToken: HRESULT ClassLoader::LoadTypeHandleFromToken( [in] Module *pModule, [in]
mdTypeDef cl, [out] EEClass** ppClass, [inout] OBJECTREF *pThrowable)
LoadTypeHandleFromToken takes a {Module, token}
pair to identify the desired class. It then walks the metadata and uses it to build
an instance of class EEClass, which is returned to the caller. Every class used
by the Shared Source CLI execution engine is ultimately loaded through LoadTypeHandleFromToken
so it's a well-trafficked bit of code. Setting a breakpoint will let you watch every
single class get loaded. This is funonce. When you settle down to do some
real work you will probably want to stop and see when your particular class is being
loaded. You can make this easier on yourself by taking advantage of the configuration
settings I mentioned earlier. Set BreakOnClassBuild to the name of the class you're
interested in and ClassLoader will present an ASSERT dialog box just before LoadTypeHandleFromToken
runs. Remember that you need to do this prior to firing up the Visual Studio debugger,
and don't forget to include the entire name, for example System.Object instead of
Object. This allows you to step through the class loading process and watch the
EEClass object being born.
Another helpful setting is ShouldDumpOnClassLoad.
If you set this to the name of your class, then details about the constructed EEClass
will be dumped to the logs. Dumping the Hello class this way gives the output shown
in Figure 11.
Unlike COM class objects, EEClass objects are
central to the operation of the runtime. Every class loaded by the CLR is represented
at runtime by an instance of class EEClass. If you've ever used the System.Reflection
namespace, you have already interacted with it because all the useful properties
and methods on System.Type and its friends like MethodInfo and FieldInfo are implemented
by looking up the desired information on the corresponding EEClass. The EEClass
for the Hello class is shown in the diagram in Figure 12.  Figure 12 Class Objects in Memory
Executing Managed Code
Once the Hello class is loaded, the runtime
is ready to execute the main method. Only one problem remainsHello.Main still
only exists as the IL, as I discussed earlier. .method public hidebysig static void Main() cil managed { .entrypoint Idstr
"Hello, World!" call void [mscorlib]System.Console::WriteLine(string)
ret }
To execute this function the runtime first needs to run the JIT compiler to generate
the x86 machine code you saw in
Figure 1.
The Shared Source CLI, like the CLR, doesn't
compile methods until they are called for the first time. To help keep the overhead
associated with JIT compilation down, the Shared Source CLI never explicitly tells
the JIT compiler to run. Rather it sets up a clever data structure to do it automatically.
As shown in Figure 12, each EEClass object holds a table of MethodDesc
objects in its m_pMethodTable member. Managed execution begins with a call to this
function. In the case of Hello, ClassLoader calls the Call function on the MethodDesc
entry for Hello::Main.
Figure 13 shows the method table
in a little more detail. Each entry in the table holds a member called m_CodeOrIL
that is initially set to refer to the IL routine in the hosting assembly. Each entry
is also associated with a tiny stub of live executable code in the five bytes immediately
preceding the address of the MethodDesc. MethodDesc::Call uses a tiny assembly shim
to call into the stub and execute whatever code it contains. You may have noticed
the addresses dumped alongside each of the methods in the dump just shownthese
are the addresses of the stub associated with the methods.  Figure 13 Before JIT Compilation
As shown in Figure 13, the stub
initially contains an x86 CALL instruction that transfers into a global routine
I call the Universal JIT Thunk. If you dereference the dumped addresses you will
find the value 0xE8, which is the opcode for a relative function call on an x86.
You can easily verify this by pointing the disassembler to the same location.
The Universal JIT Thunk is a small assembly
routine that transfers control to the JIT compiler. The JIT compiler looks up the
IL in m_CodeOrIL and generates native code. When it finishes with that, both m_CodeOrIL
and the stub are updated to point to the newly generated code so that the next time
Call is invoked, control will jump right into the native routine. This particular
state of affairs is illustrated in Figure 14.  Figure 14 After JIT Compilation
This subtle technique allows JIT compilation
to cascade in complex programs. Whenever the JIT compiler emits native code that
calls other methods it generates CALL instructions that call the corresponding stub,
and the process repeats itself. In Figure 15 class foo has methods A
and B that have already been compiled.  Figure 15 JIT Compilation Example
When Foo.B was compiled, the call to Bar.C was
resolved to the stub being held by the Bar EEClass. The same is true for Bar.C's
call to Bar.D. Bar.D has yet to be compiled, so when Bar.C makes a call to Bar.D
control will be passed into the JIT compiler. This results in something similar
to the call stack shown in
Figure 16.
In this example, the Universal JIT Thunk is
installed around address 0x008fcd5. The first call to MethodDesc::Call has called
a native method that got vectored through the JIT compiler, which called a second
method that also needed to be JIT-compiled. As the application matures, it will
stop pestering the JIT compiler, freeing up CPU cycles to execute the actual application
code.
Once the JIT compiler finishes running, it returns
to the stub it has just modified, which jumps to the native code. At long last,
the assembly code in Figure 1
is executed.
Wrap-up
The JitHalt and JitBreak configuration settings
make it easy to watch methods get compiled. Both of them serve the same purpose
of helping you zero in on a method that interests you. JitBreak throws an assertion
deep in FJit::CompileMethod right before the fun begins, allowing you to attach
and watch the function get built. JitHalt tells the JIT compiler to emit a DebugBreak
(INT 3) as the first instruction in the emitted native body, causing the debugger
to stop every time the method is called. Both variables should be initialized with
a string of the format class::method. For example JitBreak=Hello::Main
will cause a dialog box like that shown in Figure 17 to open when Hello::Main
gets compiled. Both variables also support wildcards, so the statement JitBreak=Hello::*
breaks every time any method implemented by class Hello gets compiled. You can easily
keep track of which functions are being JIT-compiled by setting JitTrace=1 and you
can even control what sort of code the JIT compiler will emit with the JitOptimizeType
setting. JitOptimizeType takes a value from 0-4 corresponding to the values of an
enum defined in eeconfig.h: enum { OPT_BLENDED, OPT_SIZE, OPT_SPEED, OPT_RANDOM, OPT_DEFAULT = OPT_BLENDED
};
After spending some time playing with these options you will find that you're ready
(and eager) to explore the myriad intricacies of managed code.  Figure 17 Breaking into JIT Compilation
Managed code is the most important concept to
come out of Redmond in a long time, and it implies a tighter relationship between
the platform and your code than ever before. Understanding this tight relationship
will help you write better programs. The Shared Source CLI is a terrific tool for
doing this, and as a bonus its workings are very similar to the CLR. You should
be able to use the techniques I showed you in this article to begin spelunking through
the code base with some understanding of how the big pieces fit together. Working
with the Shared Source CLI will deepen your knowledge of the CLR in ways that would
be impossible without the source code. So pull back that curtain, have a look, and
have fun!
|