Vrooooom!

How .NET and C# Drove an Entry in the DARPA Grand Challenge

John Hind

This article discusses:

  • Autonomous vehicle control with the .NET Framework
  • An extensible real-time control architecture based on a whiteboard metaphor
  • Implementation of an accurate GPS-synchronized timer component for .NET
This article uses the following technologies:
C#, .NET, Windows XP

Code download available at:DARPAChallenge.exe(141 KB)

Contents

Background
Real Time
Design Patterns
Whiteboard
Tips for Implementing Windows Services
A Design Pattern for Add-Ins
Device Implementation
Messaging
Telemetry
Implementation of RS-232 Devices
Mission Clock
Resilience
Further Developments
Conclusion

In 2003, I was asked to help engineer software for an autonomous vehicle to be entered in an off-road race, the DARPA Grand Challenge, which you may have read about last spring. The opportunity to be involved in a team on the cutting edge of robotics was irresistible, though I was more than a little daunted: the race called for an average speed of over 25 mph sustained for up to 10 hours, we had less than a year to design the vehicle, and the off-road course would not be known in advance. The first prize was a million dollars.

Figure 1 The Carnegie Mellon Red Team Car

Figure 1** The Carnegie Mellon Red Team Car **

A vehicle called Sandstorm (see Figure 1), built by the Red Team from Carnegie Mellon University, led the field by completing 7.4 miles. My team, Team Spirit of Las Vegas, built a vehicle called Autoquad. Although our vehicle (shown in Figure 2) did not make it through the qualification stage, we learned valuable lessons that may give us another shot at the still-unclaimed prize.

Figure 2 Team Spirit of Las Vegas Car

Figure 2** Team Spirit of Las Vegas Car **

Background

Once I joined the team, I reviewed the work already done and the design decisions made by the team. A Honda Rincon one-person ATV, donated by Honda R&D Americas Inc., had been chosen as the base vehicle and had been automated using electric servo motors and string-pots. The control architecture includes a computer vision system with a gyroscopic-stabilized video camera feeding an Apple Power Mac G5 running OS X, a choice dictated by the experience and code base of the engineers working on this part of the project. There is a GPS receiver from Novatel Inc., a second GPS receiver from Swiss company u-blox AG, a three-axis orientation sensor from MicroStrain, a radio telemetry system, and four BasicX microcontroller units (BX-24s) from NetMedia Inc. The BX-24s control the servo motors and the remaining sensors. A PC running Windows® XP integrates the systems using Ethernet with the G5 and uses point-to-point RS-232 links with the other devices. That same computer also provides a platform for the navigation, command, and vehicle management systems. An Edgeport eight-port USB-to-serial converter/multiplexer from Inside Out Networks was used to provide sufficient RS-232 ports. Most of the hardware was given or lent to the team by sponsoring manufacturers (see Figure 3 for a complete diagram of the system).

Figure 3 Autonomous Vehicle Control System

Figure 3** Autonomous Vehicle Control System **

Real Time

The computer-generated imagery rendering for the latest Hollywood blockbuster demands high performance, but it does not need to be real time. A system that spits out frames at random intervals over an hour, sometimes producing 10 frames in one second and sometimes cogitating for a minute over a single frame is as good as one that churns out frames like clockwork. On the other hand, a video game must sense the controller states and produce 20 or more new frames a second or the illusion is broken. This is a real-time system; if a frame cannot be rendered on time it is better to move on to the next one or to compromise quality in order to get the frame out. Driving very definitely presents real-time problems. At 30 mph, a driver has well under a second to respond to an obstacle detected 40 feet off, and this response must include both planning and executing the evasive maneuver.

My role in this project was real-time integration: providing a reliable platform for running artificial intelligence (AI) code, inputting and outputting data from sensors and actuators, and routing data between diverse platforms. Several considerations pushed me into writing the "glue" code using the .NET Framework. I have found C# programming to be highly robust, an essential attribute when you are working with nearly impossible deadlines. Furthermore, it was clear to me that much of the code written by robotics and navigation experts would be in various dialects of Basic. I could easily convert this code to Visual Basic® .NET and present it back to its authors in a form they could still understand. This ease of mixing languages is generally limited to the .NET environment and is of particular benefit in volunteer "garage engineering" projects where keeping application experts inside their comfort zone is more important than architectural purity.

However, I was quite aware of the potential downside: .NET was not primarily designed for real-time applications and most commercial operating systems, including Windows XP, are not designed to be real-time systems.

Design Patterns

The code I present in this article is a mixture of XML and C#. Please note that this code is not shipping code, and that most design decisions were made to support an evolving application. I've used design patterns (generic templates) for some of the code. Text-pasting tokens are shown as uppercase labels enclosed in brackets with no embedded white space (that is, {NAME}). The entire token needs to be replaced by text appropriate to the application. {NAME}Driver might become GpsDriver.

As a practical example, I've used the singleton design pattern several times. which I've defined as follows:

public class {<em xmlns="https://www.w3.org/1999/xhtml">CLASS</em>-<em xmlns="https://www.w3.org/1999/xhtml">NAME</em>} { private {<em xmlns="https://www.w3.org/1999/xhtml">CLASS</em>-<em xmlns="https://www.w3.org/1999/xhtml">NAME</em>}(){} static {<em xmlns="https://www.w3.org/1999/xhtml">CLASS</em>-<em xmlns="https://www.w3.org/1999/xhtml">NAME</em>} _singleton = new {<em xmlns="https://www.w3.org/1999/xhtml">CLASS</em>-<em xmlns="https://www.w3.org/1999/xhtml">NAME</em>}(); public static {<em xmlns="https://www.w3.org/1999/xhtml">CLASS</em>-<em xmlns="https://www.w3.org/1999/xhtml">NAME</em>} Singleton {get {return _singleton;}} ••• }

I used this for the MissionClock class by substituting for CLASS-NAME and converting it to code as follows:

public class MissionClock { private MissionClock(){} static MissionClock _singleton = new MissionClock(); public static MissionClock Singleton {get {return _singleton;}} ... }

I used the singleton design pattern to create a class of which there is always exactly one instance. The static reference ensures that one instance is created automatically by the time I first need to use it, and the private constructor ensures that no more instances can be erroneously created. This one-and-only instance is referenced using MissionClock.Singleton.

Whiteboard

The whiteboard metaphor came from the image of a group of academics collaborating on an analysis around a shared whiteboard. Each writes what they know about the problem on the board, and all can see and respond to the collective knowledge of the group. Some of the academics are in radio contact with students making new observations in the field; they update data on the board in response to changing conditions of the phenomenon being analyzed. This system works on the basis of verifiable trust—anyone can see, use, and change any data on the whiteboard, but everyone can also see who sourced or changed a particular piece of data should it prove unreliable.

I decided to organize the whiteboard metaphor into a table of parameters, each of which has an address and a state. I defined devices and parameters required for a particular application using an XML file. I built devices as independently compiled components (assemblies); they communicate with one another entirely through the whiteboard. I implemented the whiteboard on one computer running Windows XP, but devices could be implemented on the same computer, or even on different computers, with potentially heterogeneous architectures. This design, which I call Virtual Instrument and Control Panel (VICP), provides an extremely flexible platform for experimental artificial intelligence, since I could easily add new sensors, actuators, and processing modules, move them between multiple platforms and enable or disable them at run time. I could build communications, data logging, fault tolerance and human-machine interfaces (HMI) generically to work with the well-defined parameter scheme.

Here is the XML that defines the latitude and longitude parameters for the current vehicle position as reported by a GPS sensor:

<INT32 address="102" source="5" log="Y" scale="7">GPS:Latitude</INT32> <INT32 address="103" source="5" log="Y" scale="7">GPS:Longitude</INT32>

Here I hold the whiteboard table in memory and parse the XML once on startup. This XML results in two 32-bit signed integer table slots at addresses 102 and 103. The whiteboard table records that the source of these parameters is device 5, one of the GPS devices, and that changes to the state should be logged. The text and the scale attribute are not recorded in the whiteboard. I separately parse these in the HMI process for display formatting and labeling. Any device can write any parameter; the log records device number, time of change and new state. I identify the source device to the whiteboard for automated fault handling and for update rate management (an optional move, which I describe later). In addition to the normal range of states a parameter of the specified type can take, all VICP parameters have an additional named state: unknown. I set all parameters to unknown at startup until they are written by a device. If the device identified as the source fails, the parameter is automatically changed back to unknown. Furthermore, devices can explicitly write a parameter to unknown; for example, if the GPS antenna can't see enough satellites to generate a position solution. In this way, a device depending on this parameter can never be fooled by outdated or invalid data.

Here are a few more examples of parameter definitions:

<INT16 address="88" min="0" max="1023" valmax="12.0">Battery Volts</INT16> <UINT2 address="58" states="F|N|R">Transmission Gear</UINT2>

These create similar structures in the whiteboard, but provide different data interpretation in the HMI. Parameter 88 is interpreted as values in the range 0 to 12.0, scaled linearly from the state codes 0 to 1023 (valmin could also be specified, but when omitted is assumed to be the same as min). Parameter 58 is interpreted as having three named states: F, N, and R (forward, neutral and reverse), represented by state codes 0, 1, and 2. As always, each also has an additional named state: unknown.

To keep the communications architecture lean and mean, I decided on a limited range of data types, focusing on minimizing storage size. Each BX-24 microcontroller had to communicate over slow RS-232 links where every byte transmitted and every byte of precious RAM used for buffering had to be justified. (The MPU chips had only 400 bytes of RAM available. Yes, I did say bytes!) The parameters could be 2-bit or 8-bit unsigned integers or 16-bit or 32-bit signed integers. I chose the 2-bit minimum size simply because there was a spare bit available in the RS-232 message layout. This parameter type had five possible states (including unknown), but mostly only three were used for a Boolean parameter (true, false, and unknown). I found a few uses for the extra states, though, as in the vehicle transmission example.

Since the VICP is a multithreaded application, data locking was vital for thread safety. I implemented a very simple mechanism at the parameter level. Thus it's important that parameters remain independent of one another. Latitude and longitude are independent dimensions, but degrees and minutes of latitude and the hemisphere marker (north or south) are dependent, which is why I stored latitude in one parameter as signed degrees scaled by 107. It is easiest to see this problem with time and date—if I stored these as separate parameters, the machine's chronology would jump by 24 hours in the interval between the time and the date being updated each midnight.

I see these restrictions as advantages; in computing, particularly when dealing with the real world through sensors and actuators, it's good to remember these things are really just finite state machines because thinking of them this way means fewer mistakes. Mathematicians and theologians can play about with infinity; engineers and scientists should forget these theoretical fantasies and deal with reality. In most real-world measurements, if you are processing more than 10 to 16 bits, you are processing noise. In the present application, the only parameters that can be measured with enough resolution to justify the 32-bit parameter type are latitude, longitude, and time.

Tips for Implementing Windows Services

As the VICP will run on a machine on an unattended vehicle, I was clear from the outset that it ought to be implemented as a Windows service. This provides built-in support for boot startup and fault restart, and makes it easier to ensure that the application will never hang while awaiting a response from a nonexistent operator. It is very easy to create a service in Visual Studio® .NET, but not as easy to thoroughly debug it. By far the easiest solution I found is to test and debug the application as a console app and then convert it to a service for release, although there are other options available. I quickly found that this conversion had to be two-way, since I needed to switch back to the console version frequently to debug add-in application code. In the end, I refactored the code into three assemblies:

  • A singleton class VICPServer in a class library assembly (VICPServer.dll), which also contains the singleton class I called Reporter.
  • A .NET assembly (VICPConsole.exe) containing a simple harness to run VICPServer as a console application.
  • A .NET assembly (VICPService.exe) containing a simple harness to run VICPServer as a Windows service.

Now I could run the latest version of the application either as a service or as a console application without having to recompile.

In VICPServer, I trap all exceptions and use my own Reporter class. In the service environment, Reporter records messages in the system event log. In the console environment, the same messages are written to the console window.

I used the Windows service project type to generate VICPService and filled out some of the stub methods to hook up the application singleton, as you see here:

static void Main() { System.ServiceProcess.ServiceBase.Run(VICPServer.Singleton); } private void InitializeComponent() { components = new System.ComponentModel.Container(); this.ServiceName = "VICPService"; Reporter.Singleton.SetEventLog(EventLog); }

At startup I needed to extract the XML configuration file name from the command line and then delegate the rest of the process:

protected override void OnStart(string[] args) { if (args.Length > 0) VICPServer.Singleton.ConfigFile = args[0]; VICPServer.Singleton.Start(); }

The routines that Windows uses to start and stop the service are delegated immediately:

protected override void OnStop() { VICPServer.Singleton.Stop(); } protected override void OnShutdown() { VICPServer.Singleton.Stop(); }

Similarly, I used the console application project type to generate VICPConsole, and I changed the Main routine as follows:

[STAThread] static void Main(string[] args) { if (args.Length > 0) VICPServer.Singleton.ConfigFile = args[0]; VICPServer.Singleton.Start(); Console.WriteLine("Hit <enter> to exit."); Console.ReadLine(); VICPServer.Singleton.Stop(); }

The Reporter singleton is quite simple and is shown in Figure 4.

Figure 4 Reporter Class

public class Reporter { private Reporter(){} static Reporter _singleton = new Reporter(); public static Reporter Singleton {get {return _singleton;}} public void Report(string s) { if (_eventLog == null) Console.WriteLine(s); else _eventLog.WriteEntry(s, EventLogEntryType.Error); } public void SetEventLog(EventLog e){_eventLog = e;} private static EventLog _eventLog; }

A Design Pattern for Add-Ins

I have implemented devices and several other types of XML configured runtime add-ins using a common design pattern. The XML looks like this:

<{ADDIN-TYPE}s> <{CLASS-NAME} ident="{IDENT-NUM}" enabled="Y" assy="{ASSY-NAME}" ...> label </{CLASS-NAME}> ... </{ADDIN-TYPE}s>

Then I define two interfaces in an API assembly specific to the add-in type, like this:

public interface I{ADDIN-TYPE}Host{...} public interface I{ADDIN-TYPE}Client{...}

In the host application, I create an object to act as a proxy for each add-in listed in the XML and store it in a collection. In this proxy object I implement the IHost and hold a reference to the IClient of the add-in object itself:

public class {ADDIN-TYPE}Proxy: I{ADDIN-TYPE}Host { private {ADDIN-TYPE}Client _client; private uint _id; public {ADDIN-TYPE}Client Client {get{return _client;}} ... }

The classes for the add-ins themselves (the clients) may be defined in the host assembly (built-in) or may be in another assembly that references the API assembly. This is completely flexible; many add-in classes may be defined in one assembly or each may have its own assembly. Many instances of the same add-in class may be used in an application. The host assembly does not need to be recompiled to make a new add-in class available; installation merely requires that you put the assembly in the correct directory. I used this design pattern for the add-in classes:

public class {CLASS-NAME}{ADDIN-TYPE}: I{ADDIN-TYPE}Client { private I{ADDIN-TYPE}Host _host; public {CLASS-NAME}{ADDIN-TYPE}(I{ADDIN-TYPE}Host hif, string spec) { host = _hif; ... } ... }

Here I pass a reference to the host interface to the constructor, which is stored for later use. I also pass the entire XML tag in string form. The latter enables additional attributes to be passed to the add-in at startup, and they can be parsed using an XML Text Reader in the constructor.

In the host program, I use an XmlTextReader to create a proxy object for each add-in specified in the XML file (see Figure 5). I then execute the LoadClient method of the proxy object to create and reference the add-in object (see Figure 6).

Figure 6 Loading an Add-in from Its Proxy

private void LoadClient(string spec) { Assembly assy = GetType().Assembly; string val; _client = null; _id = -1; XmlTextReader tr; tr = new XmlTextReader(spec, XmlNodeType.Element, null); try { tr.Normalization = true; tr.WhitespaceHandling = WhitespaceHandling.None; tr.Read(); // Get the ID of the add-in val = tr.GetAttribute("ident"); if (val == null) throw new Exception("ident not stated."); _id = uint.Parse(val); // Find and load the add-in's assembly val = tr.GetAttribute("assy"); if (val != null) assy = Assembly.Load(val); // Create the add-in instance _client = (I{ADDIN-TYPE}Client)assy.CreateInstance( tr.Name + "{ADDIN-TYPE}", false, 0, null, new object[] {this, spec}, null, null); } catch(Exception e) { throw new Exception("Failed to create " + _id", e); } finally { if (tr != null) tr.Close(); } }

Figure 5 Loading Add-ins from an XML File

private void Load{ADDIN-TYPE}s() { string val; uint i; int depth; XmlTextReader tr; tr = new XmlTextReader("{XML-FILENAME}"); try { tr.Normalization = true; tr.WhitespaceHandling = WhitespaceHandling.None; XmlTextReader tr1; _{ADDIN-TYPE}s = new {ADDIN-TYPE}Proxy[{MAX-ADDINS}]; while(!tr.EOF) { if (tr.MoveToContent() == XmlNodeType.Element && tr.Name == "{ADDIN-TYPE}s") { depth = tr.Depth; tr.Read(); tr.MoveToContent(); while (tr.Depth > depth) { val = tr.GetAttribute("ident"); i = uint.Parse(val); val = tr.ReadOuterXml(); _{ADDIN-TYPE}s[i] = new {ADDIN-TYPE}Proxy(); _{ADDIN-TYPE}s[i].LoadClient(val); tr.MoveToContent(); } break; } else tr.Read(); } } finally { if (tr != null) tr.Close(); } }

Now, from the host application, I can use the client interface held in the proxy object to control the add-in. Many of the add-ins used in the VICP are multithreaded and respond to asynchronous events occurring in the real world. I could use the host interface reference stored in the add-in client to update the host application asynchronously. The host interface is of little use in single-threaded add-ins and I omit it if it isn't required.

Device Implementation

To implement devices in the VICP, I used the add-in design pattern just shown. Figure 7 illustrates the generic architecture. I implemented three general-purpose devices built into the VICP: RS-232, TCP, and LOCAL. I built the application-specific devices such as GPS into separate .NET assemblies. This means that the VICP code is generic, so it does not need to be recompiled for each application. The entire application configuration is in the XML file, which defines the devices and parameters.

Figure 7 Abstract Device Architecture

Figure 7** Abstract Device Architecture **

In the current application, I create an instance of the GPS driver with the following XML:

<GPS ident="5" assy="GpsDriver" port="COM10" baud="19200" type="UBLOX" rate="8" block="100" timefix="Y">UBLOX GPS</GPS>

This creates device 5 using a class GpsDriver in assembly GpsDriver.dll. This custom device interfaces with the GPS hardware and uses it to update a set of parameters including latitude and longitude in the VICP (see Figure 8). Most GPS sensors use an ASCII protocol called NMEA183 over RS-232 and send update messages (called sentences in the protocol) once per second or faster. The GpsDriver assembly parses these sentences, scales the values appropriately, and uses the host interface to update the VICP.

Figure 8 RS-232 Devices

Figure 8** RS-232 Devices **

As with all devices, I pass an XML extract to the constructor as a text string. I then extract some device-specific attributes to configure the driver. In this device, the rate attribute controls the rate at which the GPS position fix is updated (every eight ticks of the 32Hz clock, or four times a second). The device writes to a contiguous block of parameters starting at the address specified in the block attribute. The timefix attribute controls whether this GPS device will be used to correct the mission clock.

The XmlTextReader class makes it very easy to unpack the XML fragment passed to the device class in the constructor (see the code in Figure 9).

Figure 9 GpsDriver

public GpsDriver(IDriverHost hif, string spec) { _host = hif; XmlTextReader tr = new XmlTextReader(spec, XmlNodeType.Element, null); try { tr.Normalization = true; tr.WhitespaceHandling = WhitespaceHandling.None; tr.Read(); string val = tr.GetAttribute("port"); if (val == null) throw new Exception("GPS Device must specify a port"); _port = val; _timeFix = false; val = tr.GetAttribute("timefix"); if (val != null) _timeFix = (val == "Y"); ... } finally { if (tr != null) tr.Close(); } }

Similarly, I used instances of the built-in RS-232 device to integrate the four BX-24s used in the application (see Figure 8):

<RS232 ident="4" port="COM7" baud="19200">BX-24 D</RS232>

Unlike the GPS, I had control of both ends for the MPU link and was able to design a highly robust and efficient binary messaging protocol. I implemented the PC end of this protocol in the RS-232 device and the BX-24 end using a state machine coded in Basic. In each BX-24, I created and synchronized a copy of selected parts of the whiteboard in Basic variables.

I decided to treat application code as devices in the VICP (see Figure 10). Both simply read and write the whiteboard. But the local and TCP device classes add generic threading and messaging functionality that make writing application code simpler. The local device class communicates with managed applications on the VICP machine. The TCP device communicates over TCP with both managed and unmanaged applications running on either the VICP or a remote machine. I coded a generic TCP application harness that runs either as a standalone executable or as a Windows service on the remote machine. Managed applications written to my specification are interchangeable between the local device and the TCP harnesses, enabling code to be migrated either way for performance rebalancing or other reasons. Unmanaged applications have to be written directly to a documented TCP interface and connected via the TCP device. This approach enabled integration of the video processing system, which is written in C and runs in an OS X environment.

Figure 10 Application Code Devices

Figure 10** Application Code Devices **

Figure 10 uses the following XML to run an application class called Servo from the assembly DriveServo.dll in its own thread under the VICP process on the VICP machine:

<LOCAL ident="8" codeassy="DriveServo" codeclass="Servo"> Drive Servo</LOCAL>

Alternatively, I use the following command line to run the same application class on machine 192.168.0.2 and listen for TCP contact from a VICP on machine 192.168.0.1 using port 1020:

>TcpAppConsole.exe 192.168.0.1 1020 DriveServo Servo

Then I use the following XML to tell the VICP to connect to this client:

<TCP ident="8" client="192.168.0.2" clientport="123">Drive Servo</TCP>

Neither the application nor the VICP can see any difference between these modes of operation.

Messaging

The messaging system I chose to use between the VICP and the devices is symmetrical, and essentially the same messages are used internally to the managed code, over RS-232, and over TCP. I use the messages shown in Figure 11.

Figure 11 Messaging System Commands

Command Description
PUT_UNKNOWN,
PUT_UINT2,
PUT_UINT8,
PUT_INT16,
PUT_INT32
Carries the current value of a parameter. Received by a device, it is notification of the parameter state; at the VICP it is a command to change the state.
ASK_POLL Requests the VICP or the device to respond with one immediate PUT message for a specified parameter.
ASK_ONCHANGE Requests the VICP or the source device to send a PUT message every time a specified parameter changes state.
ASK_PERIODIC Requests the VICP or the source device to send a PUT message for a specified parameter at a specified rate ranging from 32 times per second to once every 220 seconds.
ASK_CANCEL Requests the VICP or the source device to cancel any ASK_ONCHANGE or ASK_PERIODIC against a specified parameter.

I do not require a device to respond to ASK messages at all, and I allow it to ignore PUT messages, but the VICP responds to PUT and ASK_POLL messages for all parameters. For the other ASK messages, I require a routing table entry to have been configured for the specified parameter against the requesting device. I do this with the following XML in the parameter definition:

<UINT2 address="58" source="3">ATV Transmission Gear <ROUTING destination="1" onchange="Y"/> <ROUTING destination="2" rate="32"/> </UINT2>

This allows the VICP to route PUT messages for parameter 58 to device 1 and device 2. I require the initial routing policy to be set in the XML file, but once the table entry exists, the VICP responds to ASK_ONCHANGE and ASK_PERIODIC messages from the destination devices. I have attempted to intelligently manage the rate at which the source device sends updates to the VICP to match the fastest outward routing by sending ASK_PERIODIC messages to the identified source device.

I make no linkage between the PUT message size and the parameter type. For efficiency, the smallest message size that is capable of carrying the actual state is always used. For example, a PUT_UINT2 would be used to set an INT32 parameter to 0, 1, 2, or 3. For any attempt to PUT a state value that is outside the allowed range for a parameter, I set the state to unknown.

I used a simple message packing scheme for the internal and TCP representation of the messages; the RS-232 packing had some points of interest. To minimize the number of bytes in the messages, I decided to use a binary packet with variable packet size. The problem with this approach is that it's difficult for the receiver to know which byte is which in the stream of incoming messages. One approach is to have a checksum at the end of each message. The receiver then tests the hypothesis that each byte is the first byte in a message by testing the checksum. The checksum computes correctly when, and only when, the correct start byte has been located. After this, messages can be processed continuously, and reframing is only required if there is a subsequent checksum error. This is very efficient in minimizing the transmitted messages, but it requires a large buffer and a good deal of code on the receiver, which is not ideal for resource-limited BX-24s.

My alternative method uses a reserved message start byte; various coding tricks ensure this byte could not ever occur within a message. This required more transmitted bytes (one extra per message plus on average a small fraction for pattern avoidance coding), but was much easier to decode since the receiver simply unconditionally resets its state machine whenever it sees the start byte. The chart in Figure 12 shows this message-coding scheme.

Figure 12 RS-232 Message Coding Scheme

Byte (Bits) Value Meaning
0 0xFF Frame start.
1   Parameter address bits 0..7. Parameter addresses in which this byte would be 0xFF are illegal. Such addresses are substituted with defined values in the range 4001..4095.
2 (0..3)   Parameter address bits 8..11.
2 (4..7) 00xx PUT message UINT2 with the data bits in xx.
0100 PUT message indicating parameter unavailable.
0101 PUT message UINT8 with one follow-on byte.
0110 PUT message INT16 with two follow-on bytes.
0111 PUT message INT32 with four follow-on bytes.
1000 ASK_ONCHANGE (requests ongoing PUT messages whenever the parameter changes).
1010 ASK_PERIODIC (requests ongoing PUT messages at a rate specified in the single follow-on byte).
1011 ASK_CANCEL (cancel any ASK_ONCHANGE or ASK_PUT).
1111 Illegal.
3..10   Follow-on bytes (0..4) as specified. For multi-byte integers, the MSB is transmitted first. If a byte should be 0xFF or 0xAA, it is transmitted as the two-byte sequence 0xAA followed by the complement of the data byte.
Last   The sum of the bytes in the message before escaping and excluding the frame start byte; the bytes are added as unsigned and the result truncated to 8 bits; if the byte would be 0xFF, it is transmitted as 0x55 instead.

Telemetry

Telemetry is not a core requirement for an autonomous vehicle (in fact, it was specifically not allowed during the actual race). However, it is very useful during testing, and I always had ambitions for the VICP code beyond this specific application. The whiteboard system and the efficient, symmetrical messaging structure lend themselves quite well to telemetry for remotely controlled systems.

As illustrated in Figure 13, I simply ran the VICP service on both the vehicle and the bunker machines with an identical parameter configuration but with most devices omitted at the bunker end. Both ends have an identical telemetry device; in this case it was interfacing a point-to-point RS-232 radio modem (wired or wireless TCP or even cellular or satellite telephony could also be used).

Figure 13 Telemetry

Figure 13** Telemetry **

The telemetry device is a transparent message channel. All the infrastructure for configuring and synchronizing the two parameter tables was already available. I simply built in the necessary routing tables using XML. Then the telemetry rates could be dynamically changed at run time, on an individual parameter basis, using ASK messages.

Implementation of RS-232 Devices

I built the generic RS-232 device and the specific devices for RS-232 sensors such as GPS using the CommBase .NET RS-232 library described in my article "Use P/Invoke to Develop a .NET Base Class Library for Serial Device Communications". The code in this library was the only unmanaged code I needed for this app (other instances of unmanaged code were used for reasons of programmer familiarity and preference rather than any inability of .NET to handle the requirements). I was pleased that the library easily handled the requirements of this demanding project (a total of eight simultaneous RS-232 ports with an aggregate throughput of over 20,000 bytes per second).

Two issues did come up, however, and I have addressed both of them in an updated code download for the 2002 article. The first issue is that I discovered in most versions of Windows, the usual port name syntax does not work for double-digit port numbers—for example you cannot open "COM10:". Fortunately, when I was banging my head on this problem, a correspondent e-mailed me with the solution (which is given in Knowledge Base article 115831). You have to specify "\\.\COM10" (it will not work with a trailing colon).

To make this simpler I built automatic handling of this issue right into the library. With the latest version, you can now specify the port name either as "COM10" or "COM10:" just as you can for single digit port numbers. I made the library first try to open the port name exactly as specified; if that fails, strip any trailing colon, prefix "\\.\" and try again.

Second, a more complex problem arose with the buffering strategy used for transmission. When testing the BX-24 link I found to my surprise that I could reduce the baud rate while maintaining the message transmission rate. A quick calculation showed that messages could not possibly be transmitted at this rate. The problem was that Windows XP provides a dynamic buffer for serial transmission that grows more or less indefinitely, so the messages received by the BX-24 get older and older as time goes on—not a good situation in a real-time system.

Detecting this situation required some ingenuity and lateral thinking. Using overlapped I/O in the library, I "fire and forget" messages into the transmission buffer. A separate process within the operating system takes bytes from the buffer and transmits them as fast as possible given the configured baud rate. I could count bytes into the buffer and count them out again, but this would require a lot of complex housekeeping. There is a way of receiving a notification, within the reception thread, when the transmission buffer becomes empty. To exploit this, I implemented two flags: dataQueued is set when a message is queued, and empty is set in the reception thread when the transmission buffer is reported empty. A new function in the library, IsCongested, calculates a return value which is true if data queued is true and empty is false, and then clears both flags. This function is called at regular, relatively long intervals. If data is being transmitted faster than it is being queued, the queue will become empty at least once during the interval. Alternatively, if data is accumulating in the queue it will never become empty and IsCongested will return true.

I used a handy trick for making a flag (or any other non-object variable) thread safe. I declared it as a single element array:

private bool[] _empty = new bool[1];

Now it is an object that can be locked, as in the implementation of IsCongested from the CommBase library:

protected bool IsCongested() { bool e; if (!_dataQueued) return false; lock(_empty) { e = _empty[0]; _empty[0] = false; } _dataQueued = false; return !e; }

Mission Clock

I needed a clock that ticks 32 times per second for the VICP. My first attempt used the Timer component from .NET. I set the interval to 31ms for 3 ticks out of 4 and to 32 for the fourth tick to correct for the fractional shortfall. I counted these ticks over a period of time and compared the result with the real-time clock. My tick accumulator lost time at a rate of between 5 and 20 percent, with the worst losses occurring during periods of heavy disk activity. At first I was seriously worried: if Windows XP and .NET could not even keep up with 32 processing slots per second without any application code, what hope was there? After some experimentation I resolved the problem into two issues: first, significant disk activity did seem to prevent the timer code from running for longer than a whole tick (31ms); second, there seemed to be an uncertain interval between the timer expiring and the attached event code actually running. For accurate timekeeping, I could derive closed-loop corrections from the system real-time clock. Code execution slots would still be lost occasionally, but I decided I could live with this as long as I minimized disk activity on the run-time system.

I built a new MissionClock component from the ground up for this application entirely in managed code (C#). It was not a designer component, but was implemented as an ordinary class using the singleton design pattern.

I passed the MissionClock two delegates, one of which executes 32 times per second (if possible; remember that the operating system' scheduler is under no obligation to run my threads at any particular time) and the other every second. I also used the component to keep track of time parameters to give a consistent time to the entire system. These time parameters are locked to the system time, with provision for a correction to a time observation (GPS time in this application). A closed-loop system ensures that these time measures are kept accurate even if the system is unable to keep up with the 32 times per second tick rate. The time parameters are shown in Figure 14.

Figure 14 Time Parameters

Time Parameter
Mission time 32-bit count incrementing 32 times per second from 0 at the start of mission
Universal time Seconds since the start of the current half-century at the Prime Meridian
Local time Seconds since midnight in the local time zone
Mission day Count of local midnights since the start of the mission

I sample and store the local time zone and the half-century base for Universal Time when a mission is started to avoid discontinuities during the mission. I persist this information to an XML file when a mission is started, enabling me to reconstitute the parameters after any reboot. Figure 15 shows the closed-loop algorithm, which runs on a high-priority thread. The key is to keep track of the expected value of the computer clock in the accumulator. When I have done everything I need to do in the current time slot, I increment the accumulator by the tick interval and sleep the thread for the difference between this and the current value of the computer clock. This way, errors are corrected and do not build up. When I increment the accumulator, I check to see if it is actually greater than the current time; if not, a tick must have been missed and I loop around adding increments until caught up. In this event, the tick delegate is not run repeatedly, but the time counters are incremented. I also increment a "lost ticks" counter for diagnostic purposes. Because I like to cover my bases, I provided similar catch-up logic for the seconds counter, though I don't really expect it to ever be invoked.

Figure 15 Mission Clock Algorithm

private void Ticker() { long c; long n; bool s = false; bool t = false; bool m = false; uint mt = 0; try { while (true) { // Process stored actions: if (t) tr(mt); if (s) sr(mt); if (m) MissionOverflow(); s = m = t = false; lock(this) { // Increase mission time by one tick: mt = ++_missionTime; t = ((tr != null) && (!_disableDelegates)); // Second divider (self-correcting even with // large tick gaps): if (--_secondCounter <= 0) { while (_secondCounter <= 0) { _secondCounter += 32; _utcTime++; _localTime++; //Local midnight transition: if (_localTime >= _secondsPerDay) { _localTime -= (uint)_secondsPerDay; m = (++_missionDay >= _maxMissionDays); } } s = ((sr != null) && (!_disableDelegates)); } // The target accumulator value for this cycle: n = DateTime.UtcNow.Ticks + _currentOffset; // Determine the correction to apply to synchronize: c = _targetOffset – _currentOffset; if (c != 0) { if (c > _maxAdjust) c = _maxAdjust; if (c < (_maxAdjust * (-1))) c = _maxAdjust * (-1); _currentOffset += c; } // Advance the accumulator correcting for missed ticks: _accumulator += _tickDuration; while (_accumulator < n) { _accumulator += _tickDuration; _lostTicks++; _missionTime++; _secondCounter—; } } // Sleep for the remainder of the tick duration Thread.Sleep((int)((_accumulator - n)/_dateTimeMillisecond)); } catch (Exception e){} }

I use a second closed loop to apply corrections between observed GPS time (when available) and the computer clock. A public method (not shown here) takes an observed time and uses it to calculate targetOffset from the current value of the PC clock. In the Ticker algorithm, I compare this value with currentOffset value and apply a magnitude-limited correction factor to bring the two values closer together. This process allows me to avoid sudden jumps in time as a correction is applied—the duration of a tick becomes just slightly longer or shorter over a number of ticks until the correction is absorbed.

I use a lock block to provide mutually exclusive access to the private variables between the Ticker thread and other (possibly multiple other) threads that access the public methods. This makes the class fully thread safe.

Resilience

Any system required to operate for many hours in a hostile environment without human intervention demands a structured and thorough approach to software resilience. My first step was to ensure that the system could be safely rebooted at any time during the mission and then be capable of recovering all information and state to continue the mission intelligently. I designed the hardware system so that any of the BX-24s could detect main computer failure and cycle the power to reboot. I persisted very little information to disk, but what is persisted is updated on disk as soon as it changes to ensure it will be available on reboot. The parameter values are not persisted: on reboot they are set to the unknown state until the device sourcing them comes up and provides good data. The machine must know how far along the mission timeline it is; I keep track of this value by persisting the value of the PC clock and the local time zone offset at mission start and GPS time correction whenever it changes. On startup, I restore the mission clock immediately using the persisted information and the current value of the PC clock.

Rebooting the entire system is a last resort. I have exploited the system of XML-configured devices to provide much finer-grained recovery. Recall that devices are subsystems of application functionality as well as sensor and actuator subsystems. I require all devices to respond to an ASK-POLL message against a well-known pseudo-parameter by PUT_UINT2 with the named state OK. I made the VICP send this message every five seconds; if it either gets no response or a state other than OK, the device fails and the VICP sets all parameters it is registered as sourcing to the unknown state. I then attempt to stop and restart the device up to four times. If this fails, I retire the device object for garbage collection and create and initialize a brand new one.

The RS-232 device can reset a crashed BX-24 by pulsing the RS-232 CTS signal that I specified to be hardwired to a reset generator in the BX-24 hardware. The hardware was carefully designed to ensure PC failure modes could not hold the BX-24 in reset.

Further Developments

I am convinced that the future for robotic control systems lies in the symbiosis of many single-chip microcontrollers with a few PC class machines acting in a supervisory role. Microcontrollers, particularly the latest models with digital signal processor (DSP) class mathematical capabilities, have the ability to execute sophisticated closed-loop control algorithms in real time. The PCs bring large-scale data handling capacity and the convenience of a human interface to the party. However, the programming systems available for microcontrollers and DSPs are primitive in comparison to the latest generation of PC systems represented by .NET, and are point solutions not adapted to targeting multiprocessor heterogeneous hardware designs. I am working on developing these ideas into a state machine-based distributed application platform for real-time control and AI applications.

Conclusion

The main lesson I take away from this project is one of humility on behalf of the profession of software engineering. We really have a long way to go to bridge the gap between what we can do ourselves and what we can teach machines to do. Ironically, the part we find hard (navigation) is relatively easily automated, while driving, a skill that can be acquired in a few days by the most intellectually challenged teenager, remains elusive. The team I was involved with tried, and has so far failed, to do the job with roughly human-equivalent sensors. The more successful teams used expensive sensors that were much more capable than human senses. There must be some big, powerful signal processing trick that nature has discovered and we're still missing.

On the upside, I have demonstrated an effective, flexible, scalable and robust platform for experimenting with this and other related problems. The whiteboard structure combined with symmetrical messaging is a simple but extremely powerful solution to a wide range of complex control applications. With some understood limitations, notably related to the handling of rotating media, the .NET Framework on Windows XP has demonstrated more than adequate performance combined with excellent programmer productivity and flexibility.

John Hind is an independent author and consultant living and working in London, England. He specializes in microcontroller applications and control solutions. Get in touch with John at John.Hind@zen.co.uk.