Display Device Performance Profile Analysis (Windows Embedded CE 6.0)

1/6/2010

The following topic examines the data obtained from initial profiling run in How to Profile and Optimize a Display Driver.

Monte Carlo Profiling Data

After running ProfApp.exe the data in your Debug window should resemble the following output.

Note

For clarity, the tick count, process IDs, and thread IDs have been removed from the following output. Also, the specific timer values vary from run to run because of the manual process of controlling the profiling process.

Kernel Profiler: Gathering MonteCarlo data in buffered mode
ProfileStart() : Allocated 13960 kB for Profiler Buffer (0x48000000)
Starting profile timer at 200 uS rate
ProfApp:  Took 34297 ms to perform blts. 
Kernel Profiler: Looking up symbols for 503297 hits.
.
.
(Additional lines omitted for clarity.)
.
.
Total samples recorded = 503297
Module        Hits        Percent
------------  ----------  -------
nk.exe            332367     66.0
ddi_flat.dll      170117     33.8
gwes.dll             662      0.1
coredll.dll          108      0.0
ProfApp.exe           25      0.0
relfsd.dll             3      0.0
filesys.exe            2      0.0
shell.exe              2      0.0
explorer.exe           2      0.0
pm.dll                 1      0.0
UNKNOWN                8      0.0

Hits       Percent Address  Module       Routine
---------- ------- -------- ------------:---------------------
    328544    65.2 802351c2 nk.exe      :_IDLE_STATE
    101277    20.1 03db634d ddi_flat.dll:?EmulatedBlt_Internal
     64291    12.7 03db7c7f ddi_flat.dll:?MaskedSrcToMaskedDst
      3864     0.7 03db2068 ddi_flat.dll:?CursorOn
      1351     0.2 8025a52f nk.exe      :_NE2000_WRITE_PORT_UCHAR
      1181     0.2 8025a53b nk.exe      :_NE2000_READ_PORT_UCHAR
       345     0.0 03db2268 ddi_flat.dll:?CursorOff
       162     0.0 80229a7f nk.exe      :_PerfCountSinceTick
       153     0.0 8025a54b nk.exe      :_NE2000_READ_PORT_USHORT
       150     0.0 0003d906 gwes.dll    :?dwRealizeColor
        98     0.0 8025ad06 nk.exe      :_HandleReceive

(Additional lines omitted for clarity.)

         1     0.0 0001f663 explorer.exe:?ItemCount
         1     0.0 00029696 explorer.exe:_GetSysColorBrush
        33     0.0                      :<UNACCOUNTED FOR>

DispPerf.exe Data

The following tables show the single set of data from running ProfApp.displayed in several smaller tables.

The following table shows the overall summary of the number of times each raster operation was called.

For information about converting the ROP codes reported by DispPerf to the ROP codes listed in Ternary Raster Operations, see Display Driver Performance Profiling.

RopCode cTotal

0x0000CCCC

1063

0x0000F0F0

6222

0x00008888

59

0x00006666

52

0x0000AAF0

4537

0x0000EEEE

7

0xFEFEFFF1

49

0x0000E2E2

1

0x00005555

378

The following table shows the profiling results from all the ROPs performed by GPE functions.

RopCode cGPE dwGPETime Avg.GPETime

0x0000CCCC

1045

40923908

39161

0x0000F0F0

668

21737

32

0x00008888

59

41647

705

0x00006666

52

28160

541

0x0000AAF0

3957

80123

20

0x0000EEEE

0

0

0

0xFEFEFFF1

49

133721

2729

0x0000E2E2

1

355

355

0x00005555

0

0

0

The following table shows the profiling results from all the ROPs performed emulation functions. For more information, see BitBlT Emulation Library Functions.

RopCode cEmul dwEmulTime Avg.EmulTime

0x0000CCCC

18

94149

5230

0x0000F0F0

5554

486315

87

0x00008888

0

0

0

0x00006666

0

0

0

0x0000AAF0

580

63332

109

0x0000EEEE

7

13033

1861

0xFEFEFFF1

0

0

0

0x0000E2E2

0

0

0

0x00005555

378

42907

113

The DispPerf results do not show profiling results for hardware calls because the settings in How to Profile and Optimize a Display Driver are based on the FLAT driver, which is a general purpose driver that does not make use of advanced hardware capabilities.

Analysis of the Profiling Results

The Monte Carlo profiling results reveal the functions in which the most time was spent during the profiling period.

Specifically, these results show that the system spent the most time in the idle state, most likely waiting for the manual interaction required to start and stop the profiler.

Idle time is both expected and unavoidable with this method, so this result is neither surprising nor interesting for optimizing the display driver.

The second and third lines are more significant for optimization.

The second line shows that a very large percentage of time — well over 50 percent of the total when the idle time is factored out — is spent in the driver function EmulatedBlt_Internal.

The next most time-consuming function is MaskedSrcToMaskedDst, taking up nearly 40 percent of the execution time once the idle time is factored out.

The DispPerf.exe results show that the driver spent two orders of magnitude more time performing the 0x0000CCCC (SRCCOPY) raster operation in GPE than during any other operation. This highlights it as a prime candidate for optimization.

A search through the source code shows that the two functions are defined in files located in %_WINCEROOT%\Public\Common\OAK\Drivers\Display\GPE. EmulatedBlt_Internal is defined in Swblt.cpp and MaskedSrcToMaskedDst is defined in Swconvert.cpp.

Examining this code to see how the specific ROP is processed reveals that both of these functions are involved in converting color from the RGB555 format used in the application to the RGB565 format used by the display hardware. EmulatedBlt_Internal is a GPE function called for the blit operation.

It then calls MaskedSrcToMaskedDst for the format conversion.

The function EmulatedBlt_Internal is of particular concern because it is a standard GPE function.

This indicates that not only is there no hardware support for this conversion in the graphics device, there also is no emulation library function for the conversion either.

See Also

Concepts

How to Profile and Optimize a Display Driver