Export (0) Print
Expand All

Optimized GPE Emulation Function Analysis (Windows Embedded CE 6.0)

1/6/2010

After you implement the changes for Optimizing a GPE Emulation Function as part of How to Profile and Optimize a Display Driver, and obtain a set of performance profiling data, you must examine the results of the data to determine whether the optimization improved performance.

With the optimized code in place, the profiling results should resemble the following output.

Ee485484.note(en-US,WinEmbedded.60).gifNote:
For clarity, the tick count, process IDs, and thread IDs have been removed from the following output. Also, the specific timer values vary from run to run because of the manual process of controlling the profiling process.
Kernel Profiler: Gathering MonteCarlo data in buffered mode
ProfileStart() : Allocated 13946 kB for Profiler Buffer (0x48000000)
Starting profile timer at 200 uS rate
ProfApp:  Took 3666 ms to perform blts.
Kernel Profiler: Looking up symbols for 42478 hits.
.
.
(Additional lines omitted for clarity.)
.
.
.
Total samples recorded = 42478
Module        Hits        Percent
------------  ----------  -------
nk.exe             23679     55.7
ddi_flat.dll       18037     42.4
gwes.dll             645      1.5
coredll.dll           87      0.2
ProfApp.exe           16      0.0
fsdmgr.dll             4      0.0
relfsd.dll             2      0.0
kbdmouse.dll           1      0.0
UNKNOWN                7      0.0

Hits       Percent Address  Module       Routine
---------- ------- -------- ------------:---------------------
     22683    53.3 802351c2 nk.exe      :_IDLE_STATE
     14065    33.1 03dbac20 ddi_flat.dll:?EmulatedBltSrcCopy1616Convert
      3340     7.8 03db2068 ddi_flat.dll:?CursorOn
       300     0.7 03db2268 ddi_flat.dll:?CursorOff
       182     0.4 0003d906 gwes.dll    :?dwRealizeColor
       176     0.4 8025a53b nk.exe      :_NE2000_READ_PORT_UCHAR
       171     0.4 8025a52f nk.exe      :_NE2000_WRITE_PORT_UCHAR
       164     0.3 80229a7f nk.exe      :_PerfCountSinceTick
        89     0.2 03dbb680 ddi_flat.dll:?EmulatedBltFill16
        56     0.1 8024855c nk.exe      :_ObjectCall
        55     0.1 80243470 nk.exe      :_ZeroPage

(Additional lines omitted for clarity.)

         1     0.0 03db8e90 ddi_flat.dll:?ScanLine
         1     0.0 03c4265f kbdmouse.dll:_KeybdDriverVKeyToUnicode
        23     0.0                      :<UNACCOUNTED FOR>

For more information, see Monte Carlo Profiling.

The following tables show the single set of data from the second run of ProfApp.exe broken into several smaller tables.

The following table shows the overall summary of the number of times each raster operation (ROP) was called.

For information about converting the ROP codes reported by DispPerf to the ROP codes listed in Ternary Raster Operations, see Display Driver Performance Profiling.

RopCode cTotal

0x0000CCCC

1063

0x0000F0F0

6337

0x00008888

59

0x00006666

52

0x0000AAF0

4628

0x0000EEEE

7

0xFEFEFFF1

49

0x0000E2E2

1

0x00005555

438

The following table shows the profiling results from all the ROPs performed by GPE functions.

RopCode cGPE dwGPETime Avg.GPETime

0x0000CCCC

47

67082

1427

0x0000F0F0

754

23428

31

0x00008888

59

51740

876

0x00006666

52

28191

542

0x0000AAF0

4056

40499

9

0x0000EEEE

0

0

0

0xFEFEFFF1

49

145383

2967

0x0000E2E2

1

353

353

0x00005555

0

0

0

The following table shows the profiling results from all the ROPs performed emulation functions. For more information, see BitBlT Emulation Library Functions.

RopCode cEmul dwEmulTime Avg.EmulTime

0x0000CCCC

1016

4412339

4342

0x0000F0F0

5583

435198

77

0x00008888

0

0

0

0x00006666

0

0

0

0x0000AAF0

572

36957

64

0x0000EEEE

7

23902

3414

0xFEFEFFF1

0

0

0

0x0000E2E2

0

0

0

0x00005555

438

48659

111

The DispPerf results do not show profiling results for hardware calls because the settings in How to Profile and Optimize a Display Driver are based on the FLAT driver, which is a general purpose driver that does not make use of advanced hardware capabilities.

The Monte Carlo results for the optimized driver show that the function MaskedSrcToMaskedDst is no longer a factor in the driver's performance because it is never called. The tasks that were handled by this function are now handled by the new emulated GPE function EmulatedBltSrcCopy1616Convert instead.

When reviewing Monte Carlo profiling results, remember that the results only show the proportion of time, not the absolute amount of time, spent in that function.

The second set of profiling results show that just over 93 percent of the non-idle profiling time was spent in the driver, whereas the original run spent about 98 percent of its non-idle time in the driver.

Do not interpret these results as a 5 percent increase in performance because there is no way to know the absolute times behind these percentages when analyzing Monte Carlo results.

The results from DispPerf.exe do show absolute times spent in the various ROPs. Whereas the original run spent a combined total of nearly 42 seconds for all of the ROPs, the optimized run spent a combined total of just over 5 seconds.

The dramatic difference can be seen in ROP 0x0000CCCC (SRCCOPY), which was the focus of the optimization efforts.

Although the amount of time spent for this ROP in the emulation libraries rose significantly, as expected, from an original total of 0.09 seconds to 4.4 seconds, this was more than offset by the reduction in time spent in GPE calls, which decreased from 40.9 seconds to 0.06 seconds.

ProfApp.exe is written to report its execution time in the debug output. This allows for an independent check against the results from the profiling tools.

The first run of ProfApp.exe spent 34,297 milliseconds (ms) blitting to the screen.

After optimizing the color format conversion in the driver, the second run of ProfApp.exe spent 3,666 ms blitting to the screen.

This represents a 90 percent gain in total performance after optimization.

Community Additions

ADD
Show:
© 2015 Microsoft