Log Analysis and Tuning with Microsoft Speech Server 2004
Microsoft Corporation
January 2005
Applies to:
Microsoft Speech Server 2004
Summary: This white paper discusses log analysis and tuning of speech applications deployed on Microsoft Speech Server (MSS) 2004. It covers the processes from design through deployment and beyond. It discusses what data to log and how to analyze it using the Microsoft Speech Application Log Analysis Tools, including the Call Viewer and the Speech Application Reports. It also includes sample usage scenarios, best practices for tuning, and guidelines on log and database management. (18 printed pages)
Note Speech Server administration, monitoring, and operations are not discussed in this paper. For more information on these topics, please see the Speech Server documentation.
Contents
Overview: Log Analysis and Tuning
Selecting Events to Log
Analyzing Speech Server Logs
Common Log Analysis Scenarios
Best Practices for Speech Application Tuning
Best Practices for Database and Log File Management
Conclusion
Overview: Log Analysis and Tuning
This section provides an overview of analysis and tuning scenarios and summarizes the Microsoft Speech Server (MSS) 2004 logging infrastructure and the Microsoft Speech Application Log Analysis Tools.
Log Analysis and Tuning Scenarios
Log analysis and tuning are a major part of the speech application development cycle. The most successful speech applications in deployment are those that have been tuned on the basis of data collected from real callers to the system. Using real data, speech developers can significantly improve the user experience and overall performance of their application.
For example, during the design phase developers can try to predict the full range of input that users are likely to speak in response to a given prompt. After collecting data from a user trial, however, developers can examine what users really said to the system and update the grammars accordingly.
Predeployment Trials
Many developers roll out their applications in gradual phases to ensure the highest possible quality when it is time for full deployment. After design, debugging, and initial installation are complete, a phased deployment model such as the following is typically used:
- Initial pilot. The phone number of the system is given to colleagues, friends, and family with instructions to try out some tasks.
- Trial phase. A broader set of trial users is selected (either a subset of the expected caller set, or a more extensive range of acquaintances), and more data is gathered.
- Further trials. One or more trials may be carried out with a broader or different set of users.
- Full rollout. This is followed by continual monitoring.
After each phase different parts of the application can be tuned, based on the user behavior observed, and/or on explicit user feedback about the system. When performing application tuning, care should be taken to distinguish between the actions of users who are trying the system out of curiosity (also known as "tire-kickers"), and the actions of users who are behaving as real end users will. The behaviors of the tire-kickers are usually not representative of the behaviors of real users. Tuning should always be performed on samples that are representative of the general caller population.
What Should Be Analyzed and Tuned?
System log analysis offers the developer firsthand experience of users' interactions with the system. All user-facing components of the application can be tuned to improve the system. Not only can big problems be uncovered—such as bugs that did not surface during the developer's own testing with the Microsoft Speech Application Software Development Kit (SASDK) Telephony Application Simulator (TASim)—but considerable improvements can be made to the user experience:
- Prompts can be made clearer.
- Grammars can be tuned to cover the most likely input.
- Confidence thresholds can be optimized to minimize unnecessary confirmations.
- Dialogue flow can be tuned toward more efficient task completion.
- Commands can be enabled where users naturally expect them (and removed where they do not).
- Many other parts of the speech application can be analyzed and tuned to improve the user experience.
The tuning process is a crucial phase of the application life cycle that can produce considerable gains in usability and overall system performance.
Introduction to Microsoft Speech Application Log Analysis Tools
There are two main components of the Microsoft Speech Application Log Analysis Tools:
- Call Viewer for analysis of event flows within calls.
- Speech Application Reports for statistical analysis of key data.
The Call Viewer is designed to filter one or more calls from the logs, and to present views of those calls for analysis. This allows the developer to analyze important event flows, including those reflecting user experiences, platform latencies, and speech recognition (SR) performance. The primary reason to use this tool is to find and diagnose problem calls. Flexible querying and viewing mechanisms are provided to discover problems realized in a number of different ways in the event logs. Typical analytical tasks may include the following:
- Selecting a set of calls based on any of the following parameters: time period, number dialed, caller's number, presence of onnoreco events, high or low number of QAs, and so forth.
- Within each call, evaluating the event flow at any level: dialogue turn by dialogue turn, telephony event by telephony event, SALT object by SALT object, and so forth.
The Speech Application Reports enable the analysis of statistical data across multiple calls. These can be used by developers as well as business users. A set of Default Reports covering common speech reporting scenarios is provided with the tools, and custom reports also can be built. The Default Reports are designed to enable snapshot views of the system at a number of general levels: call volume, user behavior, task usage, error volume, and more. Parameters can be set by date/time and phone number (CalledDevice/CallingDevice), and in some cases by task or QA. Custom reports are more likely to be specific to a particular application or server configuration, and for these, any kind of reporting query across the event database is supported.
The Call Viewer and the Speech Application Reports are discussed in more detail in the Analyzing Speech Server Logs section of this white paper. Lower-level data-extraction utilities supplied with the Microsoft Speech Application Log Analysis Tools are also discussed in the Analyzing Speech Server Logs section.
MSS Logging infrastructure
Figure 1 illustrates the MSS logging framework and the main analysis tools infrastructure:

Figure 1. Logging and analysis tools framework
MSS logs events to Windows Event Trace Log (.etl) files through the Enterprise Instrumentation Framework. The output of this logging is a set of trace log files in .etl file format.
Data from the .etl files is imported into Microsoft SQL Server databases using the Data Transformation Services (DTS) import packages provided with the Speech Application Log Analysis Tools.
Once in the SQL Server databases, the data is ready for viewing in the analysis tools, the Call Viewer, and the Speech Application Reports. The Call Viewer establishes a direct connection with the SQL Server database. The Speech Application Reports are built on Microsoft SQL Server 2000 Reporting Services, the reporting infrastructure for SQL Server 2000.
The following sections discuss how to analyze and tune speech applications using the Microsoft Speech Application Log Analysis Tools.
Selecting Events to Log
Many log events are automatically logged by Speech Server and will appear in the log files if they are enabled by the logging configuration filters (see the Configuring Log Event Profiles section below). Other log events are raised by the application. To record these additional events in the log files, they must be enabled in the logging configuration filters, and also explicitly scripted in the application.
The log event schema can be found in the Log Analysis Tools help file (LogAnalysis.chm) in the topic Event Logging Class Hierarchy. It describes all the events that may be logged by Speech Server. In general, only a subset of events is necessary for a given deployment. The most useful event types for common tuning operations are outlined below.
These are a few of the useful log events that are automatically raised by Speech Server or the Telephony Interface Manager (TIM) software:
- CallStatusEvent. This class of event enables the logging of important call information, such as CallStarted, CallEnded, and so on. These are vital for analyzing call volume and other statistics.
- QASummary. This event wraps the important information about a QA into a single event. It contains the prompt that was spoken, the input that was recognized, the grammars that were used, the semantic items that were updated, the History of how the QA ended, and more.
- AudioLogEvent, ListenElementFiredEvent. The combination of these events enables the logging of recognition audio and its association with a QA for audio playback in Call Viewer.
- UserPerceivedLatencyEvent. This event class enables logging of user-perceived latencies, and is useful for determining where the user experience suffers from latencies above a given threshold.
These are some of the useful log events that are raised by the application developer:
- TaskStart, TaskProgress, TaskComplete. These events are vital for mapping parts of the dialog into the services and transactions of the application. (See the Using Task Events section of this document for more details about using the task events.)
- SALTLogMessage. This event is useful for logging messages that reflect individual application events.
Configuring Log Event Profiles
The main scenarios that encompass most monitoring, tuning, and administration tasks are encapsulated in MSS log configuration filters. For example, the CallStatistics filter enables basic reporting scenarios, using the task, and call status events, along with an operational message set. The CallAnalysis filter adds latency events, QASummary, and SALTLogMessage. Other filters also are available.
Logging configurations are set on the server using the MSSLogConfig script. For more details, see the Configuring Event Logging topic in the Log Analysis help file.
Using Task Events
The task events TaskStart, TaskProgress, and TaskComplete are used for tracking users' progress through the application. Using these events, developers can map parts of the dialogue to the services of the application. This enables the server to log rich data that signals users' attempts to carry out individual transactions, as well as success or other completion states.
The task events are used directly by the Speech Application Reports to generate charts of the use of an application's services and the success rates of those services. Speech developers are strongly encouraged to make full use of the task events in their applications to reap the full benefit of application analysis and tuning at these levels.
Task events can be raised in any client-side script block. The TaskStart, TaskProgress, and TaskComplete events implement a task model that is very flexible—tasks can be nested or overlapped. Nested task modeling allows subtasks that make up larger tasks to be logged within the larger super-task. Overlapped tasks enable the completion of some tasks to be deferred while other tasks are carried out.
Examples
An example of a typical use of the task log events follows. Given an application that offers same-day movie ticket sales, task events can be used to signal the ticket transaction and/or any of its component subtasks.
TaskStart.Raise("movieTickets");
QA: System: Which movie would you like to see?
User: 2001 A Space Odyssey.
QA: System: And how many tickets?
User: Two.
QA: System: Did you say 2 tickets?
User: Yes.
TaskComplete.Raise("movieTickets", "success");
The two task events log, respectively, the entry to the task and its completion with a status of "success." With these events in the logs, the default Tasks Report can be used for a statistical breakdown of all users who attempted and completed a given task.
The following example shows the same task, but with an unsuccessful completion. The application logs a TaskProgress event for each recognition failure, followed by a completion with status "failure" after the third attempt.
TaskStart.Raise("movieTickets");
QA: System: Which movie would you like to see?
User: [noreco].
TaskProgress.Raise("movieTickets", "notUnderstood", 1);
QA: System: Sorry, I didn't catch that. Which movie would you like to see?
User: [silence].
TaskProgress.Raise("movieTickets", "notUnderstood", 2);
QA: System: Please say the name of a current movie, or 'help' for more instructions.
User: [noreco].
TaskProgress.Raise("movieTickets", "notUnderstood", 3);
QA: System: I'm sorry, passing you to an agent for assistance.
TaskComplete.Raise("movieTickets", "failure");
A more complex task model to get credit card details is illustrated in the following sample.
TaskStart.Raise("creditCard");
TaskStart.Raise("cardType");
QA: System: Now I need your credit card details. What type of card?
User: American Express.
TaskProgress.Raise("cardType", "gotValue", "AMEX");
TaskStart.Raise("cardNumber");
QA: System: What is the number?
User: 1111 222333 44444
TaskProgress.Raise("cardNumber", "gotValue", 111122233344444);
TaskStart.Raise("cardDate");
QA: System: And what is the expiration date?
User: January 2007
TaskProgress.Raise("cardDate", "gotValue", "0701");
QA: System: OK, I will charge your American Express card ending 4444 and expiring January 2007. Shall I go ahead?
User: Yes please.
TaskComplete.Raise("cardType", "success");
TaskComplete.Raise("cardNumber", "success");
TaskComplete.Raise("cardDate", "success");
TaskComplete.Raise("creditCard", "success");
In this sample, not only is the creditCard task composed of the three subtasks cardType, cardNumber, and cardDate, but these subtasks overlap in time because the confirmation is deferred until the end. Interim TaskProgress events are used in this case to log the status of the unconfirmed values.
Analyzing Speech Server Logs
This section describes the process of Speech Server log analysis in more detail. It covers data import, discusses using the Call Viewer and Speech Application Reports to find relevant information, and outlines the lower-level log extraction utilities.
Importing Log File Data
As described above, Speech Server logs data in the form of binary ETL files. To view this data using the Speech Application Log Analysis Tools, it is necessary to import it into a SQL Server database. The import process can be run using a DTS import package, or executed using command-line tools. For more details on importing data, see the Importing Log Files topic in the Log Analysis help file.
Choosing a Log Analysis Tool
The Call Viewer is aimed at the application developer and/or system administrator. It is used for analyzing the event flows of individual calls. It allows rapid access to the data by filtering calls by high-level call criteria or by lower-level event properties. For each call selected, it displays step-by-step views of logged events at a level of interest determined by the user.
The Speech Application Reports are aimed at any user in the enterprise who is interested in the performance of the system over time, including business users, application developers and system administrators. The Default Reports provide statistical analysis of the log data. They display graphs, charts, and tables of information about the behavior of the system, which portray its users, tasks, QAs, prompts, errors, and other statistics. For intelligence that goes beyond the generic information shown in the Default Reports, the SQL Server Reporting Services framework allows the creation of custom reports. In this way, enterprise or application-specific reports can be generated and deployed alongside or in place of the Default Reports.
Using the Call Viewer
Call Filtering
The Call Viewer is designed to help users find and examine problematic calls or other interesting event flows. The upper pane is a query designer used for selecting calls and narrowing down the sets of events viewed. The lower pane shows the details returned by the query.
Queries can be made on the summary characteristics of calls using the Where clause. Parameters for Where include any of the properties shown in the columns of the Call Summary tab in the lower pane; From, To, and Placed At, for example.
For example, the following query selects all calls answered between 11:30 a.m. and noon on February 20, 2004.
Where AnsweredAt greater than or equal to 2/20/2004 11:30 And Where AnsweredAt less than 2/20/2004 12:00
Queries also can be made to find calls containing events with certain properties. This is accomplished with the Containing and Not Containing clauses.
For example, the following query selects calls containing QASummary events that show NoReco in the History.
Containing QASummary with History like NoReco
To run the query press F5 at any time. The lower pane shows all of the calls returned by the query, with some basic details about those calls. If certain values are unavailable or cannot be calculated based on the events in the log, they are shown as <NULL>.
Call Details Tab
The Call Details tab lists the events associated with a call. The sets of events displayed are configured through the Event Types and Advanced Event Filtering mechanisms described in the corresponding sections of this document.
The main data associated with the events also are shown in the at-a-glance view of the Call Details tab in the left pane. For example, a TaskStart event shows the name of the task and the time it was started. A QASummary event lists the text of the prompt and the recognized user input for that QA. Here is a sample view of TaskStart and QASummary in the Call Details tab:
TaskStart AskContactName 2/20/2004 11:18:37.018 QASummary Who would you like to contact? david hamilton
When a particular event is selected, all of its properties are shown in the properties pane on the right. In general, properties that are specific to individual events can be found below the common properties in this pane. For example, QASummary includes information such as prompt text, recognized SML, Commands spoken, and History. This can be found by scrolling down in this properties pane. Selecting an individual property displays its value in the text box at the foot of this pane.
Some QASummary and OnListenFiredevents (OnListenComplete, for example) hold audio that can be played back by clicking the Play Audio button that appears when those events are selected.
Figure 2 shows a sample view of the Call Details tab:
Figure 2. The Call Details tab
Event Types Tab
The Event Types tab can be used to filter the set of events that are displayed in the Call Details tab. Typically, users will choose individual events or sets of related events according to the analytical task at hand.
The events are categorized into broad profiles, such as Basic Dialog Events, which contains the highest-level dialog events, and Basic Task Events, which contains the task events, and SALTLogMessage events described earlier in this document.
Certain profiles used in the configuration of MSS logging are provided here as well, such as Call Statistics, Call Analysis, and Detailed Call Analysis. There is also a category containing All Events. Individual events can be selected or deselected within these categories, so custom profiles can be created for fine-grained analytical needs.
Advanced Event Filtering
The Advanced Event Filtering tab can be used for even more detailed control over the events displayed in the Call Details tab. The Advanced Event Filtering tab allows the user to specify certain event properties as conditions. For example, as shown in Figure 3, a filter can be created to show only task events where the CompletionState holds Fail within its string property:
Figure 3. The Advanced Event Filtering tab
The events in the Advanced Event Filtering tab are organized according to the hierarchy in the event log schema. For more details on the hierarchy, including complete definitions of the events and their properties, please see the event log schema documentation in the Log Analysis Tools help file.
Queries defined in the Advanced Event Filtering tab can be used not only to filter the event types displayed but also to filter the set of calls that are displayed. To apply such a condition to call filtering, simply check the "Hide all calls..." checkbox on the Call Filtering pane. This will display only calls that contain events relevant to any specified advanced filters.
Sample Queries
A number of sample queries are supplied with the Call Viewer. These queries provide starting points for common tasks of speech application diagnosis. These can be built on to create customized queries for specific applications:
- AllCallsInMay2004.cvq. This query demonstrates time-based call filtering to narrow the range of interest. In this case, calls answered during May 2004 are selected (uses Answered At property of the call).
- CallsContainingErrors.cvq. This query can be used to find important error events. It selects calls containing error events (uses fatal error event types).
- CallsContainingQANoReco.cvq, This query can be used to find dialog turns where the speech recognition engine did not produce a result (a clear indicator of misrecognition). It selects calls that contain a NoReco event for a dialog turn (uses the History property of the QASummary event).
- CallsContainingQASilence.cvq. This query can be used to find places where the user was expected to speak, but the recognizer detected no speech (perhaps due to a confusing prompt). It selects calls that contain a user silence (uses the History property of the QASummary event).
- CallsContainingConfidenceBelow0.3.cvq. This query can be used to find low-level confidence recognitions (a typical symptom of misrecognition). It selects calls that contain QAs with low confidence (uses the Confidence property of the QASummary event).
- CallsWhereUserAskedForHelp.cvq. This query can be used for finding places where users are not sure what to do. It selects calls in which the user gave a help command (uses the Command property of the QASummary event).
- CallsWhereUserHungUp.cvq. This query can be used to find calls that were ended by the user rather than by the system or by transfer. It selects calls where the call was ended by a user hang up (uses the Ended property of the call).
- ShowQANoRecoOnly.cvq. This query not only selects calls that contain QAs with NoReco, but also shows only those QAs displayed in the Call Details tab (and not other QAs). It does this by using event filtering to specify a noreco value for the History property of the QASummary event, and selects "Hide all calls..." to apply the filter at the levels of both call filtering and event view filtering.
Using Default Reports
Here is an overview of the Default Reports included with the Microsoft Speech Application Log Analysis Tools, with an explanation of their primary purposes:
- Call Volume examines overall call volume for the reported period, including total calls, graphs by period, call duration averages, disconnection statistics, and transfer details.
- Call Aggregates finds recurring usage patterns by analyzing call volume across hours of the day, days of the week and month, and months of the year.
- Server Statistics gets a lower-level view of performance through analysis of call volume by machine, board, trunk, and channel; checks overall channel time in use and maximum simultaneous channels; and examines averages for answering time and user-perceived latency.
- Tasks applies task-level analysis of what users attempted to do (tasks started) and how successful they were (tasks completed), and shows averages for task duration, number of QAs used per task, and the number of times tasks were attempted per call. This report can be parameterized at the individual task level.
- Dialog Overview gets an overall picture of user dialogue behavior through statistics on QA usage per call, averages for commands spoken, and recognition confidences.
- Turn Analysis checks the average number of dialogue turns per call, and turn success rates through the QA count and ending History values; analyzes the performance of individual QAs in terms of ending value and mode of input; and checks the duration of Web page dialogs through average time spent on a page. This report can be parameterized at the individual QA level.
- Prompts views prompt playback numbers and gets the breakdown of ending status (barged in/completed) in total and by name, checks average barge-in latencies and user-perceived latencies, and sees which prompts used prerecorded or TTS output.
- Messages views SALT log message graphs and totals, and examines message content breakdown by name.
- Errors checks for errors and missing data: calls and tasks with missing start or end times, and SALT and platform error events by type.
The default report views are configurable by date and time, called device (DNIS) and calling device (ANI). Many reports also can be filtered by individual tasks, and the Turn Analysis Report can be filtered on an individual QA.
By default, the date/time is set to the week prior to the day on which the user of the reports views them. As shown in Figure 4, to generate reports for different time periods, the user selects a different date range in the report header:

Figure 4. The date and time are set by default but can be changed in these fields.
A number of preset, relative periods are available as links on each report: for example, last seven days, last 30 days, and last 90 days.
To select only those calls made to a particular CalledDevice ID, the user enters the ID in the relevant field. For example, Figure 5 shows a configuration that reports on all calls to telephone number 555-3929 that were made on April 20, 2004:

Figure 5. Choose the Called Device to focus further.
The reporting framework offers a number of administrative benefits, including scheduled operations, Web services management, numerous delivery options, and the ability to export to a variety of formats, including Web archive, PDF, Excel, HTML with Office Web Components, and XML.
Building Custom Reports
Some enterprises may want to extend (or replace) the Default Reports with reports that are more targeted for specific business purposes; for example, to integrate the speech data with the enterprise's business intelligence data to enable speech reports that are characterized by classes of customer or product, or to parameterize a Server Statistics Report by a particular machine or channel to show the deployment characteristics of a particular installation.
For scenarios such as these, new reports can be generated using Report Definition Language (RDL), the report description format used by SQL Server Reporting Services. RDL can be written in standard XML or text editors, or it can be generated from the Visual Studio .NET design tools and wizards that come with SQL Server Reporting Services.
For more information about building custom reports, see the Creating Custom Reports topic in the Log Analysis Tools documentation.
Log Extraction Utilities
For advanced or occasional users of the ETL log files, a number of lower-level data extraction utilities are provided. These read data directly out of a single binary ETL file and do not import to the databases used by the Call Viewer and Speech Application Reports.
They include the following utilities:
- MSSLogToText serializes the events in the ETL file into a text document.
- MSSContentExtract extracts the binary audio and compiled grammar data into individual files.
- MSSUsageReport provides summary statistics at several levels.
For more details on these utilities, see the Log File Extraction Utilities topic in the Log Analysis Tools help file.
Common Log Analysis Scenarios
This section describes two common speech application analysis scenarios using the Microsoft Speech Application Log Analysis Tools: how to find and fix speech recognition problems, and how to perform business analysis.
Finding Speech Recognition Problems
A common problem developers often discover in the early phases of deployment is that users are not understood at one or more points in the dialogue. This could be a result of one or more factors. This scenario walks through discovering the problem, identifying the faulty component, and applying a fix.
Discovering the Problem
The Call Viewer can be used to find QAs where the speech recognition engine did not recognize the user's speech. Misrecognitions can surface in a number of ways. The most common are as follows:
- QASummary events with History property of NoReco.
- QASummary events with History property of Silence.
- QASummary events with Confidence property lower than the confirmation threshold used by the application.
These criteria can easily be encoded as queries to call filtering, and the display can be configured to show only these events (if desired). Some of the queries are already available among the sample CVQ files supplied with the Call Viewer.
These general queries may be supplemented by further information from direct caller feedback if it is available. For example, a caller who has provided explicit feedback might state any of the following extra details, allowing the log analyst to apply further filters to narrow down the instance of the misrecognition:
- Called from/to a particular number (use the To/From property on the call).
- Called at a particular date/time (use the Answered At property on the call).
- The system appeared not to hear any speech (look for QASummary events with History like Silence).
- Any other data that enables a more specific Call Viewer query.
Using the Speech Application Reports to Detect Speech Recognition Problems
If only a small amount of data is collected, it is feasible to directly examine in Call Viewer all calls that display any of these symptoms. However, if there is too much data for convenient analysis of individual calls, the Speech Application Reports can be used to identify by name the worst-performing QAs and tasks. These can then be treated as priorities for fixing. The Dialog Overview and the Turn Analysis Reports are useful for this. This information can then be used in the Call Viewer to target the search queries to examples of the problematic QAs and/or tasks.
Identifying and Fixing the Problem Component
Having identified the calls containing the problematic events, the developer can then look at the prompts, listen to the audio, and determine exactly where the problem lies. The following is a list of the most common sources of misrecognition problems.
Audio
The audio of the caller's input can be played back using the Play Audio button available when viewing QASummary and certain other events in the Call Details tab. If the recorded speech is faint or inaudible to the human listener, there could be a problem with the MSS installation, or with the caller's phone or environment. The developer should listen to other audio examples to see whether it is replicated for a wider variety of callers. If all calls display the same symptoms, the developer can check the MSS installation, in particular the TIM configuration (see the Microsoft Speech Server Help (the MSS.chm file) for details), and any manual audio connections. If the problem is with only a few calls, this is clearly an audio problem for individual calls.
In general, problems with the caller's audio can't be fixed by application tuning. If there is a lot of background noise, the issue is likely to lie with the user's environment, for example a car or other noisy place. There may also be problems with the caller's telephone; mobile phones and speaker phones can often degrade voice quality. If the system prompt can be heard in the background of the faulty QA (and if this is clearly the cause of the misrecognition), the developer should also consider disabling barge-in, as this will mitigate misrecognition due to prompt echo.
Grammars
If the user's speech is clearly audible, but the speech recognition engine did not recognize it correctly, the developer should make a note of what the user really said and see if the spoken phrase is covered by the grammar. The grammars used by the QA in question can be seen as a property of the QASummary event. Coverage can be checked manually, or by using the Grammar Editor.
Out-of-Grammar Phrases
If the caller's words are "out of grammar" but they provide a valid response to the prompt-in question, then the grammar can be updated with the response. In most cases this is likely to be an instant improvement. Broader-coverage grammars generally work better than narrower-coverage grammars. Further, the more likely the new response is, the greater the improvement to the system. However, sometimes phonetic confusion issues may be introduced when similar-sounding phrases are present in the same grammar, so caution is in order when adding new vocabulary to your grammar. (For more information about this, see Other Problems.)
If a phrase is not a valid response at some point in the dialogue, the developer should think about what the user is trying to do. Is it a command that is available in other parts of the application? If so, maybe the user is trying a command that he or she is familiar with, and it may be worth considering enabling such a command to meet the expectation of consistency. If not, the developer should check the prompt; if the phrase appears to be quite random, it is probably safe to ignore.
If a phrase is clearly audible and "in grammar," yet the QASummary still registers a NoReco or low-confidence recognition, the developer should check the following:
- Pronunciation. The developer should check to see if a word may not be in the recognizer's default dictionary (for example, the name of a person, place, or company). If a word is not in the dictionary, the recognizer may be using its own pronunciation that does not match the pronunciation of some callers. To fix this, the developer can add a custom pronunciation to the grammar item in question. (See the Grammar Editor documentation in SASDK Help for details.)
- Phonetic confusability. The developer should check to see if there is a similar-sounding word or phrase within any of the grammars used at the problem point in the dialogue ("replay" and "reply" in a unified messaging application, for example). If this is the case, there may be increased potential for misrecognitions and lower confidence of the similar sounding words. If possible, the developer should remove one of the phrases from the grammar, or replace/extend the phrase with one that has a similar meaning (such as "play message again" instead of "replay"). Any prompts that encourage the use of that word should be amended to remove the phrase or encourage the new one (for example, "You can also play the message again").
- Confidence thresholds. If on several occasions the recognizer is recognizing words correctly but rejecting them or attempting to confirm them unnecessarily, then the confidence threshold may be set too high. The developer should try lowering the threshold to cover most of these examples, but still reject true misrecognitions. Conversely, if a large number of misrecognitions are being accepted by the application, then the threshold is probably set too low. If there's enough data, the developer can look at other examples of the same QA to determine an optimal manual setting.
- General problems. Sometimes users can be confused by prompts that do not give them a clear idea of how to respond, or they are otherwise unclear about what state the dialog is in. It is usually a good idea to use the prompt to set clear expectations about what users can say (for example, "Please give me your phone number, beginning with the area code," or "Please say a city name, or 'help' for more instructions"). Clear prompts that encourage constrained user responses are essential to successful speech applications.
- Dialogue problems. In some cases, users have expectations about the dialogue and what is possible to do at any point. If these expectations are not met by the grammars and the possible dialogue flow, then misrecognitions will occur and/or users may be taken down paths of the dialogue that they do not wish to follow. If the logs show a large number of tasks that do not end in successful completion, or show QAs with greater numbers of Help or Cancel commands, then confusing dialogue flow may be the problem.
Business Analysis
The Speech Application Reports are designed to address many common analytical scenarios of viewers with a business interest in the speech application. The following is a list of questions that an enterprise analyst may likely ask, grouped by the themes of the reports holding the corresponding information:
High-Level System Usage
- How many calls is the system getting?
- How many people are being transferred?
- How long does an average call take?
(See the Call Volume Report.)
Usage Breakdown by Period
- What are the busiest hours of the day?
- What are the busiest days of the week?
- What are the busiest times by weekday?
- What are the busiest months of the year?
(See the Call Aggregates Report.)
Service Usage and Success Rates
- Which services are callers attempting?
- How successful are callers at completing the services?
- How long are callers taking to accomplish the tasks?
- How many times are users trying to repeat a task on the same call?
(See the Tasks Report. Remember that applications need to instrument tasks for this report.)
Server Performance
- What is the overall server usage?
- Which channels are getting the activity?
- What is the average time before the server answers the phone?
- What is the average latency experienced by the user between dialog turns?
(See the Server Statistics Report.)
Any business reporting based on speech data, potentially integrated with other business data, can be generated by enterprise users using custom reports.
Best Practices for Speech Application Tuning
When conducted on a regular basis, log analysis can provide significant insights into caller behavior, and tuning on the basis of such analysis can provide considerable enhancement of the user experience. Here are some guidelines on getting the most out of analysis and tuning:
- Phase the rollout to end users. Many big problems with the user experience are discovered during the initial phases of a deployment. It is important to minimize the number of real users affected by these problems. See the Predeployment Trials section of this document for an example of a rollout plan. This methodology also should be followed as an application evolves to take on new services, or if big changes are made to the dialogue flow or other major parts of the user interface.
- Ensure the quality and quantity of the data collected. The more data that is gathered, the more likely it is that big problems will surface and that the data on which the analysis is conducted is representative of the end-user population. It also is important to distinguish between motivated end users (those who attempt to carry out real tasks) and curious, or test, users (those who have less interest in carrying out real tasks) when applying updates. (It is more useful to give test users a range of specific tasks to carry out, rather than for them to perform blind testing.)
- Prioritize the components to analyze and tune. In general, solving higher-level problems achieves much greater gains than tweaking lower-level components. High-level problems include confusing prompts, poor-coverage grammars, flawed dialogue flow, and in some cases, incomplete pronunciation lexicons and inappropriate confidence thresholds. Lower-level issues include the speech recognition engine itself, certain configuration parameters, and tweaking the prompt database.
- Be sure that each update implements an improvement to the system. In some cases, changes made on the basis of data collection can make the system worse. For example, as noted earlier, broadening the coverage of a grammar to accept different ways of saying a particular command or item may increase confusion with other phrases in the grammar, and may thereby introduce new recognition errors where there were none before. If running offline tests on previous data is impossible, the developer should consider applying tuning updates (only if the same input occurs from multiple users as opposed to just a single occurrence/user), and/or run small trials with the update in place to check that performance is no worse.
- Remember that less is sometimes more. Problems encountered by a single user or during a single call may be unrepresentative of the typical user experience. In general, a problem should be observed by several callers before a change is made, since fixes based on one caller may make the experience worse for typical users. For example, enabling a rich set of actions that users can take at every point of the dialogue but which are only rarely (or never) used by callers can increase grammar confusion and reduce overall task completion rates. Such rarely used options should be eliminated.
- Consider using transcriptions of utterances. When large amounts of data are collected, it may be more convenient and efficient for the application developer investigating misrecognitions to use written transcriptions of what the caller(s) said rather than the actual audio recordings of their utterances.
- Monitor logs regularly. As the deployment proceeds, continuous monitoring of the logs is necessary to check that all the components of the application (prompts, grammars, and confidence thresholds, for example) are performing well. Any changes to the application—for example, the addition or removal of services, or a change in dialog flow—should be accompanied by critical examination of the log data.
Best Practices for Database and Log File Management
This section offers guidelines on the maintenance of MSS log files and the log analysis databases.
To avoid performance issues with large databases and heavy log files, it is advisable to follow certain principles of MSS log data management. These may be summarized as follows:
- Log only data that is necessary for a given phase of deployment. In other words, configure the server to log only the events that are necessary for a given analysis scenario.
- Manage the log files on a different machine from the one on which MSS is installed.
- Use regular scheduling, both to move log files from MSS machines onto the maintenance machine at off hours from the server, and to import the data from the maintenance machines into the SQL Server databases.
- Keep Call Viewer data only as long as debugging and tuning is necessary. Speech Application Reports data is typically required for longer periods.
- Purge logs from the SQL Server database when necessary. For instructions on deleting data, see the topic Speech Application Log Analysis Tools Database Maintenance in the Log Analysis Tools help file.
Conclusion
Logging, reporting, and analysis are fundamental to the speech application developer or business manager wanting to evaluate, improve, and maintain a successful speech application deployment. With the extensible log analysis framework, tools, and reports provided by the Speech Application Log Analysis Tools, Microsoft Speech Server 2004 gives developers and business managers the ability to perform sophisticated analysis and troubleshooting of their application's performance and operations. And by virtue of its integration with Microsoft SQL Server Reporting Services, customized reports for in-depth business or development analyses are also possible. Using the tools, reports, and best practices discussed herein, speech application developers and managers can effectively and efficiently tune their applications to increase recognition accuracy, produce higher call completion rates, and improve the user experience.