Speak Up: Support Dictation With Text Services Framework

Article
10/01/2019

Speak Up

Support Dictation With Text Services Framework

Eric Brown

This article discusses:

Windows Speech Recognition and text services
Making apps TSF-aware
Text stores and document locks
Creating a TSF-aware control

This article uses the following technologies:
Windows Vista

Code download available at:Dictation2007_07.exe(1560 KB)

Contents

Making Apps TSF-Aware
Text Stores and Document Locks
Connecting to TSF
Implementation Details
Interaction between Scintilla and TSF
Advanced Topics
Conclusion

O ne of my favorite new features in Windows Vista™ is Windows® Speech Recognition, which allows you to operate your computer using only your voice, including dictating text into e-mail messages or other documents. Windows Speech Recognition uses the Text Services Framework (TSF) to insert, select, and correct dictated text. TSF is a scalable framework for the delivery of advanced text input technologies. It provides a standardized method for text services—such as voice recognition, handwriting recognition, spell checkers, and Japanese Input Method Editors—to communicate with applications and text controls. In particular, TSF allows bidirectional communication between applications and text services. This means that text services can read and write to an application’s document and an application can ask a text service to perform actions such as correcting text.

Windows Vista provides TSF support for Win32® standard edit controls, RichEdit controls, HTML editor controls, Windows Forms TextBox and RichTextBox controls, and Windows Presentation Framework TextBox or RichTextBox controls. The edit controls inside Win32 and Windows Forms ComboBoxes, ListBoxes, ListView controls, and TreeView controls are also natively supported. Many Microsoft applications, such as Microsoft® Word and Publisher, also support TSF.

If an application or text control does not fully support TSF, Windows Speech Recognition can still support dictation via its Dictation Everywhere functionality. Dictation Everywhere does not require any assistance from your application or controls, but it can only insert text; it cannot select or correct text after the text has been inserted. To avoid inserting incorrect text, the user must verify that every single dictated utterance is correct.

TSF is responsible for managing keyboard layouts, displaying the language bar (used to switch keyboard layouts), managing and switching the active Input Method Editor, and most importantly for this discussion, providing an abstraction layer that allows text services continuous access to the application’s document text.

In order for text services to read the application’s document text, however, the application or control must implement a COM interface. Figure 1 shows the relationship between TSF, text services, and your application or control.

Figure 1 TSF and Text Services

Figure 1** TSF and Text Services **(Click the image for a larger view)

TSF consists of over 100 interfaces. Luckily, you really only need to use 4 of these—ITfThreadMgr, ITfDocumentMgr, ITfContext, and ITextStoreACPSink—to support dictation, and the only interface your control must implement is ITextStoreACP. Figure 2 provides more detail on these interfaces.

Figure 2 Dictation-Related TSF Interfaces

Interface Name	Who Implements It?	What Does It Do?
ITfThreadMgr	Text Services Framework	Also known as the thread manager, this is the primary TSF interface. It must be created by all TSF-aware applications or controls. It manages focus and creates ITfDocumentMgr objects.
ITfDocumentMgr	Text Services Framework	Also known as the document manager, this interface manages a stack of one or (sometimes) two ITfContext objects.
ITfContext	Text Services Framework	This interface contains an instance of ITextStoreACP and manages the interactions between text services and text stores.
ITextStoreACPSink	Text Services Framework	Also known as the text store event handler, this interface handles change notifications from your application.
ITextStoreACP	Developer	Also known as the text store, this interface gets document text and properties, and sets document text.

Making Apps TSF-Aware

You make your application TSF-aware either by using only TSF-aware text controls or by integrating TSF support directly into it. In most cases, you would only add direct TSF support if your application is directly responsible for displaying and editing text (such as Microsoft Publisher or Microsoft Word). If your app uses a control to display and edit the text, then you can implement TSF support in the control (if that control doesn’t already support it).

If your application uses only unmodified (or subclassed) plain edit controls, RichEdit controls, HTML editor controls, or is based on Windows Presentation Foundation, your application is already TSF-aware and dictation will work without modification to your application. If your application uses other controls, it may be possible to make these controls TSF-aware with very little effort.

If the control superclasses a Rich Edit control and it uses Rich Edit 4.1 or higher, call the following with your control’s window handle after it’s created:

SendMessage(hwnd, EM_SETEDITSTYLE, SES_USECTF, SES_USECTF);

If the control superclasses a plain edit control, or it superclasses a Rich Edit control that isn’t based on Rich Edit 4.1 or higher, you can identify your control as being compatible by adding a DWORD value to the following registry entry:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\CTF\KnownClasses\<RealClassName> = <flags>

<RealClassName> is the class name for your control, and <flags> can be either 1 for controls that are superclassed plain edit controls or 2 for controls that are superclassed Rich Edit controls.

If your application uses a text control that isn’t based on a Rich Edit or plain edit control, or if your application is responsible for rendering text itself, you’ll have to implement TSF support in your text control or application manually. This is a lot more work, but it can be implemented in a straightforward manner.

Text Stores and Document Locks

TSF assumes that a text store is a uniform stream of 16-bit Unicode characters, starting at offset 0. If your document doesn’t match this model, you have a couple of options. First, you can hide some characters in the stream. For example, if your document model has embedded formatting characters (HTML markup, for example), then you can tag those characters as being hidden, and TSF will hide those characters from all text services.

If your document model has multiple streams per logical document (such as Word documents), then you could create one text store per stream. If your document model has distinct editing areas (such as tables), you can separate each area with a region marker. Your application must notify the TSF manager whenever the document text changes, the selection changes, or the layout changes. This is discussed in more detail later in the article.

When you add TSF support, text services can modify your document via the ITextStoreACP methods. Since these calls can arrive at any time, interspersed with user input from the keyboard or mouse, you will need some sort of document locking so that your document is always in a consistent state.

The TSF manager will call ITextStoreACP::RequestLock when a text service wishes to read or write from the document. For example, Microsoft Global Input Method Editors (IMEs) read the document to look at the insertion context, and write to the document in order to insert the new text. While the lock is granted, your application must not call ITextStoreACPSink methods; if you do, the TSF code will get very confused.

You will also need to make sure that external events do not change the document while the lock is granted. If your control has a COM interface, you must block or postpone any COM calls that would change the document.

ITextStoreACP::RequestLock has two parameters. The first parameter (dwLockFlags) contains the lock request. It indicates whether TSF is requesting a read-only (TS_LF_READ) or read/write (TS_LF_READWRITE) lock. It also indicates whether the lock request must be granted immediately (TS_LF_SYNC), or if it may be postponed until later. The second parameter (phrSession) contains the return code of the lock request.

Your application grants the lock by calling ITextStoreACPSink::OnLockGranted using the saved interface pointer that you got from ITextStoreACP::AdviseSink. Locks can be either synchronous (ITextStoreACPSink::OnLockGranted is called within this call to RequestLock) or asynchronous (ITextStoreACPSink::OnLockGranted is called sometime later).

There are only a few instances where your application must grant a synchronous lock request. Your application must grant a synchronous lock if your application (or control) asks a text service to do something such as correcting text (via ITfFunctionProvider, the use of which is outside the scope of this article). Your application must also grant a synchronous lock whenever you call any method on the ITfKeystrokeMgr interface. Your application must also grant a synchronous lock when you do not use ITfKeystrokeMgr, and the lock request comes from within GetMessage or PeekMessage.

Aside from these exceptions, your application is not required to grant a synchronous lock request; it is perfectly allowable to only grant asynchronous lock requests. It is also perfectly allowable to treat asynchronous lock requests as synchronous lock requests.

The MSDN® documentation is a little confusing when it describes the meaning of phrSession. Figure 3 should clarify what *phrSession should contain on return from ITextStoreACP::RequestLock.

Figure 3 *phrSession Return Values

Lock Type Requested	Lock Type Supported	*phrSession Should Contain
Synchronous	Synchronous	Return value from ITextStoreACPSink::OnLockGranted
Synchronous	Asynchronous	TS_E_SYNCHRONOUS
Asynchronous	Asynchronous	TS_S_ASYNC
Asynchronous	Synchronous	Return value from ITextStoreACPSink::OnLockGranted

If your control or application supports multiple threads, or if it reenters the message pump (for example, it calls a COM method while processing a message), then you must handle TSF locks asynchronously. One way to handle the lock request is to post a message to a message queue inside ITextStoreACP::RequestLock and then grant the lock (call ITextStoreACPSink::OnLockGranted) once the document is stable. Otherwise, it’s likely going to be easier to support locking the document synchronously.

Connecting to TSF

To illustrate how the various TSF methods work together, I’m going to present a working example based on the Scintilla text editor component available at scintilla.org. The changes that I made to Scintilla are in red.

The first thing your code must do is to register itself with the TSF framework. This must be done at least once per thread; if your control can be created on separate threads, then it is preferable to do this once per control instance. You do this by creating an ITfThreadMgr object via CoCreateInstance and activating the object. Activating the thread manager marks your control as being TSF-aware and returns a client ID that you need to create a context object.

Once you have a thread manager, you also need to create ITfDocumentMgr and ITfContext objects. You need one instance of each object per active document. Since the Scintilla component is window-based, I create a thread manager, document manager, and context in the Initialise method, which gets called in the WM_CREATE handler, as shown in Figure 4.

Figure 4 Initializing the Component for TSF

void ScintillaWin::Initialise() { // Initialize COM. If the app has already done this it will have // no effect. If the app hasn’t, we really shouldn’t ask them to // call it just so this internal feature works. ::OleInitialize(NULL); // Register with TSF... HRESULT hr = CoCreateInstance (CLSID_TF_ThreadMgr, NULL, CLSCTX_INPROC_SERVER, IID_ITfThreadMgr, (void**)&cpThreadMgr); if (SUCCEEDED(hr)) { hr = cpThreadMgr->Activate(&tid); } // TSF: create context for this window if (SUCCEEDED(hr)) { hr = cpThreadMgr->CreateDocumentMgr(&cpDocMgr); } if (SUCCEEDED(hr)) { hr = cpDocMgr->CreateContext(tid, 0, reinterpret_cast<ITextStoreACP *>(&acp), &cpContext, &ecTextStore); } if (SUCCEEDED(hr)) { hr = cpDocMgr->Push(cpContext); } }

When the window is destroyed, the thread manager, document manager, contexts, and other objects need to be released. I do this in the Finalise method, which gets called just before object destruction (see Figure 5).

Figure 5 Releasing TSF Objects

void ScintillaWin::Finalise() { ScintillaBase::Finalise(); SetTicking(false); SetIdle(false); DestroySystemCaret(); ::RevokeDragDrop(MainHWND()); // TSF: Cleanup cpDocMgr->Pop(TF_POPF_ALL); cpDocMgr = NULL; cpThreadMgr->Deactivate(); cpThreadMgr = NULL; cpContext = NULL; cpSinkUnk = NULL; cpTextStoreACPSink = NULL; ::OleUninitialize(); }

Implementation Details

There are a couple of important design decisions I made while implementing ITextStoreACP for Scintilla. My first decision was to not support display attributes. This simplified the code without affecting dictation, which doesn’t use display attributes. Not supporting embedded objects was also obvious, as Scintilla doesn’t support embedded objects anyway.

The last decision to make was how to handle the mismatch between Scintilla’s 8-bit per character document stream and the 16-bit per character stream that ITextStoreACP expects. I resolved this by treating the trailing characters (DBCS trail bytes, or UTF-8 trailing bytes) as hidden white space within a 16-bit character stream. This complicated the implementation of ITextStoreACP::GetText, but greatly simplified the implementation of every other routine, as it maintained the one-to-one mapping between Unicode and DBCS characters.

Once that decision was made, the implementation of the other ITextStoreACP methods is quite straightforward. You can find the entire implementation in the source code download for this article.

Scintilla already had a notification mechanism that fired some sort of event whenever TSF needed notification. All I had to do was add the appropriate TSF notification in Scintilla’s notification routines as shown in Figure 6.

Figure 6 Adding TSF Notification Handlers

void ScintillaWin::NotifyFocus(bool focus) { ::SendMessage(::GetParent(MainHWND()), WM_COMMAND, MAKELONG(GetCtrlID(), focus ? SCEN_SETFOCUS : SCEN_KILLFOCUS), reinterpret_cast<LPARAM>(MainHWND())); //TSF: Update focus if (!!cpThreadMgr) { cpThreadMgr->SetFocus(focus ? cpDocMgr : NULL); } } void ScintillaWin::ClaimSelection() { //TSF: The selection has changed - notify the sink, // if there’s no lock held. if (!!cpTextStoreACPSink && dwLock == NONE) { cpTextStoreACPSink->OnSelectionChange(); } } sptr_t ScintillaWin::WndProc(unsigned int iMessage, uptr_t wParam, sptr_t lParam) { ... // lots of existing code not shown case WM_WINDOWPOSCHANGED: // TSF: Update position if (!!cpTextStoreACPSink && dwLock == NONE) { cpTextStoreACPSink->OnLayoutChange(TS_LC_CHANGE, 0); } ... // lots more existing code not shown } void ScintillaWin::NotifyParent(SCNotification scn) { scn.nmhdr.hwndFrom = MainHWND(); scn.nmhdr.idFrom = GetCtrlID(); ::SendMessage(::GetParent(MainHWND()), WM_NOTIFY, GetCtrlID(), reinterpret_cast<LPARAM>(&scn)); // TSF: update sinks if (!!cpTextStoreACPSink && dwLock == NONE) { if (scn.nmhdr.code == SCN_MODIFIED) { bool fNotify(false); static TS_TEXTCHANGE chg; // Lots of notifications get routed here. // We’re only interested in insert/delete // insert operations have before/after // notifications, so we save up info if (scn.modificationType & SC_MOD_BEFOREINSERT) { chg.acpStart = scn.position; } if (scn.modificationType & SC_MOD_INSERTTEXT) { fNotify = true; chg.acpOldEnd = scn.position; chg.acpNewEnd = scn.position + scn.length; } if (scn.modificationType & SC_MOD_BEFOREDELETE) { chg.acpStart = scn.position; } if (scn.modificationType & SC_MOD_DELETETEXT) { fNotify = true; chg.acpOldEnd = scn.position + scn.length; chg.acpNewEnd = scn.position; } if (fNotify) { cpTextStoreACPSink->OnTextChange(0, &chg); } } else if (scn.nmhdr.code == SCN_UPDATEUI) { cpTextStoreACPSink->OnLayoutChange(TS_LC_CHANGE, 0); } } }

The last thing that needs to be considered is document locking. Keeping TSF from modifying the document while the user is working on it turned out to be trivial; Scintilla runs in a single-threaded apartment, and message dispatch is atomic (Scintilla doesn’t pump messages while processing a single message). This also meant I could handle lock requests synchronously—there was no way that I could get a TSF lock request while processing another message.

Making sure that TSF requests have the appropriate lock is also trivial. The lock can be checked at all entry points to ITextStoreACP. The hardest part is making sure that the user can’t modify the document while a text service is holding the lock. I chose to implement this lock by queuing incoming Windows messages while the lock was held, and then draining the queue after the lock was released.

The other thing that needs to be queued is the notifications that Scintilla generates while the lock is held. These notifications can cause very strange behavior if they are delivered to the host application while the lock is held.

Interaction between Scintilla and TSF

It may not be clear how events in the Windows Speech Recognition Speech Text Service (such as dictating some text) cause changes in Scintilla, or how events in Scintilla cause changes in the Speech Text Service. Let’s take a look at the interactions between the Windows Speech Recognition Speech Text Service, TSF, and Scintilla. Figure 7 shows how Scintilla creates the TSF objects and how the Speech Text Service becomes aware that Scintilla has a text store.

Figure 7 Creating the TSF Objects

Figure 7** Creating the TSF Objects **(Click the image for a larger view)

Figure 8 shows how the Speech Text Service inserts text into a Scintilla document. This diagram only shows the objects that are related to Scintilla; there are other (TSF-managed) objects that are not shown here. The key insight here is that TSF insulates the application from the text service, and vice versa. The application controls when changes can be made by granting lock requests, and the text service gets notified when relevant changes are made.

Figure 8 Inserting Text into a Document

Figure 8** Inserting Text into a Document **(Click the image for a larger view)

Figure 9 shows how Scintilla and TSF interact when the user types some text into a Scintilla document. Note that TSF locks the document after a text change. TSF does this in order to guarantee that TSF can atomically update its internal data structures. Again, TSF insulates the text service from the application while still allowing text services to be aware of document changes.

Figure 9 Locking a Document for Input

Figure 9** Locking a Document for Input **(Click the image for a larger view)

Advanced Topics

It is possible to wrap TSF support around a control that you don’t have source for. There are four areas that need special attention. First, you must be able to send ITextStoreACPSink notifications at the appropriate time. If the control doesn’t provide those notifications, it is much more difficult to add TSF support. Second, implementing ITextStoreACP::GetACPFromPoint requires that you know how the control lays out text; this can be tricky without source. Third, locking the document can also be challenging; the best way is to wrap the third-party control in an abstraction layer so that you can postpone or deny incoming change requests while the control is locked. Finally, if the control supports asynchronous changes (such as loading from a URL), wrapping TSF support around that control will be a real challenge.

It is possible to implement ITextStoreACP in managed code using P/Invoke to call the appropriate methods. However, since you are calling unmanaged code, you need to have full trust (or at least SecurityPermissionFlag.UnmanagedCode) to allow the calls.

Debugging a text service can be quite difficult on a single machine. If your program hits a breakpoint while running under the debugger, the Text Services Manager will attempt to call into your application (because of the status notification). Since your application is halted, the Text Services Manager halts, which blocks the entire system. I strongly recommend that you debug your text store implementation on another machine via remote debugging.

Conclusion

Adding dictation support to your application, even adding full TSF support, is not as difficult as it may seem at first. Windows Vista, via Windows Speech Recognition and TSF, does most of the heavy lifting.

The download for this article contains the complete edited code. For more information about the Scintilla control used as an example here, see www.scintilla.org. To learn more, see the definitive reference for TSF on the MSDN® Web site at msdn2.microsoft.com/ms629032.aspx.

Eric Brown has been a software developer for over 20 years. He has a passion for speech recognition and dictation. He blogs at blogs.msdn.com/tsfaware.

Additional resources