Breaks a Unicode string into individually shapeable items.
HRESULT ScriptItemize( _In_ const WCHAR *pwcInChars, _In_ int cInChars, _In_ int cMaxItems, _In_opt_ const SCRIPT_CONTROL *psControl, _In_opt_ const SCRIPT_STATE *psState, _Out_ SCRIPT_ITEM *pItems, _Out_ int *pcItems );
- pwcInChars [in]
Pointer to a Unicode string to itemize.
- cInChars [in]
Number of characters in pwcInChars to itemize.
- cMaxItems [in]
Maximum number of SCRIPT_ITEM structures defining items to process.
- psControl [in, optional]
Pointer to a SCRIPT_CONTROL structure indicating the type of itemization to perform.
Alternatively, the application can set this parameter to NULL if no SCRIPT_CONTROL properties are needed. For more information, see the Remarks section.
- psState [in, optional]
Pointer to a SCRIPT_STATE structure indicating the initial bidirectional algorithm state.
Alternatively, the application can set this parameter to NULL if the script state is not needed. For more information, see the Remarks section.
- pItems [out]
Pointer to a buffer in which the function retrieves SCRIPT_ITEM structures representing the items that have been processed. The buffer should be
(cMaxItems + 1) * sizeof(SCRIPT_ITEM)bytes in length. It is invalid to call this function with a buffer to hold less than two SCRIPT_ITEM structures. The function always adds a terminal item to the item analysis array so that the length of the item with zero-based index "i" is always available as:
pItems[i+1].iCharPos - pItems[i].iCharPos;
- pcItems [out]
Pointer to the number of SCRIPT_ITEM structures processed.
Returns 0 if successful. The function returns a nonzero HRESULT value if it does not succeed.
The function returns E_INVALIDARG if pwcInChars is set to NULL, cInChars is 0, pItems is set to NULL, or cMaxItems < 2.
The function returns E_OUTOFMEMORY if the value of cMaxItems is insufficient. As in all error cases, no items are fully processed and no part of the output array contains defined values. If the function returns E_OUTOFMEMORY, the application can call it again with a larger pItems buffer.
See Displaying Text with Uniscribe for a discussion of the context in which this function is normally called.
The function delimits items by either a change of shaping engine or a change of direction.
The application can create multiple ranges, or runs that fall entirely within a single item, from each SCRIPT_ITEM structure retrieved by ScriptItemize. However, it should not combine multiple items into a single run. Later, when measuring or rendering, the application can call ScriptShape for each run and must pass the SCRIPT_ANALYSIS structure retrieved by ScriptItemize in the SCRIPT_ITEM structure.
If the text handled by an application can include any right-to-left content, the application uses the psControl and psState parameters in calling ScriptItemize. However, the application does not have to do this and can handle bidirectional text itself instead of relying on Uniscribe to do so. The psControl and psState parameters are useful in some strictly left-to-right scenarios, for example, when the fLinkStringBefore member of SCRIPT_CONTROL is not specific to right-to-left scripts. The application sets psControl and psState to NULL to have ScriptItemize break the Unicode string purely by character code.
The application can set all parameters to non-NULL values to have the function perform a full Unicode bidirectional analysis. To permit a correct Unicode bidirectional analysis, the SCRIPT_STATE structure should be initialized according to the reading order at paragraph start, and ScriptItemize should be passed the whole paragraph. In particular, the uBidiLevel member should be initialized to 0 for left-to-right and 1 for right-to-left.
The fRTL member of SCRIPT_ANALYSIS is referenced in SCRIPT_ITEM enabled="1". The fNumeric member of SCRIPT_PROPERTIES is retrieved by ScriptGetProperties. These members together provide the same classification as the lpClass member of GCP_RESULTS, referenced by lpResults in GetCharacterPlacement.
European digits U+0030 through U+0039 can be rendered as national digits, as shown in the following table.
|SCRIPT_STATE.fDigitSubstitute||SCRIPT_CONTROL.fContextDigits||Digit shapes displayed for Unicode U+0030 through U+0039|
|TRUE||FALSE||As specified in uDefaultLanguage member of SCRIPT_CONTROL.|
|TRUE||TRUE||As prior strong text, defaulting to uDefaultLanguage member of SCRIPT_CONTROL.|
In context digit mode, one of the following actions occurs:
- If the script specified by uDefaultLanguage is in the same direction as the output, all digits encountered before the first letters are rendered in the language indicated by uDefaultLanguage.
- If the script specified by uDefaultLanguage is in the opposite direction from the output, all digits encountered before the first letters are rendered in European digits.
For example, if uDefaultLanguage indicates LANG_ARABIC, initial digits are in Arabic-Indic in a right-to-left embedding. However, they are in European digits in a left-to-right embedding.
For more information, see Digit Shapes.
The Unicode control characters and definitions, and their effects on SCRIPT_STATE members, are provided in the following table. For more information on Unicode control characters, see the The Unicode Standard.
|Unicode control characters||Meaning||Effect on SCRIPT_STATE|
|NADS||Override European digits (NODS) with national digit shapes.||Set fDigitSubstitute.|
|NODS||Use nominal digit shapes, otherwise known as European digits. See NADS.||Clear fDigitSubstitute.|
|ASS||Activate swapping of symmetric pairs, for example, parentheses. For these characters, left and right are interpreted as opening and closing. This is the default. See ISS.||Clear fInhibitSymSwap.|
|ISS||Inhibit swapping of symmetric pairs. See ASS.||Set fInhibitSymSwap.|
|AAFS||Activate Arabic form shaping for Arabic presentation forms. See IAFS.||Set fCharShape.|
|IAFS||Inhibit Arabic form shaping, that is, ligatures and cursive connections, for Arabic presentation forms. Nominal Arabic characters are not affected. This is the default. See AAFS.||Clear fCharShape.|
The fArabicNumContext member of SCRIPT_STATE supports the context-sensitive display of numerals in Arabic script text. It indicates if digits are rendered using native Arabic script digit shapes or European digits. At the beginning of a paragraph, this member should normally be initialized to TRUE for an Arabic locale, or FALSE for any other locale. The function updates the script state it as it processes strong text.
Minimum supported client
Windows 2000 Professional [desktop apps only]
Minimum supported server
Windows 2000 Server [desktop apps only]
Internet Explorer 5 or later on Windows Me/98/95
- Uniscribe Functions
- Displaying Text with Uniscribe