Syntax
| long = stringclassobject.isUTF8
|
|---|---|
Description
| Determines whether the contents of the StringClass instance form a valid UTF-8 byte sequence. isUTF8 returns True when every byte in the buffer satisfies the formal UTF-8 grammar as defined in RFC 3629, including correct continuation-byte patterns for two-, three- and four-byte code points, and returns False when any malformed sequence is encountered.
Detection is performed deterministically via the Windows API function MultiByteToWideChar with the MB_ERR_INVALID_CHARS flag set, which by design rejects any byte sequence that is not strictly valid UTF-8. Pure ASCII content (code points 0–127) is by definition a valid UTF-8 subset and will therefore also return True; this is the expected behaviour and reflects the fact that an ASCII string can be safely treated as UTF-8 without further conversion. isUTF8 is intended as a reliable preflight check before calling fromUTF8, and is itself called internally by the toString method to decide whether automatic decoding is required.
The contents of the StringClass instance are not modified by isUTF8. Earlier versions of the StringClass relied on a heuristic detection that could produce false positives or false negatives for certain character combinations within the Windows-1252 codepage; the current API-based implementation is fully deterministic and replaces that heuristic.
|
See Also
| |
Example
| Sub Main |