Nothing Special   »   [go: up one dir, main page]

[Unicode]  Unicode 6.2.0 Home | Site Map | Search
 

Unicode® 6.2.0

Released: 2012 September 26 (Announcement)

Version 6.2.0 has been superseded by the latest version of the Unicode Standard.

Unicode 6.2.0 is a minor version of the Unicode Standard. This page summarizes the important changes for the Unicode Standard, Version 6.2.0. In the discussion below, Version 6.2.0 may be abbreviated as "Unicode 6.2" or "Version 6.2."


Contents of This Document

A. Summary
B. Version Information
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Unicode Character Database Changes
G. Unicode Standard Annex Changes

A. Summary

Version 6.2 of the Unicode Standard is a special release dedicated to the early publication of the newly encoded Turkish lira sign. This version also rolls in various minor corrections for errata and other small updates for the Unicode Character Database. In addition, there are some significant changes to the Unicode algorithms for text segmentation and line breaking, including changes to the line break property to improve line breaking for emoji symbols.

For detailed property changes see Section F. Unicode Character Database Changes.

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.2:

This version of the Unicode Standard is synchronized with ISO/IEC 10646:2012, plus the accelerated publication of a single character: U+20BA TURKISH LIRA SIGN.

B. Version Information

Version 6.2 of the Unicode Standard consists of the core specification, the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Version 6.2.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 6.2.0, (Mountain View, CA: The Unicode Consortium, 2012. ISBN 978-1-936213-07-8)
http://www.unicode.org/versions/Unicode6.2.0/

A complete specification of the contributory files for Unicode 6.2 is found on the page Components for 6.2.0.That page also provides the recommended reference format for Unicode Standard Annexes.

The navigation bar on the left of this page provides links to both the core specification as a single file, as well as to individual chapters, and the appendices. Also provided are links to the code charts, the radical-stroke indices to CJK ideographs, the Unicode Standard Annexes and the data files for Version 6.2 of the Unicode Character Database.

Code Charts

Several sets of code charts are available. They serve different purposes:

  • The latest set of code charts for the Unicode Standard are available online. Those charts are always the most current code charts available, and may be updated at any time. The charts are organized by scripts and blocks for easy reference. An online index by character name is also provided.

For Unicode 6.2.0 in particular two additional sets of code chart pages are provided:

  • A set of delta code charts showing the block in which the Turkish lira sign was added for Unicode 6.2.0. That character is visually highlighted in the relevant chart. These delta code charts also include blocks which contain significant glyph changes to fix errata.
  • A set of archival code charts that represent the entire set of characters, names and representative glyphs at the time of publication of Unicode 6.2.0.

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Errata

Errata incorporated into Unicode 6.2 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 6.2, see the list of current Updates and Errata.

C. Stability Policy Update

A property value constraint has been added to guarantee that no new characters will be added to the standard with Decomposition_Mapping values whose first character has a non-zero Canonical_Combining_Class. There are four exceptions, which were encoded long ago, prior to Unicode 2.1.

Note: The Unicode Character Encoding Stability Policy restricts possible future changes to the Unicode Standard, but is not formally a part of the standard itself.

D. Textual Changes and Character Additions

Textual changes are very minimal in this version, and are essentially limited to adding a description for the new Turkish lira sign.

Character Assignment Overview

One new character assignment was made in the BMP for the Unicode Standard, Version 6.2. This addition brings the total number of characters assigned in the standard to 110,117. (That is the traditional count, which totals up graphic and format characters, but omits surrogate code points, ISO control codes, noncharacters, and private-use allocations.)

No new blocks are defined in Version 6.2.

E. Conformance Changes

There are no significant conformance changes in the core specification. However, there are minor changes to the text segmentation algorithms in UAX #14 and UAX #29.

F. Unicode Character Database Changes

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 6.2 can be found in UAX #44, Unicode Character Database.

Segmentation properties (Grapheme_Cluster_Break, Word_Break, Line_Break) have been modified to improve the segmentation of regional indicator symbols. Other modifications have been made to the Line_Break property values for pictographic symbols, to enable better line breaking behavior. A number of small corrections have also been made for numeric, East Asian width, script, and Unihan properties, and one name alias correction has been added.

Starting with Version 6.2, the encoding for the Unicode names list file (NamesList.txt) has been changed from Latin-1 to UTF-8. This change became possible because of an update of the charting tools which use the names list file in the production of the Unicode code charts.

The U-Source data and glyphs associated with UAX #45 have been added to the Unicode Character Database.

The Script_Extension property was changed from provisional to informative.

G. Unicode Standard Annex Changes

In Version 6.2, many of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex Changes
UAX #9
Unicode Bidirectional Algorithm
No significant changes in this version.
UAX #11
East Asian Width
A note was added to definition ED3 in Section 4 to explain the East Asian Halfwidth property of U+20A9 WON SIGN.
UAX #14
Unicode Line Breaking Algorithm
The text was modified so that property values and rules prevent breaks between Regional Indicator (RI) characters. (Sequences of more than two RI characters should be separated by other characters, such as U+200B ZWSP.)
UAX #15
Unicode Normalization Forms
Additional equivalences were added to the Design Goals.
UAX #24
Unicode Script Property
The text was rewritten substantially to incorporate a fuller explanation of the Script_Extensions property and its property value assignments. A disclaimer was added about the stability of Script and Script_Extensions property values.
UAX #29
Unicode Text Segmentation
The text was modified so that property values and rules prevent breaks between Regional Indicator (RI) characters. (Sequences of more than two RI characters should be separated by other characters, such as U+200B ZWSP.) Regular expressions have been clarified in Table 1b, Combining Character Sequences and Grapheme Clusters.
UAX #31
Unicode Identifier and Pattern Syntax
No significant changes in this version.
UAX #34
Unicode Named Character Sequences
No significant changes in this version.
UAX #38
Unicode Han Database (Unihan)
No significant changes in this version.
UAX #41
Common References for Unicode Standard Annexes
No significant changes in this version.
UAX #42
Unicode Character Database in XML
No significant changes in this version.
UAX #44
Unicode Character Database
The status of Script_Extensions was updated to informative and the type of Bidi_Mirroring was updated from String to Miscellaneous. The Unicode_1_Name property was marked as obsolete. A clarification was added regarding change control for normative and informative property values.
UAX #45
U-Source Ideographs
UAX #45 has been updated from a Unicode Technical Report to a Unicode Standard Annex for this version. The data files for UAX #45 have been added to the Unicode Character Database.