Current PHP parser section ids are derived from heading content using an old, ad-hoc escaping scheme invented by yours truly to satisfy the unreasonable demands of XHTML4. This scheme percent-encodes the section id, and then replaces percent signs with a full stop ..
The purpose of making this change is to satisfy the #3 wish on this year's Community Wishlist:
"Non-Latin section headings are displayed terribly in URL anchors and can't be reached directly":
https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/Miscellaneous#Non-Latin_section_headings_are_displayed_terribly_in_URL_anchors_and_can.27t_be_reached_directly
HTML5 supports UTF-8 with only minimal escaping of the hash sign in particular. Switching to HTML5 section anchors would have several advantages:
- More readable section links, especially in non-ascii languages.
- Simplified editing clients without a need to implement a legacy algorithm for deriving section ids from headings.
- Simplified / more accurate DOM spec documentation.
Disadvantages include:
- Break incoming links to sections from the internet.
- Potentially break internal links to sections.
Migration options
Client-side fall-back
Client-side JS can look at the URL hash, and check if the section was found. If it was not found, it can encode headings using the old-style escaping algorithm, and check if the hash matches any of those. If it does, rewrite the hash to the matching new-style section.
As a result, users would be encouraged to fix links to new-style section ids. Existing links (both internal and external) would continue to work as long as the fall-back JS is active.
Automatic migration of internal references
Idea: Recognize old style escape pattern (/\.[0-9A-F]{2}/), and rewrite the full stop to %, then decode.
Issues:
- Chance of false positives and conversion failures.
- Only helps with internal links.
Add old style & new style section ids for a while
Pros:
- Keeps links working during transition period
Cons:
- Complicates HTML
- Does not encourage a migration / surface correct section ID