Digging for performance gold: finding hidden performance wins
Let’s talk about 1%. 1% is quite large in practice. The core metric we use is “jank” which is a noticeable delay between when the user gives input and when software reacts to it. Chrome measures jank every 30 seconds, so Jank in 1% of samples for a given user means jank once every 50 minutes. To that user, Chrome feels slow at those moments. Now the problem: can we find and fix the root causes of all the ways Chrome can be momentarily slow for our users?
First, define a scenario. For this work, we focus on user-visible Jank, which we measure in the field as a way to systematically identify moments where Chrome feels slow.
Second, gather high actionability bug reports in the field. For this we rely on Chrome’s BackgroundTracing infrastructure to generate what we call Slow Reports. A subset of Canary users who have opted in to sharing anonymized metrics have circular-buffer tracing enabled to examine specific scenarios. If a preconfigured threshold on a metric of interest is hit, the trace buffer is captured, anonymized, and uploaded to Google servers.
Such a bug report might look like this:
We have a culprit! Let’s optimize AutocompleteController? No! We don’t know why yet: keep assuming ignorance!
By augmenting BackgroundTracing with stack sampling, we were able to find a recurring stack under stalled AutoComplete events:
RegEnumValueWStub
base::win::RegistryValueIterator::Read()
gfx::`anonymous namespace\'::CachedFontLinkSettings::GetLinkedFonts
gfx::internal::LinkedFontsIterator::GetLinkedFonts()
gfx::internal::LinkedFontsIterator::NextFont(gfx::Font *)
gfx::GetFallbackFonts(gfx::Font const &)
gfx::RenderTextHarfBuzz::ShapeRuns(...)
gfx::RenderTextHarfBuzz::ItemizeAndShapeText(...)
gfx::RenderTextHarfBuzz::EnsureLayoutRunList()
gfx::RenderTextHarfBuzz::EnsureLayout()
gfx::RenderTextHarfBuzz::GetStringSizeF()
gfx::RenderTextHarfBuzz::GetStringSize()
OmniboxTextView::CalculatePreferredSize()
OmniboxTextView::ReapplyStyling()
OmniboxTextView::SetText...)
OmniboxResultView::Invalidate()
OmniboxResultView::SetMatch(AutocompleteMatch const &)
OmniboxPopupContentsView::UpdatePopupAppearance()
OmniboxPopupModel::OnResultChanged()
OmniboxEditModel::OnCurrentMatchChanged()
OmniboxController::OnResultChanged(bool)
AutocompleteController::UpdateResult(bool,bool)
AutocompleteController::Start(AutocompleteInput const &)
(...)
And before we figure that out, how do we know this is the #1 root cause of our overall long-tail performance issue? We’ve only looked at one trace so far after all...
Slow Reports tell us what the problem is for a specific user but not how many users are affected. And while we can query our corpus of Slow Report traces, it comes with inherent biases that make it impossible to correlate 1:1 with metrics. For instance, because Chrome only reports the first instance of bad performance per-session and only for users of the Canary/Dev channel, there’s both a startup and a population bias.
This is the measurement conundrum. The more actionability (data) a tool provides, the fewer scenarios it captures and the more bias it incurs. Depth vs. breadth.
Tools that attempt to do both sit somewhere in the middle, where they use aggregation over a large dataset and risk showing aggregate results based on flawed input (e.g. circular buffer tracing having dropped the interesting portion and contributing to a biased aggregate).
Thus we scientifically opted for the least engineering-minded option: open a bunch of Slow Report traces manually. This gave us the most actionability over a top-level issue we’d already quantified.
After opening dozens of traces it turned out that a great majority showed variations of the aforementioned fonts issue. While this didn’t give us a precise #users-affected, it was enough for us to believe it was the main cause of user pain seen in the metrics.
If a substring within it is from a Unicode Block that can’t be rendered by the given font, GetFallbackFont() is used to request the system recommended fallback font for it. If that fails, GetFallbackFonts() is invoked to try all the linked fonts and determine the one that can best render it; that second fallback is much slower.
GetFallbackFont() should never fail, but in practice it’s not that simple. The reliable way to do this on Windows is to query DirectWrite; however, DirectWrite was added in Windows 7, when Chrome still supported Windows XP. Therefore the GetFallbackFont() logic was forced to stick to a less reliable heuristic using Uniscribe+GDI in order to work on both versions of the OS. Since things worked most of the time, no one noticed that this could have been cleaned up when Chrome later dropped support for Windows XP. With new tooling to investigate long-tail performance, this turned out to be the number one cause of jank (unnecessarily invoking GetFallbackFonts()).
We fixed that, reducing the amount of calls to GetFallbackFonts() by 4x.
Still not zero though, and still seeing instances of the aforementioned AutoComplete issue in our Slow Reports. Keep digging. DirectWrite’s GetFallbackFont() failing was unexpected, but since Slow Reports are anonymized, no user-generated strings can be uploaded -- and therefore, finding which codepoints were problematic was tricky. We teamed up with our privacy experts to instrument Unicode Block and Script of text blocks going through HarfBuzz so that we could ensure no leakage of Personally Identifiable Information.
But wait, doesn’t Chrome support modern Unicode..?! Indeed, it does, in Blink which renders the web content. But the browser-side logic was not updated to support modern emojis (with modifiers) because it didn’t use to draw emojis at all. It’s only when the browser UI (tab strip, bookmark bar, omnibox, etc.) was modernized to support Unicode circa 2018 that the legacy segmentation logic became an (invisible) problem.
On top of that, the caching logic did not cache on error, so trying to render a modifier on its own caused a massive jank, every time, for users with a lot of fonts installed. Ironically, this cache had been added to amortize the cost of this misunderstood bottleneck when Unicode support was first added to browser UI. Diving deeper into the underlying implementation of our fonts logic, rather than stopping at the layer of the fonts APIs, not only fixed a major performance issue but also resulted in a correctness fix for other emojis. For instance, π³️π is encoded as U+1F3F3( π³️) + U+1F308 (π); before the itemization fix, browser UI would incorrectly render this grapheme as π³️π.
Using this approach, we have reduced user-visible jank by a factor of 10X over the last 2.5 years and improved long-tail performance of many features caught in the cross-fire.
Data source for all statistics: Real-world data anonymously aggregated from Chrome clients.