Difference between revisions of "Release Management/Nightly Respin"

From MozillaWiki
Jump to: navigation, search
m (Add a process card)
 
(10 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{DISPLAYTITLE:The Firefox Nightly respin process}}
+
{{DISPLAYTITLE:Nightly Usability}}
  
{{Processbox
+
<p style="font-size: larger; font-weight:bold;">This page details the process to back out a faulty patch and, if needed, get nightlies respun.</p>
  | Process name = Nightly respin
 
  | Purpose      = Update our users broken nightlies
 
  | Why          = A patch on mozilla-central broke Nightly badly (crash, broken UI…)
 
  | Goals        = Nightly population retention
 
  | People      = Relman, releng
 
}}
 
  
<p style="font-size: larger; font-weight:bold;">This page details the process to back out a faulty patch and, if needed, get nightlies respun.</p>
+
Maintaining Firefox Nightly usability as the user’s main browser is a responsibility shared by everyone contributing to Firefox development. Our goal is to keep the Firefox Nightly channel as stable and usable as possible for our community. If the browser crashes or faces severe usability issues, Nightly users may leave, reducing critical feedback during development. Nightly feedback comes through crash reports, bug reports, and various in-product metrics. Getting new users on the nightly channel is hard. Maintaining a sizable active population on this channel is more valuable than avoiding disruptions such as backing out a patch or pausing updates for a short period.
 +
When a severe regression is introduced in Firefox Nightly, the goal is to address the issue as quickly as possible. Depending on the scenario, this may require pausing Nightly updates, backing out the regressing patch, and creating new Nightly builds before resuming updates. The Release Management team ensures the balance between allowing expected bugs on this channel and maintaining the highest possible quality to retain a user base.
  
  
 
== Summarized process ==
 
== Summarized process ==
 +
# Identify the regression (e.g., crash spike, core functionality issue).
 +
# File/track the bug in Bugzilla, ensuring all necessary flags are set.
 +
# Pause updates if the regression impacts a large number of users.
 +
# Request a backout via the Sheriffs in Matrix.
 +
# Request new builds via the Sheriffs, if necessary.
 +
# Communicate with the community about the bug and its resolution.
  
# File a bug with as much details possible about the regression / crash if a bug hasn't been filed yet
+
==How to track in Bugzilla==
# Ask sheriffs to stop automatic nightly updates because of the bug filed in step 1
+
* If it is a crash spike, then file the bug via Socorro.
# Warn our users about the regression via our Twitter account and #nightly IRC channel, give the bug number.
+
** If the bug was already reported then ensure the bug has the correct crash signature.
# Investigate to find the faulty patch via mozregression or stack traces for crashes
+
* If the bug was already filed by a community member, then use it to track the regression. Add the nightly-community keyword if missing.
# Ask sheriffs for the back out of the patch and nightly respun, give the bug number as reference
+
* If it is a functional regression and no bug has been created yet, then create a bug.
# Mark the bug as blocking the bug referenced for the faulty patch
 
# Ask the patch author to investigate the regression (NeedInfo in Bugzilla)
 
# When updates are back announce that the fix is served on Twitter and IRC
 
  
== When should we back out a patch? ==
+
The bug will be used to track a resolution for the regression. Ensure the following are set on the bug, where N is the version number for Nightly:
 +
* ''status-firefoxN'' tracking flag is set as affected
 +
* ''tracking-firefoxN'' set as blocking
 +
* Target milestone set to ''mozillaN''
  
We want to back out a patch when a significant regression is identified. This is usually either a functional regression (browser unusable, content rendering broken) or a sudden spike of crashes on the Nightly channel.
+
==When should we pause automatic background updates and backout a patch?==
  
We want the nightly channel to be as stable and usable by our community as possible. If the browser is crashing or barely usable, our users leave and we need to make we have a sizable nightly community. Deciding on backing out a patch, blocking automatic updates and rebuilding nightlies after the back out is a balance between the fact that bugs are expected on this channel and the fact that we want this channel to be of the highest quality possible to have a user base. The Release Management team are in charge of finding this balance.
+
If you expect that many users will be impacted by a regression, then we should pause background updates.
  
== How to find the patch to back out ==
+
We should stop background updates and back out in the following scenarios:
 +
* New crash spike
 +
* Regression in core functionality without an intuitive workaround, for example:
 +
** Usability issue on a Top site
 +
** Awesome bar usability
 +
** Bookmarks are inaccessible or cannot be created
 +
** History is inaccessible or cannot be created/deleted
 +
** Possible data loss
 +
** Firefox Sync is not sending or receiving data
  
If it is a functional regression (reproducible case), then we should use [https://mozilla.github.io/mozregression/quickstart.html mozregression]. If it is a spike in crashes not necessarily reproducible (random crashes while surfing), then our crash analysis experts in the Release Management team should be contacted. The analysis of the stack trace combined with hg logs on mozilla-central often allow finding the bug number that introduced the instability.
+
There can be scenarios where the regression in core functionality is only experienced by users who have enabled an experimental feature via Firefox Labs. Though the workaround for the user is to disable the experimental feature, this may not be intuitive for users. We will still aim to backout the regressing patch.  
  
== Bug filing ==
+
==How to pause automatic background updates==
  
* If the bug was already filed by a community member, then use it to track the regression and qualify it. Add the nightly-community keyword if missing.
+
===Desktop===
* If it is a crash, get a Crash ID from the people that reported it and file a bug via Socorro.
+
Blocking automatic updates will not prevent new users from installing Firefox Nightly from [https://www.mozilla.org/firefox/channel/desktop/ mozilla.org] but it will mitigate the impact on our existing user base. Also, blocking automatic updates will not prevent users from checking manually for updates.
* If it is a functional regression and no bug was filed yet, file it.
 
  
Have the ''status-firefoxN'' tracking flag set as ''affected'', the ''tracking-firefoxN'' set as ''blocking'' and the target milestone set to ''mozillaN'' where N is the version number for Nightly.
+
Currently, Release Management cannot pause automatic updates only for specific platforms.
  
Once the back out is done, mark the bug as FIXED and change the ''status-firefoxN'' tracking flag from ''affected'' to ''fixed''.
+
Release management can pause Firefox Desktop Nightly automatic updates as follows:
 +
# View the [https://balrog.services.mozilla.com/rules?product=Firefox&channel=nightly Firefox: nightly] rule in Balrog
 +
# Select Disable Updates from the options menu
 +
# Add the regression bug number as the reason comment
  
The bug number will be used to track the work to fix the regression. Communicate to our community that a bug exists so as to avoid having many duplicate bugs filed.
+
When updates are paused the [https://whattrainisitnow.com/release/?version=nightly Nightly] page on [https://whattrainisitnow.com whattrainisitnow.com] will display a message and include a link to the regression bug.
  
== Stopping automatic background updates ==
+
===Mobile===
 +
Pausing Firefox Nightly updates on Mobile is not as flexible as Firefox Desktop due to app store restrictions.
  
If you think that a lot of people are going to be impacted by a regression, ask sheriffs to stop automatic updates. You can contact the on-call sheriff in the #sheriffs IRC channel. In case nobody is available, release engineering can do it (#releng channel).
+
Release Management manually submits Firefox Nightly mobile builds for app review.
 +
* If the severe regression is in a build that has not yet been pushed for app review, then do not push it.
 +
* If the severe regression is in a build that was pushed but is still pending review, then cancel the submission.
 +
If the issue is specific to a device or OS version, the app can be limited or blocked from those devices in the Google Play Store via the device catalog.
  
Blocking automatic updates will not prevent new users to install Firefox Nightly from mozilla.org but it will mitigate greatly the impact on our existing user base.
+
===How to find the patch to back out===
 +
If it is a crash spike, then the stack trace analysis combined with the patches that landed in mozilla-central can help find the bug that introduced the instability.
 +
* If it’s unclear then reach out on [https://mozilla.enterprise.slack.com/archives/C4D3JFF26 #firefox] in Slack or [https://matrix.to/#/#stability:mozilla.org #stability] on Matrix.
 +
If it is a functional regression bug with a reproducible case, then we can use [https://mozilla.github.io/mozregression/quickstart.html mozregression] to identify the regressing bug.
  
We can ask for automatic updates to be stoped for a specific OS.
+
If it is a functional regression bug without a reproducible case but there is an approximate window of when the bug was introduced, then the possible regressor could be found by looking through the patches that recently landed in Mozilla-central.
 +
* If it’s unclear then reach out on [https://mozilla.enterprise.slack.com/archives/C4D3JFF26 #firefox] in Slack.
  
Most of the major regressions are reported immediately via our @FirefoxNightly Twitter account followers, usually when more than 2 people report a similar regression there is a high chance that it will be serious and stopping automatic updates should be done rapidly.
+
It could be identified that the regression is caused by an experiment rollout. Check if any experiments were recently enabled and work with the responsible engineering team to disable the experiment.
  
'''Note:''' Some members of the release management team have the technical knowledge and permissions to stop automatic updates.
+
===How to request a backout===
 +
The Sheriffs manage backing out patches from mozilla-central. You can contact sheriffs in the [https://matrix.to/#/#sheriffs:mozilla.org #sheriffs] channel on Matrix. Let them know the bug you want backed out and the reason for the backout. The Sheriffs will reopen the regressor with a comment that includes a link to the backout commit, the reason for the backout, and a needinfo on the assignee.
  
== Asking for a back out of the patch  and new nightlies ==
+
Once the backout is done, you can resolve the regression bug as FIXED and change the ''status-firefoxN'' tracking flag from ''affected'' to ''fixed''.
  
You can contact sheriffs in the #sheriffs IRC channel to back out the patch that caused the regression when you have identified it. The back out commit will reference the bug number.
+
''Note'': Release managers manage backing out patches from mozilla-beta and mozilla-release.
  
If the next nightly is about to be built and the impact is moderate, it is not necessarily needed to ask for nightly builds to be respun after the back out.
+
===How to request new Nightly builds===
 +
The Sheriffs can trigger new Nightly builds. You can contact sheriffs in the [https://matrix.to/#/#sheriffs:mozilla.org #sheriffs] channel on Matrix.
  
'''Note:''' Some members of the release management team have the technical knowledge and permissions to back out patches.
+
New Nightly builds could be requested when requesting the Sheriffs to backout the regressing patches.
 +
* If the next nightly is about to build then it may be better to wait for the scheduled build.
 +
* If the regression is platform specific then you can request new nightlies for that platform only.
 +
device catalog.
  
 
== Communicating about the issue ==
 
== Communicating about the issue ==
Line 73: Line 97:
 
Communicating about major regressions in Nightly is also part of the informal social contract we have with our alpha testers, making sure they are informed of major technical issues impacting them helps keeping them engaged.
 
Communicating about major regressions in Nightly is also part of the informal social contract we have with our alpha testers, making sure they are informed of major technical issues impacting them helps keeping them engaged.
  
The main communication channels to communicate a regression are our @FirefoxNightly Twitter account, our #nightly IRC channel (bridged with our Telegram group) and potentially the #nightly-newbies slack channel if employees or NDAed volunteers reported it there.
+
The main communication channels to communicate a regression are our [https://twitter.com/firefoxnightly @FirefoxNightly] Twitter account, our [https://mozilla.social/@FirefoxNightly @FirefoxNightly@mozilla.social] Mastodon account and our [https://matrix.to/#/#nightly:mozilla.org #Nightly] chatroom on  Matrix/Element.
 +
 
 +
==Decision Matrix==
 +
{| class="wikitable"
 +
|-
 +
!  Scenario !!Disable Updates !! Backout Regressor !! Request New Nightlies !! Additonal Info
 +
|-
 +
| '''New Top Crash''' || Yes || Yes || Yes ||
 +
|-
 +
| '''High severity''' bug impacting '''core functionality'''  || Yes || Yes || Yes ||
 +
|-
 +
| '''High severity''' bug only impacting '''experimental functionality''' exposed by the Firefox Labs UI  || No || No || No || Expected that engineering will prioritize a fix. If a fix is not expected within the next Nightly build window then consider back out the regressor.
 +
|-
 +
| '''Medium severity''' bug impacting '''core functionality''' || No || No || No || Expected that engineering will prioritize a fix.
 +
May consider backing out in beta if we don’t have a fix during Nightly.
 +
|-
 +
| '''Medium severity''' bug only impacting '''experimental functionality''' exposed by the Firefox Labs UI || No || No || No ||  Expected that engineering will prioritize a fix.
 +
|-
 +
| '''Medium severity''' bug only impacting '''experimental functionality''' exposed by the Firefox Labs UI || No || No || No || Expected that engineering will prioritize a fix.
 +
|-
 +
| '''Low severity''' bug impacting '''core functionality''' || No || No || No || Expected that engineering will prioritize a fix.
 +
|-
 +
| '''Low severity''' bug only impacting '''experimental functionality''' exposed by the Firefox Labs UI || No || No || No || Expected that engineering will prioritize a fix.
 +
 
 +
 
 +
|}
  
 
[[Category:Release_Management]]
 
[[Category:Release_Management]]
 
[[Category:Release_Management:Processes|Nightly Respin]]
 
[[Category:Release_Management:Processes|Nightly Respin]]

Latest revision as of 16:19, 18 October 2024


This page details the process to back out a faulty patch and, if needed, get nightlies respun.

Maintaining Firefox Nightly usability as the user’s main browser is a responsibility shared by everyone contributing to Firefox development. Our goal is to keep the Firefox Nightly channel as stable and usable as possible for our community. If the browser crashes or faces severe usability issues, Nightly users may leave, reducing critical feedback during development. Nightly feedback comes through crash reports, bug reports, and various in-product metrics. Getting new users on the nightly channel is hard. Maintaining a sizable active population on this channel is more valuable than avoiding disruptions such as backing out a patch or pausing updates for a short period. When a severe regression is introduced in Firefox Nightly, the goal is to address the issue as quickly as possible. Depending on the scenario, this may require pausing Nightly updates, backing out the regressing patch, and creating new Nightly builds before resuming updates. The Release Management team ensures the balance between allowing expected bugs on this channel and maintaining the highest possible quality to retain a user base.


Summarized process

  1. Identify the regression (e.g., crash spike, core functionality issue).
  2. File/track the bug in Bugzilla, ensuring all necessary flags are set.
  3. Pause updates if the regression impacts a large number of users.
  4. Request a backout via the Sheriffs in Matrix.
  5. Request new builds via the Sheriffs, if necessary.
  6. Communicate with the community about the bug and its resolution.

How to track in Bugzilla

  • If it is a crash spike, then file the bug via Socorro.
    • If the bug was already reported then ensure the bug has the correct crash signature.
  • If the bug was already filed by a community member, then use it to track the regression. Add the nightly-community keyword if missing.
  • If it is a functional regression and no bug has been created yet, then create a bug.

The bug will be used to track a resolution for the regression. Ensure the following are set on the bug, where N is the version number for Nightly:

  • status-firefoxN tracking flag is set as affected
  • tracking-firefoxN set as blocking
  • Target milestone set to mozillaN

When should we pause automatic background updates and backout a patch?

If you expect that many users will be impacted by a regression, then we should pause background updates.

We should stop background updates and back out in the following scenarios:

  • New crash spike
  • Regression in core functionality without an intuitive workaround, for example:
    • Usability issue on a Top site
    • Awesome bar usability
    • Bookmarks are inaccessible or cannot be created
    • History is inaccessible or cannot be created/deleted
    • Possible data loss
    • Firefox Sync is not sending or receiving data

There can be scenarios where the regression in core functionality is only experienced by users who have enabled an experimental feature via Firefox Labs. Though the workaround for the user is to disable the experimental feature, this may not be intuitive for users. We will still aim to backout the regressing patch.

How to pause automatic background updates

Desktop

Blocking automatic updates will not prevent new users from installing Firefox Nightly from mozilla.org but it will mitigate the impact on our existing user base. Also, blocking automatic updates will not prevent users from checking manually for updates.

Currently, Release Management cannot pause automatic updates only for specific platforms.

Release management can pause Firefox Desktop Nightly automatic updates as follows:

  1. View the Firefox: nightly rule in Balrog
  2. Select Disable Updates from the options menu
  3. Add the regression bug number as the reason comment

When updates are paused the Nightly page on whattrainisitnow.com will display a message and include a link to the regression bug.

Mobile

Pausing Firefox Nightly updates on Mobile is not as flexible as Firefox Desktop due to app store restrictions.

Release Management manually submits Firefox Nightly mobile builds for app review.

  • If the severe regression is in a build that has not yet been pushed for app review, then do not push it.
  • If the severe regression is in a build that was pushed but is still pending review, then cancel the submission.

If the issue is specific to a device or OS version, the app can be limited or blocked from those devices in the Google Play Store via the device catalog.

How to find the patch to back out

If it is a crash spike, then the stack trace analysis combined with the patches that landed in mozilla-central can help find the bug that introduced the instability.

If it is a functional regression bug with a reproducible case, then we can use mozregression to identify the regressing bug.

If it is a functional regression bug without a reproducible case but there is an approximate window of when the bug was introduced, then the possible regressor could be found by looking through the patches that recently landed in Mozilla-central.

  • If it’s unclear then reach out on #firefox in Slack.

It could be identified that the regression is caused by an experiment rollout. Check if any experiments were recently enabled and work with the responsible engineering team to disable the experiment.

How to request a backout

The Sheriffs manage backing out patches from mozilla-central. You can contact sheriffs in the #sheriffs channel on Matrix. Let them know the bug you want backed out and the reason for the backout. The Sheriffs will reopen the regressor with a comment that includes a link to the backout commit, the reason for the backout, and a needinfo on the assignee.

Once the backout is done, you can resolve the regression bug as FIXED and change the status-firefoxN tracking flag from affected to fixed.

Note: Release managers manage backing out patches from mozilla-beta and mozilla-release.

How to request new Nightly builds

The Sheriffs can trigger new Nightly builds. You can contact sheriffs in the #sheriffs channel on Matrix.

New Nightly builds could be requested when requesting the Sheriffs to backout the regressing patches.

  • If the next nightly is about to build then it may be better to wait for the scheduled build.
  • If the regression is platform specific then you can request new nightlies for that platform only.

device catalog.

Communicating about the issue

We should not hesitate to communicate the issue with a reference to the bug number to our community so as to minimize the number of duplicate bugs. If the issue needs steps to reproduce which are not obvious or a specific hardware/OS combination, having all communications centralized in a single bug helps.

We should also remember communicating about the resolving of the issue and urge people to upgrade to the updated nightly (so as to reduce automatic crash reports and unneeded bugs filed).

Communicating about major regressions in Nightly is also part of the informal social contract we have with our alpha testers, making sure they are informed of major technical issues impacting them helps keeping them engaged.

The main communication channels to communicate a regression are our @FirefoxNightly Twitter account, our @FirefoxNightly@mozilla.social Mastodon account and our #Nightly chatroom on Matrix/Element.

Decision Matrix

Scenario Disable Updates Backout Regressor Request New Nightlies Additonal Info
New Top Crash Yes Yes Yes
High severity bug impacting core functionality Yes Yes Yes
High severity bug only impacting experimental functionality exposed by the Firefox Labs UI No No No Expected that engineering will prioritize a fix. If a fix is not expected within the next Nightly build window then consider back out the regressor.
Medium severity bug impacting core functionality No No No Expected that engineering will prioritize a fix.

May consider backing out in beta if we don’t have a fix during Nightly.

Medium severity bug only impacting experimental functionality exposed by the Firefox Labs UI No No No Expected that engineering will prioritize a fix.
Medium severity bug only impacting experimental functionality exposed by the Firefox Labs UI No No No Expected that engineering will prioritize a fix.
Low severity bug impacting core functionality No No No Expected that engineering will prioritize a fix.
Low severity bug only impacting experimental functionality exposed by the Firefox Labs UI No No No Expected that engineering will prioritize a fix.