Nothing Special   »   [go: up one dir, main page]

Closed Bug 1917242 Opened 2 months ago Closed 2 months ago

Chained MozPromise objects may destroy callbacks in wrong order

Categories

(Core :: XPCOM, defect, P2)

defect

Tracking

()

RESOLVED FIXED
132 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox-esr128 --- fixed
firefox130 --- wontfix
firefox131 --- fixed
firefox132 --- fixed

People

(Reporter: janv, Assigned: janv)

References

(Regressed 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(2 files)

https://searchfox.org/mozilla-central/rev/cc01f11adfacca9cd44a75fd140d2fdd8f9a48d4/xpcom/threads/MozPromise.h#850-872

    void DoResolveOrRejectInternal(ResolveOrRejectValue& aValue) override {
      // Note: The usage of InvokeCallbackMethod here requires that
      // ResolveFunction/RejectFunction are capture-lambdas (i.e. anonymous
      // classes with ::operator()), since it allows us to share code more
      // easily. We could fix this if need be, though it's quite easy to work
      // around by just capturing something.
      if (aValue.IsResolve()) {
        InvokeCallbackMethod<SupportChaining>(
            mResolveFunction.ptr(), &ResolveFunction::operator(),
            MaybeMove(aValue.ResolveValue()), std::move(mCompletionPromise));
      } else {
        InvokeCallbackMethod<SupportChaining>(
            mRejectFunction.ptr(), &RejectFunction::operator(),
            MaybeMove(aValue.RejectValue()), std::move(mCompletionPromise));
      }

      // Destroy callbacks after invocation so that any references in closures
      // are released predictably on the dispatch thread. Otherwise, they would
      // be released on whatever thread last drops its reference to the
      // ThenValue, which may or may not be ok.
      mResolveFunction.reset();
      mRejectFunction.reset();
    }

The problem is that mResolveFunction and mRejectFunction are actually not destroyed right after invocation, they are destroyed after invocation and chaining completion promise https://searchfox.org/mozilla-central/rev/cc01f11adfacca9cd44a75fd140d2fdd8f9a48d4/xpcom/threads/MozPromise.h#662-672
If the current thread is suspended after calling ChainTo which can call Dispatch https://searchfox.org/mozilla-central/rev/cc01f11adfacca9cd44a75fd140d2fdd8f9a48d4/xpcom/threads/MozPromise.h#590
and if other threads processing next tasks are fast enough, callbacks may end up being destroyed in wrong order.

I'll submit a gtest which demonstrates the problem.

This is blocking bug 1917169 which blocks bug 1913679 and eventually bug 1671932.

We had a similar problem in bug 1892875 for which we added mozilla::dom::quota::RunAfterProcessingCurrentEvent in bug 1901745.

Investigation of this problem started when I was testing patches for OPFS:
https://treeherder.mozilla.org/logviewer?job_id=472923301&repo=try&lineNumber=1870

OPFS has two quite complex promise chains:
https://searchfox.org/mozilla-central/rev/cc01f11adfacca9cd44a75fd140d2fdd8f9a48d4/dom/fs/parent/datamodel/FileSystemDataManager.cpp#526-674

It turns out that Android is good at slowing down/suspending one thread while allowing other threads to proceed very fast.
So https://searchfox.org/mozilla-central/rev/cc01f11adfacca9cd44a75fd140d2fdd8f9a48d4/dom/fs/parent/datamodel/FileSystemDataManager.cpp#549 can be released only after all other captures are released. The last release of the manager then happens on the QM IO thread (but it should be done on the PBackground thread).

I think I found a changeset which regressed this https://hg.mozilla.org/mozilla-central/rev/3297397ead64

Keywords: regression
Regressed by: 1367679

Set release status flags based on info from the regressing bug 1367679

Pushed by jvarga@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/25f901c0c28e Add a gtest for expected callback destruction order; r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/cede0737b371 Destroy callbacks before chaining completion promise; r=xpcom-reviewers,nika
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 132 Branch

The patch landed in nightly and beta is affected.
:janv, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox131 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(jvarga)

Comment on attachment 9423202 [details]
Bug 1917242 - Destroy callbacks before chaining completion promise; r=#xpcom-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: Users might still experience "safe" main process crashes in some cases.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This fixes a regression and basically gets the code back to state before bug 1367679.
  • String changes made/needed: None
  • Is Android affected?: Yes
Flags: needinfo?(jvarga)
Attachment #9423202 - Flags: approval-mozilla-beta?

:janv does the gtest patch need to be uplifted too? Doesn't graft cleanly without it.

Flags: needinfo?(jvarga)

(In reply to Dianna Smith [:diannaS] from comment #9)

:janv does the gtest patch need to be uplifted too? Doesn't graft cleanly without it.

The gtest is not essential for the uplift, but let me quickly take a look why it doesn't graft cleanly.

Flags: needinfo?(jvarga)

(In reply to Jan Varga [:janv] from comment #10)

(In reply to Dianna Smith [:diannaS] from comment #9)

:janv does the gtest patch need to be uplifted too? Doesn't graft cleanly without it.

The gtest is not essential for the uplift, but let me quickly take a look why it doesn't graft cleanly.

Sorry, I forgot to remove it from a bigger patch stack, the two patches are now in a separate patch stack and they graft cleanly to current beta.

Unfortunately, I would still get an error because I'm not landing the patch stack. Im grafting the one change and the changes in TestMoxPromise.cpp include changes to the "TEST(MozPromise, ObjectDestructionOrder)" which doesnt exist in beta because it was added by the above gtest.

I can ignore the changes to that file during the conflict resolution if you would like?

Flags: needinfo?(jvarga)

Well, either land both patches or land just the second patch (D221321) and ignore this part:

diff --git a/xpcom/tests/gtest/TestMozPromise.cpp b/xpcom/tests/gtest/TestMozPromise.cpp
--- a/xpcom/tests/gtest/TestMozPromise.cpp
+++ b/xpcom/tests/gtest/TestMozPromise.cpp
@@ -810,17 +810,17 @@ TEST(MozPromise, MapErr)
       });
 
   NS_ProcessPendingEvents(nullptr);
 
   EXPECT_EQ(result, 2.0);
   EXPECT_EQ(ran_ok, false);
 }
 
-TEST(MozPromise, DISABLED_ObjectDestructionOrder)
+TEST(MozPromise, ObjectDestructionOrder)
 {
   AutoTaskQueue atq;
   RefPtr<TaskQueue> queue = atq.Queue();
 
   nsTArray<size_t> list;
 
   bool done = false;
 
Flags: needinfo?(jvarga)

Thanks, and sorry for the trouble.

Comment on attachment 9423202 [details]
Bug 1917242 - Destroy callbacks before chaining completion promise; r=#xpcom-reviewers

Approved for 131.0b8

Attachment #9423202 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Should we uplift this to ESR128 also? Please nominate if so.

Flags: needinfo?(jvarga)

Yeah, it's quite nasty bug, but it seems it doesn't cause problems very often, but who knows.
I'll nominate it for ESR128 ASAP.

Comment on attachment 9423202 [details]
Bug 1917242 - Destroy callbacks before chaining completion promise; r=#xpcom-reviewers

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: The fix was originally targeted for a specific component (bug 1917169), but since MozPromise is a low-level helper class that's widely used across various components, they too can benefit from the fix.
  • User impact if declined: Users might still experience "safe" main process crashes in some cases.
  • Fix Landed on Version: 132 and 131
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This fixes a regression and basically gets the code back to state before bug 1367679.
Flags: needinfo?(jvarga)
Attachment #9423202 - Flags: approval-mozilla-esr128?

Comment on attachment 9423200 [details]
Bug 1917242 - Add a gtest for expected callback destruction order; r=#xpcom-reviewers

Approved for 128.4esr.

Attachment #9423200 - Flags: approval-mozilla-esr128+
Attachment #9423202 - Flags: approval-mozilla-esr128? → approval-mozilla-esr128+
Regressed by: 1929184
No longer regressed by: 1929184
Regressions: 1929184
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: