Nothing Special   »   [go: up one dir, main page]

Page MenuHomePhabricator

Stop fragmenting ParserCache entries for mobile frontend
Closed, ResolvedPublic5 Estimated Story Points

Description

30% of ParserCache is just fragmentation of mobile frontend due to using "responsiveimages" in PC key. I would have been okay with fragmentation if that was intentional but it seems it's not (MF first reparsers the page and then applies a lot of transformations).

I hope turning responsiveimages into a transformation rather than PC split would be rather easy because it would improve response time of mobile (by reusing desktop cache) and drastically reduces size of parsercache.

Cache/DB QA

To be done by @Ladsgroup

  • Testing that fragmentation stops.

Performance QA

To be done by @nray, @Ladsgroup :

Consult with Performance team for any consistent and large spikes

Functional QA steps

To be done by @Edtadros

Clear the parser cache on both desktop and mobile sites by:

  1. Visiting https://en.wikipedia.beta.wmflabs.org/wiki/Dog?action=purge on desktop device and clicking the "Yes" button
  2. Visiting https://en.m.wikipedia.beta.wmflabs.org/wiki/Dog?action=purge on desktop device and clicking the "Yes" button

Using a mobile device:

  1. Visit https://en.m.wikipedia.beta.wmflabs.org/wiki/Dog
  2. Open each section and scroll down to the bottom of the page
  3. Verify that the images load correctly, and there are no visual regressions with the infobox
  4. Click on one of the thumbnails
  5. Verify that the image loads the lightbox correctly

@Edtadros will get in touch with @EAkinloose or @Ryasmeen for QA of DiscussionTools

Using a desktop device:

  1. Visit https://en.wikipedia.beta.wmflabs.org/wiki/Dog
  2. Scroll down the bottom of the page
  3. Verify that the images load correctly, and there are no visual regressions with the infobox
  4. Click on one of the thumbnails
  5. Verify that the image loads the lightbox correctly

Sign off steps

QA Results - Beta

QA Results - Prod

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Ladsgroup flagged this as a blocker to T320534: Put Parsoid output into the ParserCache on every edit, which I was hoping to close by the end of the month. @NHillard-WMF what do you think how long it would take your team to fix this?

Re: test cases, DiscussionTools does its own transforms on talk pages on mobile web, so Editing has some interest in keeping an eye on changes that affect this.

Hi @daniel -
Thanks for weighing in. I'm realizing I could have been a bit more clear in my original post as to specific timing, sequencing, and responsibility. I can look to clarify that for now:

Our current situation is that @Ladsgroup has proposed a patch, but the Web Team needs to do QTE on our side to check on potential functional fallout.

Specifically I'm proposing the following next steps:

  1. @Ladsgroup, can you write out some cache/db-related test cases for user-facing fallout that we need to be sure to test for as we evaluate this change?
  2. We'll be writing out our own functional tests cases over at https://phabricator.wikimedia.org/T328802 . This work will start next sprint, which begins next week.
  3. Once (2) is ready, we can comment back here that this fix is ready to merge - depending on timing this may be on beta in 1-2 weeks.
  4. We can then assign this ticket to @Edtadros for verification in beta ; @DLynch, for your part, maybe it makes sense for Editing spin out a separate ticket to handle validation on your side?

Notably though, if any of these steps are missed along the way, and/or if we identify functional regressions that we need to address with followup fixes, it could push out this timeline.

In summary though, if all of the above is able to happen, and if there are not known regressions, it seems we could have this ready to go by end of month, though given the need for coordination and the potential for necessary followup fixes it may push us out beyond this. We can keep everyone up to date here as this all evolves.

@NHillard-WMF thank you for the detailed response!

Hi,
Thanks for the notes. Detailed indeed.

I'm not 100% sure on the testing here. We have three aspects here:

  • Testing that fragmentation stops. This is quite internal to our systems but I can check and make sure this would be indeed the cases.
  • Testing that no user regression happens. After this change, the mobile frontend basically starts reusing the same ParserCache entry deskop builds and uses. This might have some inadvertent effects on the mobile frontend output (or desktop outputs if it starts to pollute the parser cache). This is quite rare but not impossible.
    • Testing this is quite easy, get the patch merged, go to a beta wiki, purge a page in both mobile and desktop and make sure the mobile transformations are only visible in mobile and not desktop (e.g. "page issues" or I think infoboxes are similar too, basically anything inside #bodyContent div.
    • There are already unit tests that make sure at least parts of this happen? e.g. transformations happen in mobile.
  • Testing the impact of getting rid of stripping responsive images in mobile.
    • This won't have any regression, just might be a small performance hit, it's hard to measure or test unless we go to production but I doubt anything major could happen. Images all are lazy loaded in browsers these days (AFAIK).

Also @nray will be web team point person going forward for anything relating to this change.

NBaca-WMF changed the task status from Open to In Progress.Feb 9 2023, 6:29 PM
NBaca-WMF reassigned this task from Ladsgroup to nray.
NBaca-WMF set the point value for this task to 5.

Hi all -
An update -- first, procedurally:

  1. We're going to close out https://phabricator.wikimedia.org/T328802
  2. Move this task to our board (https://phabricator.wikimedia.org/project/view/6366/), set priority and effort estimation to match that ticket, and assign to @nray to add functional regression test steps as had originally been in scope for that other ticket
  3. @Jdlrobson will add more parameters here about what is in and out of scope for these test cases.
  4. Once test cases have been written out, we will pass this back to @Ladsgroup to merge
  5. Once merged, we'll test and post test results back here.

Second, addressing prioritization angle (cc'ing @daniel and @DLynch) :
We talked through several issues just now with this ticket as a team:
a) When we estimated, we touched on the fact that this is in a part of the code (specifically MobileFrontend) that the team as a whole is generally less familiar with, and further, we haven't made changes in this area for some time. We consequently want to approach changes in this area very cautiously. Thus the 5 story points, which for us indicates roughly a week of development effort.
b) We are over capacity for our upcoming sprint, starting next week. We have not yet done sprint planning, but one likely possibility is that we would need to push this out to a following sprint, either the one running from Feb 27 - Mar 10 or afterward.

I realize you all have dependent work around this change, but for the reasons above, we figured this made sense to push this out in an abundance of caution. Please reach out if you'd like to talk further about this.

Re: test cases, DiscussionTools does its own transforms on talk pages on mobile web, so Editing has some interest in keeping an eye on changes that affect this.

@DLynch Thank you for bringing this up. Can you elaborate on how we could verify that this behavior is still working correctly? For example, what QA steps should we run through on beta?

Change 890127 had a related patch set uploaded (by Nray; author: Nray):

[mediawiki/extensions/MobileFrontend@master] Don't add srcset attribute to lazy loaded image element in data saver mode

https://gerrit.wikimedia.org/r/890127

FYI I analyzed the payload effects this change has on the top 100 most popular pages in January in T293303#8630702

Change 890509 had a related patch set uploaded (by Nray; author: Nray):

[operations/mediawiki-config@master] Add static "Cleopatra" page to facilitate synthetic testing of 885362

https://gerrit.wikimedia.org/r/890509

@nray you need to check a talk page on a wiki with DiscussionTools enabled for mobile (currently the default, and is turned on everywhere but enwiki) -- e.g. https://fr.m.wikipedia.org/wiki/Discussion:C%C3%B4te_d%27Ivoire. There should be:

  • metadata added to the top of the content about the last comment
  • collapsed-by-default sections that have been marked up with discussion metadata and subscription links
  • within those sections there should be "reply" links that trigger the DiscussionTools reply widget after every signature

There's also special behavior associated with the lede section of the page if it contains content on a talk page; if so, there should be a "Learn more about this page" button after that initial comment metadata, and clicking on it should pop up a dialog that shows that initial content. (Unless the lede section contains a comment, in which case it shouldn't be hidden after all.)

image.png (1×834 px, 201 KB)

If I had to guess about places where the responsive image change might hypothetically cause an issue, images in the lede section is where my mind jumps, but that's just a gut feeling...

Someone like @EAkinloose or @Ryasmeen might be better suited to give you a "what we do to comprehensively test DiscussionTools on mobile" overview, admittedly. 😅

Change 890773 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/mobile-synthetic-monitoring-tests@master] Add temporary

https://gerrit.wikimedia.org/r/890773

Change 885362 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] Completely get rid of responsiveimages removal

https://gerrit.wikimedia.org/r/885362

After QA is done, I'll backport this early next week to isolate the impact and ease the revert in case needed.

Change 890509 merged by jenkins-bot:

[operations/mediawiki-config@master] Add static "Cleopatra" page to facilitate synthetic testing of 885362

https://gerrit.wikimedia.org/r/890509

Mentioned in SAL (#wikimedia-operations) [2023-02-21T21:08:29Z] <catrope@deploy1002> Started scap: Backport for [[gerrit:890509|Add static "Cleopatra" page to facilitate synthetic testing of 885362 (T326147 T293303)]]

Mentioned in SAL (#wikimedia-operations) [2023-02-21T21:10:23Z] <catrope@deploy1002> catrope and nray: Backport for [[gerrit:890509|Add static "Cleopatra" page to facilitate synthetic testing of 885362 (T326147 T293303)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-02-21T21:18:58Z] <catrope@deploy1002> Finished scap: Backport for [[gerrit:890509|Add static "Cleopatra" page to facilitate synthetic testing of 885362 (T326147 T293303)]] (duration: 10m 28s)

@Ladsgroup Regarding the possibility of parser cache issues between the mobile and desktop sites, @Edtadros found that the beta mobile site's infobox has a horizontal scrollbar on articles like "Dog":

2023-02-21_17-29-16 (1).gif (800×800 px, 1 MB)

It looked to me like the scrollbar happens as a result of border: 2px solid red; and min-width: 100%; styles on the tr elements. I don't think that's related to any parser cache issues, but can you confirm?

Change 890127 abandoned by Nray:

[mediawiki/extensions/MobileFrontend@master] Don't add srcset attribute to lazy loaded image element in data saver mode

Reason:

https://gerrit.wikimedia.org/r/890127

I'm OOO starting tomorrow Feb. 22 and will return Wednesday, March 1:

  • I've met and asked @Edtadros to QA the Functional QA steps listed in the task description, and to get in touch with @EAkinloose or @Ryasmeen for QA of DiscussionTools. Any QA issues can be discussed with @Ladsgroup
  • I've met and asked @Ladsgroup to review the performance dashboards after deploy
  • @Ladsgroup will handle the deploy of this change next week (aiming to backport Monday) after QA is finished

Change 890773 merged by jenkins-bot:

[performance/mobile-synthetic-monitoring-tests@master] Add temporary tests to measure potential impact of srcset.

https://gerrit.wikimedia.org/r/890773

Yup, it's coming from https://en.wikipedia.beta.wmflabs.org/wiki/Module:Infobox/styles.css which means it was broken before the patch :D Let me see if I can copy paste that from enwiki

The English Wikipedia version uses automatic taxoboxes, I copy pasted the resulting wikitext (using Special:ExpandTemplates) and pasted it there and it seems to be fine now. I guess it was using an old version of taxobox.

The only thing that I think needs double checking is that infoboxes are not collapsed and I think at least for a while they were collapsed in mobile (under "Quick facts" or something like that) but it doesn't happen anymore, nor in beta nor production English Wikipedia (maybe via css it got changed?) I tested it on the article of Obama just to be sure the infobox is known.

CPU times, including HTML parse times, from the synthetic test results (without srcset, with srcset) on a Moto G (5) phone look pretty similar:

2023-02-22_18-07-38 (1).png (1×3 px, 200 KB)

2023-02-22_18-08-55.png (1×3 px, 199 KB)

nray updated the task description. (Show Details)
nray updated the task description. (Show Details)

Test Result - Beta

Status: ✅ PASS
Environment: beta
OS: macOS Ventura
Browser: Chrome
Device: MBP
Emulated Device:NA

Test Artifact(s):

QA Steps

Clear the parser cache on both desktop and mobile sites by:

Visiting https://en.wikipedia.beta.wmflabs.org/wiki/Dog?action=purge on desktop device and clicking the "Yes" button
Visiting https://en.m.wikipedia.beta.wmflabs.org/wiki/Dog?action=purge on desktop device and clicking the "Yes" button

Using a mobile device:

Visit https://en.m.wikipedia.beta.wmflabs.org/wiki/Dog
Open each section and scroll down to the bottom of the page
✅ AC1: Verify that the images load correctly, and there are no visual regressions with the infobox
Everything looks good.
Click on one of the thumbnails
✅ AC2: Verify that the image loads the lightbox correctly

Screen Recording 2023-02-23 at 6.46.45 PM.mov.gif (848×390 px, 475 KB)

Using a desktop device:

Visit https://en.wikipedia.beta.wmflabs.org/wiki/Dog
Scroll down the bottom of the page
✅ AC3: Verify that the images load correctly, and there are no visual regressions with the infobox
Everything looks good.
Click on one of the thumbnails
✅ AC4: Verify that the image loads the lightbox correctly

Screen Recording 2023-02-23 at 6.51.14 PM.mov.gif (1×1 px, 1 MB)

@Ladsgroup, Everything looks good on my end. @Ryasmeen and @EAkinloose are looking into the discussion tools.

We are done testing Discussion Tools from our end. We didn't find anything broken for DT that's relevant to this change.

We are done testing Discussion Tools from our end. We didn't find anything broken for DT that's relevant to this change.

Looks good on this end too.

Change 891982 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/MobileFrontend@wmf/1.40.0-wmf.24] Completely get rid of responsiveimages removal

https://gerrit.wikimedia.org/r/891982

Change 891982 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@wmf/1.40.0-wmf.24] Completely get rid of responsiveimages removal

https://gerrit.wikimedia.org/r/891982

Mentioned in SAL (#wikimedia-operations) [2023-02-27T13:32:06Z] <ladsgroup@deploy1002> ladsgroup: Completely get rid of responsiveimages removal, part I (T326147) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-02-27T13:41:20Z] <ladsgroup@deploy1002> Synchronized php-1.40.0-wmf.24/extensions/MobileFrontend/extension.json: Completely get rid of responsiveimages removal, part I (T326147) (duration: 10m 48s)

Mentioned in SAL (#wikimedia-operations) [2023-02-27T13:50:32Z] <ladsgroup@deploy1002> ladsgroup: Completely get rid of responsiveimages removal, part II (T326147) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-02-27T13:56:34Z] <ladsgroup@deploy1002> Synchronized php-1.40.0-wmf.24/extensions/MobileFrontend/includes/MobileFrontendHooks.php: Completely get rid of responsiveimages removal, part II (T326147) (duration: 07m 24s)

Mentioned in SAL (#wikimedia-operations) [2023-02-27T13:59:04Z] <ladsgroup@deploy1002> ladsgroup: Completely get rid of responsiveimages removal, part III (T326147) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-02-27T14:05:26Z] <ladsgroup@deploy1002> Synchronized php-1.40.0-wmf.24/extensions/MobileFrontend/includes/MobileContext.php: Completely get rid of responsiveimages removal, part III (T326147) (duration: 07m 36s)

It is hard to measure the change's impact right now. Because when we deployed it, while it started to use desktop's pre-filled cache, it basically stopped using any PC cache that existed in mobile only. This cancelled a bit of the impact it had which takes some time to actually warm up and show itself (our current cache time for PC is 21 days, so in 21 days we should start measuring averages and compare them with a couple days ago).

With all of that, it still managed to reduce 9% from the 95th response time.

And just the binlog of parser cache hosts has gone 310GB to 240G \o/

Test Result - Prod

Status: ✅ PASS
Environment: enwiki
OS: macOS Ventura
Browser: Chrome
Device: MBP
Emulated Device:NA

Test Artifact(s):

QA Steps

Clear the parser cache on both desktop and mobile sites by:

Visiting https://en.wikipedia.org/wiki/Dog?action=purge on desktop device and clicking the "Yes" button
Visiting https://en.m.wikipedia.org/wiki/Dog?action=purge on desktop device and clicking the "Yes" button

Using a mobile device:

Visit https://en.m.wikipedia.org/wiki/Dog
Open each section and scroll down to the bottom of the page
✅ AC1: Verify that the images load correctly, and there are no visual regressions with the infobox
Everything looks good.
Click on one of the thumbnails
✅ AC2: Verify that the image loads the lightbox correctly

Screen Recording 2023-02-28 at 6.55.01 PM.mov.gif (846×392 px, 456 KB)

Using a desktop device:

Visit https://en.wikipedia.org/wiki/Dog
Scroll down the bottom of the page
✅ AC3: Verify that the images load correctly, and there are no visual regressions with the infobox
Everything looks good.
Click on one of the thumbnails
✅ AC4: Verify that the image loads the lightbox correctly

Screen Recording 2023-02-28 at 6.56.16 PM.mov.gif (1×1 px, 2 MB)

Hit ratio has had a bump since the deployment and still going up as it's getting warmed:

Nice!

  • Analyze FirstView android dashboard
  • Analyze ParserCache dashboard
  • Analyze NavigationTiming dashboard

I just checked these three dashboards and didn't notice any unexpected spikes. The synthetic test rendering metrics looked good. HTML transfer size slightly increased (~1 kB) on the Obama page, and I did not notice any consistent spikes in HTML parse time on any of the article pages we test. Image transfer size increased by ~30 kB on that page. Both of these increases were expected.

2023-03-01_13-48-47.png (1×1 px, 308 KB)

2023-03-01_13-56-53.png (736×4 px, 214 KB)

Thank you @Ladsgroup for the patch and handling the deploy, and @Edtadros, @Ryasmeen, @EAkinloose for the QA!

Alright, looking good! Resolving.

Change 893542 had a related patch set uploaded (by Nray; author: Nray):

[operations/mediawiki-config@master] Revert "Add static "Cleopatra" page to facilitate synthetic testing of 885362"

https://gerrit.wikimedia.org/r/893542

Change 893544 had a related patch set uploaded (by Nray; author: Nray):

[performance/mobile-synthetic-monitoring-tests@master] Revert "Add temporary tests to measure potential impact of srcset."

https://gerrit.wikimedia.org/r/893544

Alright, looking good! Resolving.

Thank you for resolving this quickly!

Change 885275 abandoned by Ladsgroup:

[mediawiki/extensions/MobileFrontend@master] Switch stripping reponsive image to post parse transform

Reason:

https://gerrit.wikimedia.org/r/885275

Change 893544 merged by jenkins-bot:

[performance/mobile-synthetic-monitoring-tests@master] Revert "Add temporary tests to measure potential impact of srcset."

https://gerrit.wikimedia.org/r/893544