Wikipedia:Bots/Requests for approval/PrimeBOT 17
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Primefac (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 14:24, Saturday, May 27, 2017 (UTC)
Automatic, Supervised, or Manual: automatic
Programming language(s): AWB
Source code available: AWB
Function overview: Remove UTM parameters (Google analytics) from external links and references (i.e. resurrect Theo's Little Bot task #23)
Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 55#Remove Google Analytics tracking from external links
Edit period(s): Once a month
Estimated number of pages affected: 16000 in the initial run, and maybe 200 a month after that? Theo's task ran in batches of 500, which also works, but I couldn't then give a timeframe.
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: Straight-forward find-and-remove. Regex:
\??(?:&?utm_[^=]*?=[^&\s\]\|]*)+(?=]|\s|\|)|(?<=\?)(?:&?utm_[^=]*?=[^&\s\]\|]*)+&
(test cases)\??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&
(tests)
As near as I can tell, I've managed to cover all of the edge cases which were of concern in the original BRFA. The blue section covers the case where ?utm_ is followed by an & not followed by another utm_ (e.g. ?utm_example=1234¶=value
). The red hits everything else (i.e. where the utm_ term(s) are only at the end of the URL). Green is when utm falls in between two other codes
Discussion
edit- As a note, unlike the original bot run this will not be checking to see if the URLs are still valid. AWB doesn't do that. Primefac (talk) 14:24, 27 May 2017 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Please post results here when done. — xaosflux Talk 14:27, 27 May 2017 (UTC)[reply]
In addition to the UTM parameters, there's also "?cmpid", and probably others. DS (talk) 16:14, 1 June 2017 (UTC)[reply]
- An easy addition, just replace
utm_
withcmpid
in the regex. Primefac (talk) 18:37, 1 June 2017 (UTC)[reply]
- Approved. Task approved. — xaosflux Talk 03:44, 6 June 2017 (UTC)[reply]
- Amended (00:29, 7 August 2017 (UTC)) to include
?mbid
parameter cleanup as well "speedily approved" in lieu of another task as this is low volume. — xaosflux Talk 00:29, 7 August 2017 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.
- Amended to include removing tracking from New York Times URLs; see talk. — The Earwig (talk) 15:36, 25 March 2024 (UTC)[reply]