Request to host the Reference Risk Model on LiftWing
Closed, ResolvedPublic8 Estimated Story Points
Actions

Assigned To

Authored By

	XiaoXiao-WMF
	Aug 13 2024, 1:43 PM

Description

What use case is the model going to support/resolve?

Do you have a '''model card''' ?

https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language-agnostic_reference_risk

What team created/trained/etc.. the model? What tools and frameworks have you used?

What kind of data was the model trained with, and what kind of data the model is going to need in production (for example, calls to internal/external services, special datasources for features, etc..) ?

This model requires access to precomputed domain metadata for inference. This metadata is currently expected to be updated monthly through an Airflow DAG maintained by research engineering which invokes the reference quality pipeline to generate a new snapshot of domain features every month. The pipeline itself depends on the wmf.mediawiki_wikitext_history dataset in the data lake for retrieving historical information on a domain and also on wmf.mediawiki_wikitext_current for getting the perennial sources labels for domains.

To facilitate retrieval, these snapshots are exported in the form of an sqlite database with the following table:

sqlite> .schema domains
CREATE TABLE IF NOT EXISTS "domains" (
  "wiki_db" TEXT,
  "domain" TEXT,
  "page_distinct_cnt" INTEGER,
  "add_user_distinct_cnt" INTEGER,
  "ref_max_lifespan_mean" REAL,
  "ref_max_lifespan_p25" REAL,
  "ref_max_lifespan_median" REAL,
  "ref_max_lifespan_p75" REAL,
  "ref_real_lifespan_mean" REAL,
  "ref_real_lifespan_p25" REAL,
  "ref_real_lifespan_median" REAL,
  "ref_real_lifespan_p75" REAL,
  "num_edits_mean" REAL,
  "num_edits_p25" INTEGER,
  "num_edits_median" INTEGER,
  "num_edits_p75" INTEGER,
  "sur_edit_ratio_mean" REAL,
  "sur_edit_ratio_p25" REAL,
  "sur_edit_ratio_median" REAL,
  "sur_edit_ratio_p75" REAL,
  "psl_local" TEXT,
  "psl_enwiki" TEXT,
  "snapshot" TEXT
);
CREATE INDEX "ix_domains_wiki_db_domain"ON "domains" ("wiki_db","domain");

This database is then uploaded to the feature-sets container in swift which is world readable i.e. has read ACL '.r:*,.rlistings'. Available snapshots can be listed via:

$ curl 'https://thanos-swift.discovery.wmnet/v1/AUTH_research/feature-sets?prefix=reference-risk'

If you have a minimal codebase that you used to run the first tests with the model, could you please share it?

The original source for the model lives in the reference-quality repo.
An adapted version of it that works with the generated sqlite databases has been added to knowledge-integrity and can be used by installing v0.8.3.

State what team will own the model and please share some main point of contacts.

What is the current latency and throughput of the model, if you have tested it? We don't need anything precise at this stage, just some ballparks numbers to figure out how the model performs with the expected inputs. For example, does the model take ms/seconds/etc.. to respond to queries? How does it react when 1/10/20/etc.. requests in parallel are made? If you don't have these numbers don't worry, open the task and we'll figure something out while we discuss about next steps!

Is there an expected frequency in which the model will have to be retrained with new data?

What are the resources required to train the model and what was the dataset size?

Have you checked if the output of your model is safe from a human rights point of view? Is there any risk of it being offensive for somebody? Even if you have any slight worry or corner case, please tell us!

Everything else that is relevant in your opinion.

Details

Subject	Repo	Branch	Lines +/-
locust: entry for reference-risk model	machinelearning/liftwing/inference-services	main	+31 -24
ml-services: update ref-quality isvc	operations/deployment-charts	master	+7 -25
reference-quality: add reference-risk model	machinelearning/liftwing/inference-services	main	+65 -17

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		achou	T372405 Request to host the Reference Risk Model on LiftWing
		Open		None	T378495 Expose reference quality isvc on API gateway

Event Timeline

XiaoXiao-WMF created this task.Aug 13 2024, 1:43 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 13 2024, 1:43 PM

isarantopoulos assigned this task to achou.Sep 3 2024, 2:48 PM

isarantopoulos moved this task from Unsorted to In Progress on the Machine-Learning-Team board.

@MunizaA Is this task for reference-risk? Is the title incorrect?

achou removed achou as the assignee of this task.Sep 3 2024, 3:05 PM

achou moved this task from In Progress to Unsorted on the Machine-Learning-Team board.

achou subscribed.

MunizaA renamed this task from Request to host reference needed on Lift Wing to Request to host the Reference Risk Model on LiftWing.Sep 3 2024, 4:14 PM

In T372405#10113441, @achou wrote:

@MunizaA Is this task for reference-risk? Is the title incorrect?

@AikoChou correct, this was supposed to be a placeholder task for the Reference Risk model. I've fixed the title and would add in other details shortly, thanks!

isarantopoulos updated the task description. (Show Details)Sep 6 2024, 3:57 PM

@AikoChou a side note - please make sure, when we integrate with Enterprise, we can record usage (User Agent String to the http Requests or something like that)... this part was missing when revert risk rolled out, hence the comment.

isarantopoulos assigned this task to achou.Sep 18 2024, 2:13 PM

isarantopoulos moved this task from Unsorted to In Progress on the Machine-Learning-Team board.

isarantopoulos triaged this task as Medium priority.Sep 24 2024, 2:32 PM

isarantopoulos set the point value for this task to 5.

isarantopoulos changed the point value for this task from 5 to 8.

MunizaA updated the task description. (Show Details)Sep 24 2024, 6:39 PM

MunizaA updated the task description. (Show Details)Sep 26 2024, 6:14 PM

Change #1076163 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] reference-quality: add reference-risk model

https://gerrit.wikimedia.org/r/1076163

gerritbot added a project: Patch-For-Review.Sep 27 2024, 9:51 AM

Change #1077309 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update ref-quality isvc in experimental ns

https://gerrit.wikimedia.org/r/1077309

Change #1077310 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] locust: add reference-risk model

https://gerrit.wikimedia.org/r/1077310

Change #1076163 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] reference-quality: add reference-risk model

https://gerrit.wikimedia.org/r/1076163

achou mentioned this in rMLISfb7e689894cf: reference-quality: add reference-risk model.Oct 7 2024, 9:27 PM

Change #1077309 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update ref-quality isvc

https://gerrit.wikimedia.org/r/1077309

Change #1077310 merged by AikoChou:

[machinelearning/liftwing/inference-services@main] locust: entry for reference-risk model

https://gerrit.wikimedia.org/r/1077310

achou mentioned this in rMLISe4509a872716: locust: entry for reference-risk model.Oct 10 2024, 6:29 PM

Maintenance_bot removed a project: Patch-For-Review.Oct 10 2024, 6:30 PM

The reference-risk model is now in production! It's paired with the reference-need model under a single service called "reference-quality". This coupling reflects the models' close relationship and allows the reference-risk model to operate with minimal resources.

Here are the load test results for both models:
https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/test/locust/models/reference_quality/README.md

reference-risk: latency average 169ms, median 150ms
reference-need: latency average 412ms, median 330ms

To request the reference-risk model:

$ curl "https://inference.discovery.wmnet:30443/v1/models/reference-risk:predict" -X POST -d '{"rev_id": 1231860309, "lang": "en"}' -H  "Host: reference-quality.revision-models.wikimedia.org"

To request the reference-need model:

$ curl "https://inference.discovery.wmnet:30443/v1/models/reference-need:predict" -X POST -d '{"rev_id": 1231860309, "lang": "en"}' -H  "Host: reference-quality.revision-models.wikimedia.org"

The host header is the same reference-quality.revision-models.wikimedia.org for both models; only the endpoint URLs /v1/models/{__model_name__}:predict differ.

achou closed this task as Resolved.Oct 16 2024, 10:42 AM

achou mentioned this in T371902: Request to host the Reference Need Model on LiftWing.

Aitolkyn subscribed.Oct 17 2024, 7:39 AM

isarantopoulos moved this task from In Progress to 2024-2025 Q2 Done on the Machine-Learning-Team board.Oct 18 2024, 2:24 PM

achou added a subtask: T378495: Expose reference quality isvc on API gateway.Oct 29 2024, 1:29 PM

isarantopoulos moved this task from 2024-2025 Q2 Done to Task Archive on the Machine-Learning-Team board.Nov 4 2024, 8:06 AM

isarantopoulos moved this task from Task Archive to 2024-2025 Q2 Done on the Machine-Learning-Team board.Nov 4 2024, 8:55 AM