Nothing Special   »   [go: up one dir, main page]

Page MenuHomePhabricator

MunizaA (Muniza)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Mar 31 2021, 5:59 AM (193 w, 3 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
MunizaA [ Global Accounts ]

Recent Activity

Wed, Dec 11

MunizaA added a comment to T371344: [LLM] Use Flash attention 2 for GPU inference.

I tried to install the wheel from this in a new env and although it installs it cant be used

ImportError: /home/isaranto/miniconda3/envs/flash251/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _Z17fmha_fwd_appendkv24fmha_fwd_appendkv_traits22fmha_fwd_appendkv_argsRKN7ck_tile13stream_configE

I can reproduce this too. It turns out that running python3 -m build --no-isolation first builds an sdist from the source and then builds a wheel from the sdist which results in the extensions not being built correctly. I did the following to get it to build the wheel from the source instead:

pip install -U ninja build
GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a python3 -m build --no-isolation --wheel .

This gives me a wheel that I can then install and use. We could also run python setup.py bdist_wheel but direct invocations of setup.py have been deprecated for some time now and this lets us avoid that.

Wed, Dec 11, 1:27 AM · Machine-Learning-Team

Tue, Dec 10

MunizaA added a comment to T377496: Phase 1: LLM inference - base metrics.
  • Added support for text classification (in addition to text generation to the harness) to be able to benchmark bert and longformer.
  • Ran experiments to observe the effects of input vs output tokens and model size on latency.
  • Ran experiments that simulate inference requests for the npov and peacock experiments with fixed input and output lengths and variable batch sizes as a rough proxy for concurrent requests.
  • Added a quickstart guide to the llmperf readme.
  • For phase 2:
    • Built flash attention 2, gptq, awq, bitsandbytes and deepspeed for ROCm on ML labs.
    • Ran experiments comparing latency for original and quantized models.
    • Ran experiments using tensor parallelism to study the effects of better hardware on latency.
  • Next steps include: finishing up remaining work for phase 2 (i.e. fine tuning) and adding and summarizing results from all experiments so far to the report appendix.
Tue, Dec 10, 8:33 AM · Research-engineering, Research

Fri, Nov 29

MunizaA added a comment to T371344: [LLM] Use Flash attention 2 for GPU inference.

The issues we are having seem to be related to hipcc so I will download the original image to see what the hipconfiguration looks like in there

Fri, Nov 29, 3:32 PM · Machine-Learning-Team
MunizaA added a comment to T371344: [LLM] Use Flash attention 2 for GPU inference.

If you have miniconda installed, maybe you could try running the following? I just tried this again and was able to build CK FA2 from scratch:

Fri, Nov 29, 3:06 PM · Machine-Learning-Team
MunizaA added a comment to T371344: [LLM] Use Flash attention 2 for GPU inference.

The above sequence of actions failed again. The logs are available in this paste

I couldn't find an error in your paste, so I ran the sequence of commands from your comment above and it looks like this is the actual error:

In file included from /srv/pytorch-rocm/venv/lib/python3.11/site-packages/torch/
include/torch/csrc/api/include/torch/python.h:8:
In file included from /srv/pytorch-rocm/venv/lib/python3.11/site-packages/torch/
include/torch/csrc/Device.h:4:
/srv/pytorch-rocm/venv/lib/python3.11/site-packages/torch/include/torch/csrc/pyt
hon_headers.h:12:10: fatal error: 'Python.h' file not found
   12 | #include <Python.h>
      |          ^~~~~~~~~~
1 error generated when compiling for gfx90a.

There's more here but it seems that this file should either be under /usr/include/python3.11 or under .venv/include/python3.11 (but its not). I realize that I didn't run into this because I was using a conda env with a python 3.11 installation and Python.h can be found here: /home/mnz/miniconda3/envs/flash-env/include/python3.11/Python.h.

Fri, Nov 29, 12:50 PM · Machine-Learning-Team

Thu, Nov 28

MunizaA added a comment to T371344: [LLM] Use Flash attention 2 for GPU inference.

Looking at your paste, it seems like its loading hip from /usr:

In file included from /usr/include/hip/hip_fp16.h:29:

Can you run hipconfig to check if the HIP_PATH is /usr and if so, try setting it to /opt/rocm?

Thu, Nov 28, 10:45 AM · Machine-Learning-Team

Wed, Nov 27

MunizaA added a comment to T371344: [LLM] Use Flash attention 2 for GPU inference.

it isn't done building yet but the build has started successfully.

This finally finished running.

(flash-env) mnz@ml-lab1001:~/scratch/flash-attn-2$ pip show flash-attn
DEPRECATION: Loading egg at /srv/home/mnz/miniconda3/envs/flash-env/lib/python3.11/site-packages/flash_attn-2.7.0.post2-py3.11-linux-x86_64.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330
Name: flash_attn
Version: 2.7.0.post2
Summary: Flash Attention: Fast and Memory-Efficient Exact Attention
Home-page: https://github.com/Dao-AILab/flash-attention
Author: Tri Dao
Author-email: tri@tridao.me
License: 
Location: /home/mnz/miniconda3/envs/flash-env/lib/python3.11/site-packages/flash_attn-2.7.0.post2-py3.11-linux-x86_64.egg
Requires: einops, torch
Required-by:

Currently running pytest -v test_flash_attn_ck.py, about 20% of the way through.

Wed, Nov 27, 10:56 PM · Machine-Learning-Team
MunizaA added a comment to T371344: [LLM] Use Flash attention 2 for GPU inference.

It looks you can override just invocations of nvcc or hipcc without overriding invocations of g++ or clang++ when building extensions (which is what CXX would do) by setting PYTORCH_NVCC (See https://github.com/pytorch/pytorch/blob/main/torch/utils/cpp_extension.py#L2363). I did export PYTORCH_NVCC=/usr/bin/hipcc and it isn't done building yet but the build has started successfully.

Wed, Nov 27, 6:01 PM · Machine-Learning-Team
MunizaA added a comment to T371344: [LLM] Use Flash attention 2 for GPU inference.

Also note that the output says HIP version : 5.2.21153-0 but I would've expected it to be something like 6.1.x ?

Wed, Nov 27, 4:45 PM · Machine-Learning-Team
MunizaA updated the task description for T371344: [LLM] Use Flash attention 2 for GPU inference.
Wed, Nov 27, 2:14 PM · Machine-Learning-Team
MunizaA added a comment to T371344: [LLM] Use Flash attention 2 for GPU inference.

I tried building FA2 from source on ml-lab1001 but ran into:

fatal error: cannot open file '/opt/rocm/amdgcn/bitcode/ocml.bc': Unknown attribute kind (86) (Producer: 'LLVM17.0.0git' Reader: 'LLVM 15.0.6')
Wed, Nov 27, 12:35 PM · Machine-Learning-Team

Fri, Nov 15

MunizaA added a comment to T377496: Phase 1: LLM inference - base metrics.
  • Finished running basic experiments for all models and added a notebook analyzing the results. Some observations from the analysis (see the notebook for more detailed metrics, experiment setup and comparison charts):
    • The fastest model is (not unexpectedly) the smallest model on the list: mistralai/Mistral-7B-Instruct-v0.3 with eager attention at 16.414 tokens/sec and 35.454 secs overall latency. The slowest model is the meta-llama/Llama-3.1-70B-Instruct at 0.127 tokens/sec and 4103.839 secs overall latency.
    • Eager vs SDPA attention:
      • Throughput: For smaller models, decoding throughput is slightly lower with SDPA than eager attention. For larger models, where we use CPU offloading, the throughput is almost 2x higher.
      • VRAM usage: For smaller models SDPA can result in as much 2.7x lower peak VRAM usage. For larger models, things are a little more complicated: The way we load these models is by allocating a portion of the VRAM to model weights. Then, during inference, layers are onloaded from the RAM as needed. Eager attention tends to cause large spikes in VRAM usage so we can only allocate a very small portion of the VRAM upfront to account for these. On the flip side, this means that apart from these spikes, a large portion of the memory can go unused. VRAM usage with SDPA is much more stable, so we can allocate almost all of the 64 GiB upfront, resulting in less onloading and offloading and thus faster decoding. SDPA is supposed to be more memory efficient but these spikes are still strange and I need to investigate some more here.
    • For CPU usage, when running model.generate, the method used for autoregressive decoding, only a single core is used, pegged at almost 100%. I found this comment from a transformers maintainer that suggests that this is the python code that orchestrates instructions on the GPU, "not optimized in many segments of the model forward pass".
Fri, Nov 15, 10:58 PM · Research-engineering, Research

Nov 8 2024

MunizaA added a comment to T377496: Phase 1: LLM inference - base metrics.

Just a note that he Mixtral model @Trokhymovych is experimenting with are 8x7B and 7B (not 8x22) See: https://phabricator.wikimedia.org/T377425#10302406

Nov 8 2024, 6:24 PM · Research-engineering, Research
MunizaA added a comment to T377496: Phase 1: LLM inference - base metrics.
  • Experiments can now be configured using yaml files. The location of these files can be passed to the cli and experiments can be filtered using shell-style wildcards.
    • Added an isolated runner so that each experiment is launched in a separate process (sequentially) and resources such as vram are cleaned up between subsequent runs, when multiple experiments are passed to the cli.
    • Started running experiments on ml-labs. These experiments use a single gpu and fix the sequence length to 8192 tokens and generate exactly 512 new tokens. The variables are the model family (LLama 3.1 and Mixtral 8x7B and 8x22B), number of parameters, batch size, torch dtype and attention mechanisms. For models that don't fit on a single gpu, like the Llama 3.1 70B, additional variables are how much memory (vram and ram) to allocate to model weights when dispatching the model across gpu and cpu. So far I've ran experiments for Llama 3.1 8B and Llama 3.1 70B (half precision, single gpu with cpu offloading). The bottleneck here is downloading the model weights from the HF hub (tops out at 10 Mbps) and inference with models that use cpu offloading is very slow (0.17 tokens/s for the experiment mentioned above) so experiments can take a long time to finish.
Nov 8 2024, 12:40 PM · Research-engineering, Research

Nov 4 2024

isarantopoulos awarded T377496: Phase 1: LLM inference - base metrics a Yellow Medal token.
Nov 4 2024, 9:06 AM · Research-engineering, Research

Nov 1 2024

MunizaA added a comment to T377496: Phase 1: LLM inference - base metrics.
  • Added support for VRAM and RAM monitoring to the benchmark harness. A monitor polls either rocm-smi (for VRAM) or the process running the benchmarks (for RAM) at configurable intervals, writes their memory usage to a csv file with timestamps and returns min and peak usage at the end of the monitoring window. You can also configure what blocks of code are monitored and which gpu(s) to consider.
Nov 1 2024, 4:07 PM · Research-engineering, Research

Oct 25 2024

MunizaA added a comment to T377496: Phase 1: LLM inference - base metrics.
  • Added a basic benchmark harness for measuring latency and throughput of causal decoder-only models:
    • Works with only HuggingFace transformers CausalLM models at the moment but is written to be extensible to new backends.
    • Allows fiddling with input size, batch size, output size and model specific parameters like pretrained model name or path, whether to load the model on gpu or cpu etc. Measures latency and throughput in terms of tokens / second.
    • The way it currently works is by passing an input of size batch_size * sequence_length tokens to the model, initially generating a single token to get the prefill phase latency (i.e. the time it takes to initialize the Key Value cache with the input tokens and use it to generate the first token). The throughput is then measured using the time taken to autoregressively generate max_new_tokens afterwards i.e. the decode phase.
    • Example:
Oct 25 2024, 12:34 PM · Research-engineering, Research

Oct 18 2024

MunizaA updated the task description for T377159: [SDS 1.2.1 B] Test existing AI models for internal use-cases.
Oct 18 2024, 10:46 AM · Research (FY2024-25-Research-October-December)

Oct 10 2024

MunizaA added a comment to T360794: Implement stream of HTML content on mw.page_change event.

Hi, I took a stab at this and was able to put together a job that enriches page change events by retrieving the HTML from MW Rest API (the existing examples in mediawiki-event-enrichment really helped with this!). I've opened an MR against mediawiki-event-enrichment with a first draft and would really appreciate it if I could get someone to give me some feedback on it!

Oct 10 2024, 11:47 PM · Research, Data-Engineering, Event-Platform

Oct 4 2024

MunizaA added a comment to T368613: Essential work - Research tooling.
  • Opened an MR with an experimental implementation of YarnSession to be merged into research-common. YarnSession allows spinning up Skein applications on YARN to interactively run functions on the Hadoop cluster. You can adjust what resources get allocated to this application, whether it gets a gpu or not etc.
    • MR includes a notebook that demonstrates some of the things you can do with this e.g. training an ML model remotely.
Oct 4 2024, 11:24 AM · Essential-Work, Research-engineering, Research (FY2024-25-Research-July-September)

Oct 3 2024

MunizaA added a comment to T368614: Essential work - model quantization.

Update: Building on the work we did in this direction last quarter, I've been experimenting with applying post training quantization techniques to the reference-need model (version 0), which is a fine-tuned bert-base-multilingual-case. These experiments focus on optimizing inference on CPU while making sure that the model's accuracy, precision, recall and f1 score stays the same.

Oct 3 2024, 10:02 PM · Research, Essential-Work, Research-engineering

Sep 26 2024

MunizaA added a comment to T357038: reference model engineering work.

Engineering work for the reference need and reference risk models has concluded. Both models have been added to knowledge-integrity. Additionally, a research-datasets pipeline for generating domain metadata snapshots used for inference and an airflow DAG for scheduling this pipeline and publishing the snapshots to swift has also been added.

Sep 26 2024, 9:45 PM · Research, Research-engineering
MunizaA updated the task description for T372405: Request to host the Reference Risk Model on LiftWing.
Sep 26 2024, 6:14 PM · Lift-Wing, Machine-Learning-Team

Sep 24 2024

MunizaA updated the task description for T372405: Request to host the Reference Risk Model on LiftWing.
Sep 24 2024, 6:39 PM · Lift-Wing, Machine-Learning-Team
MunizaA added a comment to T344016: Improvements to Annotool.

@diego the work needed to resolve this task is trivial but as far as I know, we're not running any campaigns on Annotool at the moment. If that is indeed the case, would it make sense to move this to the freezer and pick it back up once that changes? Thanks!

Sep 24 2024, 2:31 PM · Research

Sep 3 2024

MunizaA added a comment to T371902: Request to host the Reference Need Model on LiftWing.

Hi @isarantopoulos, the pytorch version was pinned in knowledge-integrity when the transformers dependency was added. I was under the impression that this was because transformers specifies an upper bound on pytorch and we can't upgrade transformers since its not backward compatible and breaks models trained on older versions. But it looks like there's only a lower bound and so technically we should be able to upgrade to the latest version.

Sep 3 2024, 5:52 PM · Lift-Wing, Machine-Learning-Team
MunizaA updated subscribers of T372405: Request to host the Reference Risk Model on LiftWing.

@MunizaA Is this task for reference-risk? Is the title incorrect?

Sep 3 2024, 4:20 PM · Lift-Wing, Machine-Learning-Team
MunizaA renamed T372405: Request to host the Reference Risk Model on LiftWing from Request to host reference needed on Lift Wing to Request to host the Reference Risk Model on LiftWing.
Sep 3 2024, 4:14 PM · Lift-Wing, Machine-Learning-Team

Aug 26 2024

MunizaA renamed T371902: Request to host the Reference Need Model on LiftWing from Request to host Reference Quality Model on Lift Wing to Request to host the Reference Need Model on LiftWing.
Aug 26 2024, 12:13 PM · Lift-Wing, Machine-Learning-Team

Jul 17 2024

MunizaA closed T351118: [Research Engineering Request] Produce regular snapshots of all Wikipedia article topics as Resolved.
Jul 17 2024, 12:31 PM · Research-engineering, Research

Jul 15 2024

MunizaA added a comment to T351118: [Research Engineering Request] Produce regular snapshots of all Wikipedia article topics.

Thanks @Isaac and @Mayakp.wiki for your feedback!

  • I presume we're keeping the most recent snapshot and not storing prior runs? If so, that makes sense to me. I could see justification for storing maybe the previous snapshot too (just to be able to easily detect changes if desired) but I see no reason for storing the topics from older runs.
  • Sorry I didn't spot this earlier but can we align with the model currently being used by LiftWing (assuming this is the model used by the DAG)?
Jul 15 2024, 7:14 PM · Research-engineering, Research

Jun 28 2024

MunizaA closed T357316: Develop pipelines for research datasets - Q3/Q4, a subtask of T341817: Standardize research pipelines - Dataset generation, as Resolved.
Jun 28 2024, 6:56 PM · Research-engineering, Epic, Research
MunizaA closed T357316: Develop pipelines for research datasets - Q3/Q4 as Resolved.
Jun 28 2024, 6:56 PM · Research (FY2024-25-Research-July-September), Research-engineering
MunizaA added a comment to T357316: Develop pipelines for research datasets - Q3/Q4.

The pipeline for generating article topics and embeddings is in research-datasets now and the Airflow DAG has been merged and deployed. The DAG might require a few minor changes requested in T351118: [Research Engineering Request] Produce regular snapshots of all Wikipedia article topics but since this is being tracked there, I'll close this one out.

Jun 28 2024, 6:44 PM · Research (FY2024-25-Research-July-September), Research-engineering
MunizaA updated the task description for T357316: Develop pipelines for research datasets - Q3/Q4.
Jun 28 2024, 6:22 PM · Research (FY2024-25-Research-July-September), Research-engineering

Jun 25 2024

MunizaA added a comment to T367551: Cloud VPS "research-collaborations-api" project Buster deprecation.

I went ahead and created a new instance (wikinav-bookworm.research-collaborations-api) that's the same RAM/CPU but new OS

Thank you!

Jun 25 2024, 11:23 AM · Research, Cloud-VPS (Debian Buster Deprecation)

Jun 24 2024

MunizaA added a comment to T367551: Cloud VPS "research-collaborations-api" project Buster deprecation.

Hey @Isaac , since there's lots of moving parts to deploying this API (setting up nginx, installing dependencies, invoking gunicorn, setting up a cron job for the dumps etc.) I've containerized these and added a docker-compose.yml file (PR here) so that all this can be easily deployed on any instance that has docker and really only takes a single command to do so, though note that I haven't touched any application code.

Jun 24 2024, 11:32 AM · Research, Cloud-VPS (Debian Buster Deprecation)

Jun 19 2024

MunizaA updated the task description for T367757: Request to add mnz to analytics-research-admins.
Jun 19 2024, 8:02 PM · Patch-For-Review, SRE, SRE-Access-Requests

Jun 17 2024

MunizaA created T367757: Request to add mnz to analytics-research-admins.
Jun 17 2024, 2:13 PM · Patch-For-Review, SRE, SRE-Access-Requests
MunizaA closed T352839: RevertRisk model readiness for temporary accounts as Resolved.

@kostajh Liftwing is now running version 0.8.0 of Knowledge Integrity so this has been deployed to production and I think it'd be okay to resolve this but I agree with your observation that it is only a stopgap solution and will need to be revisited once we have more data on temporary accounts.

Jun 17 2024, 10:10 AM · Research, Moderator-Tools-Team, Temporary accounts, Trust and Safety Product Team, Machine-Learning-Team
MunizaA added a comment to T367551: Cloud VPS "research-collaborations-api" project Buster deprecation.

Hi @Isaac, thanks for the ping! There is a cron job that runs every month and imports the latest clickstream dump into sqlite but this could all use some documentation so I'll use this as an opportunity to flesh out the README for wikinav with instructions and will link that back here shortly.

Jun 17 2024, 9:25 AM · Research, Cloud-VPS (Debian Buster Deprecation)

Jun 14 2024

MunizaA added a comment to T351118: [Research Engineering Request] Produce regular snapshots of all Wikipedia article topics.

Hi @cchen, the May 2024 snapshot for the article topics dataset is available at hdfs:///tmp/research/article_topics/20240501_20240601. There's also an airflow DAG for this pipeline which will get deployed shortly allowing us to produce regular snapshots starting next month. Please let me know if you have any questions!

Jun 14 2024, 5:15 PM · Research-engineering, Research

Jun 11 2024

MunizaA added a comment to T362526: Model quantization (research infra).

We've done some initial experimentation here, focused on optimizing inference on CPU for batch pipelines. The model used in these experiments is the text simplification model which is a fine-tuned FLAN-T5 XL from huggingface used with the pytorch-based transformers and the baseline here is the time that inference on GPU takes. For this early exploration, we've experimented with dynamic quantization and conversion from pytorch to other formats like ONNX. A summary of the results is below and detailed numbers and code can be found in this notebook. An important observation here is that our best results are 2x faster than inference on CPU and 9x slower than inference on GPU which means batch inference pipelines can finish in half the time but it might still not be good enough for online inference.

Jun 11 2024, 10:14 AM · Research-engineering, Research (FY2023-24-Research-April-June)

Mar 11 2024

MunizaA added a comment to T355742: Assess runtime performance impact of pydantic data models in the RRLA model-server.

@kevinbazira this is very helpful, thank you!

Mar 11 2024, 10:28 AM · Patch-For-Review, Machine-Learning-Team

Jan 18 2024

MunizaA added a comment to T352839: RevertRisk model readiness for temporary accounts.

Ideally, by the time we are deploying to pilot wikis, the model will understand that revisions made by temp accounts should be scored differently than if those revisions came from full accounts. I am not sure how much you'll be able to do, though, without a lot of real world data of temp account edits?

I think we should consider them as equivalent for anonymous

Jan 18 2024, 8:49 PM · Research, Moderator-Tools-Team, Temporary accounts, Trust and Safety Product Team, Machine-Learning-Team

Nov 16 2023

MunizaA closed T350389: Upgrade xgboost in knowledge_integrity as Resolved.
Nov 16 2023, 9:54 PM · Research, Machine-Learning-Team
MunizaA closed T350389: Upgrade xgboost in knowledge_integrity, a subtask of T349844: Increased latencies with Kserve 0.11.1 (cgroups v2), as Resolved.
Nov 16 2023, 9:54 PM · Patch-For-Review, Machine-Learning-Team
MunizaA added a comment to T350389: Upgrade xgboost in knowledge_integrity.

Knowledge Integrity v0.5.0 has been released which now depends on xgboost 2.x. Upgrading xgboost also required serializing the classifier with the new version so the version for RevertRiskModel has also been bumped to v3. I've shared the model file with @achou. The SHA512 sum for RevertRiskModel v3 is:

fb6d76b105b7e8198cee47f779c69f1bd85be61075061665bdd0811e8d52e1d4f793dacb4a00fc3776ace8c518f1fa5f653879cec81777854cf235b0483156e7 *revert_risk_language_agnostic_model_v3.pkl

Going to resolve this now but please feel free to reopen if something does not look right!

Nov 16 2023, 9:52 PM · Research, Machine-Learning-Team

Nov 8 2023

MunizaA added a comment to T350389: Upgrade xgboost in knowledge_integrity.

Hi, I've opened an MR for dropping support for Python 3.7 in KI since this was already on the roadmap after its EOL in June 2023 and it also helps support this change (xgboost 2.x requires minimum Python 3.8).

Nov 8 2023, 12:47 PM · Research, Machine-Learning-Team

Oct 31 2023

MunizaA added a comment to T350061: [Annotool] Errors loading edits.

@diego, looking at the revisions in this project I see that the wiki_db is wikidatawiki when it should be wikidata. Can you try fixing the wiki and importing again?

Oct 31 2023, 1:03 PM · Research

Oct 16 2023

MunizaA updated the task description for T348822: Choose vector search framework.
Oct 16 2023, 5:41 AM · Research

Oct 2 2023

MunizaA updated the task description for T347330: Expand language support for Revert Risk Model.
Oct 2 2023, 11:08 AM · Machine-Learning-Team, Research
MunizaA created P52800 SHA-512 checksum for Revert Risk language agnostic model V2.
Oct 2 2023, 11:03 AM

Sep 29 2023

MunizaA closed T347330: Expand language support for Revert Risk Model as Resolved.

@achou thanks again for the review! I've released v0.4.0 for Knowledge Integrity which should help you pick up these new changes. Going to close this now but please feel free to reopen if something does not look right!

Sep 29 2023, 4:33 PM · Machine-Learning-Team, Research
MunizaA updated the task description for T347330: Expand language support for Revert Risk Model.
Sep 29 2023, 4:25 PM · Machine-Learning-Team, Research

Sep 27 2023

MunizaA updated the task description for T347330: Expand language support for Revert Risk Model.
Sep 27 2023, 6:40 PM · Machine-Learning-Team, Research

Sep 26 2023

MunizaA added a comment to T347330: Expand language support for Revert Risk Model.

@diego, it's possible I'm missing something but while updating these constants I noticed that the values for be-x-old and be-tarask are different and according to the information here, the former redirects to the latter. be-tarask is one of the new wikis that we're adding these values for, so I wanted to check with you if this is okay. Thanks!

Sep 26 2023, 2:34 PM · Machine-Learning-Team, Research
MunizaA updated the task description for T347330: Expand language support for Revert Risk Model.
Sep 26 2023, 2:17 PM · Machine-Learning-Team, Research

Sep 25 2023

MunizaA moved T347330: Expand language support for Revert Risk Model from Backlog to In Progress on the Research board.
Sep 25 2023, 6:13 PM · Machine-Learning-Team, Research
MunizaA set Due Date to Sep 28 2023, 7:00 AM on T347330: Expand language support for Revert Risk Model.
Sep 25 2023, 6:12 PM · Machine-Learning-Team, Research
MunizaA changed the status of T347330: Expand language support for Revert Risk Model from Open to In Progress.
Sep 25 2023, 6:10 PM · Machine-Learning-Team, Research
MunizaA created T347330: Expand language support for Revert Risk Model.
Sep 25 2023, 6:08 PM · Machine-Learning-Team, Research
MunizaA closed T344613: Improve knowledge gap project setup as Resolved.
Sep 25 2023, 5:38 PM · Research

Sep 5 2023

MunizaA moved T344613: Improve knowledge gap project setup from Staged to In Progress on the Research board.
Sep 5 2023, 3:51 PM · Research
MunizaA claimed T344613: Improve knowledge gap project setup.
Sep 5 2023, 3:44 PM · Research

Aug 30 2023

MunizaA added a comment to T343064: Expand types of edits for Wikidata revert risk model.

Hi @Miriam, I'll need some more time to work on this. I did some initial exploration here but then had to pause to work on Annotool bug fixes and improvements, which is the tool currently being used to collect annotations for this model. Thank you!

Aug 30 2023, 9:25 AM · Research

Aug 21 2023

MunizaA closed T344152: Allow users to edit annotations in Annotool, a subtask of T344016: Improvements to Annotool, as Resolved.
Aug 21 2023, 4:52 PM · Research
MunizaA closed T344152: Allow users to edit annotations in Annotool as Resolved.
Aug 21 2023, 4:52 PM · Research

Aug 14 2023

MunizaA moved T344152: Allow users to edit annotations in Annotool from FY2023-24-Research-July-September to In Progress on the Research board.
Aug 14 2023, 2:27 PM · Research
MunizaA created T344152: Allow users to edit annotations in Annotool.
Aug 14 2023, 12:47 PM · Research
MunizaA added a comment to T343973: Fix relative links for Qids in Annotool.

Hi @wolfgang8741, I deployed a patch Friday that fixes this but if you're still running into it, please let me know. Thanks!

Aug 14 2023, 12:37 PM · Research
MunizaA added a comment to T341820: Evaluate and improve the Revert Risk model for Wikidata..

Hi @Danny_Benjafield_WMDE, thanks for the feedback!

Aug 14 2023, 10:53 AM · Research (FY2023-24-Research-April-June)

Jul 28 2023

MunizaA added a comment to T340811: Index out of range in revert risk multi-lingual.

I rewrote some parts of RRML earlier this week to replace StructuredEditTypes with SimpleEditTypes in this MR, using pointers from your comments above. The changes are still being tested by @Trokhymovych, the original author of this code, to make sure there isn't a significant drift in the predictions made by the model

Jul 28 2023, 12:51 PM · Patch-For-Review, Research, Machine-Learning-Team

Jul 21 2023

MunizaA added a comment to T340811: Index out of range in revert risk multi-lingual.

I managed to separate the headers from sections in the code so it's much cleaner now and seems to run fine for your list of revisions with timeout set to True.

@Isaac that's awesome, thank you!

Jul 21 2023, 10:59 AM · Patch-For-Review, Research, Machine-Learning-Team

Jul 14 2023

MunizaA added a comment to T340811: Index out of range in revert risk multi-lingual.

@Isaac thanks so much for the pointers! It seems like this model is also using node edit info for some of the features but in any case, we should be able to simplify the text diffing code using functionality from mwedittypes.

Jul 14 2023, 6:52 PM · Patch-For-Review, Research, Machine-Learning-Team
MunizaA added a comment to T340811: Index out of range in revert risk multi-lingual.

Hi @elukey, the dependency contraint we have for mwedittypes in KI is "1.2.1" so unfortunately this new version is not a drop-in replacement. There are some minor API changes but more importantly, the diff processing code in get_edit_info will have to be modified in order to adapt to this new version. I can look into making these modifications but looking at the changelog for mwedittypes, it seems like there have also been some changes to how certain types of edits are captured since v1.2.1 and I'm not sure how this would impact the performance of the model so I'm discussing these changes with @Trokhymovych and will make the switch as soon as we're sure about its impact.

Jul 14 2023, 10:49 AM · Patch-For-Review, Research, Machine-Learning-Team

Feb 22 2023

MunizaA updated the task description for T330148: Support the Revert-Review API/tool on Toolforge.
Feb 22 2023, 3:54 PM · Machine-Learning-Team, Lift-Wing

Jan 24 2023

MunizaA added a comment to T323107: [M] Upgrade code base to Spark 3.

@xcollazo I think the reason why Spark2 does not push this filter down is because it does not infer filters from generators as the optimizer rule InferFiltersFromGenerate in Spark 3.1.2 does not seem to exist in Spark 2.4.4

Jan 24 2023, 9:28 AM · Structured-Data-Backlog (Current Work), Data Pipelines, Section-Topics

Jan 20 2023

MunizaA added a comment to T323107: [M] Upgrade code base to Spark 3.

I looked into this a little because I wasn't sure why Spark was pushing down the parse UDF. Looking at the query plan for Spark 3, I think what's happening is that because the job does an explode on the UDF's result, Spark pushes down the UDF to filter out any rows with nulls or empty arrays early:

Jan 20 2023, 7:57 PM · Structured-Data-Backlog (Current Work), Data Pipelines, Section-Topics

Dec 19 2022

MunizaA added a comment to T324321: Add option to imagerec/recommendation.py to exclude sections that already have images.

This has been implemented here and reviewed by @diego and @Cparle.

Dec 19 2022, 12:06 PM · Research, Structured-Data-Backlog, Section-Level-Image-Suggestions

Dec 5 2022

MunizaA added a comment to T323613: Test MultilingualRevertRiskModel inference service on ml-sandbox.

This is also related to T323023 but could it be that since we're sharing a client session between requests, the host header is not getting updated correctly? Maybe we should also log successful request responses to see that we're getting the right language back? For example, all of the above revisions exist in de wikipedia except for the en ones and if I query for those I get 'badrevids'.

Dec 5 2022, 11:24 AM · Lift-Wing, Machine-Learning-Team

Nov 2 2022

MunizaA added a comment to T321594: Deploy revert-risk-model to production.

Thanks a lot for sharing these results here, @achou! I do see that we're seeing more socket connect errors with increased connections. Is that something we should be concerned about? Wrk docs don't seem to say anything about these errors but some issues on the repo mention that connect errors in particular can also occur when wrk runs out of file descriptors but they also report opening hundreds of connections so not sure if that's the case here.

Nov 2 2022, 3:38 PM · Machine-Learning-Team, Lift-Wing

Aug 4 2022

MunizaA created P32283 mnz SSH public key for WMF production.
Aug 4 2022, 1:35 PM

Apr 26 2022

MunizaA added a comment to T293511: Expand section aligment to more languages, and share dumps.

Hi @santhosh, I've restored the probability and rank columns for the database and uploaded the new version here. The directory also contains databases with lower threshold scores (0.5 - 0.8). Please let me know if you have any questions, thanks.

Apr 26 2022, 5:07 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)

Apr 3 2022

MunizaA added a comment to T293511: Expand section aligment to more languages, and share dumps.

The following results include the top 100 language pairs by number of section pairs tested. The precision here denotes the probability that, of all the aligned target sections for a source section in our extracted data, the cx dataset translation was among the top 5. Please note that any source sections occurring more than once per (source language, target language) in the cx dataset were counted as one pair and tested by checking if any of the corresponding targets ended up among the top 5.

Apr 3 2022, 5:15 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)

Jan 12 2022

MunizaA added a comment to T293511: Expand section aligment to more languages, and share dumps.

In order to assess the accuracy of our current language model, we tried to replicate the experiment that @diego had run with the FastText embeddings. This involved training a classifier on a portion of the ground truth and then using it to predict the similarity of the remaining section pairs in the ground truth. More specifically, we took our previously generated set of all possible section pairs for the 6 languages used in this experiment and for each pair extracted a bunch of features that describe that pair (the number of times the two sections in it occur together, how similar the links that they contain are on average etc.) which happen to be a subset of the features that Diego used. We then labelled all pairs that are found in the ground truth as 'True' and the rest as 'False'. A classifier using gradient boosting was trained on a portion of this data and was then used to classify the rest of it. We then dense ranked the results from this classifier to evaluate the probability of a pair from the ground truth ending up in the top 5 (precision @ 5).
The results from this experiment came out to be comparable to the previously documented ones. This means that we can use a multilingual model in place of FastText which is monolingual (meaning that while similar words within a language share similar vectors, translation words from different languages do not do so) eliminating the need to align vectors from two languages in a single vector space before they can be compared and expect similar results.
The following image depicts the results from the experiment mentioned above. Empty boxes in the chart represent cases where we didn't have enough ground truth.

precision.png (432×720 px, 15 KB)

Jan 12 2022, 9:51 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)

Dec 6 2021

MunizaA added a comment to T293511: Expand section aligment to more languages, and share dumps.

We experimented with multiple pre-trained models from sentence-transformers to find a multilingual model that can accurately and efficiently encode section headings. We've found that paraphrase-xlm-r-multilingual-v1 provides the most accurate and consistent results across multiple languages for our use case. It maps sentences to a 768 dimensional shared vector space and the resulting vectors can then be used to calculate cosine similarity between co-occurring sections.
The following results were obtained by running the same model evaluation experiments for different language pairs. We evaluate models by first aligning articles in two languages using their wikidata id. We then take the sections from those aligned articles and generate all possible combinations. The selected model is then used to encode these section pairs and calculate their similarity. We then rank these pairs by similarity for each section and check the rank of the true section translation (the one that's in our dataset). Note that these results only contain language pairs for which we had more than 20 records in our dataset.

Dec 6 2021, 11:49 AM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)

Oct 13 2021

MunizaA added a comment to T292955: Requesting access to Analytic Cluster for Muniza.

@MunizaA can you confirm that this wikitech user is you? https://ldap.toolforge.org/user/mnz

Also would you rather have mnza0001@gmail.com (from that wikitech account) or munaslam001@gmail.com (from this ticket) associated with this shell account?

Oct 13 2021, 3:38 PM · SRE, SRE-Access-Requests
MunizaA added a comment to T292955: Requesting access to Analytic Cluster for Muniza.

@CDanis I've signed it now. Thanks!

Oct 13 2021, 9:13 AM · SRE, SRE-Access-Requests

Oct 12 2021

MunizaA updated the task description for T292955: Requesting access to Analytic Cluster for Muniza.
Oct 12 2021, 8:25 AM · SRE, SRE-Access-Requests
MunizaA created P17452 MunizaA SSH public key for WMF production.
Oct 12 2021, 8:21 AM

Mar 31 2021

MunizaA updated MunizaA.
Mar 31 2021, 6:12 AM