Rawal Khirodkar
Β·
Timur Bagautdinov
Β·
Julieta Martinez
Β·
Su Zhaoen
Β·
Austin James
Peter Selednik
.
Stuart Anderson
.
Shunsuke Saito
Sapiens offers a comprehensive suite for human-centric vision tasks (e.g., 2D pose, part segmentation, depth, normal, etc.). The model family is pretrained on 300 million in-the-wild human images and shows excellent generalization to unconstrained conditions. These models are also designed for extracting high-resolution features, having been natively trained at a 1024 x 1024 image resolution with a 16-pixel patch size.
git clone https://github.com/facebookresearch/sapiens.git
export SAPIENS_ROOT=/path/to/sapiens
For users setting up their own environment primarily for running existing models in inference mode, we recommend the Sapiens-Lite installation.
This setup offers optimized inference (4x faster) with minimal dependencies (only PyTorch + numpy + cv2).
To replicate our complete training setup, run the provided installation script.
This will create a new conda environment named sapiens
and install all necessary dependencies.
cd $SAPIENS_ROOT/_install
./conda.sh
Please download the original checkpoints from hugging-face.
You can be selective about only downloading the checkpoints of interest.
Set $SAPIENS_CHECKPOINT_ROOT
to be the path to the sapiens_host
folder. Place the checkpoints following this directory structure:
sapiens_host/
βββ detector/
β βββ checkpoints/
β βββ rtmpose/
βββ pretrain/
β βββ checkpoints/
β βββ sapiens_0.3b/
βββ sapiens_0.3b_epoch_1600_clean.pth
β βββ sapiens_0.6b/
βββ sapiens_0.6b_epoch_1600_clean.pth
β βββ sapiens_1b/
β βββ sapiens_2b/
βββ pose/
βββ checkpoints/
βββ sapiens_0.3b/
βββ seg/
βββ depth/
βββ normal/
We finetune sapiens for multiple human-centric vision tasks. Please checkout the list below.
Finetuning our models is super-easy! Here is a detailed training guide for the following tasks.
We would like to acknowledge the work by OpenMMLab which this project benefits from.
For any questions or issues, please open an issue in the repository.
See contributing and the code of conduct.
This project is licensed under LICENSE.
Portions derived from open-source projects are licensed under Apache 2.0.
If you use Sapiens in your research, please consider citing us.
@misc{khirodkar2024_sapiens,
title={Sapiens: Foundation for Human Vision Models},
author={Khirodkar, Rawal and Bagautdinov, Timur and Martinez, Julieta and Zhaoen, Su and James, Austin and Selednik, Peter and Anderson, Stuart and Saito, Shunsuke},
year={2024},
eprint={2408.12569},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.12569}
}