This repository is to serve ALREADYME model on FastAPI.
- torch
- fastapi[all]
- omegaconf
- transformers
- loguru
Before starting the server, the fine-tuned model weight is required. While transformers
pipeline has extremely slow, we use pickling to enhance the initialization time. Because of that, some conversion is needed:
import torch
from transformers import pipeline
pipe = pipeline("text-generation", "bloom-1b7-finetuned-readme-270k-steps", torch_dtype=torch.float16, device=0)
torch.save(pipe, "bloom-1b7-finetuned-readme-270k-steps/pipeline.pt")
Move the transformer model to app/resources
and change the path in app/resources/config.yaml
.
We recommend to build a docker image instead using in local. But it would be better to run before building the image to check any bug in the code and your fine-tuned model.
$ cd app
$ uvicorn main:app --ip [your ip address] --port [your port]
We do not provide any pre-build image yet. Build your own image with custom fine-tuned model!
$ docker build -t alreadyme-ai-serving:v0.1.2 -f Dockerfile \
--build-args CUDA_VER=11.6.1 \
--build-args CUDNN_VER=8 \
--build-args UBUNTU_VER=18.04 \
--build-args PYTHON_VER=39
You can change the version of cuda, cudnn, ubuntu and python. They can be useful for compatibility of different cloud environment. After build your image, run docker by:
$ docker run --gpus all -p 8080:80 alreadyme-ai-serving:v0.1.2
The docker container will launch the server on port 80, so you should binding to your own port number (e.g. 8080).
alreadyme-ai-serving supports OpenAPI and you can see the documentation of the APIs in your server. If the server is running locally, check out http://127.0.0.1:8080/docs
for swagger or http://127.0.0.1:8080/redoc
for redoc.
For convenience, we hosted free redoc documentation page. You may login to see the details.
alreadyme-ai-serving is released under the Apache License 2.0. License can be found in here.