安装 SGLang

目录

安装 SGLang#

你可以使用以下任何方法安装 SGLang。

方法 1：使用 pip#

pip install --upgrade pip
pip install "sglang[all]"

# Install FlashInfer CUDA kernels
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/

方法 2：从源代码安装#

# Use the last release branch
git clone -b v0.3.0 https://github.com/sgl-project/sglang.git
cd sglang

pip install --upgrade pip
pip install -e "python[all]"

# Install FlashInfer CUDA kernels
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/

方法 3：使用 Docker#

Docker 镜像在 Docker Hub 上以 lmsysorg/sglang 的形式提供，构建自 Dockerfile。将下面的 <secret> 替换为你的 huggingface hub token。

docker run --gpus all \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000

方法 4：使用 Docker Compose#

More

如果你打算将其作为服务提供，建议使用此方法。更好的方法是使用 k8s-sglang-service.yaml。

将 compose.yml 复制到你的本地机器
在你的终端中执行命令 docker compose up -d。

方法 5：在 Kubernetes 或云上使用 SkyPilot 运行#

More

要部署到 Kubernetes 或 12+ 云，你可以使用 SkyPilot。

安装 SkyPilot 并设置 Kubernetes 集群或云访问：请参阅 SkyPilot 文档。
使用单个命令在你的基础设施上部署并获取 HTTP API 端点：

SkyPilot YAML: sglang.yaml

# sglang.yaml
envs:
  HF_TOKEN: null

resources:
  image_id: docker:lmsysorg/sglang:latest
  accelerators: A100
  ports: 30000

run: |
  conda deactivate
  python3 -m sglang.launch_server \
    --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
    --host 0.0.0.0 \
    --port 30000

# Deploy on any cloud or Kubernetes cluster. Use --cloud <cloud> to select a specific cloud provider.
HF_TOKEN=<secret> sky launch -c sglang --env HF_TOKEN sglang.yaml

# Get the HTTP API endpoint
sky status --endpoint 30000 sglang

要使用自动缩放和故障恢复进一步扩展你的部署，请查看 SkyServe + SGLang 指南。

常见说明#

FlashInfer 是默认的注意力内核后端。它只支持 sm75 及更高版本。如果你在 sm75+ 设备（例如 T4、A10、A100、L4、L40S、H100）上遇到任何与 FlashInfer 相关的問題，请通过添加 --attention-backend triton --sampling-backend pytorch 切换到其他内核，并在 GitHub 上打开一个问题。
如果你只需要使用 OpenAI 后端，你可以使用 pip install "sglang[openai]" 来避免安装其他依赖项。