# Install vLLM with CUDA 12.8. # If you are using pip. pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128 # If you are using uv. uv pip install vllm --torch-backend=auto
下载模型
在下载前,请先通过如下命令安装ModelScope
1
pip install modelscope
下载模型到本地(样例:Qwen/Qwen2.5-32B-Instruct-AWQ) 使用魔塔社区进行下载,创建魔塔社区的账号 进入模型库菜单,选择想要下载的模型,点击下载模型按钮,可以看到下载命令,咱们选择SDK下载。 下面是批量下载py脚本,直接进行python run download.py
served-model-name 模型名称 port 模型端口 gpu-memory-utilization GPU使用量 api-key 模型秘钥 & 后台启动
测试模型
(有秘钥版)
1 2 3 4 5 6 7 8 9 10 11
curl http://192.168.3.52:8000/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer token-abc123" -d '{ "model": "Qwen2.5-32B-Instruct-AWQ", "messages": [ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, {"role": "user", "content": "Who won the world series in 2020?"} ], "temperature": 0.7, "top_p": 0.8, "repetition_penalty": 1.05, "max_tokens": 512 }'
(无秘钥版)
1 2 3 4 5 6 7 8 9 10 11
curl http://192.168.3.52:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "Qwen2.5-32B-Instruct-AWQ", "messages": [ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, {"role": "user", "content": "Who won the world series in 2020?"} ], "temperature": 0.7, "top_p": 0.8, "repetition_penalty": 1.05, "max_tokens": 512 }'