Access DeepSeek's state-of-the-art AI models for local deployment and integration into your applications.
Choose from our range of powerful AI models tailored for different use cases.
The latest version of our flagship model, featuring enhanced reasoning capabilities and improved multilingual support. Released on March 24, 2025, this model represents our most advanced AI system with superior performance across a wide range of tasks.
Model | Total Params | Activated Params | Context Length | Download |
---|---|---|---|---|
DeepSeek-V3-0324 | 660B | 37B | 128K | Download |
DeepSeek-V3-0324 uses the same base model as the previous DeepSeek-V3, with only improvements in post-training methods. For private deployment, you only need to update the checkpoint and tokenizer_config.json (tool calls related changes). The model has approximately 660B parameters, and the open-source version offers a 128K context length (while the web, app, and API provide 64K context).
Our powerful general-purpose AI model with exceptional reasoning, comprehension, and generation capabilities. DeepSeek-V3 excels at complex problem-solving and demonstrates strong performance in technical domains.
NOTE
The total size of DeepSeek-V3 models on Hugging Face is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.
To ensure optimal performance and flexibility, DeepSeek has partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out the "How to Run Locally" section below.
Specialized for advanced reasoning tasks, DeepSeek-R1 delivers outstanding performance in mathematics, coding, and logical reasoning challenges. Built with reinforcement learning techniques, it offers unparalleled problem-solving abilities.
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository.
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models.
Model | Base Model | Download |
---|---|---|
DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | Download |
DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | Download |
DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | Download |
DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | Download |
DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | Download |
DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | Download |
DeepSeek models can be deployed locally using various hardware and open-source community software.
DeepSeek-V3 can be deployed locally using the following hardware and open-source community software:
Since FP8 training is natively adopted in our framework, we only provide FP8 weights. If you require BF16 weights for experimentation, you can use the provided conversion script to perform the transformation.
Here is an example of converting FP8 weights to BF16:
cd inference
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
NOTE
Hugging Face's Transformers has not been directly supported yet.
NOTE
Linux with Python 3.10 only. Mac and Windows are not supported.
Dependencies:
torch==2.4.1
triton==3.0.0
transformers==4.46.3
safetensors==0.4.5
First, clone the DeepSeek-V3 GitHub repository:
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
Navigate to the `inference` folder and install dependencies listed in `requirements.txt`. Easiest way is to use a package manager like `conda` or `uv` to create a new virtual environment and install the dependencies.
cd DeepSeek-V3/inference
pip install -r requirements.txt
Download the model weights from Hugging Face, and put them into `/path/to/DeepSeek-V3` folder.
Convert Hugging Face model weights to a specific format:
python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
Then you can chat with DeepSeek-V3:
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200
Or batch inference on a given file:
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --input-file $FILE
SGLang SGLang currently supports MLA optimizations, DP Attention, FP8 (W8A8), FP8 KV Cache, and Torch Compile, delivering state-of-the-art latency and throughput performance among open-source frameworks.[1 ][2 ][3 ]
Notably, SGLang v0.4.1 fully supports running DeepSeek-V3 on both NVIDIA and AMD GPUs, making it a highly versatile and robust solution.[1 ]
SGLang also supports multi-node tensor parallelism, enabling you to run this model on multiple network-connected machines.[1 ]
Multi-Token Prediction (MTP) is in development, and progress can be tracked in the optimization plan.[1 ]
Here are the launch instructions from the SGLang team:[1 ]
LMDeploy LMDeploy, a flexible and high-performance inference and serving framework tailored for large language models, now supports DeepSeek-V3. It offers both offline pipeline processing and online deployment capabilities, seamlessly integrating with PyTorch-based workflows.[1 ]
For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy, please refer to here:[1 ]
TensorRT-LLM TensorRT-LLM now supports the DeepSeek-V3 model, offering precision options such as BF16 and INT4/INT8 weight-only. Support for FP8 is currently in progress and will be released soon. You can access the custom branch of TRTLLM specifically for DeepSeek-V3 support through the following link to experience the new features directly:[1 ][2 ]
vLLM vLLM v0.6.6 supports DeepSeek-V3 inference for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers pipeline parallelism allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the vLLM instructions. Please feel free to follow the enhancement plan as well.[1 ][2 ][3 ]
In collaboration with the AMD team, DeepSeek has achieved Day-One support for AMD GPUs using SGLang, with full compatibility for both FP8 and BF16 precision. For detailed guidance, please refer to the SGLang instructions.[1 ]
The MindIE framework from the Huawei Ascend community has successfully adapted the BF16 version of DeepSeek-V3. For step-by-step guidance on Ascend NPUs, please follow the instructions here.[1 ][2 ]
Please visit the DeepSeek-V3 deployment section above for more information about running DeepSeek-R1 locally.
NOTE
Hugging Face's Transformers has not been directly supported yet.
DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models.
For instance, you can easily start a service using vLLM:[1 ]
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
You can also easily start a service using SGLang:[1 ]
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2
We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:
Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting <think></think>) when responding to certain queries, which can adversely affect the model's performance.To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with <think></think> at the beginning of every output.
DeepSeek-V3-0324 uses the same base model as the previous DeepSeek-V3, with only improvements in post-training methods. For private deployment, you only need to update the checkpoint and tokenizer_config.json (tool calls related changes).
The deployment options and frameworks for DeepSeek-V3-0324 are identical to those for DeepSeek-V3 described in section 1. All the same toolkits (SGLang, LMDeploy, TensorRT-LLM, vLLM) support DeepSeek-V3-0324 with the same configuration options.
Information about the licenses under which DeepSeek models are released
Consistent with DeepSeek-R1, our open-source repository (including model weights) uniformly adopts the MIT License, and allows users to leverage model outputs and distillation methods to train other models.
View LicenseThis code repository is licensed under the MIT License. The use of DeepSeek-V3 Base/Chat models is subject to the Model License. DeepSeek-V3 series (including Base and Chat) supports commercial use.
View LicenseThis code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that models like DeepSeek-R1-Distill-Qwen and DeepSeek-R1-Distill-Llama are derived from their respective base models with their original licenses.
View LicenseDeepSeek models are provided "as is" without any express or implied warranties. Users should use the models at their own risk and ensure compliance with relevant laws and regulations. DeepSeek is not liable for any damages resulting from the use of these models.