Download DeepSeek AI Models

Access DeepSeek's state-of-the-art AI models for local deployment and integration into your applications.

Available Models

Choose from our range of powerful AI models tailored for different use cases.

DeepSeek-V3-0324

The latest version of our flagship model, featuring enhanced reasoning capabilities and improved multilingual support. Released on March 24, 2025, this model represents our most advanced AI system with superior performance across a wide range of tasks.

Download GitHub

DeepSeek-V3-0324 Models

Model	Total Params	Activated Params	Context Length	Download
DeepSeek-V3-0324	660B	37B	128K	Download

DeepSeek-V3-0324 uses the same base model as the previous DeepSeek-V3, with only improvements in post-training methods. For private deployment, you only need to update the checkpoint and tokenizer_config.json (tool calls related changes). The model has approximately 660B parameters, and the open-source version offers a 128K context length (while the web, app, and API provide 64K context).

DeepSeek-V3

Our powerful general-purpose AI model with exceptional reasoning, comprehension, and generation capabilities. DeepSeek-V3 excels at complex problem-solving and demonstrates strong performance in technical domains.

Download GitHub

DeepSeek-V3 Models

Model	Total Params	Activated Params	Context Length	Download
DeepSeek-V3-Base	671B	37B	128K	Download
DeepSeek-V3	671B	37B	128K	Download

NOTE

The total size of DeepSeek-V3 models on Hugging Face is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.

To ensure optimal performance and flexibility, DeepSeek has partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out the "How to Run Locally" section below.

DeepSeek-R1-0528

The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro.

Download GitHub

DeepSeek-R1-0528 Models

Model	Total Params	Activated Params	Context Length	Download
DeepSeek-R1-0528	685B	37B	128K	Download

Compared to the previous version, the upgraded model shows significant improvements in handling complex reasoning tasks. For instance, in the AIME 2025 test, the model’s accuracy has increased from 70% in the previous version to 87.5% in the current version. This advancement stems from enhanced thinking depth during the reasoning process: in the AIME test set, the previous model used an average of 12K tokens per question, whereas the new version averages 23K tokens per question.

DeepSeek-R1

Specialized for advanced reasoning tasks, DeepSeek-R1 delivers outstanding performance in mathematics, coding, and logical reasoning challenges. Built with reinforcement learning techniques, it offers unparalleled problem-solving abilities.

Download GitHub

DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository.

DeepSeek-R1 Models

Model	Total Params	Activated Params	Context Length	Download
DeepSeek-R1-Zero	671B	37B	128K	Download
DeepSeek-R1	671B	37B	128K	Download

DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models.

DeepSeek-R1-Distill Models

Model	Base Model	Download
DeepSeek-R1-Distill-Qwen-1.5B	Qwen2.5-Math-1.5B	Download
DeepSeek-R1-Distill-Qwen-7B	Qwen2.5-Math-7B	Download
DeepSeek-R1-Distill-Llama-8B	Llama-3.1-8B	Download
DeepSeek-R1-Distill-Qwen-14B	Qwen2.5-14B	Download
DeepSeek-R1-Distill-Qwen-32B	Qwen2.5-32B	Download
DeepSeek-R1-Distill-Llama-70B	Llama-3.3-70B-Instruct	Download

How to Run Locally

DeepSeek models can be deployed locally using various hardware and open-source community software.

1. DeepSeek-V3 Deployment

DeepSeek-V3 can be deployed locally using the following hardware and open-source community software:

DeepSeek-Infer Demo: DeepSeek provides a simple and lightweight demo for FP8 and BF16 inference.
SGLang: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes, with Multi-Token Prediction coming soon.[1 ]
LMDeploy: Enables efficient FP8 and BF16 inference for local and cloud deployment.
TensorRT-LLM: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon.
vLLM: Support DeepSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
AMD GPU: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
Huawei Ascend NPU: Supports running DeepSeek-V3 on Huawei Ascend devices.

Since FP8 training is natively adopted in our framework, we only provide FP8 weights. If you require BF16 weights for experimentation, you can use the provided conversion script to perform the transformation.

Here is an example of converting FP8 weights to BF16:

cd inference
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights

NOTE

Hugging Face's Transformers has not been directly supported yet.

1.1 Inference with DeepSeek-Infer Demo (example only)

System Requirements

NOTE

Linux with Python 3.10 only. Mac and Windows are not supported.

Dependencies:

torch==2.4.1
triton==3.0.0
transformers==4.46.3
safetensors==0.4.5

Model Weights & Demo Code Preparation

First, clone the DeepSeek-V3 GitHub repository:

git clone https://github.com/deepseek-ai/DeepSeek-V3.git

Navigate to the `inference` folder and install dependencies listed in `requirements.txt`. Easiest way is to use a package manager like `conda` or `uv` to create a new virtual environment and install the dependencies.

cd DeepSeek-V3/inference
pip install -r requirements.txt

Download the model weights from Hugging Face, and put them into `/path/to/DeepSeek-V3` folder.

Model Weights Conversion

Convert Hugging Face model weights to a specific format:

python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16

Run

Then you can chat with DeepSeek-V3:

torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200

Or batch inference on a given file:

torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --input-file $FILE

1.2 Inference with SGLang (recommended)

SGLang SGLang currently supports MLA optimizations, DP Attention, FP8 (W8A8), FP8 KV Cache, and Torch Compile, delivering state-of-the-art latency and throughput performance among open-source frameworks.[1 ][2 ][3 ]

Notably, SGLang v0.4.1 fully supports running DeepSeek-V3 on both NVIDIA and AMD GPUs, making it a highly versatile and robust solution.[1 ]

SGLang also supports multi-node tensor parallelism, enabling you to run this model on multiple network-connected machines.[1 ]

Multi-Token Prediction (MTP) is in development, and progress can be tracked in the optimization plan.[1 ]

Here are the launch instructions from the SGLang team:[1 ]

1.3 Inference with LMDeploy (recommended)

LMDeploy LMDeploy, a flexible and high-performance inference and serving framework tailored for large language models, now supports DeepSeek-V3. It offers both offline pipeline processing and online deployment capabilities, seamlessly integrating with PyTorch-based workflows.[1 ]

For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy, please refer to here:[1 ]

1.4 Inference with TRT-LLM (recommended)

TensorRT-LLM TensorRT-LLM now supports the DeepSeek-V3 model, offering precision options such as BF16 and INT4/INT8 weight-only. Support for FP8 is currently in progress and will be released soon. You can access the custom branch of TRTLLM specifically for DeepSeek-V3 support through the following link to experience the new features directly:[1 ][2 ]

1.5 Inference with vLLM (recommended)

vLLM vLLM v0.6.6 supports DeepSeek-V3 inference for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers pipeline parallelism allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the vLLM instructions. Please feel free to follow the enhancement plan as well.[1 ][2 ][3 ]

1.6 Recommended Inference Functionality with AMD GPUs

In collaboration with the AMD team, DeepSeek has achieved Day-One support for AMD GPUs using SGLang, with full compatibility for both FP8 and BF16 precision. For detailed guidance, please refer to the SGLang instructions.[1 ]

1.7 Recommended Inference Functionality with Huawei Ascend NPUs

The MindIE framework from the Huawei Ascend community has successfully adapted the BF16 version of DeepSeek-V3. For step-by-step guidance on Ascend NPUs, please follow the instructions here.[1 ][2 ]

2. DeepSeek-R1 Deployment

2.1 DeepSeek-R1 Models

Please visit the DeepSeek-V3 deployment section above for more information about running DeepSeek-R1 locally.

NOTE

Hugging Face's Transformers has not been directly supported yet.

2.2 DeepSeek-R1-Distill Models

DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models.

For instance, you can easily start a service using vLLM:[1 ]

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager

You can also easily start a service using SGLang:[1 ]

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2

2.3 Usage Recommendations

We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:

Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
Avoid adding a system prompt; all instructions should be contained within the user prompt.
For mathematical problems, it is advisable to include a directive in your prompt such as: 'Please reason step by step, and put your final answer within boxed.'
When evaluating model performance, it is recommended to conduct multiple tests and average the results.

Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting <think></think>) when responding to certain queries, which can adversely affect the model's performance.To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with <think></think> at the beginning of every output.

3. DeepSeek-V3-0324 Deployment

DeepSeek-V3-0324 uses the same base model as the previous DeepSeek-V3, with only improvements in post-training methods. For private deployment, you only need to update the checkpoint and tokenizer_config.json (tool calls related changes).

The deployment options and frameworks for DeepSeek-V3-0324 are identical to those for DeepSeek-V3 described in section 1. All the same toolkits (SGLang, LMDeploy, TensorRT-LLM, vLLM) support DeepSeek-V3-0324 with the same configuration options.

License Information

Information about the licenses under which DeepSeek models are released

DeepSeek-V3-0324

MIT License

Consistent with DeepSeek-R1, our open-source repository (including model weights) uniformly adopts the MIT License, and allows users to leverage model outputs and distillation methods to train other models.

View License

DeepSeek-V3

MIT License

This code repository is licensed under the MIT License. The use of DeepSeek-V3 Base/Chat models is subject to the Model License. DeepSeek-V3 series (including Base and Chat) supports commercial use.

View License

DeepSeek-R1

MIT License

This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that models like DeepSeek-R1-Distill-Qwen and DeepSeek-R1-Distill-Llama are derived from their respective base models with their original licenses.

View License

Disclaimer

DeepSeek models are provided "as is" without any express or implied warranties. Users should use the models at their own risk and ensure compliance with relevant laws and regulations. DeepSeek is not liable for any damages resulting from the use of these models.