Model Provider Configuration Guide

Detailed guide on the three types of model providers supported by Dify: Original Providers, Aggregation Platforms, and Local Deployment.

Model Provider Configuration Guide

After entering Dify, click on your avatar in the top right corner, then select Settings > Model Provider.

Settings

On this page, you can choose any model provider to install and configure.

Model Providers

Here are some famous model providers:

Install Model

1. Original Model Providers (Foundational Model Providers)

Introduction: These providers are the direct trainers and owners of foundational models. They offer the most native API services, usually featuring the latest model versions, the most stable service quality, and official technical support. Integrating these models typically requires applying for an API Key directly from their official websites.

  • OpenAI: Creator of the ChatGPT series (GPT-4o, GPT-3.5), the industry benchmark.
  • Gemini (Google): Google's flagship multimodal model, featuring an ultra-large context window.
  • Anthropic (Claude): Known for safety and excellent natural language understanding. Claude, from version 3 to the current 4.5, has been a strong performer in coding and writing.
  • DeepSeek: A leading domestic (China) open-source and closed-source model provider, offering high cost-performance ratio and excellent performance in reasoning and coding capabilities.

Dify Integration Tip: Dify officially provides built-in support for these mainstream providers. After installing the plugin, simply enter the API Key in the "Model Provider" list.

Native Providers

2. Model Aggregation Providers (MaaS / API Aggregation Platforms)

Introduction: These platforms usually do not train models themselves but integrate various mainstream models worldwide (including closed-source GPT/Claude and open-source Llama/Qwen) into a unified interface.

Core Advantages:

  1. Unified Interface: Usually fully compatible with the OpenAI interface format, allowing you to switch between dozens of models with a single integration.
  2. No GPU Required: Allows users to use open-source large models at a very low price without expensive hardware.
  3. Convenient Payment: Solves issues like difficult model payments and account bans.

Representative Providers:

  • OpenRouter: A globally renowned model aggregation platform gathering almost all mainstream open-source and closed-source models. With transparent pricing, it allows using top-tier models at very low exchange rates, making it the "Model Supermarket" of choice for developers.
  • 302.ai: A comprehensive AI relay platform that aggregates not only LLMs but also painting, voice, and other models. It features a user-friendly interface and convenient management.
  • SiliconFlow: An emerging high-performance large model inference platform. It focuses on providing ultra-high concurrency and ultra-low latency open-source model services (such as DeepSeek, Kimi, etc.). Its features include extremely fast inference speed (high token generation rate) and often offers free quotas, making it very suitable for development and testing.

Dify Integration Tip: In Dify, you can integrate these via the "OpenAI-API-Compatible" method or find the corresponding model provider plugin. Fill in the platform's Base URL and API Key.

Aggregation Providers

3. Self-hosted Models (Local/Private Cloud Inference)

Introduction: Run models on local servers or private clouds using your own hardware (GPU/CPU).

Core Scenarios:

  1. Data Privacy: Data never leaves the local environment, suitable for handling sensitive business.
  2. Cost Control: May be cheaper than APIs for long-term high-volume calls.
  3. Offline Operation: Use AI capabilities in offline environments.

Representative Tools:

  • Ollama: Currently the most popular local model running tool. Its biggest feature is extreme simplicity, as easy as installing Docker. Supports macOS, Linux, and Windows; download and run models like Llama3, Qwen2 with a single command. Very suitable for individual developers and lightweight applications.
  • vLLM: A high-performance inference engine for production environments. It uses PagedAttention technology to greatly improve video memory utilization and throughput. If you need to provide high-concurrency services for a large number of users on a private server, vLLM is a more professional choice than Ollama.

Dify Integration Tip: Dify supports integration via Ollama and LocalAI. You need to ensure that the Dify container can access the IP address of your local model (for example, using host.docker.internal or the LAN IP).

Self-hosted Models