LLM Provider Configuration Guide¶

This guide covers configuring different Large Language Model (LLM) providers with the Mattermost Agents plugin. Each provider has specific configuration requirements and capabilities.

Supported Providers¶

The Mattermost Agents plugin currently supports these LLM providers:

Local models via OpenAI-compatible APIs (Ollama, vLLM, etc.)
OpenAI
Anthropic
AWS Bedrock
Cohere
Mistral
Scale AI
Azure OpenAI
Google Gemini
Google Vertex AI

General Configuration Concepts¶

For any LLM provider, you’ll need to configure API authentication (keys, tokens, or other authentication methods), model selection for different use cases, parameters like context length and token limits, and ensure proper connectivity to provider endpoints.

Local Models (OpenAI Compatible)¶

The OpenAI Compatible option allows integration with any OpenAI-compatible LLM provider, such as Ollama:

Configuration¶

Deploy your model, for example, on Ollama
Select OpenAI Compatible in the AI Service dropdown
Enter the URL to your AI service from your Mattermost deployment in the API URL field. Be sure to include the port, and append /v1 to the end of the URL if using Ollama. (e.g., http://localhost:11434/v1 for Ollama, otherwise http://localhost:11434/)
If using Ollama, leave the API Key field blank
Specify your model name in the Default Model field

Configuration Options¶

Setting	Required	Description
API URL	Yes	The endpoint URL for your OpenAI-compatible API
API Key	No	API key if your service requires authentication
Default Model	Yes	The model to use by default
Organization ID	No	Organization ID if your service supports it
Send User ID	No	Whether to send user IDs to the service
Use Responses API	No	Defaults to enabled. Uses the OpenAI Responses API when supported. Turn off for legacy Chat Completions compatibility with endpoints that do not implement the Responses API.

Special Considerations¶

Ensure your self-hosted solution has sufficient compute resources and test for compatibility with the Mattermost plugin. Some advanced features may not be available with all compatible providers, so adjust token limits based on your deployment’s capabilities. If you need OpenAI-compatible behavior without the Responses API, use OpenAI Compatible with Use Responses API disabled instead of the OpenAI service type.

OpenAI¶

Authentication¶

Obtain an OpenAI API key, then select OpenAI in the Service dropdown and enter your API key. Specify a model name in the Default Model field that corresponds with the model’s label in the API. If your API key belongs to an OpenAI organization, you can optionally specify your Organization ID.

Direct OpenAI services always use the OpenAI Responses API. There is no System Console setting to disable the Responses API for this service type.

Configuration Options¶

Setting	Required	Description
API Key	Yes	Your OpenAI API key
Organization ID	No	Your OpenAI organization ID
Default Model	Yes	The model to use by default (see OpenAI’s model documentation)
Send User ID	No	Whether to send user IDs to OpenAI

Anthropic (Claude)¶

Authentication¶

Obtain an Anthropic API key, then select Anthropic in the Service dropdown and enter your API key. Specify a model name in the Default Model field that corresponds with the model’s label in the API.

Configuration Options¶

Setting	Required	Description
API Key	Yes	Your Anthropic API key
Default Model	Yes	The model to use by default (see Anthropic’s model documentation)

AWS Bedrock¶

AWS Bedrock provides access to foundation models from Anthropic (Claude), Amazon (Nova, Titan), and other providers via a unified API. For full setup instructions — including IAM policy configuration and Anthropic-specific Claude requirements — see the AWS Bedrock Setup Guide.

Authentication¶

The plugin uses the AWS SDK default credential chain. For Mattermost servers running on EC2, attach an IAM instance profile to your instance and leave all credential fields blank — the SDK discovers credentials automatically. For non-EC2 deployments, enter an AWS Access Key ID and AWS Secret Access Key, or a short-term Bedrock console API key.

Configuration Options¶

Setting	Required	Description
AWS Region	Yes	AWS region where Bedrock is available (e.g., `us-east-1`, `us-west-2`, `eu-central-1`)
Custom Endpoint URL	No	Optional custom endpoint for VPC endpoints or proxies. Leave blank for standard AWS endpoints.
AWS Access Key ID	No	IAM user access key ID for long-term credentials. Takes precedence over API Key if both are set.
AWS Secret Access Key	No	IAM user secret access key. Required if AWS Access Key ID is provided.
API Key	No	Bedrock console API key (base64 encoded, format: `ABSKQm...`). If IAM credentials above are set, they take precedence.
Default Model	Yes	The Bedrock model ID to use (e.g., `us.anthropic.claude-sonnet-4-6`). See the AWS Bedrock model IDs documentation for the full list of available models and their IDs. Model availability varies by AWS region.

Cohere¶

Authentication¶

Obtain a Cohere API key, then select Cohere in the Service dropdown and enter your API key. Specify a model name in the Default Model field that corresponds with the model’s label in the API.

Configuration Options¶

Setting	Required	Description
API Key	Yes	Your Cohere API key
Default Model	Yes	The model to use by default (see Cohere’s model documentation)

Mistral¶

Authentication¶

Obtain a Mistral API key, then select Mistral in the Service dropdown and enter your API key. Specify a model name in the Default Model field that corresponds with the model’s label in the API.

Configuration Options¶

Setting	Required	Description
API Key	Yes	Your Mistral API key
Default Model	Yes	The model to use by default (see Mistral’s model documentation)

Scale AI¶

Overview¶

Scale AI (including ScaleGov) provides access to LLM models through an OpenAI-compatible API with custom authentication. Scale uses x-api-key and x-selected-account-id headers for authentication instead of the standard Authorization header.

Authentication¶

Obtain your Scale AI API key and account ID from your Scale AI or ScaleGov dashboard, then select Scale AI in the Service dropdown. Enter your API key and the API URL for your Scale endpoint (e.g., https://sgp-api.scalegov.com/v5). If using ScaleGov, enter your account ID in the Account ID field.

Configuration Options¶

Setting	Required	Description
API Key	Yes	Your Scale AI API key (sent as `x-api-key` header)
API URL	Yes	Your Scale API endpoint (e.g., `https://sgp-api.scalegov.com/v5`)
Account ID	No	Your Scale account ID (sent as `x-selected-account-id` header, required for ScaleGov)
Default Model	Yes	The model to use by default in `vendor/model-name` format (e.g., `openai/gpt-4o`)

Models¶

Models use the vendor/model-name format (e.g., openai/gpt-4o). For the full list of available models, see the Scale AI documentation.

Azure OpenAI¶

Authentication¶

For more details about integrating with Microsoft Azure’s OpenAI services, see the official Azure OpenAI documentation.

Provision sufficient access to Azure OpenAI for your organization and access your Azure portal
If you do not already have one, deploy an Azure AI Hub resource within Azure AI Studio
Once the deployment is complete, navigate to the resource and select Launch Azure AI Studio
In the side navigation pane, select Deployments under Shared resources
Select Deploy model then Deploy base model
Select your desired model and select Confirm
Select Deploy to start your model
In Mattermost, select Azure in the Service dropdown
In the Endpoint panel for your new model deployment, copy the base URI of the Target URI (everything up to and including .com) and paste it in the API URL field in Mattermost
In the Endpoint panel for your new model deployment, copy the Key and paste it in the API Key field in Mattermost
In the Deployment panel for your new model deployment, copy the Model name and paste it in the Default Model field in Mattermost

Configuration Options¶

Setting	Required	Description
API Key	Yes	Your Azure OpenAI API key
API URL	Yes	Your Azure OpenAI endpoint
Default Model	Yes	The model to use by default (see Azure OpenAI’s model documentation)
Send User ID	No	Whether to send user IDs to Azure OpenAI
Use Responses API	No	Defaults to enabled. Uses the OpenAI Responses API when your Azure deployment supports it. Turn off for legacy Chat Completions compatibility if your endpoint or deployment does not support the Responses API.

Google Gemini¶

Google Gemini uses the Generative Language API (AI Studio), which authenticates with a single API key. Both Gemini and Vertex AI route through Bifrost, the plugin’s unified LLM gateway. If you need enterprise GCP authentication, project/region scoping, or VPC-SC, use Google Vertex AI instead.

When to choose Gemini vs. Vertex AI¶

Use Google Gemini when…	Use Google Vertex AI when…
You want the simplest setup — single API key from AI Studio	You need GCP-scoped billing, IAM, or audit logging
You’re testing models or running a small team	You need region pinning, VPC-SC, or private connectivity
You don’t have a GCP project	You already have a GCP project and want to centralize spend
Fine to call `generativelanguage.googleapis.com` directly	You need Anthropic Claude models served through Vertex

Setup¶

Sign in to Google AI Studio and create an API key at aistudio.google.com/apikey.
In Mattermost, open System Console > Agents > LLM Services and add a new service (or open an existing Gemini service).
Select Google Gemini in the Service dropdown.
Paste your AI Studio key into the API Key field.
Enter a model identifier in the Default Model field (for example, gemini-2.5-pro or gemini-2.5-flash). Use Google’s model catalog to confirm the exact ID.
Save the service. There is no API URL field in the System Console for the Gemini service type. The underlying apiURL config field is accepted by the API and forwarded to Bifrost as the base URL if non-empty, enabling egress-proxy use cases for operators who configure services programmatically.

Configuration Options¶

Setting	Required	Description
API Key	Yes	Your Gemini API key from AI Studio. Stored as a secret.
Default Model	Yes	The model to use by default (see Gemini model documentation). Common choices: `gemini-2.5-pro`, `gemini-2.5-flash`.
Input Token Limit	No	Optional override for the maximum input context size, in tokens. Leave blank to use the model’s default.
Output Token Limit	No	Optional override for the maximum output (`max_tokens`) the plugin will request.

The Send User ID and Use Responses API toggles are not exposed for the Google Gemini service type. Bifrost automatically switches to the Responses API path when you enable a native Google tool or when an agent or feature requires native web search; in all other cases the Chat path is used.

Reasoning (thinking) and model-version mapping¶

Gemini supports provider-native reasoning (“thinking”) through Bifrost. Enable Reasoning on the agent (System Console > Agents > select agent > Config tab) to turn it on. The configuration controls behave differently across model generations:

Model generation	What Thinking Budget does	What Reasoning Effort does
Gemini 2.5 (Pro, Flash, …)	Sets `thinkingConfig.thinkingBudget` directly (token count).	Translated into an estimated `thinkingBudget` — the effort selector (“minimal” / “low” / “medium” / “high”) is not natively supported on 2.5, so Bifrost maps it to a budget.
Gemini 3.0+	Sets `thinkingConfig.thinkingBudget` directly.	Sets `thinkingConfig.thinkingLevel` (the native 3.0+ field) using the same minimal/low/medium/high values.

When both Thinking Budget and Reasoning Effort are set, the explicit thinking budget wins for both Chat and Responses API paths. The default effort when nothing is set is "medium".

Reasoning works on both the Chat Completions and Responses API paths; on the Responses path Bifrost also enables a reasoning summary so the provider streams reasoning text back as reasoning_summary events.

Bifrost detects thinking support by model name: any model containing ‘thinking’, any gemini-2.5-* model, or any Gemini 3.0+ model. Earlier 2.0 non-thinking models are silently skipped — thinkingConfig is not sent for them.

Bifrost uses https://generativelanguage.googleapis.com/v1beta. Egress proxies must whitelist paths starting with /v1beta/models/.

Native Google tools¶

Native Google tools are off by default for Gemini, matching the same posture as Cohere, Mistral, and Bedrock. Enable them per agent in the Config tab under Native Google Tools.

Tool	What it does	Notes
Web Search	Grounds responses in Google Search results (web search + citations) via Bifrost’s Responses API.	When enabled, Bifrost auto-switches the request to the Responses API path. The agent’s other function tools and MCP tools continue to work. Grounding citations are normalized to the same `url_citation` annotation format used by OpenAI and Anthropic providers — the webapp renders them identically.

Other native tools advertised by OpenAI (file search, code interpreter) are not available in the System Console for Google providers — only Web Search is offered.

If a feature surface uses NativeWebSearchAllowed at request time (for example, the in-product web-search shortcut), Bifrost adds web search to the Gemini Responses request even if the agent has not explicitly checked Web Search under Native Google Tools. This is how the plugin lets a single feature toggle web search on without forcing every agent to enable it manually.

Example service configuration¶

# System Console > Agents > LLM Services > Add Service
Service:           Google Gemini
API Key:           ABcDef...   # from https://aistudio.google.com/apikey
Default Model:     gemini-2.5-pro
Output Token Limit: 8192       # optional

# Per-agent (System Console > Agents > <agent> > Config)
Reasoning:                Enabled
Thinking Budget (tokens): 4096           # optional; wins over effort
Reasoning Effort:         medium         # used when budget is blank
Native Google Tools:
  Web Search:             Enabled        # off by default

Google Vertex AI¶

Vertex AI provides access to Gemini and other Google models through Google Cloud’s enterprise AI platform, with project-scoped billing, regional deployment, and IAM-based access control. Like the direct Gemini integration, Vertex AI is routed through Bifrost.

Prerequisites¶

Before configuring the service in Mattermost, complete this in GCP:

Have or create a Google Cloud project. Note the Project ID (slug, e.g., my-project-123) and the Project Number (numeric, e.g., 123456789012).
Enable the Vertex AI API for that project: gcloud services enable aiplatform.googleapis.com --project <PROJECT_ID> or the equivalent in the GCP Console.
Choose a region that has the Vertex models you need (for example, us-central1, us-east5, europe-west4). Model availability varies by region — check Vertex AI model availability before committing.
Decide your authentication mode — see below.

Authentication¶

The plugin supports two authentication modes via Bifrost:

Application Default Credentials (ADC) — recommended when the plugin runs on GCP (GKE, GCE, Cloud Run) with an attached service account, or when the host’s GOOGLE_APPLICATION_CREDENTIALS environment variable points at a service-account key file. Leave the Service Account JSON field blank to use ADC.
Service Account JSON — paste the full contents of a service-account key JSON into the Service Account JSON field. The plugin validates that the field contains valid JSON before saving. The service account needs roles/aiplatform.user (or a custom role with the aiplatform.endpoints.predict permission) in your project.

To create a service-account key for the JSON path:

In the GCP Console, go to IAM & Admin > Service Accounts and create a service account (or pick an existing one).
Grant the account roles/aiplatform.user on the project.
Open the service account, go to Keys > Add Key > Create new key, choose JSON, and download the file.
Open the downloaded file, copy its full contents (including braces), and paste into the Service Account JSON field in Mattermost. Treat this file as a secret — anyone with it can call Vertex against your project.

Bifrost stores the JSON as a credential. When Service Account JSON is empty, Bifrost falls back to ADC at request time — no key material is held in plugin config.

Setup¶

Complete the prerequisites above.
In Mattermost, open System Console > Agents > LLM Services and add a new service.
Select Google Vertex AI in the Service dropdown.
Fill in GCP Project ID, GCP Region, and (optionally) GCP Project Number.
For ADC: leave Service Account JSON blank. For key-based auth: paste the full JSON.
Enter a model identifier in Default Model (for example, gemini-2.5-pro). Use the Vertex AI model catalog to confirm the exact Vertex model ID for your region.
Save the service.

Configuration Options¶

Setting	Required	Description
GCP Project ID	Yes	Your Google Cloud project ID slug (e.g., `my-project-123`).
GCP Project Number	No	Numeric project number (e.g., `123456789012`). Project Number is required only when using a numeric endpoint ID for a fine-tuned model. For all standard publisher models (Gemini, Claude, Mistral), Project Number is not consumed and can be left blank.
GCP Region	Yes	Vertex AI region (e.g., `us-central1`, `europe-west4`). Model availability is region-specific.
Service Account JSON	No	Full service-account key JSON. Validated as JSON on save. Leave blank to use ADC or an attached IAM role.
Default Model	Yes	The Vertex model ID to use (see Vertex AI model documentation).
Input Token Limit	No	Optional override for the maximum input context size, in tokens.
Output Token Limit	No	Optional override for the maximum output (`max_tokens`) the plugin will request.

The API Key, Send User ID, and Use Responses API toggles do not apply to the Google Vertex AI service type. The Responses API path is auto-enabled when a native Google tool is in use; otherwise the Chat path is used.

Reasoning (thinking) and model-version mapping¶

For Gemini models running on Vertex AI, reasoning is configured the same way as direct Gemini and follows the same model-version mapping (Gemini 2.5 vs. 3.0+). Enable Reasoning on the agent and set either Thinking Budget (tokens) or Reasoning Effort (minimal / low / medium / high). When both are set, the explicit budget wins.

Anthropic Claude on Vertex. When you select an Anthropic model ID (for example, claude-sonnet-4-6@20260101) on a Vertex service, the agent uses Anthropic-style extended thinking instead of thinkingConfig. The Thinking Budget field still applies; the Reasoning Effort selector is ignored. Model IDs follow the format {claude-model-name}@{YYYYMMDD} where the date is the Anthropic snapshot version. Check the Anthropic on Vertex AI documentation for current versions.

Native Google tools¶

Native Google tools are off by default for Vertex AI, matching the same posture as Cohere, Mistral, and Bedrock. Enable per agent under Native Google Tools.

Tool	What it does	Notes
Web Search	Grounds responses with Google Search via the Vertex Responses API.	When enabled, Bifrost switches to the Responses API. Citations are returned alongside the response. If requests fail with grounding enabled, confirm the Vertex AI API is enabled, the selected model supports grounding, and your Google Cloud project satisfies Vertex AI grounding prerequisites.

OpenAI-style file_search and code_interpreter are not available in the System Console for Google providers — only Web Search is offered.

Example service configuration¶

# System Console > Agents > LLM Services > Add Service
Service:               Google Vertex AI
GCP Project ID:        my-project-123
GCP Project Number:    123456789012   # optional
GCP Region:            us-central1
Service Account JSON:  ""             # blank = ADC
Default Model:         gemini-2.5-pro
Output Token Limit:    8192           # optional

# Per-agent (System Console > Agents > <agent> > Config)
Reasoning:                Enabled
Thinking Budget (tokens): 4096           # optional; wins over effort on Gemini
Reasoning Effort:         medium         # mapped to thinkingLevel on Gemini 3.0+
Native Google Tools:
  Web Search:             Enabled        # off by default

Troubleshooting¶

Saving fails with “invalid service account JSON”. The plugin validates that the Service Account JSON field contains valid JSON before saving. Re-copy the full contents of the key file, including the surrounding { }, and check there are no truncated or escaped characters.
Requests fail with PERMISSION_DENIED. Confirm the service account has roles/aiplatform.user on the project, and that the Vertex AI API is enabled. For ADC deployments, confirm the bound principal (workload identity, instance SA, etc.) has the same role.
Model not found in region. Vertex model IDs are region-scoped. Check the model is available in your GCP Region, or switch to a region that has it.
Web search returns no citations. Confirm Web Search is checked under Native Google Tools for the agent, verify the selected model supports grounding, and make sure your Google Cloud project meets the current Vertex AI grounding prerequisites.