LLM Provider Configuration Guide¶
This guide covers configuring different Large Language Model (LLM) providers with the Mattermost Agents plugin. Each provider has specific configuration requirements and capabilities.
Supported Providers¶
The Mattermost Agents plugin currently supports these LLM providers:
Local models via OpenAI-compatible APIs (Ollama, vLLM, etc.)
OpenAI
Anthropic
AWS Bedrock
Cohere
Mistral
Scale AI
Azure OpenAI
Google Gemini
Google Vertex AI
General Configuration Concepts¶
For any LLM provider, you’ll need to configure API authentication (keys, tokens, or other authentication methods), model selection for different use cases, parameters like context length and token limits, and ensure proper connectivity to provider endpoints.
Local Models (OpenAI Compatible)¶
The OpenAI Compatible option allows integration with any OpenAI-compatible LLM provider, such as Ollama:
Configuration¶
Deploy your model, for example, on Ollama
Select OpenAI Compatible in the AI Service dropdown
Enter the URL to your AI service from your Mattermost deployment in the API URL field. Be sure to include the port, and append
/v1to the end of the URL if using Ollama. (e.g.,http://localhost:11434/v1for Ollama, otherwisehttp://localhost:11434/)If using Ollama, leave the API Key field blank
Specify your model name in the Default Model field
Configuration Options¶
Setting |
Required |
Description |
|---|---|---|
API URL |
Yes |
The endpoint URL for your OpenAI-compatible API |
API Key |
No |
API key if your service requires authentication |
Default Model |
Yes |
The model to use by default |
Organization ID |
No |
Organization ID if your service supports it |
Send User ID |
No |
Whether to send user IDs to the service |
Use Responses API |
No |
Defaults to enabled. Uses the OpenAI Responses API when supported. Turn off for legacy Chat Completions compatibility with endpoints that do not implement the Responses API. |
Special Considerations¶
Ensure your self-hosted solution has sufficient compute resources and test for compatibility with the Mattermost plugin. Some advanced features may not be available with all compatible providers, so adjust token limits based on your deployment’s capabilities. If you need OpenAI-compatible behavior without the Responses API, use OpenAI Compatible with Use Responses API disabled instead of the OpenAI service type.
OpenAI¶
Authentication¶
Obtain an OpenAI API key, then select OpenAI in the Service dropdown and enter your API key. Specify a model name in the Default Model field that corresponds with the model’s label in the API. If your API key belongs to an OpenAI organization, you can optionally specify your Organization ID.
Direct OpenAI services always use the OpenAI Responses API. There is no System Console setting to disable the Responses API for this service type.
Configuration Options¶
Setting |
Required |
Description |
|---|---|---|
API Key |
Yes |
Your OpenAI API key |
Organization ID |
No |
Your OpenAI organization ID |
Default Model |
Yes |
The model to use by default (see OpenAI’s model documentation) |
Send User ID |
No |
Whether to send user IDs to OpenAI |
Anthropic (Claude)¶
Authentication¶
Obtain an Anthropic API key, then select Anthropic in the Service dropdown and enter your API key. Specify a model name in the Default Model field that corresponds with the model’s label in the API.
Configuration Options¶
Setting |
Required |
Description |
|---|---|---|
API Key |
Yes |
Your Anthropic API key |
Default Model |
Yes |
The model to use by default (see Anthropic’s model documentation) |
AWS Bedrock¶
AWS Bedrock provides access to foundation models from Anthropic (Claude), Amazon (Nova, Titan), and other providers via a unified API. For full setup instructions — including IAM policy configuration and Anthropic-specific Claude requirements — see the AWS Bedrock Setup Guide.
Authentication¶
The plugin uses the AWS SDK default credential chain. For Mattermost servers running on EC2, attach an IAM instance profile to your instance and leave all credential fields blank — the SDK discovers credentials automatically. For non-EC2 deployments, enter an AWS Access Key ID and AWS Secret Access Key, or a short-term Bedrock console API key.
Configuration Options¶
Setting |
Required |
Description |
|---|---|---|
AWS Region |
Yes |
AWS region where Bedrock is available (e.g., |
Custom Endpoint URL |
No |
Optional custom endpoint for VPC endpoints or proxies. Leave blank for standard AWS endpoints. |
AWS Access Key ID |
No |
IAM user access key ID for long-term credentials. Takes precedence over API Key if both are set. |
AWS Secret Access Key |
No |
IAM user secret access key. Required if AWS Access Key ID is provided. |
API Key |
No |
Bedrock console API key (base64 encoded, format: |
Default Model |
Yes |
The Bedrock model ID to use (e.g., |
Cohere¶
Authentication¶
Obtain a Cohere API key, then select Cohere in the Service dropdown and enter your API key. Specify a model name in the Default Model field that corresponds with the model’s label in the API.
Configuration Options¶
Setting |
Required |
Description |
|---|---|---|
API Key |
Yes |
Your Cohere API key |
Default Model |
Yes |
The model to use by default (see Cohere’s model documentation) |
Mistral¶
Authentication¶
Obtain a Mistral API key, then select Mistral in the Service dropdown and enter your API key. Specify a model name in the Default Model field that corresponds with the model’s label in the API.
Configuration Options¶
Setting |
Required |
Description |
|---|---|---|
API Key |
Yes |
Your Mistral API key |
Default Model |
Yes |
The model to use by default (see Mistral’s model documentation) |
Scale AI¶
Overview¶
Scale AI (including ScaleGov) provides access to LLM models through an OpenAI-compatible API with custom authentication. Scale uses x-api-key and x-selected-account-id headers for authentication instead of the standard Authorization header.
Authentication¶
Obtain your Scale AI API key and account ID from your Scale AI or ScaleGov dashboard, then select Scale AI in the Service dropdown. Enter your API key and the API URL for your Scale endpoint (e.g., https://sgp-api.scalegov.com/v5). If using ScaleGov, enter your account ID in the Account ID field.
Configuration Options¶
Setting |
Required |
Description |
|---|---|---|
API Key |
Yes |
Your Scale AI API key (sent as |
API URL |
Yes |
Your Scale API endpoint (e.g., |
Account ID |
No |
Your Scale account ID (sent as |
Default Model |
Yes |
The model to use by default in |
Models¶
Models use the vendor/model-name format (e.g., openai/gpt-4o). For the full list of available models, see the Scale AI documentation.
Azure OpenAI¶
Authentication¶
For more details about integrating with Microsoft Azure’s OpenAI services, see the official Azure OpenAI documentation.
Provision sufficient access to Azure OpenAI for your organization and access your Azure portal
If you do not already have one, deploy an Azure AI Hub resource within Azure AI Studio
Once the deployment is complete, navigate to the resource and select Launch Azure AI Studio
In the side navigation pane, select Deployments under Shared resources
Select Deploy model then Deploy base model
Select your desired model and select Confirm
Select Deploy to start your model
In Mattermost, select Azure in the Service dropdown
In the Endpoint panel for your new model deployment, copy the base URI of the Target URI (everything up to and including
.com) and paste it in the API URL field in MattermostIn the Endpoint panel for your new model deployment, copy the Key and paste it in the API Key field in Mattermost
In the Deployment panel for your new model deployment, copy the Model name and paste it in the Default Model field in Mattermost
Configuration Options¶
Setting |
Required |
Description |
|---|---|---|
API Key |
Yes |
Your Azure OpenAI API key |
API URL |
Yes |
Your Azure OpenAI endpoint |
Default Model |
Yes |
The model to use by default (see Azure OpenAI’s model documentation) |
Send User ID |
No |
Whether to send user IDs to Azure OpenAI |
Use Responses API |
No |
Defaults to enabled. Uses the OpenAI Responses API when your Azure deployment supports it. Turn off for legacy Chat Completions compatibility if your endpoint or deployment does not support the Responses API. |
Google Gemini¶
Google Gemini uses the Generative Language API (AI Studio), which authenticates with a single API key. Both Gemini and Vertex AI route through Bifrost, the plugin’s unified LLM gateway. If you need enterprise GCP authentication, project/region scoping, or VPC-SC, use Google Vertex AI instead.
When to choose Gemini vs. Vertex AI¶
Use Google Gemini when… |
Use Google Vertex AI when… |
|---|---|
You want the simplest setup — single API key from AI Studio |
You need GCP-scoped billing, IAM, or audit logging |
You’re testing models or running a small team |
You need region pinning, VPC-SC, or private connectivity |
You don’t have a GCP project |
You already have a GCP project and want to centralize spend |
Fine to call |
You need Anthropic Claude models served through Vertex |
Setup¶
Sign in to Google AI Studio and create an API key at aistudio.google.com/apikey.
In Mattermost, open System Console > Agents > LLM Services and add a new service (or open an existing Gemini service).
Select Google Gemini in the Service dropdown.
Paste your AI Studio key into the API Key field.
Enter a model identifier in the Default Model field (for example,
gemini-2.5-proorgemini-2.5-flash). Use Google’s model catalog to confirm the exact ID.Save the service. There is no API URL field in the System Console for the Gemini service type. The underlying
apiURLconfig field is accepted by the API and forwarded to Bifrost as the base URL if non-empty, enabling egress-proxy use cases for operators who configure services programmatically.
Configuration Options¶
Setting |
Required |
Description |
|---|---|---|
API Key |
Yes |
Your Gemini API key from AI Studio. Stored as a secret. |
Default Model |
Yes |
The model to use by default (see Gemini model documentation). Common choices: |
Input Token Limit |
No |
Optional override for the maximum input context size, in tokens. Leave blank to use the model’s default. |
Output Token Limit |
No |
Optional override for the maximum output ( |
The Send User ID and Use Responses API toggles are not exposed for the Google Gemini service type. Bifrost automatically switches to the Responses API path when you enable a native Google tool or when an agent or feature requires native web search; in all other cases the Chat path is used.
Reasoning (thinking) and model-version mapping¶
Gemini supports provider-native reasoning (“thinking”) through Bifrost. Enable Reasoning on the agent (System Console > Agents > select agent > Config tab) to turn it on. The configuration controls behave differently across model generations:
Model generation |
What Thinking Budget does |
What Reasoning Effort does |
|---|---|---|
Gemini 2.5 (Pro, Flash, …) |
Sets |
Translated into an estimated |
Gemini 3.0+ |
Sets |
Sets |
When both Thinking Budget and Reasoning Effort are set, the explicit thinking budget wins for both Chat and Responses API paths. The default effort when nothing is set is "medium".
Reasoning works on both the Chat Completions and Responses API paths; on the Responses path Bifrost also enables a reasoning summary so the provider streams reasoning text back as reasoning_summary events.
Bifrost detects thinking support by model name: any model containing ‘thinking’, any gemini-2.5-* model, or any Gemini 3.0+ model. Earlier 2.0 non-thinking models are silently skipped — thinkingConfig is not sent for them.
Bifrost uses https://generativelanguage.googleapis.com/v1beta. Egress proxies must whitelist paths starting with /v1beta/models/.
Native Google tools¶
Native Google tools are off by default for Gemini, matching the same posture as Cohere, Mistral, and Bedrock. Enable them per agent in the Config tab under Native Google Tools.
Tool |
What it does |
Notes |
|---|---|---|
Web Search |
Grounds responses in Google Search results (web search + citations) via Bifrost’s Responses API. |
When enabled, Bifrost auto-switches the request to the Responses API path. The agent’s other function tools and MCP tools continue to work. Grounding citations are normalized to the same |
Other native tools advertised by OpenAI (file search, code interpreter) are not available in the System Console for Google providers — only Web Search is offered.
If a feature surface uses NativeWebSearchAllowed at request time (for example, the in-product web-search shortcut), Bifrost adds web search to the Gemini Responses request even if the agent has not explicitly checked Web Search under Native Google Tools. This is how the plugin lets a single feature toggle web search on without forcing every agent to enable it manually.
Example service configuration¶
# System Console > Agents > LLM Services > Add Service
Service: Google Gemini
API Key: ABcDef... # from https://aistudio.google.com/apikey
Default Model: gemini-2.5-pro
Output Token Limit: 8192 # optional
# Per-agent (System Console > Agents > <agent> > Config)
Reasoning: Enabled
Thinking Budget (tokens): 4096 # optional; wins over effort
Reasoning Effort: medium # used when budget is blank
Native Google Tools:
Web Search: Enabled # off by default
Google Vertex AI¶
Vertex AI provides access to Gemini and other Google models through Google Cloud’s enterprise AI platform, with project-scoped billing, regional deployment, and IAM-based access control. Like the direct Gemini integration, Vertex AI is routed through Bifrost.
Prerequisites¶
Before configuring the service in Mattermost, complete this in GCP:
Have or create a Google Cloud project. Note the Project ID (slug, e.g.,
my-project-123) and the Project Number (numeric, e.g.,123456789012).Enable the Vertex AI API for that project:
gcloud services enable aiplatform.googleapis.com --project <PROJECT_ID>or the equivalent in the GCP Console.Choose a region that has the Vertex models you need (for example,
us-central1,us-east5,europe-west4). Model availability varies by region — check Vertex AI model availability before committing.Decide your authentication mode — see below.
Authentication¶
The plugin supports two authentication modes via Bifrost:
Application Default Credentials (ADC) — recommended when the plugin runs on GCP (GKE, GCE, Cloud Run) with an attached service account, or when the host’s
GOOGLE_APPLICATION_CREDENTIALSenvironment variable points at a service-account key file. Leave the Service Account JSON field blank to use ADC.Service Account JSON — paste the full contents of a service-account key JSON into the Service Account JSON field. The plugin validates that the field contains valid JSON before saving. The service account needs
roles/aiplatform.user(or a custom role with theaiplatform.endpoints.predictpermission) in your project.
To create a service-account key for the JSON path:
In the GCP Console, go to IAM & Admin > Service Accounts and create a service account (or pick an existing one).
Grant the account
roles/aiplatform.useron the project.Open the service account, go to Keys > Add Key > Create new key, choose JSON, and download the file.
Open the downloaded file, copy its full contents (including braces), and paste into the Service Account JSON field in Mattermost. Treat this file as a secret — anyone with it can call Vertex against your project.
Bifrost stores the JSON as a credential. When Service Account JSON is empty, Bifrost falls back to ADC at request time — no key material is held in plugin config.
Setup¶
Complete the prerequisites above.
In Mattermost, open System Console > Agents > LLM Services and add a new service.
Select Google Vertex AI in the Service dropdown.
Fill in GCP Project ID, GCP Region, and (optionally) GCP Project Number.
For ADC: leave Service Account JSON blank. For key-based auth: paste the full JSON.
Enter a model identifier in Default Model (for example,
gemini-2.5-pro). Use the Vertex AI model catalog to confirm the exact Vertex model ID for your region.Save the service.
Configuration Options¶
Setting |
Required |
Description |
|---|---|---|
GCP Project ID |
Yes |
Your Google Cloud project ID slug (e.g., |
GCP Project Number |
No |
Numeric project number (e.g., |
GCP Region |
Yes |
Vertex AI region (e.g., |
Service Account JSON |
No |
Full service-account key JSON. Validated as JSON on save. Leave blank to use ADC or an attached IAM role. |
Default Model |
Yes |
The Vertex model ID to use (see Vertex AI model documentation). |
Input Token Limit |
No |
Optional override for the maximum input context size, in tokens. |
Output Token Limit |
No |
Optional override for the maximum output ( |
The API Key, Send User ID, and Use Responses API toggles do not apply to the Google Vertex AI service type. The Responses API path is auto-enabled when a native Google tool is in use; otherwise the Chat path is used.
Reasoning (thinking) and model-version mapping¶
For Gemini models running on Vertex AI, reasoning is configured the same way as direct Gemini and follows the same model-version mapping (Gemini 2.5 vs. 3.0+). Enable Reasoning on the agent and set either Thinking Budget (tokens) or Reasoning Effort (minimal / low / medium / high). When both are set, the explicit budget wins.
Anthropic Claude on Vertex. When you select an Anthropic model ID (for example,
claude-sonnet-4-6@20260101) on a Vertex service, the agent uses Anthropic-style extended thinking instead ofthinkingConfig. The Thinking Budget field still applies; the Reasoning Effort selector is ignored. Model IDs follow the format{claude-model-name}@{YYYYMMDD}where the date is the Anthropic snapshot version. Check the Anthropic on Vertex AI documentation for current versions.
Native Google tools¶
Native Google tools are off by default for Vertex AI, matching the same posture as Cohere, Mistral, and Bedrock. Enable per agent under Native Google Tools.
Tool |
What it does |
Notes |
|---|---|---|
Web Search |
Grounds responses with Google Search via the Vertex Responses API. |
When enabled, Bifrost switches to the Responses API. Citations are returned alongside the response. If requests fail with grounding enabled, confirm the Vertex AI API is enabled, the selected model supports grounding, and your Google Cloud project satisfies Vertex AI grounding prerequisites. |
OpenAI-style file_search and code_interpreter are not available in the System Console for Google providers — only Web Search is offered.
Example service configuration¶
# System Console > Agents > LLM Services > Add Service
Service: Google Vertex AI
GCP Project ID: my-project-123
GCP Project Number: 123456789012 # optional
GCP Region: us-central1
Service Account JSON: "" # blank = ADC
Default Model: gemini-2.5-pro
Output Token Limit: 8192 # optional
# Per-agent (System Console > Agents > <agent> > Config)
Reasoning: Enabled
Thinking Budget (tokens): 4096 # optional; wins over effort on Gemini
Reasoning Effort: medium # mapped to thinkingLevel on Gemini 3.0+
Native Google Tools:
Web Search: Enabled # off by default
Troubleshooting¶
Saving fails with “invalid service account JSON”. The plugin validates that the Service Account JSON field contains valid JSON before saving. Re-copy the full contents of the key file, including the surrounding
{ }, and check there are no truncated or escaped characters.Requests fail with
PERMISSION_DENIED. Confirm the service account hasroles/aiplatform.useron the project, and that the Vertex AI API is enabled. For ADC deployments, confirm the bound principal (workload identity, instance SA, etc.) has the same role.Model not found in region. Vertex model IDs are region-scoped. Check the model is available in your GCP Region, or switch to a region that has it.
Web search returns no citations. Confirm Web Search is checked under Native Google Tools for the agent, verify the selected model supports grounding, and make sure your Google Cloud project meets the current Vertex AI grounding prerequisites.