Models

llms

The llms section contains the configuration for the model. Each model is defined by the unique key (e.g. azure-uk-gpt35), and the following properties:

deployment_name: The name of the deployment in the model provider’s system.
source: The type of the model provider. List of supported providers can be found here.
type: Type of the source provider (text, image, video, …). List of supported providers and types can be found here.
url: The URL of the model provider
api_key: The API key for the model provider. For more information on how to store secrets, see api keys section.
cost_profile: The cost profile for the model. The cost profile defines the cost of the model. The cost profile is defined in the cost_profiles section.
priority (optional): Priorities allow you to prefer certain models over others in the selected group. Check load balancing section for more details.

llm_groups

The llm_groups section contains the configuration for the model groups. Each group is defined by the unique key (e.g. gpt35), and the following properties:

models: The list of models that belong to the group. Models must be defined in the llms section.

Groups are used by end users to select the model they want to use. The group is then used to select the model from the list of models. For example following code will select the model from the gpt35 group:

client = create_azure_openai_client()
completion = client.chat.completions.create(
    model="gpt35",
    messages=[
      {
          "role": "user",
          "content": "Who is General Hammond from Stargate?",
      },
    ],
)
print(completion.choices[0].message.content)

cost_profiles

The cost_profiles section contains the configuration for the cost profiles. Each cost profile is defined by the unique key (e.g. azure-gpt35-turbo), and the following properties:

id: The unique identifier of the cost profile.
usd_per_1k_input_tokens: The cost of 1000 input tokens in USD.
usd_per_1k_output_tokens: The cost of 1000 output tokens in USD.
usd_per_1k_cached_input_tokens: Optional, the cost of 1000 cached output tokens in USD (tracked only for monitoring purposes at the moment)

Example of models configuration file:

llms:
  azure-uk-gpt-4.1:
    deployment_name: gpt-4.1
    type: AzureOpenAIText
    source: AzureOpenAI
    url: "https://bss-llm-gateway-model-models-cognitive-account-uk.openai.azure.com"
    api_key: "azure:https://bss-llm-gateway-test.vault.azure.net/secrets/bss-llm-gateway-model-uk"
    cost_profile: azure-gpt-4.1
    api_version: "2025-01-01-preview"

  azure-sweden-gpt-4.1:
    deployment_name: gpt-4.1
    type: AzureOpenAIText
    source: AzureOpenAI
    url: "https://bss-llm-gateway-model-models-cognitive-account-uk.openai.azure.com"
    api_key: "azure:https://bss-llm-gateway-test.vault.azure.net/secrets/bss-llm-gateway-model-sweden"
    cost_profile: azure-gpt-4.1
    api_version: "2025-01-01-preview"

  azure-us-text-embedding-3-large:
    deployment_name: text-embedding-3-large
    type: AzureOpenAIText
    source: AzureOpenAI
    url: "https://models-cognitive-account-eastus2.openai.azure.com"
    api_key: "env:AZURE_OPENAI_API_KEY"
    cost_profile: azure-text-embedding-3-large

llm_groups:
  gpt-4.1:
    models:
      - azure-uk-gpt-4.1
      - azure-sweden-gpt-4.1
  text-embedding-3-large:
    models:
      - azure-us-text-embedding-3-large

cost_profiles:
  - id: azure-gpt-4.1
    usd_per_1k_input_tokens: 0.0022
    usd_per_1k_output_tokens: 0.0088
    usd_per_1k_cached_input_tokens: 0.0011  # 50% of input price
  - id: azure-text-embedding-3-large
    usd_per_1k_input_tokens: 0.00013
    usd_per_1k_output_tokens: 0.00013