Release Notes

v2.4.0 (UPCOMING - second half of February)

Breaking Changes ⚠️

  • Removed support for API keys in config

Features

  • Full Entra ID integration
    • Ability to import groups in Admin portal
    • Ability to authorize via user Entra ID OID
    • Sample client application using Entra ID SSO for authentication
  • Enhanced token pricing support
    • Full support for cached tokens pricing for Azure, AWS, OpenAI LLMs
    • Correct support for image generation tokens pricing
  • Admin portal improvements
    • Limit model access per project
    • Disallow editing default spend limits when group spend limit is set
    • Support for importing Entra ID groups
    • Better warning message when deleting project with API keys
    • Allow deletion of expired API keys
  • Swagger documentation
    • Samples for LLM requests
    • Common error responses
  • Documentation
    • Improved customer documentation with better start guide for LLMGW users.

v2.3.0 (26.1.2026)

Breaking Changes ⚠️

  • Remove support for service bus integration - Configuration is no longer propagated from storage via service bus. In case you update configuration, container must be restarted.

  • LLMGW environment variables type change - The following variables were changed from *_NAME style to concrete values:

    • SESSION_SECRET_KEY_NAMESESSION_SECRET_KEY
    • ADMIN_API_KEY_NAMEADMIN_API_KEY
    • DB_CONNECTION_STRING_SECRET_NAMEDB_CONNECTION_STRING_SECRET
    • ADMIN_SSO_AUTH_CREDENTIAL_NAMEADMIN_SSO_AUTH_CREDENTIAL
    • LLMGW_API_OIDC_CLIENT_SECRET_NAMELLMGW_API_OIDC_CLIENT_SECRET

    These will no longer be retrieved dynamically from a secrets manager by the application code, but should be provided at deployment time.

  • New/updated variables - see envconfig

    • ADMIN_PORTAL_ variables to configure admin portal
    • CONFIG_SOURCES, STORAGE and AZURE_BLOB_STORAGE_ACCOUNT_URL - to configure config files location, please review environment variable documentation for updates
    • SECRET_STORE to configure secret store
  • Spend limits per groups - When a spend limit is configured for a group in the Admin portal, only those groups with configured limits have access and no default limit applies.

Features

  • AWS deployment support - LLMGW can now be deployed with AWS services (documentation in progress)
  • Native Bedrock models - Use AwsBedrockNative models for native Bedrock integration (e.g., via Claude Code)
  • Entra ID integration (Phase 1)
    • Requests can now be authorized with Entra ID SSO token
    • Endpoints to retrieve Entra ID available groups
  • Enhanced spend limits - Project access can be limited to specific groups. Default user limit only applies if no project-user limit is set.
  • Admin portal improvements
    • Configure Admin portal branding (logo, colors)
    • Alphabetic ordering in filters for API keys and user-based tokens screens.
    • Better UX for spend limits
    • Better details in audit log (shows name of deleted entity)
  • Improved security - New NGINX security headers and app-based host header validation. Set ALLOWED_HOSTS for LLMGW container and ENABLE_HSTS for Frontend container. See LLMGW env vars reference and Frontend env vars reference for details.
  • Flexible NGINX configuration - NGINX now accepts additional PROXY_PORT and BACKEND_PROXY_PROTO environment variables

Improvements

  • OpenAI /responses API fixes
    • Support for reasoning responses/tokens
    • Support for custom named models on backend
  • Better error messages - Improved error messages for invalid request entities or misconfigured spend limits

Grafana

  • Dashboard versioning - Better version management for Grafana dashboards
  • Report data setup - Proper versioning for Dockerfile.reporting-data-setup following the LLMGW versioning
  • Cached tokens - Dashboards to display cached tokens used by LLM and approximate spend savings

v2.2.0 (10.11.2025)

Features

  • Responses api add support for openai /responses endpoint. See details in OpenAI client
  • Specific limit priority specific limit (e.g. user limit on project) always wins over default entity limit
  • Otel metric emitter config allow to configure how often metrics are pushed to otel via OTEL_PERIODIC_EXPORT_INTERVAL_MILLIS

Bug fixes

  • Admin portal fixes show correct number of active tokens, better default ordering
  • Azure key vault handle closing ssl connections properly

Grafana

  • Fixes and improvements to grafana dashboards
  • NOTE: in order to fix data aggregation for projects reporting-data-setup latest image must be pulled and run again

v2.0.5 (19.10.2025)

Features

  • Grafana views container job template to create psql views for reporting

Improvements

  • Key vault caching better caching for key vault secrets
  • Admin portal improvements logout, sorting, currency indicator
  • Logging improvements fix exporting logs to otel, simplify logging

v2.0.1

Features

  • New database/backend model for managing entities, API keys, and user-based tokens
  • New spend entity model enabling spend limit management across entities
  • Admin portal UI for managing entities, spend limits, API keys, and user tokens
  • SSO support for Admin Portal endpoints (see documentation for environment variables)
  • AWS Bedrock support via boto3 integration
  • New embedding model - Added LLModelType (AzureOpenAIEmbedding) with cost computation support

Improvements

  • Enhanced Grafana setup
  • Streamlined configuration and settings
  • Improved Docker setup
  • Refactored token model

Breaking Changes

⚠️ Migration Required

  • ENVIRONMENT variable is now mandatory - Allowed values: local, dev, test, prd
  • Removed ALLOW_NON_CONFIGURED_ENTITIES - Boolean variable no longer supported
  • Spend limits configuration - No longer configurable via configuration file; manual migration required
  • Default entity types - Must be defined in configuration file
  • API keys deprecation - Configuration file support deprecated; manual migration to new model required
  • Ensure to configure keyVault secret cache time via AZURE_ENV_SERVICE_CACHE_EXPIRATION_SECONDS for expected admin portal behaviour

v1.7.5 (18.08.2025)

Features:

  • Add support for EntraID token authentication

Bugs:

  • Remove model from Custom Header Mapping

v1.7.4 (06.08.2025)

  • refactoring auth layer
  • add index for llmgw_spend_metrics to improve grafana performance

v1.7.3 (31.07.2025)

  • new endpoints for managing entities configuration
  • remove support for id regex and soft spend limit in configuration
  • scripts for generation openapi spec

v1.7.2 (22.07.2025)

  • security fix: sqlite version bumb

v1.7.1 (22.07.2025)

  • More models graphs in grafana
  • Add project daily usage to grafana
  • Fix tracking response codes for errors

v1.7.0 (09.07.2025)

  • Add support for user based tokens

v1.6.7 (07.07.2025)

  • Better config validation
  • Retain spend metrics longer
  • Fix token usage count for azure models

v1.6.5 (02.07.2025)

  • Automatic cert refresh
  • Support for project/user API keys

v1.6.4 (23.06.2025)

  • Fix keyvault caching
  • Fix claude streaming
  • Fix token usage count for Bedrock models

v1.6.2 (19.06.2025)

  • Split grafana dashboards and improve

v1.6.0 (31.03.2025)

  • Added support for alternative names for spend entities.
  • Added regex support for validating spend entity names.
  • Filtered out NonConfiguredSpendEntity objects from the database metadata.

v1.5.1 (31.03.2025)

  • All user spend entities can now have a default spend limit
  • Multiple spend entities are accepted in the request, e.g., llmgw-group=group_1,group_2

v1.5.0 (31.03.2025)

  • AWS Bedrock Claude support
  • Text to speech model support

v1.4.0 (07.03.2025)

  • Use loguru for logging.
  • Better model response status handling.
  • Unify currencies to Dollars.
  • Docs update.
  • Grafana dashboard update.
  • Add checking both HTTP headers and query parameters for spend entities.
  • DeepSeek model support.
  • AWS Bedrock support.

v1.3.0 (22.12.2024)

  • OTEL config update.
  • Grafana Dashboard tuning.
  • User logging_utils everywhere.
  • Poetry update.
  • Grafana SSO login and provisioning update.
  • Spend metrics update.
  • Gpt4 version update.
  • New llmgw headers and dynamic spend entities.
  • Flow-id propagation.

v1.2.0 (26.11.2024)

  • Metrics updates and fixes.
  • Grafana dashboards.
  • Load balancing based on rewards.
  • Non-text models support.
  • Provisioned Throughput Units (PTU) support.

v1.1.2 (29.10.2024)

  • Bug fixing.
  • Metrics refactoring.

v1.1.1 (24.10.2024)

  • Smoke test update.

v1.1.0 (24.10.2024)

  • Response time modeling experiments.
  • Bug fixing.
  • Streaming metrics.
  • Deployment pipeline update.
  • Doc update.
  • Model Heartbeat.
  • Spend limit metrics.
  • Docker image vulnerability scan.

v1.0.0 (14.10.2024)

Initial release of the LLM Gateway (LLMGW).