Release Notes

v2.4.0 `(UPCOMING - second half of February)`

Breaking Changes ⚠️

Removed support for API keys in config

Features

Full Entra ID integration
- Ability to import groups in Admin portal
- Ability to authorize via user Entra ID OID
- Sample client application using Entra ID SSO for authentication
Enhanced token pricing support
- Full support for cached tokens pricing for Azure, AWS, OpenAI LLMs
- Correct support for image generation tokens pricing
Admin portal improvements
- Limit model access per project
- Disallow editing default spend limits when group spend limit is set
- Support for importing Entra ID groups
- Better warning message when deleting project with API keys
- Allow deletion of expired API keys
Swagger documentation
- Samples for LLM requests
- Common error responses
Documentation
- Improved customer documentation with better start guide for LLMGW users.

v2.3.0 `(26.1.2026)`

Breaking Changes ⚠️

Remove support for service bus integration - Configuration is no longer propagated from storage via service bus. In case you update configuration, container must be restarted.
LLMGW environment variables type change - The following variables were changed from *_NAME style to concrete values:
- SESSION_SECRET_KEY_NAME → SESSION_SECRET_KEY
- ADMIN_API_KEY_NAME → ADMIN_API_KEY
- DB_CONNECTION_STRING_SECRET_NAME → DB_CONNECTION_STRING_SECRET
- ADMIN_SSO_AUTH_CREDENTIAL_NAME → ADMIN_SSO_AUTH_CREDENTIAL
- LLMGW_API_OIDC_CLIENT_SECRET_NAME → LLMGW_API_OIDC_CLIENT_SECRET
These will no longer be retrieved dynamically from a secrets manager by the application code, but should be provided at deployment time.
New/updated variables - see envconfig
- ADMIN_PORTAL_ variables to configure admin portal
- CONFIG_SOURCES, STORAGE and AZURE_BLOB_STORAGE_ACCOUNT_URL - to configure config files location, please review environment variable documentation for updates
- SECRET_STORE to configure secret store
Spend limits per groups - When a spend limit is configured for a group in the Admin portal, only those groups with configured limits have access and no default limit applies.

Features

AWS deployment support - LLMGW can now be deployed with AWS services (documentation in progress)
Native Bedrock models - Use AwsBedrockNative models for native Bedrock integration (e.g., via Claude Code)
Entra ID integration (Phase 1)
- Requests can now be authorized with Entra ID SSO token
- Endpoints to retrieve Entra ID available groups
Enhanced spend limits - Project access can be limited to specific groups. Default user limit only applies if no project-user limit is set.
Admin portal improvements
- Configure Admin portal branding (logo, colors)
- Alphabetic ordering in filters for API keys and user-based tokens screens.
- Better UX for spend limits
- Better details in audit log (shows name of deleted entity)
Improved security - New NGINX security headers and app-based host header validation. Set ALLOWED_HOSTS for LLMGW container and ENABLE_HSTS for Frontend container. See LLMGW env vars reference and Frontend env vars reference for details.
Flexible NGINX configuration - NGINX now accepts additional PROXY_PORT and BACKEND_PROXY_PROTO environment variables

Improvements

OpenAI /responses API fixes
- Support for reasoning responses/tokens
- Support for custom named models on backend
Better error messages - Improved error messages for invalid request entities or misconfigured spend limits

Grafana

Dashboard versioning - Better version management for Grafana dashboards
Report data setup - Proper versioning for Dockerfile.reporting-data-setup following the LLMGW versioning
Cached tokens - Dashboards to display cached tokens used by LLM and approximate spend savings

v2.2.0 `(10.11.2025)`

Features

Responses api add support for openai /responses endpoint. See details in OpenAI client
Specific limit priority specific limit (e.g. user limit on project) always wins over default entity limit
Otel metric emitter config allow to configure how often metrics are pushed to otel via OTEL_PERIODIC_EXPORT_INTERVAL_MILLIS

Bug fixes

Admin portal fixes show correct number of active tokens, better default ordering
Azure key vault handle closing ssl connections properly

Grafana

Fixes and improvements to grafana dashboards
NOTE: in order to fix data aggregation for projects reporting-data-setup latest image must be pulled and run again

v2.0.5 `(19.10.2025)`

Features

Grafana views container job template to create psql views for reporting

Improvements

Key vault caching better caching for key vault secrets
Admin portal improvements logout, sorting, currency indicator
Logging improvements fix exporting logs to otel, simplify logging

v2.0.1

Features

New database/backend model for managing entities, API keys, and user-based tokens
New spend entity model enabling spend limit management across entities
Admin portal UI for managing entities, spend limits, API keys, and user tokens
SSO support for Admin Portal endpoints (see documentation for environment variables)
AWS Bedrock support via boto3 integration
New embedding model - Added LLModelType (AzureOpenAIEmbedding) with cost computation support

Improvements

Enhanced Grafana setup
Streamlined configuration and settings
Improved Docker setup
Refactored token model

Breaking Changes

⚠️ Migration Required

ENVIRONMENT variable is now mandatory - Allowed values: local, dev, test, prd
Removed ALLOW_NON_CONFIGURED_ENTITIES - Boolean variable no longer supported
Spend limits configuration - No longer configurable via configuration file; manual migration required
Default entity types - Must be defined in configuration file
API keys deprecation - Configuration file support deprecated; manual migration to new model required
Ensure to configure keyVault secret cache time via AZURE_ENV_SERVICE_CACHE_EXPIRATION_SECONDS for expected admin portal behaviour

v1.7.5 `(18.08.2025)`

Features:

Add support for EntraID token authentication

Bugs:

Remove model from Custom Header Mapping

v1.7.4 `(06.08.2025)`

refactoring auth layer
add index for llmgw_spend_metrics to improve grafana performance

v1.7.3 `(31.07.2025)`

new endpoints for managing entities configuration
remove support for id regex and soft spend limit in configuration
scripts for generation openapi spec

v1.7.2 `(22.07.2025)`

security fix: sqlite version bumb

v1.7.1 `(22.07.2025)`

More models graphs in grafana
Add project daily usage to grafana
Fix tracking response codes for errors

v1.7.0 `(09.07.2025)`

Add support for user based tokens

v1.6.7 `(07.07.2025)`

Better config validation
Retain spend metrics longer
Fix token usage count for azure models

v1.6.5 `(02.07.2025)`

Automatic cert refresh
Support for project/user API keys

v1.6.4 `(23.06.2025)`

Fix keyvault caching
Fix claude streaming
Fix token usage count for Bedrock models

v1.6.2 `(19.06.2025)`

Split grafana dashboards and improve

v1.6.0 `(31.03.2025)`

Added support for alternative names for spend entities.
Added regex support for validating spend entity names.
Filtered out NonConfiguredSpendEntity objects from the database metadata.

v1.5.1 `(31.03.2025)`

All user spend entities can now have a default spend limit
Multiple spend entities are accepted in the request, e.g., llmgw-group=group_1,group_2

v1.5.0 `(31.03.2025)`

AWS Bedrock Claude support
Text to speech model support

v1.4.0 `(07.03.2025)`

Use loguru for logging.
Better model response status handling.
Unify currencies to Dollars.
Docs update.
Grafana dashboard update.
Add checking both HTTP headers and query parameters for spend entities.
DeepSeek model support.
AWS Bedrock support.

v1.3.0 `(22.12.2024)`

OTEL config update.
Grafana Dashboard tuning.
User logging_utils everywhere.
Poetry update.
Grafana SSO login and provisioning update.
Spend metrics update.
Gpt4 version update.
New llmgw headers and dynamic spend entities.
Flow-id propagation.

v1.2.0 `(26.11.2024)`

Metrics updates and fixes.
Grafana dashboards.
Load balancing based on rewards.
Non-text models support.
Provisioned Throughput Units (PTU) support.

v1.1.2 `(29.10.2024)`

Bug fixing.
Metrics refactoring.

v1.1.1 `(24.10.2024)`

Smoke test update.

v1.1.0 `(24.10.2024)`

Response time modeling experiments.
Bug fixing.
Streaming metrics.
Deployment pipeline update.
Doc update.
Model Heartbeat.
Spend limit metrics.
Docker image vulnerability scan.

v1.0.0 `(14.10.2024)`

Initial release of the LLM Gateway (LLMGW).

Release Notes

v2.4.0 (UPCOMING - second half of February)

Breaking Changes ⚠️

Features

v2.3.0 (26.1.2026)

Breaking Changes ⚠️

Features

Improvements

Grafana

v2.2.0 (10.11.2025)

Features

Bug fixes

Grafana

v2.0.5 (19.10.2025)

Features

Improvements

v2.0.1

Features

Improvements

Breaking Changes

v1.7.5 (18.08.2025)

v1.7.4 (06.08.2025)

v1.7.3 (31.07.2025)

v1.7.2 (22.07.2025)

v1.7.1 (22.07.2025)

v1.7.0 (09.07.2025)

v1.6.7 (07.07.2025)

v1.6.5 (02.07.2025)

v1.6.4 (23.06.2025)

v1.6.2 (19.06.2025)

v1.6.0 (31.03.2025)

v1.5.1 (31.03.2025)

v1.5.0 (31.03.2025)

v1.4.0 (07.03.2025)

v1.3.0 (22.12.2024)

v1.2.0 (26.11.2024)

v1.1.2 (29.10.2024)

v1.1.1 (24.10.2024)

v1.1.0 (24.10.2024)

v1.0.0 (14.10.2024)

v2.4.0 `(UPCOMING - second half of February)`

v2.3.0 `(26.1.2026)`

v2.2.0 `(10.11.2025)`

v2.0.5 `(19.10.2025)`

v1.7.5 `(18.08.2025)`

v1.7.4 `(06.08.2025)`

v1.7.3 `(31.07.2025)`

v1.7.2 `(22.07.2025)`

v1.7.1 `(22.07.2025)`

v1.7.0 `(09.07.2025)`

v1.6.7 `(07.07.2025)`

v1.6.5 `(02.07.2025)`

v1.6.4 `(23.06.2025)`

v1.6.2 `(19.06.2025)`

v1.6.0 `(31.03.2025)`

v1.5.1 `(31.03.2025)`

v1.5.0 `(31.03.2025)`

v1.4.0 `(07.03.2025)`

v1.3.0 `(22.12.2024)`

v1.2.0 `(26.11.2024)`

v1.1.2 `(29.10.2024)`

v1.1.1 `(24.10.2024)`

v1.1.0 `(24.10.2024)`

v1.0.0 `(14.10.2024)`