Release Notes
v2.4.0 (UPCOMING - second half of February)
Breaking Changes ⚠️
- Removed support for API keys in config
Features
- Full Entra ID integration
- Ability to import groups in Admin portal
- Ability to authorize via user Entra ID OID
- Sample client application using Entra ID SSO for authentication
- Enhanced token pricing support
- Full support for cached tokens pricing for Azure, AWS, OpenAI LLMs
- Correct support for image generation tokens pricing
- Admin portal improvements
- Limit model access per project
- Disallow editing default spend limits when group spend limit is set
- Support for importing Entra ID groups
- Better warning message when deleting project with API keys
- Allow deletion of expired API keys
- Swagger documentation
- Samples for LLM requests
- Common error responses
- Documentation
- Improved customer documentation with better start guide for LLMGW users.
v2.3.0 (26.1.2026)
Breaking Changes ⚠️
-
Remove support for service bus integration - Configuration is no longer propagated from storage via service bus. In case you update configuration, container must be restarted.
-
LLMGW environment variables type change - The following variables were changed from
*_NAMEstyle to concrete values:SESSION_SECRET_KEY_NAME→SESSION_SECRET_KEYADMIN_API_KEY_NAME→ADMIN_API_KEYDB_CONNECTION_STRING_SECRET_NAME→DB_CONNECTION_STRING_SECRETADMIN_SSO_AUTH_CREDENTIAL_NAME→ADMIN_SSO_AUTH_CREDENTIALLLMGW_API_OIDC_CLIENT_SECRET_NAME→LLMGW_API_OIDC_CLIENT_SECRET
These will no longer be retrieved dynamically from a secrets manager by the application code, but should be provided at deployment time.
-
New/updated variables - see envconfig
ADMIN_PORTAL_variables to configure admin portalCONFIG_SOURCES,STORAGEandAZURE_BLOB_STORAGE_ACCOUNT_URL- to configure config files location, please review environment variable documentation for updatesSECRET_STOREto configure secret store
-
Spend limits per groups - When a spend limit is configured for a group in the Admin portal, only those groups with configured limits have access and no default limit applies.
Features
- AWS deployment support - LLMGW can now be deployed with AWS services (documentation in progress)
- Native Bedrock models - Use
AwsBedrockNativemodels for native Bedrock integration (e.g., via Claude Code) - Entra ID integration (Phase 1)
- Requests can now be authorized with Entra ID SSO token
- Endpoints to retrieve Entra ID available groups
- Enhanced spend limits - Project access can be limited to specific groups. Default user limit only applies if no project-user limit is set.
- Admin portal improvements
- Configure Admin portal branding (logo, colors)
- Alphabetic ordering in filters for API keys and user-based tokens screens.
- Better UX for spend limits
- Better details in audit log (shows name of deleted entity)
- Improved security - New NGINX security headers and app-based host header validation. Set
ALLOWED_HOSTSfor LLMGW container andENABLE_HSTSfor Frontend container. See LLMGW env vars reference and Frontend env vars reference for details. - Flexible NGINX configuration - NGINX now accepts additional
PROXY_PORTandBACKEND_PROXY_PROTOenvironment variables
Improvements
- OpenAI
/responsesAPI fixes- Support for reasoning responses/tokens
- Support for custom named models on backend
- Better error messages - Improved error messages for invalid request entities or misconfigured spend limits
Grafana
- Dashboard versioning - Better version management for Grafana dashboards
- Report data setup - Proper versioning for
Dockerfile.reporting-data-setupfollowing the LLMGW versioning - Cached tokens - Dashboards to display cached tokens used by LLM and approximate spend savings
v2.2.0 (10.11.2025)
Features
- Responses api add support for openai
/responsesendpoint. See details in OpenAI client - Specific limit priority specific limit (e.g. user limit on project) always wins over default entity limit
- Otel metric emitter config allow to configure how often metrics are pushed to otel via
OTEL_PERIODIC_EXPORT_INTERVAL_MILLIS
Bug fixes
- Admin portal fixes show correct number of active tokens, better default ordering
- Azure key vault handle closing ssl connections properly
Grafana
- Fixes and improvements to grafana dashboards
- NOTE: in order to fix data aggregation for projects
reporting-data-setuplatest image must be pulled and run again
v2.0.5 (19.10.2025)
Features
- Grafana views container job template to create psql views for reporting
Improvements
- Key vault caching better caching for key vault secrets
- Admin portal improvements logout, sorting, currency indicator
- Logging improvements fix exporting logs to otel, simplify logging
v2.0.1
Features
- New database/backend model for managing entities, API keys, and user-based tokens
- New spend entity model enabling spend limit management across entities
- Admin portal UI for managing entities, spend limits, API keys, and user tokens
- SSO support for Admin Portal endpoints (see documentation for environment variables)
- AWS Bedrock support via boto3 integration
- New embedding model - Added
LLModelType(AzureOpenAIEmbedding) with cost computation support
Improvements
- Enhanced Grafana setup
- Streamlined configuration and settings
- Improved Docker setup
- Refactored token model
Breaking Changes
⚠️ Migration Required
ENVIRONMENTvariable is now mandatory - Allowed values:local,dev,test,prd- Removed
ALLOW_NON_CONFIGURED_ENTITIES- Boolean variable no longer supported - Spend limits configuration - No longer configurable via configuration file; manual migration required
- Default entity types - Must be defined in configuration file
- API keys deprecation - Configuration file support deprecated; manual migration to new model required
- Ensure to configure keyVault secret cache time via
AZURE_ENV_SERVICE_CACHE_EXPIRATION_SECONDSfor expected admin portal behaviour
v1.7.5 (18.08.2025)
Features:
- Add support for EntraID token authentication
Bugs:
- Remove model from Custom Header Mapping
v1.7.4 (06.08.2025)
- refactoring auth layer
- add index for llmgw_spend_metrics to improve grafana performance
v1.7.3 (31.07.2025)
- new endpoints for managing entities configuration
- remove support for id regex and soft spend limit in configuration
- scripts for generation openapi spec
v1.7.2 (22.07.2025)
- security fix: sqlite version bumb
v1.7.1 (22.07.2025)
- More models graphs in grafana
- Add project daily usage to grafana
- Fix tracking response codes for errors
v1.7.0 (09.07.2025)
- Add support for user based tokens
v1.6.7 (07.07.2025)
- Better config validation
- Retain spend metrics longer
- Fix token usage count for azure models
v1.6.5 (02.07.2025)
- Automatic cert refresh
- Support for project/user API keys
v1.6.4 (23.06.2025)
- Fix keyvault caching
- Fix claude streaming
- Fix token usage count for Bedrock models
v1.6.2 (19.06.2025)
- Split grafana dashboards and improve
v1.6.0 (31.03.2025)
- Added support for alternative names for spend entities.
- Added regex support for validating spend entity names.
- Filtered out NonConfiguredSpendEntity objects from the database metadata.
v1.5.1 (31.03.2025)
- All user spend entities can now have a default spend limit
- Multiple spend entities are accepted in the request, e.g.,
llmgw-group=group_1,group_2
v1.5.0 (31.03.2025)
- AWS Bedrock Claude support
- Text to speech model support
v1.4.0 (07.03.2025)
- Use loguru for logging.
- Better model response status handling.
- Unify currencies to Dollars.
- Docs update.
- Grafana dashboard update.
- Add checking both HTTP headers and query parameters for spend entities.
- DeepSeek model support.
- AWS Bedrock support.
v1.3.0 (22.12.2024)
- OTEL config update.
- Grafana Dashboard tuning.
- User logging_utils everywhere.
- Poetry update.
- Grafana SSO login and provisioning update.
- Spend metrics update.
- Gpt4 version update.
- New llmgw headers and dynamic spend entities.
- Flow-id propagation.
v1.2.0 (26.11.2024)
- Metrics updates and fixes.
- Grafana dashboards.
- Load balancing based on rewards.
- Non-text models support.
- Provisioned Throughput Units (PTU) support.
v1.1.2 (29.10.2024)
- Bug fixing.
- Metrics refactoring.
v1.1.1 (24.10.2024)
- Smoke test update.
v1.1.0 (24.10.2024)
- Response time modeling experiments.
- Bug fixing.
- Streaming metrics.
- Deployment pipeline update.
- Doc update.
- Model Heartbeat.
- Spend limit metrics.
- Docker image vulnerability scan.
v1.0.0 (14.10.2024)
Initial release of the LLM Gateway (LLMGW).