Response Headers

Adastra LLMGW includes custom response headers that provide valuable metadata about each request. All LLMGW-specific headers are prefixed with x-llmgw- to distinguish them from standard HTTP headers.

Available Headers

`x-llmgw-request-id`

Type: String (UUID)
Description: A unique identifier for the request that can be used for tracking and debugging purposes.
Example: 3cb26481-d869-4923-8093-3feb92f8d9fc

`x-llmgw-model-id`

Type: String
Description: The actual model ID that was used to process the request. This may differ from the requested model if load balancing or failover occurred.
Example: azure-us-gpt35

`x-llmgw-attempts`

Type: Integer (as string)
Description: The number of attempts made to get a successful response. This is useful for understanding retry behavior and service reliability.
Example: 2

`x-llmgw-cost`

Type: Decimal (as string)
Description: The cost of the request in cents. This helps track spending and usage costs.
Example: 4e-05 (0.00004 cents)

`x-llmgw-remaining-limits`

Type: String (formatted)
Description: Information about remaining usage limits for the requesting entity. The format is {entity_name}/{limit_id}/{remaining_amount}. Multiple limits may be returned as separate headers.
Example: user/daily-tokens/9500

Accessing Headers

With OpenAI Client

To access response headers when using the OpenAI client, use the with_raw_response method:

raw_response = client.chat.with_raw_response.completions.create(
    model="gpt-35-turbo",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Extract all LLMGW headers
llmgw_headers = {
    key: value 
    for key, value in raw_response.headers.items() 
    if key.startswith('x-llmgw')
}
print(json.dumps(llmgw_headers, indent=2))

With Langchain

When using Langchain, ensure you set include_response_headers=True in your model configuration:

model = AzureChatOpenAI(
    # ... other configuration
    include_response_headers=True,
)

response = model.invoke("Hello!")

# Access headers from response metadata
if hasattr(response, 'response_metadata'):
    headers = response.response_metadata.get('headers', {})
    llmgw_headers = {
        key: value 
        for key, value in headers.items() 
        if key.startswith('x-llmgw')
    }
    print(json.dumps(llmgw_headers, indent=2))

Example Response Headers

A typical response might include headers like this:

{
  "x-llmgw-request-id": "3cb26481-d869-4923-8093-3feb92f8d9fc",
  "x-llmgw-model-id": "azure-us-gpt35",
  "x-llmgw-attempts": "1",
  "x-llmgw-cost": "0.00012",
  "x-llmgw-remaining-limits": "user/daily-requests/4999"
}

Use Cases

These headers are particularly useful for:

Cost Tracking: Monitor spending using the x-llmgw-cost header
Debugging: Use x-llmgw-request-id to trace specific requests in logs
Load Balancing Analysis: Check x-llmgw-model-id to see which model actually processed your request
Reliability Monitoring: Monitor x-llmgw-attempts to understand retry patterns
Usage Management: Track remaining limits with x-llmgw-remaining-limits

Important Notes

Headers are returned for both successful and failed requests (when possible)
The x-llmgw-remaining-limits header may appear multiple times if multiple limits apply
All cost values are in cents (USD)
Request IDs are unique across all requests and can be used for support inquiries