Response Headers

Adastra LLMGW includes custom response headers that provide valuable metadata about each request. All LLMGW-specific headers are prefixed with x-llmgw- to distinguish them from standard HTTP headers.

Available Headers

x-llmgw-request-id

Type: String (UUID)
Description: A unique identifier for the request that can be used for tracking and debugging purposes.
Example: 3cb26481-d869-4923-8093-3feb92f8d9fc

x-llmgw-model-id

Type: String
Description: The actual model ID that was used to process the request. This may differ from the requested model if load balancing or failover occurred.
Example: azure-us-gpt35

x-llmgw-attempts

Type: Integer (as string)
Description: The number of attempts made to get a successful response. This is useful for understanding retry behavior and service reliability.
Example: 2

x-llmgw-cost

Type: Decimal (as string)
Description: The cost of the request in cents. This helps track spending and usage costs.
Example: 4e-05 (0.00004 cents)

x-llmgw-remaining-limits

Type: String (formatted)
Description: Information about remaining usage limits for the requesting entity. The format is {entity_name}/{limit_id}/{remaining_amount}. Multiple limits may be returned as separate headers.
Example: user/daily-tokens/9500

Accessing Headers

With OpenAI Client

To access response headers when using the OpenAI client, use the with_raw_response method:

raw_response = client.chat.with_raw_response.completions.create(
    model="gpt-35-turbo",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Extract all LLMGW headers
llmgw_headers = {
    key: value 
    for key, value in raw_response.headers.items() 
    if key.startswith('x-llmgw')
}
print(json.dumps(llmgw_headers, indent=2))

With Langchain

When using Langchain, ensure you set include_response_headers=True in your model configuration:

model = AzureChatOpenAI(
    # ... other configuration
    include_response_headers=True,
)

response = model.invoke("Hello!")

# Access headers from response metadata
if hasattr(response, 'response_metadata'):
    headers = response.response_metadata.get('headers', {})
    llmgw_headers = {
        key: value 
        for key, value in headers.items() 
        if key.startswith('x-llmgw')
    }
    print(json.dumps(llmgw_headers, indent=2))

Example Response Headers

A typical response might include headers like this:

{
  "x-llmgw-request-id": "3cb26481-d869-4923-8093-3feb92f8d9fc",
  "x-llmgw-model-id": "azure-us-gpt35",
  "x-llmgw-attempts": "1",
  "x-llmgw-cost": "0.00012",
  "x-llmgw-remaining-limits": "user/daily-requests/4999"
}

Use Cases

These headers are particularly useful for:

  • Cost Tracking: Monitor spending using the x-llmgw-cost header
  • Debugging: Use x-llmgw-request-id to trace specific requests in logs
  • Load Balancing Analysis: Check x-llmgw-model-id to see which model actually processed your request
  • Reliability Monitoring: Monitor x-llmgw-attempts to understand retry patterns
  • Usage Management: Track remaining limits with x-llmgw-remaining-limits

Important Notes

  • Headers are returned for both successful and failed requests (when possible)
  • The x-llmgw-remaining-limits header may appear multiple times if multiple limits apply
  • All cost values are in cents (USD)
  • Request IDs are unique across all requests and can be used for support inquiries