AWS Bedrock

This guide shows how to use the AWS Bedrock client with Adastra LLMGW. It allows you to access models like Claude from Anthropic.

Currently only InvokeModel and InvokeModelWithResponseStream Bedrock APIs are supported.

Setup

We need to have boto3 package installed which provides AWS SDK functionality for Python. Install it via:

$ pip install boto3

Set your endpoint and API key to use the AWS Bedrock client:

LLMGW_API_ENDPOINT = "https://<llmgw-deployment-url>/aws-bedrock/"
LLMGW_API_KEY = <YOUR_LLMGW_TOKEN>

Next, we need to create a client for the AWS Bedrock service.

Unfortunately currently there is no simple solution to set custom headers on every boto3 Bedrock operation. In order to include, for example, Authentication, user and project headers, you should register custom boto3 client events:

import json
import boto3
from botocore import UNSIGNED
from botocore.config import Config

def add_boto3_llmgw_headers(model, params, **kwargs):
    params.setdefault("headers", {})
    params["headers"]["Authorization"] = f"Bearer {LLMGW_API_KEY}"
    params["headers"]["llmgw-project"] = "shop-assistant"
    params["headers"]["llmgw-user"] = "alice"

client = boto3.client(
    'bedrock-runtime',
    endpoint_url=LLMGW_API_ENDPOINT,
    aws_access_key_id="",
    config=Config(
        signature_version=UNSIGNED,
        retries={"max_attempts": 2},
    ),
    region_name="eu-south-2",
)
# sets headers for `.invoke_model()` method
client.meta.events.register(
    "before-call.bedrock-runtime.InvokeModel",
    add_api_key,
)
# sets headers for `.invoke_model_with_response_stream()` method
client.meta.events.register(
    "before-call.bedrock-runtime.InvokeModelWithResponseStream",
    add_api_key,
)

Making Requests

Now, let’s make a request using the client to generate a completion with Claude.

# Prepare the request body for Claude
request_body = json.dumps({
    "max_tokens": 150,
    "messages": [
        {
            "role": "user",
            "content": [{"type": "text", "text": "What is the best number in the world according to Sheldon from The Big Bang Theory?"}]
        }
    ]
})

response = client.invoke_model(
    modelId="anthropic.claude-sonnet-4-5-20250929-v1:0",
    body=request_body,
    contentType="application/json",
    accept="application/json",
)

# Parse the response
response_body = json.loads(response['body'].read())
print(response_body['content'][0]['text'])

In this example:

  • For list of modelId parameters see config.yaml where you look for deployment_name. It refers to model group configured in LLMGW.
  • The request body format follows Anthropic’s Claude API specification for Bedrock.
  • The event handler registration is required before making any requests to ensure proper authentication and metadata.

Accessing Response Metadata

For more detailed information, such as request cost and model information, you can inspect the response metadata in the headers. LLMGW includes custom headers prefixed with x-llmgw.

response_metadata = response.get('ResponseMetadata', {})
headers = response_metadata.get('HTTPHeaders', {})

llmgw_headers = {key: value for key, value in headers.items() if key.startswith('x-llmgw')}
print(json.dumps(llmgw_headers, indent=2))

The output may look like this:

{
    'x-llmgw-cost': '4e-05',
    'x-llmgw-request-id': '3cb26481-d869-4923-8093-3feb92f8d9fc',
    'x-llmgw-model-id': 'azure-us-gpt35',
    'x-llmgw-attempts': '2'
}
  • x-llmgw-cost - The cost of the request in cents.
  • x-llmgw-request-id - The request id used for the request.
  • x-llmgw-model-id - The model id used for the request.
  • x-llmgw-attempts - The number of attempts made to get the response.

Streaming Responses

The standard approach blocks until the entire response is ready, which may take time for longer responses. An alternative way would be to access the completion in a streaming mode, rendering pieces of the response as soon as they get generated, as we can see for example in the ChatGPT user interface.

For streaming functionality, we recommend using the OpenAI or Azure OpenAI clients which have full streaming support with LLMGW.

Please see the AWS Bedrock documentation for more information on the streaming mode and different model formats.