OpenAI

This guide shows how to use the standard OpenAI client with Adastra LLMGW. This is useful when you want to use OpenAI’s API format without Azure-specific configurations.

Setup

We need to have openai package installed which provides a Python client for the OpenAI API. Install it via:

$ pip install openai

Set your endpoint and API key to use the OpenAI client:

LLMGW_API_ENDPOINT = "https://<llmgw-deployment-url>/openai/"
LLMGW_API_KEY = <YOUR_LLMGW_TOKEN>

Next, we need to create a client for the Azure OpenAI service.

import json
from openai import OpenAI


def create_openai_client():
    return OpenAI(
        base_url=LLMGW_API_ENDPOINT,
        api_key=LLMGW_API_KEY,
        default_headers={"llmgw-project": "shop-assistant", "llmgw-user": "alice"},
    )

Note that in order to use the newer APIs (e.g., Responses API), you should prepend v1 to the base_url. I.e., LLMGW_API_ENDPOINT = “https://<llmgw-deployment-url>/openai/v1”

The default_headers parameter allows you to associate metadata such as the project name and user with each request, which may be required based on your configuration. Check with your administrator for specific header requirements.

Making Requests

Now, let’s make a request using the client to generate a completion.

client = create_openai_client()
completion = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
      {
          "role": "user",
          "content": "What is the best number in the world according to Sheldon from The Big Bang Theory?",
      },
    ],
)
print(completion.choices[0].message.content)

In this example:

For list of model parameters see config.yaml where you look for deployment_name. It refers to model group configured in LLMGW.
The messages array contains the user input, with each message having a role (e.g., “user”) and content.

Accessing Response Metadata

For more detailed information, such as request cost and model information, you can inspect the response metadata in the headers. LLMGW includes custom headers prefixed with x-llmgw.

raw_response = client.chat.with_raw_response.completions.create(
    model="gpt-4.1"
    messages=[
        {
            "role": "user",
            "content": "What is the best number in the world according to Sheldon from The Big Bang Theory?",
        },
    ],
)

llmgw_headers = {key: value for key, value in raw_response.headers.items() if key.startswith('x-llmgw')}
print(json.dumps(llmgw_headers, indent=2))

The output may look like this:

{
    'x-llmgw-cost': '4e-05',
    'x-llmgw-request-id': '3cb26481-d869-4923-8093-3feb92f8d9fc',
    'x-llmgw-model-id': 'azure-us-gpt35',
    'x-llmgw-attempts': '2'
}

x-llmgw-cost - The cost of the request in cents.
x-llmgw-request-id - The request id used for the request.
x-llmgw-model-id - The model id used for the request.
x-llmgw-attempts - The number of attempts made to get the response.

Streaming Responses

The standard approach blocks until the entire response is ready, which may take time for longer responses. An alternative way is to access the completion in a streaming mode, rendering pieces of the response as soon as they get generated, as we can see for example in the ChatGPT user interface. Below is a sample code demonstrating this approach.

client = create_openai_client()
completion = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
      {
          "role": "user",
          "content": "Please tell me a very long and boring joke.",
      },
    ],
    stream=True,
)
for chunk in completion:
    for choice in chunk.choices:
        content = choice.delta.content
        if content:
            print(content, end='')

Please see the OpenAI documentation for more information on the streaming mode.

Calling non-OpenAI models using OpenAI client

OpenAI client can also be used to call other models using the Chat Completions interface. Currently only AWS Bedrock models are supported.

A simple example would look like regular Chat Completions call but with AWS Bedrock Claude model id:

from openai import OpenAI

BASE_URL = "https://myserver/openai"
API_KEY = "myapikey"

client = OpenAI(
    base_url=BASE_URL,
    api_key=API_KEY,
    default_headers={
        "llmgw-project": "openai-client-aws-test",
        "llmgw-user": "openai-client-aws-test",
    },
)

MODEL = "anthropic.claude-sonnet-4-20250514-v1:0"
resp = client.chat.completions.create(
    model=MODEL,
    messages=[
        {
            "role": "user",
            "content": "Tell me a joke, please!",
        },
    ],
    max_tokens=32,
)
print(resp)

That internally converts the Chat Completions request to a AWS-compatible request, gets the Response, and converts it to the Chat Completions format.