Azure OpenAI

This guide shows how to use the Azure OpenAI client with Adastra LLMGW.

Setup

We need to have openai package installed which provides a Python client for the Azure OpenAI API. Install it via:

$ pip install openai

Set your endpoint and API key to use the Azure OpenAI client:

LLMGW_API_ENDPOINT = "https://<llmgw-deployment-url>/azure-open-ai/"
LLMGW_API_KEY = <YOUR_LLMGW_TOKEN>

Next, we need to create a client for the Azure OpenAI service.

import json
from openai import AzureOpenAI


def create_azure_openai_client():
    return AzureOpenAI(
        # https://learn.microsoft.com/azure/ai-services/openai/reference#rest-api-versioning
        api_version="2025-01-01-preview",
        azure_endpoint=LLMGW_API_ENDPOINT,
        api_key=LLMGW_API_KEY,
        default_headers={"llmgw-project": "shop-assistant", "llmgw-user": "alice"},
    )

The default_headers parameter allows you to associate metadata such as the project name and user with each request, which may be required based on your configuration. Check with your administrator for specific header requirements.

Making Requests

Now, let’s make a request using the client to generate a completion.

client = create_azure_openai_client()
completion = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
      {
          "role": "user",
          "content": "What is the best number in the world according to Sheldon from The Big Bang Theory?",
      },
    ],
)
print(completion.choices[0].message.content)

In this example:

For list of model parameters see config.yaml where you look for deployment_name. It refers to model group configured in LLMGW.
The messages array contains the user input, with each message having a role (e.g., “user”) and content.

Accessing Response Metadata

For more detailed information, such as request cost and model information, you can inspect the response metadata in the headers. LLMGW includes custom headers prefixed with x-llmgw.

raw_response = client.chat.with_raw_response.completions.create(
    model = "gpt-4.1",
    messages = [
        {
            "role": "user",
            "content": "What is the best number in the world according to Sheldon from The Big Bang Theory?",
        },
    ],
)

llmgw_headers = {key: value for key, value in raw_response.headers.items() if key.startswith('x-llmgw')}
print(json.dumps(dict(llmgw_headers), indent=2))

The output may look like this:

{
    'x-llmgw-cost': '4e-05',
    'x-llmgw-request-id': '3cb26481-d869-4923-8093-3feb92f8d9fc',
    'x-llmgw-model-id': 'azure-us-gpt35',
    'x-llmgw-attempts': '2'
}

x-llmgw-cost - The cost of the request in cents.
x-llmgw-request-id - The request id used for the request.
x-llmgw-model-id - The model id used for the request.
x-llmgw-attempts - The number of attempts made to get the response.

Streaming Responses

The standard approach blocks until the entire response is ready, which may take time for longer responses. An alternative way is to access the completion in a streaming mode, rendering pieces of the response as soon as they get generated, as we can see for example in the ChatGPT user interface. Below is a sample code demonstrating this approach.

client = create_azure_openai_client()
completion = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
      {
          "role": "user",
          "content": "Please tell me a very long and boring joke.",
      },
    ],
    stream = True,
)
for chunk in completion:
    for choice in chunk.choices:
        content = choice.delta.content
        if content:
            print(content, end='')

Please see the OpenAI documentation for more information on the streaming mode.

Azure OpenAI v1 API

Azure currently provides two API versions: the legacy deployments API and the newer v1 API. The biggest difference is that the v1 API supports OpenAI Responses API, while deployments API does not.

If you wish to use the newer v1 API (see v1 API docs), you should switch to the OpenAI client - see OpenAI client section - as AzureOpenAI client does not currently support it.

This additionally means that calling v1 endpoints (e.g., OpenAI Responses API) via /azure-open-ai endpoint is not supported.