Langchain (Azure Chat OpenAI)

This guide shows how to use Langchain’s Azure ChatOpenAI client with Adastra LLMGW. Langchain provides additional abstractions and tools for building AI applications on top of the base OpenAI functionality.

Client Setup

Install the Langchain OpenAI package, which provides Langchain integrations for OpenAI models:

pip install langchain_openai

Set your endpoint to use the Azure OpenAI client through Langchain:

LLMGW_API_ENDPOINT = "https://<llmgw-deployment-url>/azure-open-ai/"
LLMGW_API_KEY = <YOUR_LLMGW_TOKEN>

Note: LLMGW_API_ENDPOINT is the same as it was for Azure OpenAI.

Next we need to create the Langchain Azure ChatOpenAI model.

import json
from langchain_openai import AzureChatOpenAI

model = AzureChatOpenAI(
    azure_deployment="gpt-4.1",
    api_version="2025-01-01-preview",
    azure_endpoint=LLMGW_API_ENDPOINT,
    api_key=LLMGW_API_KEY,
    model_kwargs={
        "extra_headers": {"llmgw-project": "shop-assistant", "llmgw-user": "alice"},
    },
    include_response_headers=True,
)

The extra_headers in model_kwargs allows you to associate metadata such as the project name and user with each request, which may be required based on your configuration. Check with your administrator for specific header requirements.

Making Requests

Now let’s make a request using Langchain’s simplified interface:

question = "What is the best number in the world according to Sheldon from The Big Bang Theory?"
response = model.invoke(question)
print(response.content)

In this example:

The azure_deployment parameter should match deployment_name values in your LLMGW config.yaml. It refers to model group configured in LLMGW.
Langchain’s invoke() method provides a simplified interface compared to the raw OpenAI client.

Accessing Response Metadata

For more detailed information, such as request cost and model information, you can inspect the response metadata. Since we set include_response_headers=True, LLMGW includes custom headers prefixed with x-llmgw.

headers = response.response_metadata.get('headers', {})
llmgw_headers = {key: value for key, value in headers.items() if key.startswith('x-llmgw')}
print(json.dumps(llmgw_headers, indent=2))

The output may look like this:

{
    'x-llmgw-cost': '4e-05',
    'x-llmgw-request-id': '3cb26481-d869-4923-8093-3feb92f8d9fc',
    'x-llmgw-model-id': 'azure-us-gpt35',
    'x-llmgw-attempts': '2'
}

x-llmgw-cost - The cost of the request in cents.
x-llmgw-request-id - The request id used for the request.
x-llmgw-model-id - The model id used for the request.
x-llmgw-attempts - The number of attempts made to get the response.

Streaming Responses

Langchain also supports streaming responses for real-time output. You can use the stream() method to get response chunks as they are generated.

question = "Please tell me a very long and boring joke."
for chunk in model.stream(question):
    content = chunk.content
    if content:
        print(content, end='')

This provides the same streaming experience as the raw OpenAI client but through Langchain’s interface.

Advanced Langchain Features

With Langchain, you can leverage additional features like:

Prompt templates for consistent message formatting,
Chains for complex multi-step operations,
Agents for autonomous task execution,
Memory for conversation persistence.

For more information on these advanced features, see the Langchain documentation.