OpenAI
This guide shows how to use the standard OpenAI client with Adastra LLMGW. This is useful when you want to use OpenAI’s API format without Azure-specific configurations.
Setup
We need to have openai package installed which provides a Python client for the OpenAI API. Install it via:
Set your endpoint and API key to use the OpenAI client:
Next, we need to create a client for the Azure OpenAI service.
Note that in order to use the newer APIs (e.g., Responses API), you should prepend
v1to thebase_url. I.e., LLMGW_API_ENDPOINT = “https://<llmgw-deployment-url>/openai/v1”
The default_headers parameter allows you to associate metadata such as the project name and user with each request,
which may be required based on your configuration. Check with your administrator for specific header requirements.
Making Requests
Now, let’s make a request using the client to generate a completion.
In this example:
- For list of
modelparameters seeconfig.yamlwhere you look fordeployment_name. It refers to model group configured in LLMGW. - The
messagesarray contains the user input, with each message having a role (e.g., “user”) and content.
Accessing Response Metadata
For more detailed information, such as request cost and model information, you can inspect the response metadata in the
headers. LLMGW includes custom headers prefixed with x-llmgw.
The output may look like this:
x-llmgw-cost- The cost of the request in cents.x-llmgw-request-id- The request id used for the request.x-llmgw-model-id- The model id used for the request.x-llmgw-attempts- The number of attempts made to get the response.
Streaming Responses
The standard approach blocks until the entire response is ready, which may take time for longer responses. An alternative way is to access the completion in a streaming mode, rendering pieces of the response as soon as they get generated, as we can see for example in the ChatGPT user interface. Below is a sample code demonstrating this approach.
Please see the OpenAI documentation for more information on the streaming mode.
Calling non-OpenAI models using OpenAI client
OpenAI client can also be used to call other models using the Chat Completions interface.
Currently only AWS Bedrock models are supported.
A simple example would look like regular Chat Completions call but
with AWS Bedrock Claude model id:
That internally converts the Chat Completions request to a AWS-compatible request, gets the Response,
and converts it to the Chat Completions format.