AI Coding Assistants

This section covers how to set up and configure AI coding assistants to work with LLM Gateway.

Approved Assistants

The following AI coding assistants have been tested and approved for use with LLMGW:

Assistant	Description	Guide
Claude Code	Anthropic’s official CLI for Claude	Setup Guide
Claude Code VS Code Extension	Claude Code extension for VS Code (Beta)	Setup Guide
Cline	VS Code extension for AI-assisted coding	Setup Guide

Prerequisites

Before setting up any AI coding assistant, ensure you have:

LLMGW API Token - Request your personal user-based token from your PM or administrator. Note that user-based tokens are different from API keys in LLMGW.
LLMGW Endpoint - The base URL depends on your assistant:
- Claude Code (Bedrock): https://<llmgw-deployment-url>/aws-bedrock
- Cline (OpenAI Compatible): https://<llmgw-deployment-url>/openai
Project Assignment - Your token must be associated with a project that has appropriate spend limits configured.

How It Works

AI coding assistants connect to LLMGW using the OpenAI-compatible API endpoint (/openai) or AWS Bedrock endpoint (/aws-bedrock). This allows assistants that support these API formats to work seamlessly with LLMGW.

┌─────────────────┐     ┌─────────────┐     ┌──────────────────┐
│  AI Assistant   │────►│    LLMGW    │────►│  LLM Provider    │
│ (Claude Code,   │     │             │     │ (Azure, Bedrock) │
│  Cline, etc.)   │◄────│             │◄────│                  │
└─────────────────┘     └─────────────┘     └──────────────────┘

Endpoint Model Access

Different endpoints provide access to different model providers:

/openai - OpenAI-compatible endpoint providing access to:
- Azure OpenAI models (GPT-4.1 Turbo, GPT-4o, GPT-5 Nano, etc.)
- Azure DeepSeek models
/aws-bedrock - AWS Bedrock endpoint providing access to:
- Anthropic Claude models (Claude Sonnet 4.5, Claude Sonnet 4, Claude Haiku 4.5, etc.)
- Other AWS Bedrock-supported models

What LLMGW Provides

LLMGW handles:

Authentication - Validates your API token
Spend Tracking - Monitors usage against your budget limits
Load Balancing - Routes requests to optimal model instances
Model Selection - Allows you to use different models via model groups

Monitoring Your Usage

For details on how to check your current spend and remaining budget limits from the command line, see the internal MS Teams channel “Access token-API requests”.

Available Models

For details on available models, their pricing, and recommended configurations, see the internal MS Teams channel “Access token-API requests”.