AI Coding Assistants Integration

Introduction

The CERIT-SC AI infrastructure exposes Large Language Models (LLMs) through standard API protocols, enabling you to integrate powerful AI assistance directly into your local development environment. By connecting your tools to our backend, you can leverage high-performance models (such as qwen3-coder or gpt-oss-120b) for coding tasks without running them on your own hardware or relying on external commercial providers.

This guide explains how to configure several popular tools to communicate with our API.

Prerequisite: Before proceeding, ensure you have generated an API key from the AI Chat WebUI. You will need this key to authenticate your client.

Claude Code

Claude Code is also integrated into Jupyter Notebook.

Claude Code can be deployed and configured to work with our models by pointing it to our API endpoint.

Installation

Install Claude Code for your operating system by following the official instructions in the upstream repository:

Claude Code – Getting Started

Make sure the claude CLI is available in your $PATH after installation.

Linux Installation (including Windows WSL)

These instructions apply to both native Linux and Windows Subsystem for Linux (WSL).

Install Claude Code

Use the official installation script to install Claude Code:

curl -fsSL https://claude.ai/install.sh | bash

After installation completes successfully, you should see output similar to the following:

Setting up Claude Code...

✔ Claude Code successfully installed!

  Version: 2.1.5
  Location: ~/.local/bin/claude

  Next: Run claude --help to get started

✅ Installation complete!

Start Claude and Exit During Onboarding

Run Claude for the first time:

claude

Proceed through the syntax scheme selection.
When you reach the ”Select login method” screen, exit the application by pressing Ctrl+C three times.

This step generates the initial configuration file without completing onboarding.

Manually Complete Onboarding

Open the Claude configuration file:

vim ~/.claude.json

At the end of the file, add the following property:

"hasCompletedOnboarding": true

Ensure the previous last property ends with a comma.
The JSON must remain valid.

Example of a correctly updated ~/.claude.json file:

{
  "installMethod": "native",
  "autoUpdates": false,
  "cachedGrowthBookFeatures": {
    "tengu_1p_event_batch_config": {
      "scheduledDelayMillis": 5000,
      "maxExportBatchSize": 200,
      "maxQueueSize": 8192
    },
    "tengu_mcp_tool_search": false,
    "tengu_scratch": false,
    "tengu_log_segment_events": false,
    "tengu_log_datadog_events": true,
    "tengu_event_sampling_config": {},
    "tengu_tool_pear": false,
    "tengu_thinkback": false,
    "tengu_sumi": false
  },
  "userID": "xxx",
  "firstStartTime": "2026-01-12T12:59:53.117Z",
  "sonnet45MigrationComplete": true,
  "opus45MigrationComplete": true,
  "thinkingMigrationComplete": true,
  "changelogLastFetched": 1768222793309,
  "autoUpdatesProtectedForNative": true,
  "hasCompletedOnboarding": true
}

Save the file and exit the editor.

Run Claude Normally

Start Claude again:

claude

Claude should now launch without triggering the onboarding flow and run smoothly.

Configuration

Claude Code is configured using environment variables. Export the following variables in your shell:

export ANTHROPIC_BASE_URL="https://llm.ai.e-infra.cz/"
export ANTHROPIC_AUTH_TOKEN="sk-..."
export ANTHROPIC_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="gpt-oss-120b"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

Alternatively, you can define these environment variables in the settings file ~/.claude/settings.json:

settings.json

{
  "permissions": {
    "defaultMode": "acceptEdits"
  },
  "env": {
    "ANTHROPIC_BASE_URL": "https://llm.ai.e-infra.cz/",
    "ANTHROPIC_AUTH_TOKEN": "sk-...",
    "ANTHROPIC_MODEL": "glm-4.7",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-4.7",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "gpt-oss-120b",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
  }
}

Variable description:

ANTHROPIC_BASE_URL – Base URL of our LLM API.
ANTHROPIC_AUTH_TOKEN – Your API key obtained from https://chat.ai.e-infra.cz.
ANTHROPIC_MODEL – Default model to use when running Claude Code.
ANTHROPIC_DEFAULT_OPUS_MODEL – Default model to use when running Claude Code for reasoning and complex tasks.
ANTHROPIC_DEFAULT_SONNET_MODEL – Default model to use when running Claude Code for reasoning and moderately complex tasks.
ANTHROPIC_DEFAULT_HAIKU_MODEL – Default model to use for simple tasks.
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC – Disables sending telemetry and various reporting (not used with non-Anthropic APIs).

Running Claude Code

Once the environment variables are set, start Claude Code with:

claude [project-dir]

You should now be able to interact with Claude Code using our backend and selected model.

You can choose any of our available models (e.g., qwen3-coder). However, not all models are guaranteed to work correctly with Claude Code.

If Claude Code stops responding or terminates unexpectedly, the most common cause is that the model’s context size has been exceeded. To resolve this, switch to a different model with a larger context window or reduce the amount of text being processed at once.

Codex

Codex can be deployed and configured to work with our models by pointing it to our API endpoint.

Installation

Install Codex for your operating system by following the official instructions in the upstream repository:

Codex – Installing and running Codex CLI

Ensure the codex CLI is available in your $PATH after installation.

For Linux or Windows, it is recommended to visit the Releases page and download the appropriate precompiled binary for your platform.

Configuration

Set these environment variables:

export OPENAI_BASE_URL=https://llm.ai.e-infra.cz
export OPENAI_API_KEY=sk-...

Replace sk-... with your API key obtained from: https://chat.ai.e-infra.cz

Run the application with the following command:

codex --model qwen3-coder --full-auto

When prompted to sign in:

Select Provide your own API key.
Confirm Use your own OpenAI API key for usage-based billing.

After this, the setup is complete and ready to use.

Open Code

Open Code can be deployed and configured to work with our models by pointing it to our API endpoint.

Installation

Install Open Code for your operating system by following the official instructions in the upstream repository:

Open Code – Installation

Ensure the opencode CLI is available in your $PATH after installation.

Configuration

Save the configuration below to the file ~/.config/opencode/opencode.json:

~/.config/opencode/opencode.json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "litellm": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "LiteLLM",
      "options": {
        "baseURL": "https://llm.ai.e-infra.cz/v1"
      },
      "models": {
        "deepseek-r1": {
          "name": "deepseek-r1"
        },
        "gpt-oss-120b": {
          "name": "gpt-oss-120b"
        },
        "qwen3-coder": {
          "name": "qwen3-coder"
        },
        "glm-4.7": {
          "name": "glm-4.7"
        }
      }
    }
  }
}

Start the application by running:
```
opencode
```
Inside opencode, type:
```
/connect
```
From the list of available providers, scroll to the end and select LiteLLM.
When prompted, paste your API key obtained from: https://chat.ai.e-infra.cz
Select the model:
```
glm-4.7
```

Once completed, the setup is ready to use.

The model gpt-oss-120b has partial support and may not work as expected.
The model qwen3-coder should work as well as glm-4.7.

Visual Studio Code

Integrate AI chatbots with Visual Studio Code using third-party extensions such as Continue or Roo Code. These extensions enable AI assistance in various roles while coding, including simple chat, agent mode, and autocomplete. Chat provides a familiar conversational interface, agent mode analyzes or edits files in your project, and autocomplete suggests code as you write.

Install the Continue extension

With Visual Studio Code running:

Open the Extensions tab (Ctrl+Shift+X) and search for Continue
Click the extension and then click Install
After installation, access the Continue extension by clicking the Continue icon in the left sidebar

Configure the Continue extension

Configure the Continue extension by editing the config.yaml file:

Access the Continue extension within Visual Studio Code using the icon in the left sidebar
Click Open settings in the top-right corner of the Continue extension window.
Click Configs.
Click the Open configuration icon at the end of the line labeled Local Config.
Once the config.yaml is opened, use the following configuration with your own <api-key> (guide above):

%YAML 1.1
---
name: Local Assistant
version: 1.0.0
schema: v1
model_defaults: &model_defaults
  provider: openai
  apiKey: <api-key>
  apiBase: https://llm.ai.e-infra.cz/v1
models:
  - name: autocomplete-coder
    <<: *model_defaults
    model: qwen3-coder
    promptTemplates:
      autocomplete: '<|fim_prefix|>{{{ prefix }}}<|fim_suffix|>{{{ suffix }}}<|fim_middle|>'
    autocompleteOptions:
      transform: false
    defaultCompletionOptions:
      temperature: 0.6
      maxTokens: 512
    roles:
      - autocomplete
  - name: chat-coder
    <<: *model_defaults
    model: qwen3-coder
    env:
      useLegacyCompletionsEndpoint: false
    roles:
      - chat
      - edit
context:
  - provider: code
  - provider: docs
  - provider: diff
  - provider: terminal
  - provider: problems
  - provider: folder
  - provider: codebase

Save the file. The new configuration should apply immediately.
Verify that FIM autocomplete is being used by checking the Continue button in the Visual Studio Code status bar (bottom-right corner). It should display Continue, not Continue (NE). If Continue (NE) is shown, click this button and select Use FIM autocomplete over Next Edit.

Usage of AI in Visual Studio Code

Chat: Access chat by clicking the Continue icon in the left sidebar
Agent mode: In chat, engage agent mode by asking to analyze, explain, or edit files in your current project. This requires additional permissions such as read/write access to related files; you must grant these permissions for the agent to perform requested actions
Autocomplete: The autocomplete feature continuously suggests new code as you write. Press Tab to accept suggestions. Responsiveness depends on model speed. You can change the model from qwen3-coder to gpt-oss-120b in the autocomplete configuration section for faster responses, though the default model generally performs better on coding tasks

Roo Code

Roo Code is a Visual Studio Code extension offering AI assistance with enhanced agent and code understanding capabilities.

Install the Roo Code extension

With Visual Studio Code running:

Open the Extensions tab (Ctrl+Shift+X)
Search for Roo Code
Select the extension and click Install
After installation, access Roo Code via its icon in the left sidebar

Configure the Roo Code extension

To use self-hosted models:

Open Roo Code using the sidebar icon
Click the ⚙️ Settings icon (top-right corner)
In the Providers section:
- API Provider: OpenAI Compatible
- Base URL: https://llm.ai.e-infra.cz/v1
- API Key: Your key from AI Chat WebUI
- Model: qwen3-coder (or alternative)
Set Context Window Size using this table
Save settings

Using Roo Code

Select agents via the button in the extension’s bottom-left corner
After submitting a prompt, agents work autonomously until intervention is needed
For advanced usage, consult the Roo Code documentation

Codebase Indexing

Index the codebase using one of our embedding models to help agents better search and understand your codebase. To store the index, set up a local Qdrant database. Here is an example docker-compose.yaml:

services:
  qdrant:
    image: qdrant/qdrant
    ports:
      - "6333:6333"
    volumes:
      - qdrant_storage:/qdrant/storage
volumes:
  qdrant_storage:

Configure codebase indexing by following these steps:

Launch a new database instance by running docker compose up -d in the directory containing the docker-compose.yaml file
In the Roo Code extension interface, click on the database icon in the bottom right corner.
Open the Setup section.
Set the following values:
- Embedder Provider: OpenAI Compatible
- Base URL: https://llm.ai.e-infra.cz/v1
- API Key: Your API key from AI Chat WebUI
- Model: One of our embedding models, e.g., qwen3-embedding-4b
- Model Dimension: The embedding vector size of the selected model, e.g., 2560 for qwen3-embedding-4b.
- Qdrant URL: http://localhost:6333
- Qdrant API Key: Leave empty if you used the provided Docker Compose configuration.
Create the index by clicking on Start Indexing.
After the indexing is finished, the agents will have access to the database and be able to better search and understand the codebase.

Caveats

Disable all other Visual Studio Code extensions that provide AI autocomplete features. Otherwise, the Continue extension may not work properly.

AI Coding Assistants Integration

On this page