AIaaS Introduction

AI for Science. Built for Researchers. Run securely in e-INFRA CZ.

CERIT-SC, a core component of e-INFRA CZ, is developing an on-premise AI platform dedicated to providing researchers with secure, high-performance, and interoperable AI tools. This platform is powered by cutting-edge NVIDIA DGX H100/B200/B300 class systems, guaranteeing the robust compute required for both efficient large-scale model training and high-speed inference. This robust environment hosts a curated selection of open large language and generative models, which are easily accessible via the Open WebUI interface or standard OpenAI-compatible APIs.

Difference between AI Training and Inference

image-webui

Training is learning, and Inference is applying that knowledge.

Concept	What It Does	Data Flow	Our Context
AI Training	The “Education” Phase. Developing the brain by feeding it massive datasets. The model learns patterns, adjusts weights, and minimizes error through backpropagation. Learn patterns & optimize weights.	Forward & Backward (Backprop).	This is the work done by others who created the open-source trained models we are hosting.
AI Inference	The “Application” Phase. Putting the brain to work. The trained model takes new, unseen data and applies its fixed knowledge to make real-time predictions or decisions. Make predictions on new data.	Forward pass only.	This is our job: providing the models and applications over them.

Key Features (Inference)

Secure, on‑premise LLM & generative‑AI platform – Our models run securely on the e-INFRA CZ infrastructure. Queries and responses are not logged by external providers, ensuring that your research and sensitive data remain within our environment. With the exception of internet searching, nothing leaves our local infrastructure.
Supports privacy‑sensitive research – compliance with institutional and legal requirements. This makes our services ideal for handling sensitive data.

What the Platform Provides

Category	Highlights
Compute	NVIDIA DGX‑H100 (Hopper) and DGX‑B200 (Bergamo). Petaflop‑class GPU performance.
Key Models	Full-sized DeepSeek R1 0528 (685B): High-performance reasoning model, excellent for complex tasks and the best for Czech language processing. Operates without internet access for high security. GPT-OSS-120B: General reasoning model, equivalent to known mini OpenAI models. Qwen3 Coder 480B Instruct: Specialized instruction model optimized for programming and code generation. Additional community models can be added on request.
Access	MUNI students and employees and MetaCentrum users
External usage	OpenAI‑compatible REST API - use existing `openai` client code, LangChain, or other integrations without modification

Key AI Services (Inference)

Service	Link	Description
Chat (Open-WebUI)	WebUI chat	A full-featured conversational interface (similar to ChatGPT) offering advanced features and explicit language model selection. Text Work: Translations, summaries, analysis, and generation of program code. Multimodality: Image generation (including editing) and content recognition in images (e.g., extracting a serial number from a photo). Tools: Searching the internet, GitHub, arXiv, and a Python sandbox for running code in the browser and data analytics. RAG (Knowledge): Searching within a document attached to the chat works very well.
Call the API	Using AI models – API docs	Use the OpenAI‑compatible endpoint to integrate AI into your scripts, pipelines, or services.
DeepSite	DeepSite (vibe‑coding)	A generative tool that creates webpages and applications (HTML/CSS/JS) based on a simple text description. Excellent for design proposals, mockups, or quick web concepts.
AI in Jupyter Notebooks	JupyterHub integration	Integration of an AI Assistant directly into the Jupyter Lab environment (Notebook Intelligence). Used for fixing code, generating new snippets, and conversational assistance within your coding projects.
n8n platform	[n8n Agents]../ai-as-a-service/n8n-agents)	In8n is an open-source, low-code workflow automation platform with AI Agent functionality
Documentation ChatBot	docs.e-infra.cz and other documentation sites	A Retrieval-Augmented Generation (RAG) system implemented across e-INFRA CZ documentation. Answers specific questions and acts as a problem solver based on our knowledge base (e.g., “How to run MATLAB?”).

Reference

Read more details on our e-INFRA Blog at https://blog.e-infra.cz/

Picture used from https://blogs.nvidia.com/blog/difference-deep-learning-training-inference-ai/