Introduction
About AI as a Service
CERIT-SC, a core component of e-INFRA CZ, is developing an on-premise AI platform dedicated to providing researchers with secure, high-performance, and interoperable AI tools. This platform is powered by cutting-edge NVIDIA DGX H100/B200/B300 class systems, guaranteeing the robust compute required for both efficient large-scale model training and high-speed inference. This robust environment hosts a curated selection of open large language and generative models, which are easily accessible via the Open WebUI interface or standard OpenAI-compatible APIs.
Difference between AI Training and Inference

Training is learning, and Inference is applying that knowledge.
| Concept | What It Does | Data Flow | Our Context |
|---|---|---|---|
| AI Training | The “Education” Phase. Developing the brain by feeding it massive datasets. The model learns patterns, adjusts weights, and minimizes error through backpropagation. Learn patterns & optimize weights. | Forward & Backward (Backprop). | This is the work done by others who created the open-source trained models we are hosting. |
| AI Inference | The “Application” Phase. Putting the brain to work. The trained model takes new, unseen data and applies its fixed knowledge to make real-time predictions or decisions. Make predictions on new data. | Forward pass only. | This is our job: providing the models and applications over them. |
Key Features (Inference)
- Secure, on‑premise LLM & generative‑AI platform – Our models run securely on the e-INFRA CZ infrastructure. Queries and responses are not logged by external providers, ensuring that your research and sensitive data remain within our environment. With the exception of internet searching, nothing leaves our local infrastructure.
- Supports privacy‑sensitive research – compliance with institutional and legal requirements. This makes our services ideal for handling sensitive data.
What the Platform Provides
| Category | Highlights |
|---|---|
| Compute | NVIDIA DGX‑H100 (Hopper) and DGX‑B200 (Bergamo). Petaflop‑class GPU performance. |
| Key Models | Full-sized DeepSeek R1 0528 (685B): High-performance reasoning model, excellent for complex tasks and the best for Czech language processing. Operates without internet access for high security. GPT-OSS-120B: General reasoning model, equivalent to known mini OpenAI models. Qwen3 Coder 480B Instruct: Specialized instruction model optimized for programming and code generation. Additional community models can be added on request. |
| Access | MUNI students and employees and MetaCentrum users |
| External usage | OpenAI‑compatible REST API - use existing openai client code, LangChain, or other integrations without modification |
Key AI Services (Inference)
| Service | Link | Description |
|---|---|---|
| Chat (Open-WebUI) | WebUI chat | A full-featured conversational interface (similar to ChatGPT) offering advanced features and explicit language model selection. Text Work: Translations, summaries, analysis, and generation of program code. Multimodality: Image generation (including editing) and content recognition in images (e.g., extracting a serial number from a photo). Tools: Searching the internet, GitHub, arXiv, and a Python sandbox for running code in the browser and data analytics. RAG (Knowledge): Searching within a document attached to the chat works very well. |
| Call the API | Using AI models – API docs | Use the OpenAI‑compatible endpoint to integrate AI into your scripts, pipelines, or services. |
| DeepSite | DeepSite (vibe‑coding) | A generative tool that creates webpages and applications (HTML/CSS/JS) based on a simple text description. Excellent for design proposals, mockups, or quick web concepts. |
| AI in Jupyter Notebooks | JupyterHub integration | Integration of an AI Assistant directly into the Jupyter Lab environment (Notebook Intelligence). Used for fixing code, generating new snippets, and conversational assistance within your coding projects. |
| Documentation ChatBot | docs.e-infra.cz and other documentation sites | A Retrieval-Augmented Generation (RAG) system implemented across e-INFRA CZ documentation. Answers specific questions and acts as a problem solver based on our knowledge base (e.g., “How to run MATLAB?”). |
Reference
Read more details on our e-INFRA Blog at https://blog.e-infra.cz/
Picture used from https://blogs.nvidia.com/blog/difference-deep-learning-training-inference-ai/
Last updated on
