v11.3.3651 (build: May 13 2026)

LLM-server

Some BOSS-Offline reports use generative AI based on the LLM neural network, so to use them, you need to configure the settings at this page.

You can configure either a local server or a cloud server, or both at the same time.
If both at the same time are configured, the local server will take priority, except when neutral data (that does not contain confidential or personal information) is being transmitted.

For a local server Ollama framework is supported, and ChatGPT / YandexGPT / Gemini for a cloud server.

Server URL
specify http or https URL of the server with Ollama installed
As usually, this is http 11434
Example:
http://192.168.0.111:11434

API-key
ChatGPT: you should create API-key and copy it here.
YandexGPT: you should create billing account here, and then obtain OAuth-token and copy it here.
Gemini: you should create API-key, connect billing to it, top up balance, and then copy key here.

Model
Ollama: specify the loaded model to use, currently, models from qwen3 or deepseek-r1 are recommended.
For example:
deepseek-r1:14b
deepseek-r1:32b
qwen3:14b
qwen3:32b
You need to specify the exact model that downloaded and installed in Ollama. Complete models list available on the Ollama website.
ChatGPT:
gpt-4o
o4-mini
gpt-4.1
gpt-4.1-mini
gpt-5
gpt-5-mini
gpt-5.1
and others
YandexGPT:
gpt://<folder_ID>/yandexgpt
gpt://<folder_ID>/yandexgpt/latest
gpt://<folder_ID>/yandexgpt-lite
Gemini:
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.5-pro
gemini-3.1-pro-preview
gemini-3-flash-preview
gemini-flash-latest
gemini-pro-latest
and others


Ollama:
- using a GPU with CUDA support is not required for operation, but is highly recommended, because the performance will be an order of magnitude higher even in comparison with multi-core CPU servers!
- the model must fit completely into the video memory or RAM;
- the larger the model, the better the quality, but the slower the speed;
- it is allowed to use several GPUs (if the video memory of one GPU is not enough to accommodate the entire model);
- when using GPU, CPU and RAM resources can be minimal (for example, 2 CPUs and 4 GB RAM are quite enough).


Example of installing Ollama on Linux Ubuntu (it is assumed that the GPU drivers are already installed):
curl -fsSL https://ollama.com/install.sh | sh
For non-localhost access and increasing the allowed model loading time, it is recommended to make additional settings:
sudo nano /etc/systemd/system/ollama.service
The following lines should be added to the [Service] section:
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_LOAD_TIMEOUT=60m"
Then save the file and execute:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Then you need to download and install the model. For example, qwen3:32b
ollama run qwen3:32b


Attention! If you see the error message "model requires more system memory than is available" while loading a model, even though there is enough VRAM to accommodate the model, the cause is most likely that Ollama's default context window for models is set to a fairly large size - 64K or more, which takes up additional VRAM. In this case, you should ignore this error and look at the "Context window size" setting on this page. This parameter literally means "how much information the model can hold in memory simultaneously during a request," and is specified in tokens. The larger the parameter, the more VRAM is required and the larger the request to the neural network can be. If the context window is too small, the response will be of lower quality, as the neural network may "forget" or not see part of the query. For current system tasks, 16384 is generally sufficient (if VRAM allows, you can set it higher), and the minimum recommended value is 4096. If you set it to 0, the Ollama framework itself will determine the parameter based on the loaded model. However, setting the value too high is also not recommended, as it should not exceed the maximum for a given model (see the description of the specific model).

Finding out the current VRAM usage is usually convenient using the command:
nvidia-smi

© KICKIDLER DLP