![]() v11.3.3651 (build: May 13 2026) |
|
LLM-serverSome BOSS-Offline reports use generative AI based on the LLM neural network, so to use them, you need to configure the settings at this page.You can configure either a local server or a cloud server, or both at the same time. If both at the same time are configured, the local server will take priority, except when neutral data (that does not contain confidential or personal information) is being transmitted. For a local server Ollama framework is supported, and ChatGPT / YandexGPT / Gemini for a cloud server. Server URL specify http or https URL of the server with Ollama installed As usually, this is http 11434 Example: http://192.168.0.111:11434 API-key ChatGPT: you should create API-key and copy it here. YandexGPT: you should create billing account here, and then obtain OAuth-token and copy it here. Gemini: you should create API-key, connect billing to it, top up balance, and then copy key here. Model Ollama: specify the loaded model to use, currently, models from qwen3 or deepseek-r1 are recommended. For example: deepseek-r1:14b deepseek-r1:32b qwen3:14b qwen3:32b You need to specify the exact model that downloaded and installed in Ollama. Complete models list available on the Ollama website. ChatGPT: gpt-4o o4-mini gpt-4.1 gpt-4.1-mini gpt-5 gpt-5-mini gpt-5.1 and others YandexGPT: gpt://<folder_ID>/yandexgpt gpt://<folder_ID>/yandexgpt/latest gpt://<folder_ID>/yandexgpt-lite Gemini: gemini-2.5-flash gemini-2.5-flash-lite gemini-2.5-pro gemini-3.1-pro-preview gemini-3-flash-preview gemini-flash-latest gemini-pro-latest and others Ollama: - using a GPU with CUDA support is not required for operation, but is highly recommended, because the performance will be an order of magnitude higher even in comparison with multi-core CPU servers! - the model must fit completely into the video memory or RAM; - the larger the model, the better the quality, but the slower the speed; - it is allowed to use several GPUs (if the video memory of one GPU is not enough to accommodate the entire model); - when using GPU, CPU and RAM resources can be minimal (for example, 2 CPUs and 4 GB RAM are quite enough). Example of installing Ollama on Linux Ubuntu (it is assumed that the GPU drivers are already installed): curl -fsSL https://ollama.com/install.sh | shFor non-localhost access and increasing the allowed model loading time, it is recommended to make additional settings: sudo nano /etc/systemd/system/ollama.serviceThe following lines should be added to the [Service] section: Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_LOAD_TIMEOUT=60m"Then save the file and execute: sudo systemctl daemon-reload sudo systemctl restart ollamaThen you need to download and install the model. For example, qwen3:32b ollama run qwen3:32b Attention! If you see the error message "model requires more system memory than is available" while loading a model, even though there is enough VRAM to accommodate the model, the cause is most likely that Ollama's default context window for models is set to a fairly large size - 64K or more, which takes up additional VRAM. In this case, you should ignore this error and look at the "Context window size" setting on this page. This parameter literally means "how much information the model can hold in memory simultaneously during a request," and is specified in tokens. The larger the parameter, the more VRAM is required and the larger the request to the neural network can be. If the context window is too small, the response will be of lower quality, as the neural network may "forget" or not see part of the query. For current system tasks, 16384 is generally sufficient (if VRAM allows, you can set it higher), and the minimum recommended value is 4096. If you set it to 0, the Ollama framework itself will determine the parameter based on the loaded model. However, setting the value too high is also not recommended, as it should not exceed the maximum for a given model (see the description of the specific model). Finding out the current VRAM usage is usually convenient using the command: nvidia-smi |
|
| © KICKIDLER DLP | |