Releases · ollama/ollama

@yajianggroup

What's Changed

Ollama's app now supports searching when running DeepSeek-V3.1, Qwen3 and other models that support tool calling.
Flash attention is now enabled by default for Gemma 3, improving performance and memory utilization
Fixed issue where Ollama would hang while generating responses
Fixed issue where qwen3-coder would act in raw mode when using /api/generate or ollama run qwen3-coder <prompt>
Fixed qwen3-embedding providing invalid results
Ollama will now evict models correctly when num_gpu is set
Fixed issue where tool_index with a value of 0 would not be sent to the model

Experimental Vulkan Support

Experimental support for Vulkan is now available when you build locally from source. This will enable additional GPUs from AMD, and Intel which are not currently supported by Ollama. To build locally, install the Vulkan SDK and set VULKAN_SDK in your environment, then follow the developer instructions. In a future release, Vulkan support will be included in the binary release as well. Please file issues if you run into any problems.

New Contributors

@yajianggroup made their first contribution in #12377
@inforithmics made their first contribution in #11835
@sbhavani made their first contribution in #12619

Full Changelog: v0.12.5...v0.12.6

@shengxinjing

What's Changed

Thinking models now support structured outputs when using the /api/chat API
Ollama's app will now wait until Ollama is running to allow for a conversation to be started
Fixed issue where "think": false would show an error instead of being silently ignored
Fixed deepseek-r1 output issues
macOS 12 Monterey and macOS 13 Ventura are no longer supported
AMD gfx900 and gfx906 (MI50, MI60, etc) GPUs are no longer supported via ROCm. We're working to support these GPUs via Vulkan in a future release.

New Contributors

@shengxinjing made their first contribution in #12415

Full Changelog: v0.12.4...v0.12.5-rc0

@Fachep

What's Changed

Flash attention is now enabled by default for Qwen 3 and Qwen 3 Coder
Fixed minor memory estimation issues when scheduling models on NVIDIA GPUs
Fixed an issue where keep_alive in the API would accept different values for the /api/chat and /api/generate endpoints
Fixed tool calling rendering with qwen3-coder
More reliable and accurate VRAM detection
OLLAMA_FLASH_ATTENTION can now be overridden to 0 for models that have flash attention enabled by default
macOS 12 Monterey and macOS 13 Ventura are no longer supported
Fixed crash where templates were not correctly defined
Fix memory calculations on NVIDIA iGPUs
AMD gfx900 and gfx906 (MI50, MI60, etc) GPUs are no longer supported via ROCm. We're working to support these GPUs via Vulkan in a future release.

New Contributors

@Fachep made their first contribution in #12412

Full Changelog: v0.12.3...v0.12.4-rc3

@gr4ceG

New models

DeepSeek-V3.1-Terminus: DeepSeek-V3.1-Terminus is a hybrid model that supports both thinking mode and non-thinking mode. It delivers more stable & reliable outputs across benchmarks compared to the previous version:

Run on Ollama's cloud:
```
ollama run deepseek-v3.1:671b-cloud
```
Run locally (requires 500GB+ of VRAM)
```
ollama run deepseek-v3.1
```
Kimi-K2-Instruct-0905: Kimi K2-Instruct-0905 is the latest, most capable version of Kimi K2. It is a state-of-the-art mixture-of-experts (MoE) language model, featuring 32 billion activated parameters and a total of 1 trillion parameters.
```
ollama run kimi-k2:1t-cloud
```

What's Changed

Fixed issue where tool calls provided as stringified JSON would not be parsed correctly
ollama push will now provide a URL to follow to sign in
Fixed issues where qwen3-coder would output unicode characters incorrectly
Fix issue where loading a model with /load would crash

New Contributors

@gr4ceG made their first contribution in #12385

Full Changelog: v0.12.2...v0.12.3

Web search

A new web search API is now available in Ollama. Ollama provides a generous free tier of web searches for individuals to use, and higher rate limits are available via Ollama’s cloud. This web search capability can augment models with the latest information from the web to reduce hallucinations and improve accuracy.

What's Changed

Models with Qwen3's architecture including MoE now run in Ollama's new engine
Fixed issue where built-in tools for gpt-oss were not being rendered correctly
Support multi-regex pretokenizers in Ollama's new engine
Ollama's new engine can now load tensors by matching a prefix or suffix

Full Changelog: v0.12.1...v0.12.2

New models

Qwen3 Embedding: state of the art open embedding model by the Qwen team

What's Changed

Qwen3-Coder now supports tool calling
Ollama's app will now longer show "connection lost" in error when connecting to cloud models
Fixed issue where Gemma3 QAT models would not output correct tokens
Fix issue where & characters in Qwen3-Coder would not be parsed correctly when function calling
Fixed issues where ollama signin would not work properly on Linux

Full Changelog: v0.12.0...v0.12.1

@rick-github

Cloud models

Cloud models are now available in preview, allowing you to run a group of larger models with fast, datacenter-grade hardware.

To run a cloud model, use:

ollama run qwen3-coder:480b-cloud

View all cloud models
Blog post

What's Changed

Models with the Bert architecture now run on Ollama's engine
Models with the Qwen 3 architecture now run on Ollama's engine
Fix issue where older NVIDIA GPUs would not be detected if newer drivers were installed
Fixed issue where models would not be imported correctly with ollama create
Ollama will skip parsing the initial <think> if provided in the prompt for /api/generate by @rick-github

New Contributors

@egyptianbman made their first contribution in #12300
@russcoss made their first contribution in #12280

Full Changelog: v0.11.11...v0.12.0

@rick-github

What's Changed

Support for CUDA 13
Improved memory usage when using gpt-oss in Ollama's app
Better scrolling better in Ollama's app when submitting long prompts
Cmd +/- will now zoom and shrink text in Ollama's app
Assistant messages can now by copied in Ollama's app
Fixed error that would occur when attempting to import satefensor files by @rick-github in #12176
Improved memory estimates for hybrid and recurrent models by @gabe-l-hart in #12186
Fixed error that would occur when when batch size was greater than context length
Flash attention & KV cache quantization validation fixes by @jessegross in #12231
Add dimensions field to embed requests by @mxyng in #12242
Enable new memory estimates in Ollama's new engine by default by @jessegross in #12252
Ollama will no longer load split vision models in the Ollama engine by @jessegross in #12241

New Contributors

@KashyapTan made their first contribution in #12188
@carbonatedWaterOrg made their first contribution in #12230
@fengyuchuanshen made their first contribution in #12249

Full Changelog: v0.11.10...v0.11.11

New models

EmbeddingGemma a new open embedding model that delivers best-in-class performance for its size

What's Changed

Support for EmbeddingGemma

Full Changelog: v0.11.9...v0.11.10

@alpha-nerd-nomyo

What's Changed

Improved performance via overlapping GPU and CPU computations
Fixed issues where unrecognized AMD GPU would cause an error
Reduce crashes due to unhandled errors in some Mac and Linux installations of Ollama

New Contributors

@alpha-nerd-nomyo made their first contribution in #12129
@pxwanglu made their first contribution in #12123

Full Changelog: v0.11.8...v0.11.9-rc0

Releases: ollama/ollama

v0.12.6

What's Changed

Experimental Vulkan Support

New Contributors

Contributors

Uh oh!

v0.12.5

What's Changed

New Contributors

Contributors

Uh oh!

v0.12.4

What's Changed

New Contributors

Contributors

Uh oh!

v0.12.3

New models

What's Changed

New Contributors

Contributors

Uh oh!

v0.12.2

Web search

What's Changed

Uh oh!

v0.12.1

New models

What's Changed

Uh oh!

v0.12.0

Cloud models

What's Changed

New Contributors

Contributors

Uh oh!

v0.11.11

What's Changed

New Contributors

Contributors

Uh oh!

v0.11.10

New models

What's Changed

Uh oh!

v0.11.9

What's Changed

New Contributors

Contributors

Uh oh!