Q: Does Shimmy support GPU on Mac M1?

Yes, Shimmy has pre-built binaries for macOS ARM64 that include MLX acceleration for Apple Silicon. It auto-detects GPU and uses Metal for optimal performance, with full compatibility confirmed in the README for Intel and Apple Silicon Macs.

Q: How to install Shimmy on Windows?

Download the pre-built .exe file from GitHub releases using curl, then run it. No dependencies needed—it includes CUDA, Vulkan, and OpenCL backends for automatic GPU detection, as outlined in the quick start section.

Q: What models can I run with Shimmy?

Shimmy supports GGUF format models from Hugging Face, Ollama cache, or local directories. Popular choices include Phi-3 and Llama variants, but any GGUF model should work, with auto-discovery filtering out non-language models.

Q: Is Shimmy suitable for production use?

Shimmy is lightweight and reliable for development, but lacks built-in scaling and high-availability features. The README focuses on local inference, so it's best for prototyping or small-scale deployments rather than high-throughput production.

Question 1

How do I use Shimmy with VSCode Copilot?

Accepted Answer

Change the Copilot server URL to http://localhost:11435 in VSCode settings. Shimmy's OpenAI compatibility ensures it works instantly with no code changes, as shown in the integration examples where only the endpoint is modified.

Question 2

Shimmy vs Ollama: which is better for local LLMs?

Accepted Answer

Shimmy is 142x smaller (4.8MB vs 680MB), starts faster (<100ms vs 5-10s), and offers perfect OpenAI API compatibility. Ollama has a larger ecosystem but Shimmy excels in lightweight, drop-in replacement scenarios, especially for existing OpenAI tooling.

Question 3

Does Shimmy support GPU on Mac M1?

Accepted Answer

Yes, Shimmy has pre-built binaries for macOS ARM64 that include MLX acceleration for Apple Silicon. It auto-detects GPU and uses Metal for optimal performance, with full compatibility confirmed in the README for Intel and Apple Silicon Macs.

Question 4

How to install Shimmy on Windows?

Accepted Answer

Download the pre-built .exe file from GitHub releases using curl, then run it. No dependencies needed—it includes CUDA, Vulkan, and OpenCL backends for automatic GPU detection, as outlined in the quick start section.

Question 5

What models can I run with Shimmy?

Accepted Answer

Shimmy supports GGUF format models from Hugging Face, Ollama cache, or local directories. Popular choices include Phi-3 and Llama variants, but any GGUF model should work, with auto-discovery filtering out non-language models.

Question 6

Is Shimmy suitable for production use?

Accepted Answer

Shimmy is lightweight and reliable for development, but lacks built-in scaling and high-availability features. The README focuses on local inference, so it's best for prototyping or small-scale deployments rather than high-throughput production.

shimmy

What is shimmy?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions