• 0 Posts
  • 8 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle

  • Completely depends on your laptop hardware, but generally:

    • TabbyAPI (exllamav2/exllamav3)
    • ik_llama.cpp, and its openai server
    • kobold.cpp (or kobold.cpp rocm, or croco.cpp, depends)
    • An MLX host with one of the new distillation quantizations
    • Text-gen-web-ui (slow, but supports a lot of samplers and some exotic quantizations)
    • SGLang (extremely fast for parallel calls if thats what you want).
    • Aphrodite Engine (lots of samplers, and fast at the expense of some VRAM usage).

    I use text-gen-web-ui at the moment only because TabbyAPI is a little broken with exllamav3 (which is utterly awesome for Qwen3), otherwise I’d almost always stick to TabbyAPI.

    Tell me (vaguely) what your system has, and I can be more specific.