Description:
Self-hosted, multi-model AI inference server that runs LLMs, TTS, STT, embeddings, and image generation behind a single OpenAI-compatible API, with GPU memory budgeting and a plugin system.
Keep Calm and Read the Friendly Manual :-)
Description:
Self-hosted, multi-model AI inference server that runs LLMs, TTS, STT, embeddings, and image generation behind a single OpenAI-compatible API, with GPU memory budgeting and a plugin system.