Recommended Models
ServerAssistantAI supports a wide range of language models from different providers, including premium and free options, allowing server owners to choose the model that best fits their needs. Here's what we recommend, listed from most recommended to least.
Embedding Models
Embedding models are used to convert text data into numerical representations called embeddings. These embeddings capture the semantic meaning and relationships between different pieces of text. When the documents/ directory is updated, the content is sent to the embedding API. The resulting embeddings are saved to the cache/ directory, allowing the AI to find relevant context efficiently without reprocessing or making new API requests for each query.
text-embedding-004
Free & Paid
Google's state-of-the-art embeddings for semantic search and classification tasks.
text-embedding-3-large
Paid
OpenAI's large embedding model for text similarity and retrieval tasks.
embed-english-v3.0
Free & Paid
English embedding model great for noisy data, enabling better retrievals for RAG.
embed-multilingual-v3.0
Free & Paid
Multilingual embedding model great for noisy data, enabling better retrievals for RAG.
text-embedding-3-small
Paid
OpenAI's small embedding model for text similarity and retrieval tasks.
Large Language Models
Large Language Models (LLMs) are powerful AI models that can understand and generate human-like text based on the input they receive. In ServerAssistantAI, when a user asks a question, the system retrieves relevant cached context from the embedding API results. This context, along with the user's question, is sent to the LLM to generate accurate and context-aware responses.
gpt-4o
Paid
OpenAI's most advanced model. Same intelligence as GPT-4 Turbo but 2x faster and 50% cheaper.
gpt-4o
Free & Paid
OpenAI's most advanced model. Same intelligence as GPT-4 Turbo but 2x faster and 50% cheaper.
claude-opus-4.1-20250805
Paid
Anthropic's upgraded model with enhanced agentic tasks, real-world coding, and reasoning capabilities.
claude-opus-4-20250514
Paid
Anthropic's most intelligent model with hybrid reasoning and 200K context window, excelling at coding and working autonomously for hours.
gemini-2.5-flash
Free & Paid
Google's fast thinking model optimized for speed and cost-efficiency with controllable thinking budgets for high-volume tasks.
gpt-4.1
Paid
OpenAI's enhanced model with major coding improvements and 1 million token context window.
gpt-4.1
Free & Paid
OpenAI's enhanced model with major coding improvements and 1 million token context window.
deepseek-r1-distill-llama-70b
Free & Paid
DeepSeek's efficient 70B reasoning model distilled from larger models.
claude-sonnet-4-20250514
Paid
Anthropic's high-performance model with 200K context window, achieving 72.7% on SWE-bench with superior instruction following.
gpt-4.1-mini
Paid
OpenAI's efficient model competitive with GPT-4o but 50% faster and 83% cheaper with 1 million token context window.
gpt-4.1-mini
Free & Paid
OpenAI's efficient model competitive with GPT-4o but 50% faster and 83% cheaper with 1 million token context window.
gemini-2.0-flash-001
Free & Paid
Google's next-generation flash model with native tool use, superior speed, and 1M token context window for multimodal applications.
gemma-3-27b-it
Free
Google's open-source 27B parameter model optimized for instruction following and conversational AI with efficient performance.
gemini-1.5-pro
Free & Paid
Gemini 1.5 Pro offers long-context understanding, with a context window of up to 1 million tokens.
qwen/qwen3-32b
Free & Paid
Alibaba's hybrid model switching between thinking and fast modes, supporting 119 languages and competing with top proprietary models.
command-a-03-2025
Free & Paid
Cohere's most performant 111B model with 256K context, delivering 150% higher throughput.
claude-3-sonnet-20240307
Paid
Claude 3.5 Sonnet delivers enhanced intelligence and speed, ideal for advanced tasks.
gpt-4o-mini
Paid
OpenAI's most cost-efficient small model that’s smarter and cheaper than mostly all other paid models.
gpt-4o-mini
Free & Paid
OpenAI's most cost-efficient small model that’s smarter and cheaper than mostly all other paid models.
llama-3.1-70b-versatile
Free & Paid
Meta's Llama 3.1 70B instruction-tuned model outperforms all open-source chat models and even closed-source models and using Groq, the inference speed is 250+ Tokens per second!
gpt-4-turbo
Paid
OpenAI's GPT-4-Turbo model with 128K context, newer knowledge and more powerful than GPT-4.
gemini-1.5-flash
Free & Paid
Gemini 1.5 Flash has higher rate limits than Gemini 1.5 Pro and is Google's fastest, most cost-efficient model.
claude-3-opus-20240307
Paid
Anthropic's most intelligent model with great performance on highly complex tasks.
command-r-plus-08-2024
Free & Paid
Updated version of Command R+, with advanced RAG model with 128k context, multilingual support, and tool use capabilities.
llama3-70b-8192
Free & Paid
Meta's Llama 3 70B instruction-tuned model which outperforms almost all open-source chat models and using Groq, the inference speed is 300+ Tokens per second!
command-r-plus
Free & Paid
Advanced RAG model with 128k context, multilingual support, and tool use capabilities.
claude-3-haiku-20240307
Paid
Claude 3 Haiku is Anthropic's fastest model for near-instant responsiveness.
command-r
Free & Paid
Scalable generative model for RAG and Tool Use in enterprise applications.
meta-llama/Meta-Llama-3-8B-Instruct
Free
Meta's Llama 3 8B instruction-tuned model and outperforming many open-source chat models.
01-ai/Yi-1.5-34B-Chat
Free
01.AI's Yi-1.5 model, offering strong performance in coding, math, reasoning, and instruction-following.
mistralai/Mixtral-8x7B-Instruct-v0.1
Free
Mistral AI's 8x7B instruction-following model, version 0.1.
Please note that each model may have specific requirements or considerations for optimal performance. When selecting a model, consider factors such as:
System prompt configuration: Different models may require adjustments to the system prompt to achieve the best results. Check the documentation provided by the model's creators for tips on prompt engineering and configuration.
Open-source free LLMs have a higher chance of hallucinating compared to paid models.
Last updated
Was this helpful?