Recommended Models

ServerAssistantAI supports a wide range of language models from different providers, including premium and free options, allowing server owners to choose the model that best fits their needs. Here's what we recommend, listed from most recommended to least.

Embedding Models

Embedding models are used to convert text data into numerical representations called embeddings. These embeddings capture the semantic meaning and relationships between different pieces of text. When the documents/ directory is updated, the content is sent to the embedding API. The resulting embeddings are saved to the cache/ directory, allowing the AI to find relevant context efficiently without reprocessing or making new API requests for each query.

Provider

Embedding Model

Pricing

Description

Free & Paid

Google's state-of-the-art embeddings for semantic search and classification tasks.

Paid

OpenAI's large embedding model for text similarity and retrieval tasks.

Free & Paid

English embedding model great for noisy data, enabling better retrievals for RAG.

Free & Paid

Multilingual embedding model great for noisy data, enabling better retrievals for RAG.

Paid

OpenAI's small embedding model for text similarity and retrieval tasks.

ServerAssistantAI only sends information to the embedding API when changes are made to the documents/ directory. If no changes are detected, the plugin will use the previously cached embeddings to reduce API calls.

Large Language Models

Large Language Models (LLMs) are powerful AI models that can understand and generate human-like text based on the input they receive. In ServerAssistantAI, when a user asks a question, the system retrieves relevant cached context from the embedding API results. This context, along with the user's question, is sent to the LLM to generate accurate and context-aware responses.

Provider

Large Language Model

Pricing

Description

Paid

OpenAI's most advanced model. Same intelligence as GPT-4 Turbo but 2x faster and 50% cheaper.

Paid

Claude 3.5 Sonnet delivers enhanced intelligence and speed, ideal for advanced tasks.

Paid

OpenAI's most cost-efficient small model that’s smarter and cheaper than mostly all other paid models.

Free & Paid

Gemini 1.5 Pro offers long-context understanding, with a context window of up to 1 million tokens.

Free

Meta's Llama 3.2 90B text model is similar to Llama 3.1 70B but with even better performance. It offers an impressive inference speed of 300+ Tokens per second on the Groq platform!

Free

Meta's Llama 3.1 70B instruction-tuned model outperforms all open-source chat models and even closed-source models and using Groq, the inference speed is 250+ Tokens per second!

Paid

OpenAI's GPT-4-Turbo model with 128K context, newer knowledge and more powerful than GPT-4.

Free & Paid

Gemini 1.5 Flash has higher rate limits than Gemini 1.5 Pro and is Google's fastest, most cost-efficient model.

Paid

Anthropic's most intelligent model with great performance on highly complex tasks.

Free & Paid

Updated version of Command R+, with advanced RAG model with 128k context, multilingual support, and tool use capabilities.

Free

Meta's Llama 3 70B instruction-tuned model which outperforms almost all open-source chat models and using Groq, the inference speed is 300+ Tokens per second!

Free & Paid

Advanced RAG model with 128k context, multilingual support, and tool use capabilities.

Paid

Claude 3 Haiku is Anthropic's fastest model for near-instant responsiveness.

Free & Paid

Scalable generative model for RAG and Tool Use in enterprise applications.

meta-llama/Meta-Llama-3-8B-Instruct

Free

Meta's Llama 3 8B instruction-tuned model and outperforming many open-source chat models.

01-ai/Yi-1.5-34B-Chat

Free

01.AI's Yi-1.5 model, offering strong performance in coding, math, reasoning, and instruction-following.

mistralai/Mixtral-8x7B-Instruct-v0.1

Free

Mistral AI's 8x7B instruction-following model, version 0.1.

Paid

OpenAI's GPT-3.5 model with a larger context size of 16K tokens.

All Large Language Models and Embedding Models from the HuggingFace Inference API are completely free to use!

Please note that each model may have specific requirements or considerations for optimal performance. When selecting a model, consider factors such as:

System prompt configuration: Different models may require adjustments to the system prompt to achieve the best results. Check the documentation provided by the model's creators for tips on prompt engineering and configuration.
Open-source free LLMs have a higher chance of hallucinating compared to paid models.

PreviousQuestion Detection Provider Options NextFree Models

Recommended Models

Embedding Models

Provider

Embedding Model

Pricing

Description

Free & Paid

Google's state-of-the-art embeddings for semantic search and classification tasks.

Paid

OpenAI's large embedding model for text similarity and retrieval tasks.

Free & Paid

English embedding model great for noisy data, enabling better retrievals for RAG.

Free & Paid

Multilingual embedding model great for noisy data, enabling better retrievals for RAG.

Paid

OpenAI's small embedding model for text similarity and retrieval tasks.

Large Language Models

Provider

Large Language Model

Pricing

Description

Paid

OpenAI's most advanced model. Same intelligence as GPT-4 Turbo but 2x faster and 50% cheaper.

Paid

Claude 3.5 Sonnet delivers enhanced intelligence and speed, ideal for advanced tasks.

Paid

OpenAI's most cost-efficient small model that’s smarter and cheaper than mostly all other paid models.

Free & Paid

Gemini 1.5 Pro offers long-context understanding, with a context window of up to 1 million tokens.

Free

Meta's Llama 3.2 90B text model is similar to Llama 3.1 70B but with even better performance. It offers an impressive inference speed of 300+ Tokens per second on the Groq platform!

Free

Meta's Llama 3.1 70B instruction-tuned model outperforms all open-source chat models and even closed-source models and using Groq, the inference speed is 250+ Tokens per second!

Paid

OpenAI's GPT-4-Turbo model with 128K context, newer knowledge and more powerful than GPT-4.

Free & Paid

Gemini 1.5 Flash has higher rate limits than Gemini 1.5 Pro and is Google's fastest, most cost-efficient model.

Paid

Anthropic's most intelligent model with great performance on highly complex tasks.

Free & Paid

Updated version of Command R+, with advanced RAG model with 128k context, multilingual support, and tool use capabilities.

Free

Meta's Llama 3 70B instruction-tuned model which outperforms almost all open-source chat models and using Groq, the inference speed is 300+ Tokens per second!

Free & Paid

Advanced RAG model with 128k context, multilingual support, and tool use capabilities.

Paid

Claude 3 Haiku is Anthropic's fastest model for near-instant responsiveness.

Free & Paid

Scalable generative model for RAG and Tool Use in enterprise applications.

meta-llama/Meta-Llama-3-8B-Instruct

Free

Meta's Llama 3 8B instruction-tuned model and outperforming many open-source chat models.

01-ai/Yi-1.5-34B-Chat

Free

01.AI's Yi-1.5 model, offering strong performance in coding, math, reasoning, and instruction-following.

mistralai/Mixtral-8x7B-Instruct-v0.1

Free

Mistral AI's 8x7B instruction-following model, version 0.1.

Paid

OpenAI's GPT-3.5 model with a larger context size of 16K tokens.

All Large Language Models and Embedding Models from the HuggingFace Inference API are completely free to use!

Please note that each model may have specific requirements or considerations for optimal performance. When selecting a model, consider factors such as:

System prompt configuration: Different models may require adjustments to the system prompt to achieve the best results. Check the documentation provided by the model's creators for tips on prompt engineering and configuration.
Open-source free LLMs have a higher chance of hallucinating compared to paid models.

PreviousQuestion Detection Provider Options NextFree Models