Recommended Models

Free & Paid

Google's state-of-the-art embeddings for semantic search and classification tasks.

Paid

OpenAI's large embedding model for text similarity and retrieval tasks.

Free & Paid

English embedding model great for noisy data, enabling better retrievals for RAG.

Free & Paid

Multilingual embedding model great for noisy data, enabling better retrievals for RAG.

Paid

OpenAI's small embedding model for text similarity and retrieval tasks.

ServerAssistantAI only sends information to the embedding API when changes are made to the documents/ directory. If no changes are detected, the plugin will use the previously cached embeddings to reduce API calls.

Large Language Models

Large Language Models (LLMs) are powerful AI models that can understand and generate human-like text based on the input they receive. In ServerAssistantAI, when a user asks a question, the system retrieves relevant cached context from the embedding API results. This context, along with the user's question, is sent to the LLM to generate accurate and context-aware responses.

Provider

Large Language Model

Pricing

Description

Paid

OpenAI's most advanced model. Same intelligence as GPT-4 Turbo but 2x faster and 50% cheaper.

Free & Paid

OpenAI's most advanced model. Same intelligence as GPT-4 Turbo but 2x faster and 50% cheaper.

Paid

Anthropic's upgraded model with enhanced agentic tasks, real-world coding, and reasoning capabilities.

Paid

Anthropic's most intelligent model with hybrid reasoning and 200K context window, excelling at coding and working autonomously for hours.

Free & Paid

Google's fast thinking model optimized for speed and cost-efficiency with controllable thinking budgets for high-volume tasks.

Paid

OpenAI's enhanced model with major coding improvements and 1 million token context window.

Free & Paid

OpenAI's enhanced model with major coding improvements and 1 million token context window.

Free & Paid

DeepSeek's efficient 70B reasoning model distilled from larger models.

Paid

Anthropic's high-performance model with 200K context window, achieving 72.7% on SWE-bench with superior instruction following.

Paid

OpenAI's efficient model competitive with GPT-4o but 50% faster and 83% cheaper with 1 million token context window.

Free & Paid

OpenAI's efficient model competitive with GPT-4o but 50% faster and 83% cheaper with 1 million token context window.

Free & Paid

Google's next-generation flash model with native tool use, superior speed, and 1M token context window for multimodal applications.

Free

Google's open-source 27B parameter model optimized for instruction following and conversational AI with efficient performance.

Free & Paid

Gemini 1.5 Pro offers long-context understanding, with a context window of up to 1 million tokens.

Free & Paid

Alibaba's hybrid model switching between thinking and fast modes, supporting 119 languages and competing with top proprietary models.

Free & Paid

Cohere's most performant 111B model with 256K context, delivering 150% higher throughput.

Paid

Claude 3.5 Sonnet delivers enhanced intelligence and speed, ideal for advanced tasks.

Paid

OpenAI's most cost-efficient small model that’s smarter and cheaper than mostly all other paid models.

Free & Paid

OpenAI's most cost-efficient small model that’s smarter and cheaper than mostly all other paid models.

Free & Paid

Meta's Llama 3.1 70B instruction-tuned model outperforms all open-source chat models and even closed-source models and using Groq, the inference speed is 250+ Tokens per second!

Paid

OpenAI's GPT-4-Turbo model with 128K context, newer knowledge and more powerful than GPT-4.

Free & Paid

Gemini 1.5 Flash has higher rate limits than Gemini 1.5 Pro and is Google's fastest, most cost-efficient model.

Paid

Anthropic's most intelligent model with great performance on highly complex tasks.

Free & Paid

Updated version of Command R+, with advanced RAG model with 128k context, multilingual support, and tool use capabilities.

Free & Paid

Meta's Llama 3 70B instruction-tuned model which outperforms almost all open-source chat models and using Groq, the inference speed is 300+ Tokens per second!

Free & Paid

Advanced RAG model with 128k context, multilingual support, and tool use capabilities.

Paid

Claude 3 Haiku is Anthropic's fastest model for near-instant responsiveness.

Free & Paid

Scalable generative model for RAG and Tool Use in enterprise applications.

HuggingFace

meta-llama/Meta-Llama-3-8B-Instruct

Free

Meta's Llama 3 8B instruction-tuned model and outperforming many open-source chat models.

HuggingFace

01-ai/Yi-1.5-34B-Chat

Free

01.AI's Yi-1.5 model, offering strong performance in coding, math, reasoning, and instruction-following.

HuggingFace

mistralai/Mixtral-8x7B-Instruct-v0.1

Free

Mistral AI's 8x7B instruction-following model, version 0.1.