synthetic

    Synthetic can use either subscription or usage-based pricing. Choose the plan that works best for you.

    Standard

    $20/month
    Perfect for individuals just starting out.
    ✓Access to all always-on models
    ✓Both UI and API access
    ✓Cancel anytime
    ✓Standard rate limits: 135 requests every five hours
    ✓3x higher rate limits than Claude's $20/month plan

    Pro

    $60/month
    For professionals and avid LLM users.
    ✓Access to all always-on models
    ✓Both UI and API access
    ✓Cancel anytime
    ✓10x higher rate limits: 1,350 requests every five hours
    ✓6x higher rate limits than Claude's $100/month plan
    ✓50% higher rate limits than Claude's $200/month plan

    Usage-based

    For enterprise users and custom models.
    ✓UI and API access
    ✓Always-on models are pay-per-token
    ✓On-demand models are pay-per-minute

    Perfect for individuals just starting out.

    ✓Access to all always-on models
    ✓Both UI and API access
    ✓Cancel anytime
    ✓Standard rate limits: 135 requests every five hours
    ✓3x higher rate limits than Claude's $20/month plan

    For professionals and avid LLM users.

    ✓Access to all always-on models
    ✓Both UI and API access
    ✓Cancel anytime
    ✓10x higher rate limits: 1,350 requests every five hours
    ✓6x higher rate limits than Claude's $100/month plan
    ✓50% higher rate limits than Claude's $200/month plan

    For enterprise users and custom models.

    ✓Pay for what you use
    ✓Both UI and API access
    ✓Always-on models are pay-per-token
    ✓On-demand models are pay-per-minute
    Subscriptions last longer!

    Small requests with less than 2048 input tokens and less than 2048 output tokens only count as 0.2 requests. And tool call messages count for even less: only 0.1 requests (up to 12x your five hour limit)

    Sign upLog in

    Always-on models

    All always-on models are included in your subscription. There's no additional charge for using any of these models.

    All-inclusive pricing

    With your subscription, all always-on models are included for one flat monthly price. No per-token billing—just simple, predictable pricing.

    Switch to "Pay per Use" to see token-based pricing for when you don't need a subscription.

    Here's the list of all always-on models included in your subscription:

    ModelContext lengthStatus
    deepseek-ai/DeepSeek-R1-0528128k tokens✓ Included
    deepseek-ai/DeepSeek-V3128k tokens✓ Included
    deepseek-ai/DeepSeek-V3-0324128k tokens✓ Included
    deepseek-ai/DeepSeek-V3.2159k tokens✓ Included
    meta-llama/Llama-3.3-70B-Instruct128k tokens✓ Included
    MiniMaxAI/MiniMax-M2.1192k tokens✓ Included
    moonshotai/Kimi-K2-Instruct-0905256k tokens✓ Included
    moonshotai/Kimi-K2-Thinking256k tokens✓ Included
    moonshotai/Kimi-K2.5256k tokens✓ Included
    nvidia/Kimi-K2.5-NVFP4256k tokens✓ Included
    openai/gpt-oss-120b128k tokens✓ Included
    Qwen/Qwen3-235B-A22B-Thinking-2507256k tokens✓ Included
    Qwen/Qwen3-Coder-480B-A35B-Instruct256k tokens✓ Included
    zai-org/GLM-4.7198k tokens✓ Included

    LoRA models

    What's a LoRA?

    Low-rank adapters — called "LoRAs" — are small, efficient fine-tunes that run on top of existing models. They can modify a model to be much more effective at specific tasks.

    All LoRAs for the following base models are included in your subscription:

    ModelContext lengthStatus
    meta-llama/Llama-3.2-1B-Instruct128k tokens✓ Included
    meta-llama/Llama-3.2-3B-Instruct128k tokens✓ Included
    meta-llama/Meta-Llama-3.1-8B-Instruct128k tokens✓ Included
    meta-llama/Meta-Llama-3.1-70B-Instruct128k tokens✓ Included
    LoRA sizes are measured in "ranks," starting at rank-8; we support up to rank-64 LoRAs kept always-on, and we run them in FP8 precision. The rank is set during the finetuning process: if you create your own LoRA, you'll be able to set exactly what rank you want using standard configuration for your training framework.

    Embedding models

    What are embeddings?

    Embedding models convert text to special numerical coordinates, placing more-similar text closer to each other and less-similar text more distant: these coordinates are referred to as "embeddings". Embedding models are often used by AI-enabled tools for tasks like codebase indexing or search.

    The following embedding models are included in your subscription. There's no additional charge for using embeddings, and embeddings requests don't count against your subscription rate limit.

    ModelContext lengthStatus
    nomic-ai/nomic-embed-text-v1.58k tokens✓ Included
    Since embedding models aren't full LLMs and can't be used for chat — only for creating embedding coordinates — these models are only accessible via the API.

    Roo & KiloCode setup

    In the Codebase Indexing setup, select the "OpenAI Compatible" embedding provider and paste in your API key. Set "Model Dimension" to the embedding model's default dimension: in the case of nomic-ai/nomic-embed-text-v1.5, use 768.

    Make sure to copy the model string with an "hf:" prefix (short for "Hugging Face", the open-source code repository where these models are stored); for example, hf:nomic-ai/nomic-embed-text-v1.5

    Your configuration should look roughly like so:

    Screenshot
    Instructions for integrating with KiloCode and Roo Code
    Sign upLog in