free-llm-api-resources

A list of free LLM inference resources accessible via API.

4,445
360
<!--- WARNING: DO NOT EDIT THIS FILE DIRECTLY. IT IS GENERATED BY src/pull_available_models.py --->

Free LLM API resources

This lists various services that provide free access or credits towards API-based LLM usage.

[!NOTE]
Please don't abuse these services, else we might lose them.

[!WARNING]
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)

Free Providers

OpenRouter

Limits:

20 requests/minute<br>50 requests/day<br>1000 requests/day with $10 lifetime topup

Models share a common quota.

Google AI Studio

Data is used for training when used outside of the UK/CH/EEA/EU.

<table><thead><tr><th>Model Name</th><th>Model Limits</th></tr></thead><tbody> <tr><td>Gemini 2.5 Pro</td><td>6,000,000 tokens/day<br>250,000 tokens/minute<br>100 requests/day<br>5 requests/minute</td></tr> <tr><td>Gemini 2.5 Flash</td><td>250,000 tokens/minute<br>250 requests/day<br>10 requests/minute</td></tr> <tr><td>Gemini 2.0 Flash</td><td>1,000,000 tokens/minute<br>200 requests/day<br>15 requests/minute</td></tr> <tr><td>Gemini 2.0 Flash-Lite</td><td>1,000,000 tokens/minute<br>200 requests/day<br>30 requests/minute</td></tr> <tr><td>Gemini 2.0 Flash (Experimental)</td><td>250,000 tokens/minute<br>50 requests/day<br>10 requests/minute</td></tr> <tr><td>Gemini 1.5 Flash</td><td>250,000 tokens/minute<br>50 requests/day<br>15 requests/minute</td></tr> <tr><td>Gemini 1.5 Flash-8B</td><td>250,000 tokens/minute<br>50 requests/day<br>15 requests/minute</td></tr> <tr><td>LearnLM 2.0 Flash (Experimental)</td><td>1,500 requests/day<br>15 requests/minute</td></tr> <tr><td>Gemma 3 27B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr> <tr><td>Gemma 3 12B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr> <tr><td>Gemma 3 4B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr> <tr><td>Gemma 3 1B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr> <tr><td>text-embedding-004</td><td rowspan="2">150 batch requests/minute<br>1,500 requests/minute<br>100 content/batch<br>Shared Quota</td></tr> <tr><td>embedding-001</td></tr> </tbody></table>

NVIDIA NIM

Phone number verification required.
Models tend to be context window limited.

Limits: 40 requests/minute

Mistral (La Plateforme)

  • Free tier (Experiment plan) requires opting into data training
  • Requires phone number verification.

Limits (per-model): 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month

Mistral (Codestral)

  • Currently free to use
  • Monthly subscription based
  • Requires phone number verification

Limits: 30 requests/minute, 2,000 requests/day

  • Codestral

HuggingFace Inference Providers

HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.

Limits: $0.10/month in credits

  • Various open models across supported providers

Vercel AI Gateway

Routes to various supported providers.

Limits: $5/month

Cerebras

Free tier restricted to 8K context.

<table><thead><tr><th>Model Name</th><th>Model Limits</th></tr></thead><tbody> <tr><td>Qwen 3 32B</td><td>30 requests/minute<br>60,000 tokens/minute<br>900 requests/hour<br>1,000,000 tokens/hour<br>14,400 requests/day<br>1,000,000 tokens/day</td></tr> <tr><td>Llama 4 Scout</td><td>30 requests/minute<br>60,000 tokens/minute<br>900 requests/hour<br>1,000,000 tokens/hour<br>14,400 requests/day<br>1,000,000 tokens/day</td></tr> <tr><td>Llama 3.1 8B</td><td>30 requests/minute<br>60,000 tokens/minute<br>900 requests/hour<br>1,000,000 tokens/hour<br>14,400 requests/day<br>1,000,000 tokens/day</td></tr> <tr><td>Llama 3.3 70B</td><td>30 requests/minute<br>60,000 tokens/minute<br>900 requests/hour<br>1,000,000 tokens/hour<br>14,400 requests/day<br>1,000,000 tokens/day</td></tr> </tbody></table>

Groq

<table><thead><tr><th>Model Name</th><th>Model Limits</th></tr></thead><tbody> <tr><td>Allam 2 7B</td><td>7,000 requests/day<br>6,000 tokens/minute</td></tr> <tr><td>DeepSeek R1 Distill Llama 70B</td><td>1,000 requests/day<br>6,000 tokens/minute</td></tr> <tr><td>Distil Whisper Large v3</td><td>7,200 audio-seconds/minute<br>2,000 requests/day</td></tr> <tr><td>Gemma 2 9B Instruct</td><td>14,400 requests/day<br>15,000 tokens/minute</td></tr> <tr><td>Groq compound-beta</td><td>200 requests/day<br>70,000 tokens/minute</td></tr> <tr><td>Groq compound-beta-mini</td><td>200 requests/day<br>70,000 tokens/minute</td></tr> <tr><td>Llama 3 70B</td><td>14,400 requests/day<br>6,000 tokens/minute</td></tr> <tr><td>Llama 3 8B</td><td>14,400 requests/day<br>6,000 tokens/minute</td></tr> <tr><td>Llama 3.1 8B</td><td>14,400 requests/day<br>6,000 tokens/minute</td></tr> <tr><td>Llama 3.3 70B</td><td>1,000 requests/day<br>12,000 tokens/minute</td></tr> <tr><td>Llama 4 Maverick 17B 128E Instruct</td><td>1,000 requests/day<br>6,000 tokens/minute</td></tr> <tr><td>Llama 4 Scout Instruct</td><td>1,000 requests/day<br>30,000 tokens/minute</td></tr> <tr><td>Mistral Saba 24B</td><td>1,000 requests/day<br>6,000 tokens/minute</td></tr> <tr><td>Qwen QwQ 32B</td><td>1,000 requests/day<br>6,000 tokens/minute</td></tr> <tr><td>Whisper Large v3</td><td>7,200 audio-seconds/minute<br>2,000 requests/day</td></tr> <tr><td>Whisper Large v3 Turbo</td><td>7,200 audio-seconds/minute<br>2,000 requests/day</td></tr> <tr><td>meta-llama/llama-guard-4-12b</td><td>14,400 requests/day<br>15,000 tokens/minute</td></tr> <tr><td>meta-llama/llama-prompt-guard-2-22m</td><td></td></tr> <tr><td>meta-llama/llama-prompt-guard-2-86m</td><td></td></tr> <tr><td>qwen/qwen3-32b</td><td>1,000 requests/day<br>6,000 tokens/minute</td></tr> </tbody></table>

Together (Free)

Limits: Up to 60 requests/minute

Cohere

Limits:

20 requests/minute<br>1,000 requests/month

Models share a common quota.

  • Command-A
  • Command-R7B
  • Command-R+
  • Command-R
  • Aya Expanse 8B
  • Aya Expanse 32B
  • Aya Vision 8B
  • Aya Vision 32B

GitHub Models

Extremely restrictive input/output token limits.

Limits: Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)

  • AI21 Jamba 1.5 Large
  • AI21 Jamba 1.5 Mini
  • Codestral 25.01
  • Cohere Command A
  • Cohere Command R 08-2024
  • Cohere Command R+ 08-2024
  • Cohere Embed v3 English
  • Cohere Embed v3 Multilingual
  • DeepSeek-R1
  • DeepSeek-R1-0528
  • DeepSeek-V3-0324
  • Grok 3
  • Grok 3 Mini
  • JAIS 30b Chat
  • Llama 4 Maverick 17B 128E Instruct FP8
  • Llama 4 Scout 17B 16E Instruct
  • Llama-3.2-11B-Vision-Instruct
  • Llama-3.2-90B-Vision-Instruct
  • Llama-3.3-70B-Instruct
  • MAI-DS-R1
  • Meta-Llama-3.1-405B-Instruct
  • Meta-Llama-3.1-8B-Instruct
  • Ministral 3B
  • Mistral Large 24.11
  • Mistral Medium 3 (25.05)
  • Mistral Nemo
  • Mistral Small 3.1
  • OpenAI GPT-4.1
  • OpenAI GPT-4.1-mini
  • OpenAI GPT-4.1-nano
  • OpenAI GPT-4o
  • OpenAI GPT-4o mini
  • OpenAI Text Embedding 3 (large)
  • OpenAI Text Embedding 3 (small)
  • OpenAI o1
  • OpenAI o1-mini
  • OpenAI o1-preview
  • OpenAI o3
  • OpenAI o3-mini
  • OpenAI o4-mini
  • Phi-3-medium instruct (128k)
  • Phi-3-medium instruct (4k)
  • Phi-3-mini instruct (128k)
  • Phi-3-mini instruct (4k)
  • Phi-3-small instruct (128k)
  • Phi-3-small instruct (8k)
  • Phi-3.5-MoE instruct (128k)
  • Phi-3.5-mini instruct (128k)
  • Phi-3.5-vision instruct (128k)
  • Phi-4
  • Phi-4-Reasoning
  • Phi-4-mini-instruct
  • Phi-4-mini-reasoning
  • Phi-4-multimodal-instruct

Chutes

Distributed, decentralized crypto-based compute.
Data is sent to individual hosts.
Limits: 200 requests/day. Requires a one time $5 top up to access the free tier.

  • Various open models

Cloudflare Workers AI

Limits: 10,000 neurons/day

  • DeepSeek R1 Distill Qwen 32B
  • Deepseek Coder 6.7B Base (AWQ)
  • Deepseek Coder 6.7B Instruct (AWQ)
  • Deepseek Math 7B Instruct
  • Discolm German 7B v1 (AWQ)
  • Falcom 7B Instruct
  • Gemma 2B Instruct (LoRA)
  • Gemma 3 12B Instruct
  • Gemma 7B Instruct
  • Gemma 7B Instruct (LoRA)
  • Hermes 2 Pro Mistral 7B
  • Llama 2 13B Chat (AWQ)
  • Llama 2 7B Chat (FP16)
  • Llama 2 7B Chat (INT8)
  • Llama 2 7B Chat (LoRA)
  • Llama 3 8B Instruct
  • Llama 3 8B Instruct
  • Llama 3 8B Instruct (AWQ)
  • Llama 3.1 8B Instruct (AWQ)
  • Llama 3.1 8B Instruct (FP8)
  • Llama 3.2 11B Vision Instruct
  • Llama 3.2 1B Instruct
  • Llama 3.2 3B Instruct
  • Llama 3.3 70B Instruct (FP8)
  • Llama 4 Scout Instruct
  • Llama Guard 3 8B
  • LlamaGuard 7B (AWQ)
  • Mistral 7B Instruct v0.1
  • Mistral 7B Instruct v0.1 (AWQ)
  • Mistral 7B Instruct v0.2
  • Mistral 7B Instruct v0.2 (LoRA)
  • Mistral Small 3.1 24B Instruct
  • Neural Chat 7B v3.1 (AWQ)
  • OpenChat 3.5 0106
  • OpenHermes 2.5 Mistral 7B (AWQ)
  • Phi-2
  • Qwen 1.5 0.5B Chat
  • Qwen 1.5 1.8B Chat
  • Qwen 1.5 14B Chat (AWQ)
  • Qwen 1.5 7B Chat (AWQ)
  • Qwen 2.5 Coder 32B Instruct
  • Qwen QwQ 32B
  • SQLCoder 7B 2
  • Starling LM 7B Beta
  • TinyLlama 1.1B Chat v1.0
  • Una Cybertron 7B v2 (BF16)
  • Zephyr 7B Beta (AWQ)

Google Cloud Vertex AI

Very stringent payment verification for Google Cloud.

<table><thead><tr><th>Model Name</th><th>Model Limits</th></tr></thead><tbody> <tr><td><a href="https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-3-2-90b-vision-instruct-maas" target="_blank">Llama 3.2 90B Vision Instruct</a></td><td>30 requests/minute<br>Free during preview</td></tr> <tr><td><a href="https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-3-1-405b-instruct-maas" target="_blank">Llama 3.1 70B Instruct</a></td><td>60 requests/minute<br>Free during preview</td></tr> <tr><td><a href="https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-3-1-405b-instruct-maas" target="_blank">Llama 3.1 8B Instruct</a></td><td>60 requests/minute<br>Free during preview</td></tr> <tr><td><a href="https://console.cloud.google.com/vertex-ai/publishers/deepseek-ai/model-garden/deepseek-r1-0528-maas" target="_blank">DeepSeek R1-0528</a></td><td>60 requests/minute<br>Free during preview</td></tr> </tbody></table>

Providers with trial credits

Together

Credits: $1 when you add a payment method

Models: Various open models

Fireworks

Credits: $1

Models: Various open models

Baseten

Credits: $30

Models: Any supported model - pay by compute time

Nebius

Credits: $1

Models: Various open models

Novita

Credits: $0.5 for 1 year, $10 for 3 months for LLMs with referral code + GitHub account connection

Models: Various open models

AI21

Credits: $10 for 3 months

Models: Jamba family of models

Upstage

Credits: $10 for 3 months

Models: Solar Pro/Mini

NLP Cloud

Credits: $15

Requirements: Phone number verification

Models: Various open models

Alibaba Cloud (International) Model Studio

Credits: 1 million tokens/model

Models: Various open and proprietary Qwen models

Modal

Credits: $5/month upon sign up, $30/month with payment method added

Models: Any supported model - pay by compute time

Inference.net

Credits: $1, $25 on responding to email survey

Models: Various open models

nCompass

Credits: $1

Models: Various open models

Kluster

Credits: $5

Models:

  • BGE-M3
  • DeepSeek-R1-0528
  • DeepSeek-V3-0324
  • Gemma 3 27B
  • Magistral Small
  • Meta Llama 3.1 8B
  • Meta Llama 3.3 70B
  • Meta Llama 4 Maverick
  • Meta Llama 4 Scout
  • Mistral NeMo
  • Mistral Small
  • Qwen2.5-VL 7B
  • Qwen3-235B-A22B
  • kluster reliability check

Hyperbolic

Credits: $1

Models:

  • DeepSeek V3
  • DeepSeek V3 0324
  • Hermes 3 Llama 3.1 70B
  • Llama 3 70B Instruct
  • Llama 3.1 405B Base
  • Llama 3.1 405B Base (FP8)
  • Llama 3.1 405B Instruct
  • Llama 3.1 70B Instruct
  • Llama 3.1 8B Instruct
  • Llama 3.2 3B Instruct
  • Llama 3.3 70B Instruct
  • Pixtral 12B (2409)
  • Qwen QwQ 32B
  • Qwen QwQ 32B Preview
  • Qwen2.5 72B Instruct
  • Qwen2.5 Coder 32B Instruct
  • Qwen2.5 VL 72B Instruct
  • Qwen2.5 VL 7B Instruct

SambaNova Cloud

Credits: $5 for 3 months

Models:

  • E5-Mistral-7B-Instruct
  • Llama 3.1 8B
  • Llama 3.3 70B
  • Llama-4-Maverick-17B-128E-Instruct
  • Qwen/Qwen3-32B
  • Whisper-Large-v3
  • deepseek-ai/DeepSeek-R1
  • deepseek-ai/DeepSeek-R1-0528
  • deepseek-ai/DeepSeek-R1-Distill-Llama-70B
  • deepseek-ai/DeepSeek-V3-0324

Scaleway Generative APIs

Credits: 1,000,000 free tokens

Models:

  • BGE-Multilingual-Gemma2
  • DeepSeek R1 Distill Llama 70B
  • Gemma 3 27B Instruct
  • Llama 3.1 70B Instruct
  • Llama 3.1 8B Instruct
  • Llama 3.3 70B Instruct
  • Mistral Nemo 2407
  • Mistral Small 3.1 24B Instruct 2503
  • Pixtral 12B (2409)
  • Qwen2.5 Coder 32B Instruct

Repository

CH
cheahjs

cheahjs/free-llm-api-resources

Created

July 4, 2024

Updated

July 7, 2025

Language

Python

Category

AI