A list of free LLM inference resources accessible via API.
This lists various services that provide free access or credits towards API-based LLM usage.
[!NOTE]
Please don't abuse these services, else we might lose them.
[!WARNING]
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)
Limits:
20 requests/minute<br>50 requests/day<br>1000 requests/day with $10 lifetime topup
Models share a common quota.
Data is used for training when used outside of the UK/CH/EEA/EU.
<table><thead><tr><th>Model Name</th><th>Model Limits</th></tr></thead><tbody> <tr><td>Gemini 2.5 Pro</td><td>6,000,000 tokens/day<br>250,000 tokens/minute<br>100 requests/day<br>5 requests/minute</td></tr> <tr><td>Gemini 2.5 Flash</td><td>250,000 tokens/minute<br>250 requests/day<br>10 requests/minute</td></tr> <tr><td>Gemini 2.0 Flash</td><td>1,000,000 tokens/minute<br>200 requests/day<br>15 requests/minute</td></tr> <tr><td>Gemini 2.0 Flash-Lite</td><td>1,000,000 tokens/minute<br>200 requests/day<br>30 requests/minute</td></tr> <tr><td>Gemini 2.0 Flash (Experimental)</td><td>250,000 tokens/minute<br>50 requests/day<br>10 requests/minute</td></tr> <tr><td>Gemini 1.5 Flash</td><td>250,000 tokens/minute<br>50 requests/day<br>15 requests/minute</td></tr> <tr><td>Gemini 1.5 Flash-8B</td><td>250,000 tokens/minute<br>50 requests/day<br>15 requests/minute</td></tr> <tr><td>LearnLM 2.0 Flash (Experimental)</td><td>1,500 requests/day<br>15 requests/minute</td></tr> <tr><td>Gemma 3 27B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr> <tr><td>Gemma 3 12B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr> <tr><td>Gemma 3 4B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr> <tr><td>Gemma 3 1B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr> <tr><td>text-embedding-004</td><td rowspan="2">150 batch requests/minute<br>1,500 requests/minute<br>100 content/batch<br>Shared Quota</td></tr> <tr><td>embedding-001</td></tr> </tbody></table>Phone number verification required.
Models tend to be context window limited.
Limits: 40 requests/minute
Limits (per-model): 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month
Limits: 30 requests/minute, 2,000 requests/day
HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.
Limits: $0.10/month in credits
Routes to various supported providers.
Limits: $5/month
Free tier restricted to 8K context.
<table><thead><tr><th>Model Name</th><th>Model Limits</th></tr></thead><tbody> <tr><td>Qwen 3 32B</td><td>30 requests/minute<br>60,000 tokens/minute<br>900 requests/hour<br>1,000,000 tokens/hour<br>14,400 requests/day<br>1,000,000 tokens/day</td></tr> <tr><td>Llama 4 Scout</td><td>30 requests/minute<br>60,000 tokens/minute<br>900 requests/hour<br>1,000,000 tokens/hour<br>14,400 requests/day<br>1,000,000 tokens/day</td></tr> <tr><td>Llama 3.1 8B</td><td>30 requests/minute<br>60,000 tokens/minute<br>900 requests/hour<br>1,000,000 tokens/hour<br>14,400 requests/day<br>1,000,000 tokens/day</td></tr> <tr><td>Llama 3.3 70B</td><td>30 requests/minute<br>60,000 tokens/minute<br>900 requests/hour<br>1,000,000 tokens/hour<br>14,400 requests/day<br>1,000,000 tokens/day</td></tr> </tbody></table>Limits: Up to 60 requests/minute
Limits:
20 requests/minute<br>1,000 requests/month
Models share a common quota.
Extremely restrictive input/output token limits.
Limits: Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)
Distributed, decentralized crypto-based compute.
Data is sent to individual hosts.
Limits: 200 requests/day. Requires a one time $5 top up to access the free tier.
Limits: 10,000 neurons/day
Very stringent payment verification for Google Cloud.
<table><thead><tr><th>Model Name</th><th>Model Limits</th></tr></thead><tbody> <tr><td><a href="https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-3-2-90b-vision-instruct-maas" target="_blank">Llama 3.2 90B Vision Instruct</a></td><td>30 requests/minute<br>Free during preview</td></tr> <tr><td><a href="https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-3-1-405b-instruct-maas" target="_blank">Llama 3.1 70B Instruct</a></td><td>60 requests/minute<br>Free during preview</td></tr> <tr><td><a href="https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-3-1-405b-instruct-maas" target="_blank">Llama 3.1 8B Instruct</a></td><td>60 requests/minute<br>Free during preview</td></tr> <tr><td><a href="https://console.cloud.google.com/vertex-ai/publishers/deepseek-ai/model-garden/deepseek-r1-0528-maas" target="_blank">DeepSeek R1-0528</a></td><td>60 requests/minute<br>Free during preview</td></tr> </tbody></table>Credits: $1 when you add a payment method
Models: Various open models
Credits: $1
Models: Various open models
Credits: $30
Models: Any supported model - pay by compute time
Credits: $1
Models: Various open models
Credits: $0.5 for 1 year, $10 for 3 months for LLMs with referral code + GitHub account connection
Models: Various open models
Credits: $10 for 3 months
Models: Jamba family of models
Credits: $10 for 3 months
Models: Solar Pro/Mini
Credits: $15
Requirements: Phone number verification
Models: Various open models
Credits: 1 million tokens/model
Models: Various open and proprietary Qwen models
Credits: $5/month upon sign up, $30/month with payment method added
Models: Any supported model - pay by compute time
Credits: $1, $25 on responding to email survey
Models: Various open models
Credits: $1
Models: Various open models
Credits: $5
Models:
Credits: $1
Models:
Credits: $5 for 3 months
Models:
Credits: 1,000,000 free tokens
Models:
cheahjs/free-llm-api-resources
July 4, 2024
July 7, 2025
Python