This post will detail how to get Cloudflare AI working on LiteLLM and OpenWebUI. Cloudflare gives you 10 000 Neurons for free every day, which should be sufficient for basic tasks. The pricing page is in the table below.
This is a continuation of this post:
Important URLs
The below is for further reading, I will take you through the setup.
| Cloudflare API | https://developers.cloudflare.com/api/resources/ai/methods/run/ |
| Cloudflare Model Playground | https://playground.ai.cloudflare.com/ |
| Cloudflare Models | https://developers.cloudflare.com/workers-ai/models/ |
| Cloudflare Model Pricing (and model names) | https://developers.cloudflare.com/workers-ai/platform/pricing/ |
Cloudflare Setup
Before we can proceed you will need the following:
- Your Cloudflare Account ID.
- A Cloudflare API key with the “Workers AI Write” and “Workers AI Read” permissions.
The endpoint we are interested in is this one in the link above:
https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/$MODEL_NAME
LiteLLM Config
I am adding 2 models from Cloudflare. They are the last 2 models in the config below.
The “model_name” can be anything really, it doesn’t have to be what I have defined below. The values in the environment variables further down is what matters.
model_list:
- model_name: azure-gpt-5-mini
litellm_params:
model: os.environ/AZURE_MODEL
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: os.environ/AZURE_API_VERSION
- model_name: openai-gpt-4o
litellm_params:
model: os.environ/OPENAI_MODEL
api_key: os.environ/OPENAI_API_KEY
- model_name: deepseek-reasoner
litellm_params:
model: os.environ/DEEPSEEK_MODEL_REASONER
api_key: os.environ/DEEPSEEK_API_KEY
- model_name: deepseek-chat
litellm_params:
model: os.environ/DEEPSEEK_MODEL_CHAT
api_key: os.environ/DEEPSEEK_API_KEY
- model_name: deepseek-coder
litellm_params:
model: os.environ/DEEPSEEK_MODEL_CODER
api_key: os.environ/DEEPSEEK_API_KEY
- model_name: cloudflare/llama-3-8b-instruct
litellm_params:
model: os.environ/CLOUDFLARE_MODEL_LLAMA
api_base: os.environ/CLOUDFLARE_API_BASE
api_key: os.environ/CLOUDFLARE_API_KEY
- model_name: cloudflare/gemma-3-12b-it
litellm_params:
model: os.environ/CLOUDFLARE_MODEL_GEMA
api_base: os.environ/CLOUDFLARE_API_BASE
api_key: os.environ/CLOUDFLARE_API_KEY
Environment Variables
I’m using Portainer so I defined these environment variables on my stack for these Cloudflare models. The LiteLLM config above will use these environment variables.
CLOUDFLARE_MODEL_LLAMA=cloudflare/@cf/meta/llama-3-8b-instruct
CLOUDFLARE_MODEL_GEMA=cloudflare/@cf/google/gemma-3-12b-it
CLOUDFLARE_API_BASE=https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/run/
CLOUDFLARE_API_KEY=xxxxxxx
I am using 3-8b-instruct and gemma-3-12b because it is not too expensive in terms of neurons. I don’t want to go above the free neurons.

Track Neuron Usage on Cloudflare
Log into Cloudflare and yo can see your neuron usage in the screen below. We can see 2 models in this graphs because I configured 2 models and used 2 models.
