This post will detail how to get Cloudflare AI working on LiteLLM and OpenWebUI. Cloudflare gives you 10 000 Neurons for free every day, which should be sufficient for basic tasks. The pricing page is in the table below.

This is a continuation of this post:


Important URLs

The below is for further reading, I will take you through the setup.

Cloudflare APIhttps://developers.cloudflare.com/api/resources/ai/methods/run/
Cloudflare Model Playgroundhttps://playground.ai.cloudflare.com/
Cloudflare Modelshttps://developers.cloudflare.com/workers-ai/models/
Cloudflare Model Pricing (and model names)https://developers.cloudflare.com/workers-ai/platform/pricing/


Cloudflare Setup

Before we can proceed you will need the following:

  • Your Cloudflare Account ID.
  • A Cloudflare API key with the “Workers AI Write” and “Workers AI Read” permissions.

The endpoint we are interested in is this one in the link above:
https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/$MODEL_NAME


LiteLLM Config

I am adding 2 models from Cloudflare. They are the last 2 models in the config below.

The “model_name” can be anything really, it doesn’t have to be what I have defined below. The values in the environment variables further down is what matters.

model_list:
  - model_name: azure-gpt-5-mini
    litellm_params:
      model: os.environ/AZURE_MODEL
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: os.environ/AZURE_API_VERSION
  - model_name: openai-gpt-4o
    litellm_params:
      model: os.environ/OPENAI_MODEL
      api_key: os.environ/OPENAI_API_KEY
  - model_name: deepseek-reasoner
    litellm_params:
      model: os.environ/DEEPSEEK_MODEL_REASONER
      api_key: os.environ/DEEPSEEK_API_KEY
  - model_name: deepseek-chat
    litellm_params:
      model: os.environ/DEEPSEEK_MODEL_CHAT
      api_key: os.environ/DEEPSEEK_API_KEY
  - model_name: deepseek-coder
    litellm_params:
      model: os.environ/DEEPSEEK_MODEL_CODER
      api_key: os.environ/DEEPSEEK_API_KEY
  - model_name: cloudflare/llama-3-8b-instruct
    litellm_params:
      model: os.environ/CLOUDFLARE_MODEL_LLAMA
      api_base: os.environ/CLOUDFLARE_API_BASE
      api_key: os.environ/CLOUDFLARE_API_KEY
  - model_name: cloudflare/gemma-3-12b-it
    litellm_params:
      model: os.environ/CLOUDFLARE_MODEL_GEMA
      api_base: os.environ/CLOUDFLARE_API_BASE
      api_key: os.environ/CLOUDFLARE_API_KEY

Environment Variables

I’m using Portainer so I defined these environment variables on my stack for these Cloudflare models. The LiteLLM config above will use these environment variables.

CLOUDFLARE_MODEL_LLAMA=cloudflare/@cf/meta/llama-3-8b-instruct
CLOUDFLARE_MODEL_GEMA=cloudflare/@cf/google/gemma-3-12b-it
CLOUDFLARE_API_BASE=https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/run/
CLOUDFLARE_API_KEY=xxxxxxx

I am using 3-8b-instruct and gemma-3-12b because it is not too expensive in terms of neurons. I don’t want to go above the free neurons.

llama 3.1-8b instruct neuron cost on cloudflare


Track Neuron Usage on Cloudflare

Log into Cloudflare and yo can see your neuron usage in the screen below. We can see 2 models in this graphs because I configured 2 models and used 2 models.

Track cloudflare AI neuron cost and usage

necrolingus

Tech enthusiast and home labber