Custom LLM Pricing
Overviewโ
LiteLLM provides flexible cost tracking and pricing customization for all LLM providers:
- Custom Pricing - Override default model costs or set pricing for custom models
- Cost Per Token - Track costs based on input/output tokens (most common)
- Cost Per Second - Track costs based on runtime (e.g., Sagemaker)
- Provider Discounts - Apply percentage-based discounts to specific providers
- Base Model Mapping - Ensure accurate cost tracking for Azure deployments
By default, the response cost is accessible in the logging object via kwargs["response_cost"] on success (sync + async). Learn More
LiteLLM already has pricing for 100+ models in our model cost map.
Cost Per Second (e.g. Sagemaker)โ
Usage with LiteLLM Proxy Serverโ
Step 1: Add pricing to config.yaml
model_list:
  - model_name: sagemaker-completion-model
    litellm_params:
      model: sagemaker/berri-benchmarking-Llama-2-70b-chat-hf-4
    model_info:
      input_cost_per_second: 0.000420
  - model_name: sagemaker-embedding-model
    litellm_params:
      model: sagemaker/berri-benchmarking-gpt-j-6b-fp16
    model_info:
      input_cost_per_second: 0.000420 
Step 2: Start proxy
litellm /path/to/config.yaml
Step 3: View Spend Logs
Cost Per Token (e.g. Azure)โ
Usage with LiteLLM Proxy Serverโ
model_list:
  - model_name: azure-model
    litellm_params:
      model: azure/<your_deployment_name>
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE
      api_version: os.environ/AZURE_API_VERSION
    model_info:
      input_cost_per_token: 0.000421 # ๐ ONLY to track cost per token
      output_cost_per_token: 0.000520 # ๐ ONLY to track cost per token
Provider-Specific Cost Discountsโ
Apply percentage-based discounts to specific providers (e.g., negotiated enterprise pricing).
Usage with LiteLLM Proxy Serverโ
Step 1: Add discount config to config.yaml
# Apply 5% discount to all Vertex AI and Gemini costs
cost_discount_config:
  vertex_ai: 0.05  # 5% discount
  gemini: 0.05     # 5% discount
  openrouter: 0.05 # 5% discount
  # openai: 0.10   # 10% discount (example)
Step 2: Start proxy
litellm /path/to/config.yaml
The discount will be automatically applied to all cost calculations for the configured providers.
How Discounts Workโ
- Discounts are applied after all other cost calculations (tokens, caching, tools, etc.)
- The discount is a percentage (0.05 = 5%, 0.10 = 10%, etc.)
- Discounts only apply to the configured providers
- Original cost, discount amount, and final cost are tracked in cost breakdown logs
- Discount information is returned in response headers:
- x-litellm-response-cost- Final cost after discount
- x-litellm-response-cost-original- Cost before discount
- x-litellm-response-cost-discount-amount- Discount amount in USD
 
Supported Providersโ
You can apply discounts to all LiteLLM supported providers. Common examples:
- vertex_ai- Google Vertex AI
- gemini- Google Gemini
- openai- OpenAI
- anthropic- Anthropic
- azure- Azure OpenAI
- bedrock- AWS Bedrock
- cohere- Cohere
- openrouter- OpenRouter
See the full list of providers in the LlmProviders enum.
Override Model Cost Mapโ
You can override our model cost map with your own custom pricing for a mapped model.
Just add a model_info key to your model in the config, and override the desired keys.
Example: Override Anthropic's model cost map for the prod/claude-3-5-sonnet-20241022 model.
model_list:
  - model_name: "prod/claude-3-5-sonnet-20241022"
    litellm_params:
      model: "anthropic/claude-3-5-sonnet-20241022"
      api_key: os.environ/ANTHROPIC_PROD_API_KEY
    model_info:
      input_cost_per_token: 0.000006
      output_cost_per_token: 0.00003
      cache_creation_input_token_cost: 0.0000075
      cache_read_input_token_cost: 0.0000006
Additional Cost Keysโ
There are other keys you can use to specify costs for different scenarios and modalities:
- input_cost_per_token_above_200k_tokens- Cost for input tokens when context exceeds 200k tokens
- output_cost_per_token_above_200k_tokens- Cost for output tokens when context exceeds 200k tokens
- cache_creation_input_token_cost_above_200k_tokens- Cache creation cost for large contexts
- cache_read_input_token_cost_above_200k_token- Cache read cost for large contexts
- input_cost_per_image- Cost per image in multimodal requests
- output_cost_per_reasoning_token- Cost for reasoning tokens (e.g., OpenAI o1 models)
- input_cost_per_audio_token- Cost for audio input tokens
- output_cost_per_audio_token- Cost for audio output tokens
- input_cost_per_video_per_second- Cost per second of video input
- input_cost_per_video_per_second_above_128k_tokens- Video cost for large contexts
- input_cost_per_character- Character-based pricing for some providers
These keys evolve based on how new models handle multimodality. The latest version can be found at https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.
Set 'base_model' for Cost Tracking (e.g. Azure deployments)โ
Problem: Azure returns gpt-4 in the response when azure/gpt-4-1106-preview is used. This leads to inaccurate cost tracking
Solution โ
 :  Set base_model on your config so litellm uses the correct model for calculating azure cost
Get the base model name from here
Example config with base_model
model_list:
  - model_name: azure-gpt-3.5
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
    model_info:
      base_model: azure/gpt-4-1106-preview
Debuggingโ
If you're custom pricing is not being used or you're seeing errors, please check the following:
- Run the proxy with LITELLM_LOG="DEBUG"or the--detailed_debugcli flag
litellm --config /path/to/config.yaml --detailed_debug
- Check logs for this line:
LiteLLM:DEBUG: utils.py:263 - litellm.acompletion
- Check if 'input_cost_per_token' and 'output_cost_per_token' are top-level keys in the acompletion function.
acompletion(
  ...,
  input_cost_per_token: my-custom-price, 
  output_cost_per_token: my-custom-price,
)
If these keys are not present, LiteLLM will not use your custom pricing.
If the problem persists, please file an issue on GitHub.