Skip to content

[ENHANCEMENT] - Add support for bedrock service tiers #9874

@Smartsheet-JB-Brown

Description

@Smartsheet-JB-Brown

Problem (one or two sentences)

🚀 AWS Bedrock Inference Service Tiering (Flex, Standard, Priority)

AWS Bedrock introduced new inference service tiers (Flex and Priority), alongside the existing Standard tier, allowing users to optimize generative AI workloads based on the desired balance of cost and latency.

⚖️ Service Tier Characteristics

Tier Name Performance Characteristics Price Characteristics Ideal Use Cases
Priority Premium Performance and preferential processing. Can realize up to 25% better Output Tokens Per Second (OTPS) latency compared to Standard. Requests get processing priority during high demand. Premium Price (highest cost). Mission-critical applications, real-time user interactions, high-traffic chat assistants.
Standard Consistent, reliable performance at regular rates. Requests are processed with reliable service but no special priority. Standard Rate (regular cost per token). Everyday AI tasks, content generation, text analysis, routine document processing.
Flex Designed for workloads that can tolerate longer latency. Requests receive lower priority during high demand. Discounted Price (lowest cost). Non-time-critical applications like model evaluations, large-scale summarization, labeling/annotation, and asynchronous agentic workflows.

💻 Invoking a Specific Tier via the Bedrock API

Engineers specify the desired service tier by including the service_tier parameter in the JSON body of model invocation API calls (e.g., InvokeModel, Converse).

Example (using the InvokeModel operation for the PRIORITY tier):

{
    "modelId": "anthropic.claude-3-sonnet-20240229-v1:0",
    "contentType": "application/json",
    "accept": "application/json",
    "body": "{\"prompt\": \"<prompt text>\", \"max_tokens_to_sample\": 200, **\"service_tier\": \"PRIORITY\"**}"
}

* Use "service_tier": "PRIORITY" for the highest latency performance.
* Use "service_tier": "FLEX" for the lowest cost/most delayed performance.
* The Standard tier is used by default if the parameter is omitted.

## 💰 Sample Model Tiering Costs (per 1 Million Tokens)

Costs vary by model, region, and input/output type. The table below provides a **directional example** of how costs compare across tiers for illustrative purposes (Always check the official AWS Bedrock pricing page for real-time rates).

| Model (Example) | Token Type | Standard Tier (Cost per 1M Tokens) | Priority Tier (Cost per 1M Tokens) | Flex Tier (Cost per 1M Tokens) |
| :--- | :--- | :--- | :--- | :--- |
| **Anthropic Claude 3 Haiku** | Input | $0.25 | ~$0.30 - $0.35 | ~$0.20 - $0.22 |
| | Output | $1.25 | ~$1.50 - $1.75 | ~$1.00 - $1.15 |
| **Amazon Titan Text Express** | Input | $0.80 | ~$0.96 - $1.12 | ~$0.64 - $0.72 |
| | Output | $1.60 | ~$1.92 - $2.24 | ~$1.28 - $1.44 |
| **Meta Llama 3.2 Instruct (11B)** | Input | $0.35 | ~$0.42 - $0.49 | ~$0.28 - $0.32 |
| | Output | $0.35 | ~$0.42 - $0.49 | ~$0.28 - $0.32 |

### Context (who is affected and when)

Amazon Bedrock users don't currently have the option of choosing a service tier, and as a result may be paying more than needed, or getting slower responses than they desire and don't have the option of choosing a more performant tier.

### Desired behavior (conceptual, not technical)

Roo code's bedrock provider for chat should support the selection of a service tier in the chat settings page when a bedrock model supports service tiers, and then properly use/apply that service tier in calls to bedrock and use that service tier's specific pricing in the Chat's token usage and pricing display.


### Constraints / preferences (optional)

_No response_

### Request checklist

- [x] I've searched existing Issues and Discussions for duplicates
- [x] This describes a specific problem with clear context and impact

### Roo Code Task Links (optional)

_No response_

### Acceptance criteria (optional)

_No response_

### Proposed approach (optional)

_No response_

### Trade-offs / risks (optional)

_No response_

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue/PR - TriageNew issue. Needs quick review to confirm validity and assign labels.enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions