-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Labels
Issue/PR - TriageNew issue. Needs quick review to confirm validity and assign labels.New issue. Needs quick review to confirm validity and assign labels.enhancementNew feature or requestNew feature or request
Description
Problem (one or two sentences)
🚀 AWS Bedrock Inference Service Tiering (Flex, Standard, Priority)
AWS Bedrock introduced new inference service tiers (Flex and Priority), alongside the existing Standard tier, allowing users to optimize generative AI workloads based on the desired balance of cost and latency.
⚖️ Service Tier Characteristics
| Tier Name | Performance Characteristics | Price Characteristics | Ideal Use Cases |
|---|---|---|---|
| Priority | Premium Performance and preferential processing. Can realize up to 25% better Output Tokens Per Second (OTPS) latency compared to Standard. Requests get processing priority during high demand. | Premium Price (highest cost). | Mission-critical applications, real-time user interactions, high-traffic chat assistants. |
| Standard | Consistent, reliable performance at regular rates. Requests are processed with reliable service but no special priority. | Standard Rate (regular cost per token). | Everyday AI tasks, content generation, text analysis, routine document processing. |
| Flex | Designed for workloads that can tolerate longer latency. Requests receive lower priority during high demand. | Discounted Price (lowest cost). | Non-time-critical applications like model evaluations, large-scale summarization, labeling/annotation, and asynchronous agentic workflows. |
💻 Invoking a Specific Tier via the Bedrock API
Engineers specify the desired service tier by including the service_tier parameter in the JSON body of model invocation API calls (e.g., InvokeModel, Converse).
Example (using the InvokeModel operation for the PRIORITY tier):
{
"modelId": "anthropic.claude-3-sonnet-20240229-v1:0",
"contentType": "application/json",
"accept": "application/json",
"body": "{\"prompt\": \"<prompt text>\", \"max_tokens_to_sample\": 200, **\"service_tier\": \"PRIORITY\"**}"
}
* Use "service_tier": "PRIORITY" for the highest latency performance.
* Use "service_tier": "FLEX" for the lowest cost/most delayed performance.
* The Standard tier is used by default if the parameter is omitted.
## 💰 Sample Model Tiering Costs (per 1 Million Tokens)
Costs vary by model, region, and input/output type. The table below provides a **directional example** of how costs compare across tiers for illustrative purposes (Always check the official AWS Bedrock pricing page for real-time rates).
| Model (Example) | Token Type | Standard Tier (Cost per 1M Tokens) | Priority Tier (Cost per 1M Tokens) | Flex Tier (Cost per 1M Tokens) |
| :--- | :--- | :--- | :--- | :--- |
| **Anthropic Claude 3 Haiku** | Input | $0.25 | ~$0.30 - $0.35 | ~$0.20 - $0.22 |
| | Output | $1.25 | ~$1.50 - $1.75 | ~$1.00 - $1.15 |
| **Amazon Titan Text Express** | Input | $0.80 | ~$0.96 - $1.12 | ~$0.64 - $0.72 |
| | Output | $1.60 | ~$1.92 - $2.24 | ~$1.28 - $1.44 |
| **Meta Llama 3.2 Instruct (11B)** | Input | $0.35 | ~$0.42 - $0.49 | ~$0.28 - $0.32 |
| | Output | $0.35 | ~$0.42 - $0.49 | ~$0.28 - $0.32 |
### Context (who is affected and when)
Amazon Bedrock users don't currently have the option of choosing a service tier, and as a result may be paying more than needed, or getting slower responses than they desire and don't have the option of choosing a more performant tier.
### Desired behavior (conceptual, not technical)
Roo code's bedrock provider for chat should support the selection of a service tier in the chat settings page when a bedrock model supports service tiers, and then properly use/apply that service tier in calls to bedrock and use that service tier's specific pricing in the Chat's token usage and pricing display.
### Constraints / preferences (optional)
_No response_
### Request checklist
- [x] I've searched existing Issues and Discussions for duplicates
- [x] This describes a specific problem with clear context and impact
### Roo Code Task Links (optional)
_No response_
### Acceptance criteria (optional)
_No response_
### Proposed approach (optional)
_No response_
### Trade-offs / risks (optional)
_No response_Metadata
Metadata
Assignees
Labels
Issue/PR - TriageNew issue. Needs quick review to confirm validity and assign labels.New issue. Needs quick review to confirm validity and assign labels.enhancementNew feature or requestNew feature or request
Type
Projects
Status
Triage