Understanding inference profiles in AWS Bedrock
If you’re looking to invoke a foundation model (e.g. Claude, Titan, Nova, Llama, etc.) in AWS Bedrock, there are 3 concepts you should know:
AWS Foundation Models
AWS Cross-Region Inference Profiles
Application Inference Profiles
Let’s visualize the relationship between them:
AWS Foundation Models
arn:aws:bedrock:<region>::foundation-model/<model_id>
You’ll find the full list of models with their available regions and model IDs on AWS’s docs.
This is the meat of Bedrock; the actual models running in different AWS regions. Calling models directly using these ARNs is possible, but a response isn’t guaranteed if that model is saturated with requests in the region you requested.
AWS Cross Region Inference Profiles
arn:aws:bedrock:<region>:<account_id>:inference-profile/<regional_model_id>
You’ll find the full list of regional model IDs on AWS’s docs.
Cross region inference profiles allow you to call a model from your desired region but have the request automatically “load balanced” to a nearby region if your desired region is not available or over capacity.
For example, making a request to arn:aws:bedrock:us-east-1:<account_id>:inference-profile/us.anthropic.claude-3-5-sonnet-20241022-v2:0
could be handled by Claude 3.5 Sonnet v2 running in us-east-1, us-east-2, or us-west-2.
Refer to the same doc to see all the source/destination region mappings per foundation model.
Also, note how the model ID is not the same as the model IDs for foundation models. Cross region inference profiles use model IDs prefixed with a country/region code (e.g. us.
).
Application Inference Profiles
arn:aws:bedrock:<region>:<account_id>:application-inference-profile/<profile_id>
These are created by you and used for tracking cost and model usage (via AWS Cost Explorer and Budgets); great for enterprise customers.
When creating an application inference profile, you will need to specify a source model ARN. This can either be the ARN of a foundation model or a cross-region inference profile.
In my next two posts, we’ll go over how to provision inference profiles that can be used cross-account and tracking using AWS Budgets and Cost Explorer.