Apportioning Amazon Bedrock invocation costs
A day after I published this blog post, AWS announced support for Cost Allocation Tags for Amazon Bedrock model invocations by using application inference profiles. I’ve written a new post on how to get started with this new feature. Check it out here!
A best practice is defined in the Cost Optimization pillar that states “Add organization information to cost and usage” (COST03-BP02). Typically, you would do this by tagging resources and activating Cost Allocation Tags in the Cost and Billing Management Console. With tags activated, you can filter and group by the values in Cost Explorer; this enables adding organisation-specific information to cost and usage.
But what about when you can’t tag something? For example, API calls to Amazon Bedrock. You’ve got two choices, apportion cost equally amongst consumers or set up AWS accounts for each consumer. The former doesn’t feel particularly fair or useful, the latter adds complexity that’s not necessarily desired.
Enough pre-amble, let’s get to the meat of this article. Let’s look at an example of how we can apportion cost for Amazon Bedrock invocation requests.
Pre-requisites
Before we get started, you’ll need to do a couple of things.
First, make sure you’ve got the AWS Serverless Application Model (SAM) CLI installed. You can find instructions here.
Secondly, you’ll need to request access to a model (or models) in Amazon Bedrock. The Bedrock API only returns input/output tokens (which we’ll use to calculate cost) for certain models. Of the applicable models, we’ll focus on the Claude 3 Haiku and Claude 3 Sonnet models. To keep costs down I’d recommend the ‘Claude 3 Haiku’ model. If you need a helping hand, check here. I’d do this in eu-west-2
(London).
1. Bootstrapping an API
To start, we’re going to bootstrap a simple SAM application that when deployed will create a Python Lambda function fronted by an API Gateway REST API.
1 - AWS Quick Start Templates
and then 1 - Hello World Example
. Opt for using the most popular runtime and package type. For the purposes of this example, you can select the default values (N
) for the questions about X-Ray, CloudWatch Application Insights, and Structured Logging. On the final step, I’ve opted to call my SAM app bedrock-cost-apportionment
but you’re free to be creative!
You’ll now have a new directory created with the required template and source files. We’re going to deploy it to AWS in its current state, just to make sure everything’s set up successfully.
Change into the new directory (e.g. cd bedrock-cost-apportionment
) and then run the following commands. Before you do so, you’ll need to make sure you’ve got active AWS credentials set up. I use AWSume, but you can also append --profile <ProfileName>
on the end of the next commands if you’ve set your local credentials up a different way.
When you run the latter command, you’ll be guided through a set of questions. You can accept the defaults for most, but you’ll want to ensure you set the AWS Region
to your desired region (I recommend eu-west-2
) and override HelloWorldFunction has no authentication. Is this okay?
to Y
.
The deployment will successfully complete after a short while. If you open the AWS Management Console for the account where you’ve deployed the application, you’ll be able to see a newly created API and a Lambda function.
Back in your terminal window, there’ll be three outputs from the CloudFormation deployed stack. Look for the output named HelloWorldApi
, it’ll have a URL as a value. Copy this value and run the command below, replacing the URL with the value you copied.
You should get a response back that looks like this:
So far, so good! We’ve got a sample API deployed and are ready to start building our integration.
2. Building an MVP proxy
Let’s move on to integrating our application with Amazon Bedrock. The first thing we’ll do is rename the hello_world
directory to bedrock_proxy
. Then create a new directory called functions
and move the bedrock_proxy
folder inside it. You should end up with your Lambda handler located at functions/bedrock_proxy/app.py
.
Updating the function
To make handling and responding to API Gateway events easier, we’ll use the AWS Lambda Powertools library. To do this, we need to add it to our requirements.txt
file. Update your file with the following contents.
With this file updated, we can now focus on the Python code that handles the proxying of requests from API Gateway to Bedrock. Replace the contents of your app.py
file with the below.
The updated code accomplishes:
- Using AWS Lambda Powertools, define a handler for the
POST /invoke-model
route. - Take the body of the API Gateway request, and pass it through to the Bedrock
InvokeModel
API - Return the response from Bedrock
Updating the template
With a new function defined in a new location in the repository, we need to update the AWS SAM template. At the same time, we’ll set up some resources that’ll be used later.
Skip to the code snippet below if you wish, but I’ll run through the changes we’re making.
- Add a parameter for the API stage name - so we have the option in future to deploy multiple versions of the same SAM application (e.g. Dev/Stg/Prod)
- Add an explicit API Gateway - we’ll want to add authentication to our API later, it’s easier to do so when we’ve got an explicit resource for the API Gateway rather than the implicit resource that SAM can create.
- Update the function definition - we’re updating the properties that define the function, including the timeout, location of code, IAM policies, and events, to support the Bedrock proxy use case.
- Update the outputs - with our explicit API resource now defined, we also need to update the CloudFormation stack output to reference it (and our new parameter).
Re-deploy it
Now that we’re up to date, we can re-deploy the SAM application and test it out.
Using a tool like Postman, make a request to POST /invoke-model
. The body of the request should be in the same format that you’d normally invoke the Bedrock InvokeModel
API.
Note that at this stage our API still doesn’t have any authentication and so I strongly recommend deleting the CloudFormation stack when you’re not using it.
3. Adding authentication
API Gateway has a really neat mechanism for handling custom authentication, and when combined with the Lambda Powertools library makes it easy to implement. Our authentication for this API will be fairly simple, driven by a DynamoDB table that contains a username and a hashed API key.
The first resource we’ll want to define is the DynamoDB table that contains this data. This is defined with the ApiKeyTable
resource below. We’re using the hashed API key as the primary key of this table as it’ll be a unique value and allows us to store metadata about the associated user as part of the same item easily.
Next, we’ll define the Lambda function that carries out the authentication checks. This is defined with the LambdaAuthorizerFunction
resource. We’re passing in an environment variable with the table name and also giving it access to read from the table. Don’t worry that the folder it’s referencing doesn’t exist - we’ll get to that in a moment.
Last of all, we need to add the authorizer to the API Gateway. The Auth
property that we’ve defined below needs to be added to the BedrockProxyApi
resource to set this up for all API routes.
With the SAM template taken care of, we can now add the function that handles the authorization logic. Create a new folder at functions/lambda_authorizer
with two files in, requirements.txt
and app.py
.
We’ll start with the Pip requirements file. This function will make use of AWS Lambda Powertools as it provides some nice helpers for working as an API Gateway Authorizer, boto3 for interacting with the DynamoDB table and Keycove for handling the hashing of the API key for comparison with that stored in the table.
Moving onto the Lambda authorizer itself. It’s a very simple authentication check, it essentially takes the authorisation token passed in the headers and checks the DynamoDB table to see if an item exists with a key that matches the hashed version of the token.
If an item doesn’t exist, the DENY_ALL_RESPONSE
constant is returned - a helper from the Lambda Powertools library that prevents access to all routes. If an item does exist, a response is built up with details of the user that’s made the request (we’ll use this for cost apportionment later) and a policy to allow access to all routes of the API.
We’ll now re-deploy the SAM application again. Once you’ve done that, if you make the same request as you did previously to test the proxy, you should find that you’re not successful and get an Unauthorized
response.
You might be asking, but how can I authenticate without an API key? Fear not - we’ll now implement a script to generate one for you. Start by setting up a requirements.txt
file within a new scripts
directory. Here we’ll define our dependencies for the script. We’re using boto3
for interacting with AWS resources, and keycove
for generating an API key.
Time to implement the script. It’s a fairly simple script that parses arguments from the command line, namely the DynamoDB table name and the desired username that you’re generating an API key for. It uses keycloak
to generate an API key and then stores the hashed version of that key along with the username in the DynamoDB table specified.
With these files defined, you’ll next need to retrieve your DynamoDB table name from the AWS console. Once you’ve got this, run through the following steps to generate a sample API key using the script we’ve just added. The output of the script will be your API key.
Retry the request to the Bedrock Proxy API that we’ve got deployed, but this time add a header called Authorization
and set the value to the token output by our script. You should find that your request succeeds and returns the expected response from Bedrock - result!
4. Recording model usage
To record our model usage, we’re going to use CloudWatch Metrics. Publishing custom metrics to CloudWatch allows us to easily create dashboards and alarms based on usage. We can also gather metrics on multiple dimensions (e.g. user name, model identifier).
First, we’ll make some changes to our SAM template in order to pass the required information to the Lambda function and give its execution role permission to store metric data in CloudWatch. We’re adding a new parameter (pCloudWatchCustomMetricNamespace
) with a sensible default to allow a custom namespace to be used. We then pass that namespace to the Lambda function as an environment variable and make a change to the IAM policy attached to the execution role to allow the cloudwatch:PutMetricData
action within this custom namespace.
We need to implement is a way to calculate the input/output tokens and corresponding usage in dollars from a Bedrock response. Given that each model responds in a different format, and pricing is different per model/per region, we need to implement a scalable approach as new models are supported in future.
The get_model_usage.py
file shown below implements the manager and driver pattern to enable writing custom logic for each model without cluttering up a single class. Read the code to get an understanding of what’s happening - at a high level:
- A driver can be retrieved from the driver manager based upon the model ID
- The retrieved driver extends from the
AbstractModelUsageDriver
class, helping to identify what methods are required to be implemented - The retrieved driver retrieves a response that contains the number of input/output tokens used and the cost associated with it.
Finally, we make changes to the Lambda function’s app.py
file. All we’re doing here is making use of the ModelUsageDriverManager
that we just implemented, retrieving the username of the person making the request, and sending the results to CloudWatch Metrics.
If you now make a request, with your API key, you’ll find that metrics are stored in CloudWatch relating to your model usage. You can verify this by going to CloudWatch Metrics and exploring the custom namespace. Metrics are recorded on the dimensions PrincipalId
and ModelId
to make it easy to analyse which principal (username in this case) is responsible for most of the usage.
Conclusion
In conclusion, we’ve covered end-to-end how to implement a proxy for invoking Amazon Bedrock models whilst recording model usage to improve cost apportionment functionality. I’d love to see AWS implement this concept as a first-party solution.
It is worth noting that CloudWatch Metrics isn’t free to use. In the eu-west-2
region, a custom metric costs $0.30 per month (first 10k, cheaper at 10k metrics+), pro-rated hourly so you’re only charged in the hours that you publish metrics. A metric counts as a published metric on 1 dimension - so in our case where we’re publishing 4 metrics on 2 dimensions each, 8 metrics are counted as used. You can see how as this solution scales, it could get expensive.
Let’s imagine a scenario where 100 engineers are using the proxy, across 2 Bedrock models. Let’s assume that they’re only using for 8 hours per day, and that there’s 20 working days in the month. This equates to 160 hours of usage. AWS charge $0.30 per month, equating to $0.0004109589041 hourly. Pro-rating this for our working hours, it results in a per-metric monthly cost of $0.06575342466. 100 engineers, using 2 models equates to 1600 metrics (100 engineers * 2 models * (4 metrics * 2 dimensions)
). This works out to be $105.20 per month in CloudWatch Metrics charges. Whilst CloudWatch Metrics definitely makes it easy to publish, visualise and alert on data, it’s also an expensive way to do so. I’ll be doing a follow-up post in future that looks at some more cost-effective ways to store and analyse the metrics without increasing complexity too much.
I hope you’ve found this post useful. If you wish to explore the code in its entirety, a GitHub repository is available.