Apportioning Amazon Bedrock invocation costs

There's a better way!

A day after I published this blog post, AWS announced support for Cost Allocation Tags for Amazon Bedrock model invocations by using application inference profiles. I’ve written a new post on how to get started with this new feature. Check it out here!

A best practice is defined in the Cost Optimization pillar that states “Add organization information to cost and usage” (COST03-BP02). Typically, you would do this by tagging resources and activating Cost Allocation Tags in the Cost and Billing Management Console. With tags activated, you can filter and group by the values in Cost Explorer; this enables adding organisation-specific information to cost and usage.

But what about when you can’t tag something? For example, API calls to Amazon Bedrock. You’ve got two choices, apportion cost equally amongst consumers or set up AWS accounts for each consumer. The former doesn’t feel particularly fair or useful, the latter adds complexity that’s not necessarily desired.

Enough pre-amble, let’s get to the meat of this article. Let’s look at an example of how we can apportion cost for Amazon Bedrock invocation requests.

Pre-requisites

Before we get started, you’ll need to do a couple of things.

First, make sure you’ve got the AWS Serverless Application Model (SAM) CLI installed. You can find instructions here.

Secondly, you’ll need to request access to a model (or models) in Amazon Bedrock. The Bedrock API only returns input/output tokens (which we’ll use to calculate cost) for certain models. Of the applicable models, we’ll focus on the Claude 3 Haiku and Claude 3 Sonnet models. To keep costs down I’d recommend the ‘Claude 3 Haiku’ model. If you need a helping hand, check here. I’d do this in eu-west-2 (London).

1. Bootstrapping an API

To start, we’re going to bootstrap a simple SAM application that when deployed will create a Python Lambda function fronted by an API Gateway REST API.

1
sam init

1 - AWS Quick Start Templates and then 1 - Hello World Example. Opt for using the most popular runtime and package type. For the purposes of this example, you can select the default values (N) for the questions about X-Ray, CloudWatch Application Insights, and Structured Logging. On the final step, I’ve opted to call my SAM app bedrock-cost-apportionment but you’re free to be creative!

You’ll now have a new directory created with the required template and source files. We’re going to deploy it to AWS in its current state, just to make sure everything’s set up successfully.

Change into the new directory (e.g. cd bedrock-cost-apportionment) and then run the following commands. Before you do so, you’ll need to make sure you’ve got active AWS credentials set up. I use AWSume, but you can also append --profile <ProfileName> on the end of the next commands if you’ve set your local credentials up a different way.

1
sam build
2
sam deploy --guided

When you run the latter command, you’ll be guided through a set of questions. You can accept the defaults for most, but you’ll want to ensure you set the AWS Region to your desired region (I recommend eu-west-2) and override HelloWorldFunction has no authentication. Is this okay? to Y.

The deployment will successfully complete after a short while. If you open the AWS Management Console for the account where you’ve deployed the application, you’ll be able to see a newly created API and a Lambda function.

Back in your terminal window, there’ll be three outputs from the CloudFormation deployed stack. Look for the output named HelloWorldApi, it’ll have a URL as a value. Copy this value and run the command below, replacing the URL with the value you copied.

1
curl -XGET https://abcdefghi123.execute-api.eu-west-2.amazonaws.com/Prod/hello/

You should get a response back that looks like this:

1
{ "message": "hello world" }

So far, so good! We’ve got a sample API deployed and are ready to start building our integration.

2. Building an MVP proxy

Let’s move on to integrating our application with Amazon Bedrock. The first thing we’ll do is rename the hello_world directory to bedrock_proxy. Then create a new directory called functions and move the bedrock_proxy folder inside it. You should end up with your Lambda handler located at functions/bedrock_proxy/app.py.

Updating the function

To make handling and responding to API Gateway events easier, we’ll use the AWS Lambda Powertools library. To do this, we need to add it to our requirements.txt file. Update your file with the following contents.

1
aws-lambda-powertools>=3.1.0
2
boto3>=1.35.39

With this file updated, we can now focus on the Python code that handles the proxying of requests from API Gateway to Bedrock. Replace the contents of your app.py file with the below.

1
import boto3
2
import json
3
from aws_lambda_powertools.event_handler import APIGatewayRestResolver
4
from aws_lambda_powertools.utilities.typing.lambda_context import LambdaContext
5

6
import json
7

8
bedrock_runtime = boto3.client("bedrock-runtime")
9
app = APIGatewayRestResolver()
10

11

12
@app.post("/invoke-model")
13
def invoke_model():
14
    # Take the body from the API GW proxy event, and re-serialise the body as
15
    # the Bedrock Runtime API expects `body` as a string
16
    bedrock_request: dict = app.current_event.json_body
17
    bedrock_request["body"] = json.dumps(bedrock_request["body"])
18

19
    # Make a request against the Bedrock Runtime API
20
    response = bedrock_runtime.invoke_model(**bedrock_request)
21
    response_body = response["body"].read().decode("utf-8")
22

23
    return response_body
24

25

26
def lambda_handler(event, context: LambdaContext):
27
    return app.resolve(event, context)

The updated code accomplishes:

Using AWS Lambda Powertools, define a handler for the POST /invoke-model route.
Take the body of the API Gateway request, and pass it through to the Bedrock InvokeModel API
Return the response from Bedrock

Updating the template

With a new function defined in a new location in the repository, we need to update the AWS SAM template. At the same time, we’ll set up some resources that’ll be used later.

Skip to the code snippet below if you wish, but I’ll run through the changes we’re making.

Add a parameter for the API stage name - so we have the option in future to deploy multiple versions of the same SAM application (e.g. Dev/Stg/Prod)
Add an explicit API Gateway - we’ll want to add authentication to our API later, it’s easier to do so when we’ve got an explicit resource for the API Gateway rather than the implicit resource that SAM can create.
Update the function definition - we’re updating the properties that define the function, including the timeout, location of code, IAM policies, and events, to support the Bedrock proxy use case.
Update the outputs - with our explicit API resource now defined, we also need to update the CloudFormation stack output to reference it (and our new parameter).

1
AWSTemplateFormatVersion: "2010-09-09"
2
Transform: AWS::Serverless-2016-10-31
3
Description: >
4
  Creates a Bedrock Cost Apportionment Proxy.
5

6
Parameters:
7
  pStageName:
8
    Type: String
9
    Description: Stage name for the API Gateway
10
    Default: v1
11

12
Resources:
13
  BedrockProxyApi:
14
    Type: AWS::Serverless::Api
15
    Properties:
16
      StageName: !Ref pStageName
17
      EndpointConfiguration:
18
        Type: REGIONAL
19

20
  BedrockProxyFunction:
21
    Type: AWS::Serverless::Function
22
    Properties:
23
      Runtime: python3.12
24
      CodeUri: functions/bedrock_proxy
25
      Handler: app.lambda_handler
26
      MemorySize: 128
27
      Timeout: 30
28
      Policies:
29
        - Statement:
30
            - Effect: Allow
31
              Action:
32
                - bedrock:InvokeModel
33
              Resource: "*"
34
      Events:
35
        ProxyApi:
36
          Type: Api
37
          Properties:
38
            RestApiId: !Ref BedrockProxyApi
39
            Path: /invoke-model
40
            Method: POST
41

42
Outputs:
43
  BedrockProxyApiEndpoint:
44
    Description: API Gateway endpoint URL
45
    Value: !Sub "https://${BedrockProxyApi}.execute-api.${AWS::Region}.amazonaws.com/${pStageName}/invoke-model"

Re-deploy it

Now that we’re up to date, we can re-deploy the SAM application and test it out.

1
sam build
2
sam deploy

Using a tool like Postman, make a request to POST /invoke-model. The body of the request should be in the same format that you’d normally invoke the Bedrock InvokeModel API.

Note that at this stage our API still doesn’t have any authentication and so I strongly recommend deleting the CloudFormation stack when you’re not using it.

3. Adding authentication

API Gateway has a really neat mechanism for handling custom authentication, and when combined with the Lambda Powertools library makes it easy to implement. Our authentication for this API will be fairly simple, driven by a DynamoDB table that contains a username and a hashed API key.

The first resource we’ll want to define is the DynamoDB table that contains this data. This is defined with the ApiKeyTable resource below. We’re using the hashed API key as the primary key of this table as it’ll be a unique value and allows us to store metadata about the associated user as part of the same item easily.

Next, we’ll define the Lambda function that carries out the authentication checks. This is defined with the LambdaAuthorizerFunction resource. We’re passing in an environment variable with the table name and also giving it access to read from the table. Don’t worry that the folder it’s referencing doesn’t exist - we’ll get to that in a moment.

Last of all, we need to add the authorizer to the API Gateway. The Auth property that we’ve defined below needs to be added to the BedrockProxyApi resource to set this up for all API routes.


11 collapsed lines
1
AWSTemplateFormatVersion: "2010-09-09"
2
Transform: AWS::Serverless-2016-10-31
3
Description: >
4
  Creates a Bedrock Cost Apportionment Proxy.
5

6
Parameters:
7
  pStageName:
8
    Type: String
9
    Description: Stage name for the API Gateway
10
    Default: v1
11

12
Resources:
13
  BedrockProxyApi:
14
    Type: AWS::Serverless::Api
15
    Properties:
16
      StageName: !Ref pStageName
17
      Auth:
18
        DefaultAuthorizer: LambdaAuthorizer
19
        Authorizers:
20
          LambdaAuthorizer:
21
            FunctionArn: !GetAtt LambdaAuthorizerFunction.Arn
22
      EndpointConfiguration:
23
        Type: REGIONAL
24

25
  LambdaAuthorizerFunction:
26
    Type: AWS::Serverless::Function
27
    Properties:
28
      Runtime: python3.12
29
      CodeUri: functions/lambda_authorizer
30
      Handler: app.lambda_handler
31
      MemorySize: 128
32
      Timeout: 5
33
      Environment:
34
        Variables:
35
          API_KEY_TABLE: !Ref ApiKeyTable
36
      Policies:
37
        - DynamoDBReadPolicy:
38
            TableName: !Ref ApiKeyTable
39

21 collapsed lines
40
  BedrockProxyFunction:
41
    Type: AWS::Serverless::Function
42
    Properties:
43
      Runtime: python3.12
44
      CodeUri: functions/bedrock_proxy
45
      Handler: app.lambda_handler
46
      MemorySize: 128
47
      Timeout: 30
48
      Policies:
49
        - Statement:
50
            - Effect: Allow
51
              Action:
52
                - bedrock:InvokeModel
53
              Resource: "*"
54
      Events:
55
        ProxyApi:
56
          Type: Api
57
          Properties:
58
            RestApiId: !Ref BedrockProxyApi
59
            Path: /invoke-model
60
            Method: POST
61

62
  ApiKeyTable:
63
    Type: AWS::Serverless::SimpleTable
64
    Properties:
65
      PrimaryKey:
66
        Name: ApiKey
67
        Type: String
68

69
Outputs:
70
  BedrockProxyApiEndpoint:
71
    Description: API Gateway endpoint URL
72
    Value: !Sub "https://${BedrockProxyApi}.execute-api.${AWS::Region}.amazonaws.com/${pStageName}/invoke-model"

With the SAM template taken care of, we can now add the function that handles the authorization logic. Create a new folder at functions/lambda_authorizer with two files in, requirements.txt and app.py.

We’ll start with the Pip requirements file. This function will make use of AWS Lambda Powertools as it provides some nice helpers for working as an API Gateway Authorizer, boto3 for interacting with the DynamoDB table and Keycove for handling the hashing of the API key for comparison with that stored in the table.

1
aws-lambda-powertools>=3.1.0
2
boto3>=1.35.39
3
keycove>=0.3.5

Moving onto the Lambda authorizer itself. It’s a very simple authentication check, it essentially takes the authorisation token passed in the headers and checks the DynamoDB table to see if an item exists with a key that matches the hashed version of the token.

If an item doesn’t exist, the DENY_ALL_RESPONSE constant is returned - a helper from the Lambda Powertools library that prevents access to all routes. If an item does exist, a response is built up with details of the user that’s made the request (we’ll use this for cost apportionment later) and a policy to allow access to all routes of the API.

1
import os
2
import boto3
3
from keycove import hash
4
from aws_lambda_powertools.utilities.data_classes import event_source
5
from aws_lambda_powertools.utilities.data_classes.api_gateway_authorizer_event import (
6
    DENY_ALL_RESPONSE,
7
    APIGatewayAuthorizerTokenEvent,
8
    APIGatewayAuthorizerResponse,
9
)
10

11
API_KEY_TABLE = os.environ["API_KEY_TABLE"]
12

13
table = boto3.resource("dynamodb").Table(API_KEY_TABLE)
14

15

16
@event_source(data_class=APIGatewayAuthorizerTokenEvent)
17
def lambda_handler(event: APIGatewayAuthorizerTokenEvent, context):
18
    arn = event.parsed_arn
19

20
    api_key_item = table.get_item(Key={"ApiKey": hash(event.authorization_token)})
21

22
    # Check for non-existent API key
23
    if api_key_item.get("Item") is None:
24
        return DENY_ALL_RESPONSE
25

26
    policy = APIGatewayAuthorizerResponse(
27
        principal_id=f"User::{api_key_item['Item']['Username']}",
28
        region=arn.region,
29
        aws_account_id=arn.aws_account_id,
30
        api_id=arn.api_id,
31
        stage=arn.stage,
32
    )
33

34
    policy.allow_all_routes()
35

36
    return policy.asdict()

We’ll now re-deploy the SAM application again. Once you’ve done that, if you make the same request as you did previously to test the proxy, you should find that you’re not successful and get an Unauthorized response.

1
sam build
2
sam deploy

You might be asking, but how can I authenticate without an API key? Fear not - we’ll now implement a script to generate one for you. Start by setting up a requirements.txt file within a new scripts directory. Here we’ll define our dependencies for the script. We’re using boto3 for interacting with AWS resources, and keycove for generating an API key.

1
boto3>=1.35.39
2
keycove>=0.3.5

Time to implement the script. It’s a fairly simple script that parses arguments from the command line, namely the DynamoDB table name and the desired username that you’re generating an API key for. It uses keycloak to generate an API key and then stores the hashed version of that key along with the username in the DynamoDB table specified.

1
import boto3
2
import argparse
3
from keycove import generate_token, hash
4

5
parser = argparse.ArgumentParser()
6
parser.add_argument("--dynamodb-table-name", required=True)
7
parser.add_argument("--username", required=True)
8
args = parser.parse_args()
9

10
table = boto3.resource("dynamodb").Table(args.dynamodb_table_name)
11

12

13
def main():
14
    token = generate_token()
15
    table.put_item(
16
        Item={
17
            "ApiKey": hash(token),
18
            "Username": args.username,
19
        }
20
    )
21

22
    print(f"Your API key generated is {token}")
23

24

25
if __name__ == "__main__":
26
    main()

With these files defined, you’ll next need to retrieve your DynamoDB table name from the AWS console. Once you’ve got this, run through the following steps to generate a sample API key using the script we’ve just added. The output of the script will be your API key.

1
cd scripts
2
python3 -m venv .venv
3
source .venv/bin/activate
4
pip3 install -r requirements.txt
5
python3 generate_sample_api_key.py --username {YourUserName} --dynamodb-table-name {YourTableName}

Retry the request to the Bedrock Proxy API that we’ve got deployed, but this time add a header called Authorization and set the value to the token output by our script. You should find that your request succeeds and returns the expected response from Bedrock - result!

4. Recording model usage

To record our model usage, we’re going to use CloudWatch Metrics. Publishing custom metrics to CloudWatch allows us to easily create dashboards and alarms based on usage. We can also gather metrics on multiple dimensions (e.g. user name, model identifier).

First, we’ll make some changes to our SAM template in order to pass the required information to the Lambda function and give its execution role permission to store metric data in CloudWatch. We’re adding a new parameter (pCloudWatchCustomMetricNamespace) with a sensible default to allow a custom namespace to be used. We then pass that namespace to the Lambda function as an environment variable and make a change to the IAM policy attached to the execution role to allow the cloudwatch:PutMetricData action within this custom namespace.


5 collapsed lines
1
AWSTemplateFormatVersion: "2010-09-09"
2
Transform: AWS::Serverless-2016-10-31
3
Description: >
4
  Creates a Bedrock Cost Apportionment Proxy.
5

6
Parameters:
7
  pStageName:
8
    Type: String
9
    Description: Stage name for the API Gateway
10
    Default: v1
11
  pCloudWatchCustomMetricNamespace:
12
    Type: String
13
    Description: CloudWatch Custom Metric Namespace
36 collapsed lines
14
    Default: BedrockCostApportionment
15

16
Resources:
17
  BedrockProxyApi:
18
    Type: AWS::Serverless::Api
19
    Properties:
20
      StageName: !Ref pStageName
21
      Auth:
22
        DefaultAuthorizer: LambdaAuthorizer
23
        Authorizers:
24
          LambdaAuthorizer:
25
            FunctionArn: !GetAtt LambdaAuthorizerFunction.Arn
26
      EndpointConfiguration:
27
        Type: REGIONAL
28

29
  LambdaAuthorizerFunction:
30
    Type: AWS::Serverless::Function
31
    Properties:
32
      Runtime: python3.12
33
      CodeUri: functions/lambda_authorizer
34
      Handler: app.lambda_handler
35
      MemorySize: 128
36
      Timeout: 5
37
      Environment:
38
        Variables:
39
          API_KEY_TABLE: !Ref ApiKeyTable
40
      Policies:
41
        - DynamoDBReadPolicy:
42
            TableName: !Ref ApiKeyTable
43

44
  BedrockProxyFunction:
45
    Type: AWS::Serverless::Function
46
    Properties:
47
      Runtime: python3.12
48
      CodeUri: functions/bedrock_proxy
49
      Handler: app.lambda_handler
50
      MemorySize: 128
51
      Timeout: 30
52
      Environment:
53
        Variables:
54
          CLOUDWATCH_CUSTOM_METRIC_NAMESPACE: !Ref pCloudWatchCustomMetricNamespace
55
      Policies:
56
        - Statement:
57
            - Effect: Allow
58
              Action:
59
                - bedrock:InvokeModel
60
              Resource: "*"
61
            - Effect: Allow
62
              Action:
63
                - cloudwatch:PutMetricData
64
              Resource: "*"
65
              Condition:
66
                StringEquals:
67
                  "cloudwatch:namespace": !Ref pCloudWatchCustomMetricNamespace
68
      Events:
69
        ProxyApi:
70
          Type: Api
71
          Properties:
72
            RestApiId: !Ref BedrockProxyApi
73
            Path: /invoke-model
74
            Method: POST
75

76
  ApiKeyTable:
77
    Type: AWS::Serverless::SimpleTable
78
    Properties:
79
      PrimaryKey:
80
        Name: ApiKey
81
        Type: String
82

83
Outputs:
84
  BedrockProxyApiEndpoint:
85
    Description: API Gateway endpoint URL
86
    Value: !Sub "https://${BedrockProxyApi}.execute-api.${AWS::Region}.amazonaws.com/${pStageName}/invoke-model"

We need to implement is a way to calculate the input/output tokens and corresponding usage in dollars from a Bedrock response. Given that each model responds in a different format, and pricing is different per model/per region, we need to implement a scalable approach as new models are supported in future.

The get_model_usage.py file shown below implements the manager and driver pattern to enable writing custom logic for each model without cluttering up a single class. Read the code to get an understanding of what’s happening - at a high level:

A driver can be retrieved from the driver manager based upon the model ID
The retrieved driver extends from the AbstractModelUsageDriver class, helping to identify what methods are required to be implemented
The retrieved driver retrieves a response that contains the number of input/output tokens used and the cost associated with it.

1
import os
2

3
AWS_REGION = os.environ["AWS_REGION"]
4

5
class AbstractModelUsageDriver:
6
    # Example format:
7
    # MODEL_PRICE_PER_1K_TOKENS = {
8
    #     "REGION": {
9
    #         "MODEL_ID": {
10
    #             "input": 0.00025,
11
    #             "output": 0.00125,
12
    #         }
13
    #     }
14
    # }
15
    MODEL_PRICE_PER_1K_TOKENS = None
16
    MODEL_ID = None
17

18
    def __init__(self, model_id):
19
        self.MODEL_ID = model_id
20

21
    def get_model_usage(self, bedrock_response):
22
        raise NotImplementedError("invoke_model method must be implemented")
23

24

25
class AnthropicClaudeDriver(AbstractModelUsageDriver):
26
    MODEL_PRICE_PER_1K_TOKENS = {
27
        "eu-west-2": {
28
            "anthropic.claude-3-haiku-20240307-v1:0": {
29
                "input": 0.00025,
30
                "output": 0.00125,
31
            },
32
            "anthropic.claude-3-sonnet-20240229-v1:0": {
33
                "input": 0.003,
34
                "output": 0.015,
35
            },
36
        }
37
    }
38

39
    def __calculate_cost_of_tokens(self, direction, tokens):
40
        region = AWS_REGION
41
        model_id = self.MODEL_ID
42
        price_per_1k_tokens = self.MODEL_PRICE_PER_1K_TOKENS[region][model_id][
43
            direction
44
        ]
45
        cost = (tokens / 1000) * price_per_1k_tokens
46
        return cost
47

48
    def get_model_usage(self, bedrock_response):
49
        input_tokens_used = bedrock_response["usage"]["input_tokens"]
50
        output_tokens_used = bedrock_response["usage"]["output_tokens"]
51
        response = {
52
            "input": {
53
                "tokens": input_tokens_used,
54
                "cost": self.__calculate_cost_of_tokens("input", input_tokens_used),
55
            },
56
            "output": {
57
                "tokens": output_tokens_used,
58
                "cost": self.__calculate_cost_of_tokens("output", output_tokens_used),
59
            },
60
        }
61
        return response
62

63

64
class ModelUsageDriverManager:
65
    # Find model IDs on this page
66
    # https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html#model-ids-arns
67
    MODEL_USAGE_DRIVERS = {
68
        "anthropic.claude-3-haiku-20240307-v1:0": AnthropicClaudeDriver,
69
        "anthropic.claude-3-sonnet-20240229-v1:0": AnthropicClaudeDriver,
70
    }
71

72
    def __init__(self, model_id):
73
        self.model_id = model_id
74

75
    def get_model_usage_driver(self):
76
        return self.MODEL_USAGE_DRIVERS[self.model_id]

Finally, we make changes to the Lambda function’s app.py file. All we’re doing here is making use of the ModelUsageDriverManager that we just implemented, retrieving the username of the person making the request, and sending the results to CloudWatch Metrics.

1
import boto3
2
import json
3
import os
4
from aws_lambda_powertools.event_handler import APIGatewayRestResolver
5
from aws_lambda_powertools.utilities.typing.lambda_context import LambdaContext
6

7
from get_model_usage import ModelUsageDriverManager
8
import json
9

10
CLOUDWATCH_CUSTOM_METRIC_NAMESPACE = os.environ["CLOUDWATCH_CUSTOM_METRIC_NAMESPACE"]
11

12
bedrock_runtime = boto3.client("bedrock-runtime")
13
cloudwatch = boto3.client("cloudwatch")
14

15
app = APIGatewayRestResolver()
16

17

18
@app.post("/invoke-model")
19
def invoke_model():
9 collapsed lines
20
    # Take the body from the API GW proxy event, and re-serialise the body as
21
    # the Bedrock Runtime API expects `body` as a string
22
    bedrock_request: dict = app.current_event.json_body
23
    bedrock_request["body"] = json.dumps(bedrock_request["body"])
24

25
    # Make a request against the Bedrock Runtime API
26
    response = bedrock_runtime.invoke_model(**bedrock_request)
27
    response_body = response["body"].read().decode("utf-8")
28

29
    # Determine the model usage driver based on the model ID, and then use it
30
    # to get the model usage
31
    model_id = bedrock_request["modelId"]
32
    model_usage_driver = ModelUsageDriverManager(model_id).get_model_usage_driver()
33
    model_usage = model_usage_driver(model_id).get_model_usage(
34
        json.loads(response_body)
35
    )
36

37
    # Extract the user information from the event
38
    principal_id = app.current_event.request_context.authorizer.principal_id
39

40
    # Put metric data to CloudWatch to store model usage
41
    cloudwatch.put_metric_data(
42
        Namespace=CLOUDWATCH_CUSTOM_METRIC_NAMESPACE,
43
        MetricData=[
44
            {
45
                "MetricName": "InputTokensUsed",
46
                "Value": model_usage["input"]["tokens"],
47
                "Unit": "Count",
48
                "Dimensions": [
49
                    {"Name": "ModelId", "Value": model_id},
50
                    {"Name": "PrincipalId", "Value": principal_id},
51
                ],
52
            },
53
            {
54
                "MetricName": "InputTokensCost",
55
                "Value": model_usage["input"]["cost"],
56
                "Unit": "Count",
57
                "Dimensions": [
58
                    {"Name": "ModelId", "Value": model_id},
59
                    {"Name": "PrincipalId", "Value": principal_id},
60
                ],
61
            },
62
            {
63
                "MetricName": "OutputTokensUsed",
64
                "Value": model_usage["output"]["tokens"],
65
                "Unit": "Count",
66
                "Dimensions": [
67
                    {"Name": "ModelId", "Value": model_id},
68
                    {"Name": "PrincipalId", "Value": principal_id},
69
                ],
70
            },
71
            {
72
                "MetricName": "OutputTokensCost",
73
                "Value": model_usage["output"]["cost"],
74
                "Unit": "Count",
75
                "Dimensions": [
76
                    {"Name": "ModelId", "Value": model_id},
77
                    {"Name": "PrincipalId", "Value": principal_id},
78
                ],
79
            },
80
        ],
81
    )
82

83
    return response_body
84

85

86
def lambda_handler(event, context: LambdaContext):
87
    return app.resolve(event, context)

If you now make a request, with your API key, you’ll find that metrics are stored in CloudWatch relating to your model usage. You can verify this by going to CloudWatch Metrics and exploring the custom namespace. Metrics are recorded on the dimensions PrincipalId and ModelId to make it easy to analyse which principal (username in this case) is responsible for most of the usage.

Conclusion

In conclusion, we’ve covered end-to-end how to implement a proxy for invoking Amazon Bedrock models whilst recording model usage to improve cost apportionment functionality. I’d love to see AWS implement this concept as a first-party solution.

It is worth noting that CloudWatch Metrics isn’t free to use. In the eu-west-2 region, a custom metric costs $0.30 per month (first 10k, cheaper at 10k metrics+), pro-rated hourly so you’re only charged in the hours that you publish metrics. A metric counts as a published metric on 1 dimension - so in our case where we’re publishing 4 metrics on 2 dimensions each, 8 metrics are counted as used. You can see how as this solution scales, it could get expensive.

Let’s imagine a scenario where 100 engineers are using the proxy, across 2 Bedrock models. Let’s assume that they’re only using for 8 hours per day, and that there’s 20 working days in the month. This equates to 160 hours of usage. AWS charge $0.30 per month, equating to $0.0004109589041 hourly. Pro-rating this for our working hours, it results in a per-metric monthly cost of $0.06575342466. 100 engineers, using 2 models equates to 1600 metrics (100 engineers * 2 models * (4 metrics * 2 dimensions)). This works out to be $105.20 per month in CloudWatch Metrics charges. Whilst CloudWatch Metrics definitely makes it easy to publish, visualise and alert on data, it’s also an expensive way to do so. I’ll be doing a follow-up post in future that looks at some more cost-effective ways to store and analyse the metrics without increasing complexity too much.

I hope you’ve found this post useful. If you wish to explore the code in its entirety, a GitHub repository is available.

Alex Kearns