AWS HTTP API and Python based Lambda Integration

I have been using AWS API Gateway to AWS Lambda integration for many years, but only recently tried out the deployment with SAM, using the newer HTTP API.

It is surprisingly a lot easier than what it used to be, and in this blog post I hope to give a quick intro into the SAM method of defining and deploying your API. I will also go into a little more detail in the Lambda side as to how to extract POST data as there are several ways in how your Lambda function may receive data.

Starting with a Simple Example

API Gateways can quickly become rather complex, so to keep things simple I will explain an example at the hand of handling a HTTP POST request, coming in via the API Gateway and with that request then being routed by means of proxy integration to AWS Lambda.

There are a couple of very important points to note with this example:

So, at this point you may wonder - why even bother? Well, the problem is there are so many different use cases that it makes it very hard to all these concepts in an example and still call it "a simple example". However, I believe the template and example code does provide at least a good starting point for anyone that wants to get their feet wet and have a really quick example to deploy and play around with. In this context, the example is perfect. It can be deployed in less than 2 minutes and within 5 minutes you should be able to have tested the example and viewed the generated logs.

I have enabled a fair amount of logging so that you would be able to see how a typical request looks like.

Keep in mind that this is an HTTP API that will make a proxy request. There are various ways in which requests and data can be routed, but I believe this type of integration exposes the lambda function to all the available information that is available at the time of making the request. The function has access to the request headers, information about the client, the query string (if present), the HTTP method as well as any data submitted with the request. As mentioned, stage variables are also available.

SAM Template

I used SAM in a previous blog post, so that should provide some additional background if required. Also consult the AWS Documentation if needed.

For this example, you can use the following SAM template:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Example Template for AWS HTTP API to Python Lambda Function

Parameters:
  StageNameParameter:
    Type: String
    Description: The API Gateway Stage Name
    Default: sandbox

Resources:

  HttpApi:
    Type: AWS::Serverless::HttpApi
    Properties:
      StageName: !Ref StageNameParameter
      Tags:
        Tag: Value
      AccessLogSettings:
        DestinationArn: !GetAtt HttpApiAccessLogs.Arn
        Format: $context.stage $context.integrationErrorMessage $context.identity.sourceIp $context.identity.caller $context.identity.user [$context.requestTime] "$context.httpMethod $context.resourcePath $context.protocol" $context.status $context.responseLength $context.requestId $context.extendedRequestId
      DefaultRouteSettings:
        ThrottlingBurstLimit: 200
      RouteSettings:
        "POST /example":
          ThrottlingBurstLimit: 500 # overridden in HttpApi Event
      StageVariables:
        StageVar: Value
      FailOnWarnings: true

  HttpApiAccessLogs:
    Type: AWS::Logs::LogGroup
    Properties:
      RetentionInDays: 90

  ApiFunctionLogs:
    Type: AWS::Logs::LogGroup
    Properties:
        LogGroupName: !Sub /aws/lambda/${ApiFunction}
        RetentionInDays: 7

  ApiFunction: # Adds a GET api endpoint at "/" to the ApiGatewayApi via an Api event
    Type: AWS::Serverless::Function
    Properties:
      Events:
        ExplicitApi: # warning: creates a public endpoint
          Type: HttpApi
          Properties:
            ApiId: !Ref HttpApi
            Method: POST
            Path: /example
            TimeoutInMillis: 30000
            PayloadFormatVersion: "2.0"
            RouteSettings:
              ThrottlingBurstLimit: 600
      Runtime: python3.8
      Handler: function.handler
      CodeUri: path/to/src

A note about the Access Log Format: you can find all the fields available for use in the AWS documentation. I have loosely based the configuration on the Apache Common Log Format, but with some additional fields included.

You will have to adapt the CodeUri to the path of your Lambda function.

Lambda Template (Python)

The following Python code is rather long, but provides a good baseline for a template you can adapt to suite your needs:

import json
import logging
from datetime import datetime
import sys
import base64
from urllib.parse import parse_qs


def extract_post_data(event)->str:
    if 'requestContext' in event:
        if 'http' in event['requestContext']:
            if 'method' in event['requestContext']['http']:
                if event['requestContext']['http']['method'].upper() in ('POST', 'PUT', 'DELETE'):  # see https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods
                    if 'isBase64Encoded' in event and 'body' in event:
                        if event['isBase64Encoded'] is True:
                            body = base64.b64decode(event['body'])
                            if isinstance(body, bytes):
                                body = body.decode('utf-8')
                            return body
                    if 'body' in event:
                        body = event['body']
                        if isinstance(body, bytes):
                            body = body.decode('utf-8')
                        else:
                            body = '{}'.format(body)
                        return body
    return ""


def decode_data(event, body: str):
    if 'headers' in event:
        if 'content-type' in event['headers']:
            if 'json' in event['headers']['content-type'].lower():
                return json.loads(body)
            if 'x-www-form-urlencoded' in event['headers']['content-type'].lower():
                return parse_qs(body)
    return body


def get_logger(level=logging.INFO):
    logger = logging.getLogger()
    for h in logger.handlers:
        logger.removeHandler(h)
    formatter = logging.Formatter('%(funcName)s:%(lineno)d -  %(levelname)s - %(message)s')
    ch = logging.StreamHandler(sys.stdout)
    ch.setLevel(level)    
    ch.setFormatter(formatter)
    logger.addHandler(ch)
    logger.setLevel(level)
    return logger


def handler(
    event,
    context
):
    logger  = get_logger(level=logging.DEBUG)
    result = dict()
    return_object = {
        'statusCode': 200,
        'headers': {
            'x-custom-header' : 'my custom header value',
            'content-type': 'application/json',
        },
        'body': result,
        'isBase64Encoded': False,
    }

    logger.info('HANDLER CALLED')
    logger.debug('DEBUG ENABLED')
    logger.info('event={}'.format(event))

    body = extract_post_data(event=event)
    logger.info('body={}'.format(body))
    data = decode_data(event=event, body=body)
    logger.info('data={}'.format(data))

    result['message'] = 'ok'
    return_object['body'] = json.dumps(result)
    logger.info('HANDLER DONE')
    logger.info('result={}'.format(result))
    logger.info('return_object={}'.format(return_object))
    return return_object

There is a couple of steps to walk through...

When the API Gateway send the proxy request to the Lambda function, the data will all be included in the event dictionary.

A basic return object is then set-up and the structure is what the API Gateway expects. Since this is a JSON API example, the function will also return JSON back.

Next, the initial body data is extracted. It is a little tricky at times, and the extract_post_data() functions checks a number of things to try it's best to extract any body data that may be available.

Once the body data is available, the decode_data() function will convert it into a dict. The example shows how to support both JSON and web forms submitted data.

No further processing is done. The data objects are logged so that you can see what the results were. I am pretty sure it is possible to break the function as there is very little in terms of error checking. However, if you do get an error when testing, the API Gateway Access logs should show the error message.

Deployment and Testing

Deployment is done with the following commands, but you can adjust to suite your needs:

# You need to set this
export AWS_PROFILE="..."

# You need to set this, for example eu-central-1
export AWS_REGION="..."

# You need to set this - example: sandbox
export DEPLOYMENT_ENV="..."

sam build

sam deploy --profile $AWS_PROFILE \
    --region $AWS_REGION \
    --stack-name myApiTest \
    --capabilities CAPABILITY_IAM \
    --parameter-overrides "ParameterKey=StageNameParameter,ParameterValue=$DEPLOYMENT_ENV" \
    --disable-rollback \
    --no-confirm-changeset \
    --config-env $DEPLOYMENT_ENV

Since the Lambda function can parse both JSON and form data, you can call your endpoint using the following example - just use your own API Gateway ID in place of the nnnnnnnn:

curl -d "param1=value1&param2=value2" -X POST  https://nnnnnnnn.execute-api.eu-central-1.amazonaws.com/sandbox/example
{"message": "ok"}

curl -d '{"Message": "Test123"}' -H "Content-Type: application/json" -X POST https://nnnnnnnn.execute-api.eu-central-1.amazonaws.com/sandbox/example
{"message": "ok"}%  

The access log for the requests should look something like this:

sandbox - NNN.NNN.NNN.NNN - - [11/Aug/2022:04:34:24 +0000] "POST - HTTP/1.1" 200 17 AAAAAAAAAAAAAAA= AAAAAAAAAAAAAAA=

The lambda function for the first requests can look something like this:

START RequestId: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa Version: $LATEST
handler:133 -  INFO - HANDLER CALLED
handler:135 -  INFO - event={'version': '2.0', 'routeKey': 'POST /example', 'rawPath': '/sandbox/example', 'rawQueryString': '', 'headers': {'accept': '*/*', 'content-length': '27', 'content-type': 'application/x-www-form-urlencoded', 'host': 'nnnnnnnn.execute-api.eu-central-1.amazonaws.com', 'user-agent': 'curl/7.68.0', 'x-amzn-trace-id': 'Root=xxx', 'x-forwarded-for': 'NNN.NNN.NNN.NNN', 'x-forwarded-port': '443', 'x-forwarded-proto': 'https'}, 'requestContext': {'accountId': '000000000000', 'apiId': 'nnnnnnnn', 'domainName': 'nnnnnnnn.execute-api.eu-central-1.amazonaws.com', 'domainPrefix': 'nnnnnnnn', 'http': {'method': 'POST', 'path': '/sandbox/example', 'protocol': 'HTTP/1.1', 'sourceIp': 'NNN.NNN.NNN.NNN', 'userAgent': 'curl/7.68.0'}, 'requestId': 'AAAAAAAAAAAAAAA=', 'routeKey': 'POST /example', 'stage': 'sandbox', 'time': '11/Aug/2022:04:33:57 +0000', 'timeEpoch': 1660192437357}, 'stageVariables': {'StageVar': 'Value'}, 'body': 'cGFyYW0xPXZhbHVlMSZwYXJhbTI9dmFsdWUy', 'isBase64Encoded': True}
handler:138 -  INFO - body=param1=value1&param2=value2
handler:140 -  INFO - data={'param1': ['value1'], 'param2': ['value2']}
handler:144 -  INFO - HANDLER DONE
handler:145 -  INFO - result={'message': 'ok'}
handler:146 -  INFO - return_object={'statusCode': 200, 'headers': {'x-custom-header': 'my custom header value', 'content-type': 'application/json'}, 'body': '{"message": "ok"}', 'isBase64Encoded': False}
END RequestId: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
REPORT RequestId: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa  Duration: 19.34 ms  Billed Duration: 20 ms  Memory Size: 128 MB Max Memory Used: 52 MB  Init Duration: 254.68 ms    

The second request log entry looks like this:

START RequestId: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa Version: $LATEST
handler:133 -  INFO - HANDLER CALLED
handler:135 -  INFO - event={'version': '2.0', 'routeKey': 'POST /example', 'rawPath': '/sandbox/example', 'rawQueryString': '', 'headers': {'accept': '*/*', 'content-length': '22', 'content-type': 'application/json', 'host': 'nnnnnnnn.execute-api.eu-central-1.amazonaws.com', 'user-agent': 'curl/7.68.0', 'x-amzn-trace-id': 'Root=xxx', 'x-forwarded-for': 'NNN.NNN.NNN.NNN', 'x-forwarded-port': '443', 'x-forwarded-proto': 'https'}, 'requestContext': {'accountId': '000000000000', 'apiId': 'nnnnnnnn', 'domainName': 'nnnnnnnn.execute-api.eu-central-1.amazonaws.com', 'domainPrefix': 'nnnnnnnn', 'http': {'method': 'POST', 'path': '/sandbox/example', 'protocol': 'HTTP/1.1', 'sourceIp': 'NNN.NNN.NNN.NNN', 'userAgent': 'curl/7.68.0'}, 'requestId': 'AAAAAAAAAAAAAAA=', 'routeKey': 'POST /example', 'stage': 'sandbox', 'time': '11/Aug/2022:04:34:24 +0000', 'timeEpoch': 1660192464915}, 'stageVariables': {'StageVar': 'Value'}, 'body': '{"Message": "Test123"}', 'isBase64Encoded': False}
handler:138 -  INFO - body=
{
    "Message": "Test123"
}

handler:140 -  INFO - data={'Message': 'Test123'}
handler:144 -  INFO - HANDLER DONE
handler:145 -  INFO - result={'message': 'ok'}
handler:146 -  INFO - return_object={'statusCode': 200, 'headers': {'x-custom-header': 'my custom header value', 'content-type': 'application/json'}, 'body': '{"message": "ok"}', 'isBase64Encoded': False}
END RequestId: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
REPORT RequestId: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa  Duration: 10.44 ms  Billed Duration: 11 ms  Memory Size: 128 MB Max Memory Used: 52 MB  

Where to from here...

This was literally just scratching the surface. API's have a lot of options and some of the other topics you may want (or need to) consider include:

I hope this quick introduction gave you a point of reference to start exploring on your own!

Tags

aws, lambda, python, api, proxy, integration