Introduction to AWS State Machines for decoupling Lambda functions

If you’re familiar with AWS (Amazon Web Services), you should be familiar with Amazon Lambda functions where you can write your serverless code. However, if you want to perform some heavy work, Lambda functions are not the most preferred choice since there are multiple limitations with regards to memory and execution time. Because of that, you might tend to use EC2 (or ECS) instances and deploy your code, because of the power it provides. Even though your code is logically decoupled (or rather less coupled), it would still be physically coupled because they run on the same machine.

If you can illustrate your algorithm in a more decoupled manner, for instance as a flowchart diagram, here is your chance to use AWS Step Functions.

What is a Step Function?

Step Function is a methodology which enables you to coordinate between your distributed micro-services (i.e. Lambda) and can easily be represented by a visual workflow.

So, we can describe a Step Function as a properly arranged micro-services which still are running independently in order to achieve a common end-goal. (You can also imagine them as small building blocks such as LEGOs).

Here are some special features and benefits you gain by using Step Functions.

You can visualize your algorithm/process in a visual form consisting of different steps/states.
You don’t have to explicitly call the next step(s) from your micro-service code.
- This enables you to easily plug new micro-services and scale faster.
- Also, it makes your micro-services more independent and individually testable.
Instead of (or in addition to) monitoring individual micro-services, now you can monitor them as a part of a single process.
- Which means, you can monitor the final outcome of the process along with the individual micro-services.

When to use a Step Function?

When you can clearly see interdependent micro-services calling one another creating a logical flow of work.
When you can clearly see that your micro-services consist of several independent functions which can be separated into new micro-services.

When not to use a Step Function

When you don’t see a logical dependency between the input/outputs of your micro-services.

For more information on Step Functions, please follow the below links.

Getting Started

To ease the explanation, let’s think of an example scenario where you can apply a Step Function.

Simple Image Classifier

Introduction

We need to build a system which will classify a given image into either of the categories of Human, Animal or Building. It will do some dark magic (i.e. Image Processing and Machine Learning) to give a category to the image with a probability.

If the probability is less than 75%, we should fail.
If it’s a Human, it should try to process the image further and try to detect if it’s a Male, Female, Unidentifiable.
If the image is of an Animal, it should provide the closest animal type (i.e. Cat, Dog, Horse etc.).
If it’s a building, there is no further processing is needed.
If the given image doesn’t fall into either of those three categories, it should throw an error.
Upon successfully extracting the information, it should store the information in an RDS instance.

Visual Representation.

Regardless of using a Step Function or not, it’s always better to illustrate this flow as a diagram.

Logical Data Flow of the Business Requirement

Note: We will NOT talk about how to implement the logic behind classifiers.

According to the above diagram, we can create four Lambda functions for the states.

Download the image & call the generic image classifier.
Call the gender classifier.
Call the animal classifier.
Store data to RDS.

However, the Fail state shouldn’t really have code level implementation, because it represents a failure state of the flow (State Machine).

AWS Step Function

We can consider the above diagram as a representation of an AWS Step Function, where each of the nodes represents a State and the relationship between them is defined using a State Machine.

State Machine

If you’re familiar with the concept of State Machines in Computer Science, you’ll already know the basics of how an AWS State Machine works. A State Machine (specifically, a Finite State Machine) will take an input and process it using different steps according to the value passed from the previous step.

ImageClassifierStatesExecutionRole:
  Type: "AWS::IAM::Role"
  Properties:
    AssumeRolePolicyDocument:
      Version: "2012-10-17"
      Statement:
- Effect: "Allow"
          Principal:
            Service:
- !Sub states.${AWS::Region}.amazonaws.com
          Action: "sts:AssumeRole"
    Path: "/"
    Policies:
- PolicyName: StatesExecutionPolicy
        PolicyDocument:
          Version: "2012-10-17"
          Statement:
- Effect: Allow
              Action:
- "lambda:InvokeFunction"
              Resource: "*"

ImageClassifierStateMachine:
  Type: "AWS::StepFunctions::StateMachine"
  Properties:
    StateMachineName: image-classifier-state-machine
    DefinitionString:
    !Sub
        - |-
          {
            "Comment": "This state machine will perform image classification.",
            "StartAt": "InitialState",

            "States": {
              // states definition
            }
          }
        - {
              // variable imports for formatting
          }
    RoleArn: !GetAtt [ ImageClassifierStatesExecutionRole, Arn ]

In order to define the relationship between the states, AWS provides a JSON alike AWS State Definition Language.

See the official documentation for the AWS State Machines from this link.

States

A state represents a single step in a state machine, which takes an input, performs some tasks using it and then return some value for the next state. A state can be either one of below.

Pass
Simply passes the input as the output, without doing any work.
Task
Represents a single unit of work performed by a state machine.
Choice
When we need to introduce branching to our logic and route the flow according to the input.
Wait
Delays the execution of the state machine by a specified time duration.
Succeed
Stops the execution of the step function successfully.
Fail
Terminate the execution of the step function with an error.
Parallel
These type of states can be used to create parallel branches of execution of the state machine.

For our task, we will need the state types of Task, Choice, Succeed and Fail.

State Machine Definition

Let’s imagine you already have four Lambda functions with the logical IDs:

GenericClassifierLambda
GenderClassifierLambda
AnimalClassifierLambda
FinalizeLambda

ImageClassifierStateMachine:
  Type: "AWS::StepFunctions::StateMachine"
  Properties:
    StateMachineName: image-classifier-state-machine
    DefinitionString:
    !Sub
        - |-
          {
            "Comment": "This state machine will perform image classification.",
            "StartAt": "InitialState",

            "States": {
              "InitialState": {
                "Type" : "Task",
                "Resource": "${GenericClassifierLambdaArn}",
                "Next": "RouteState"
              },

              "RouteState": {
                "Type" : "Choice",
                "Choices": [
                  {
                    "Variable": "$.probability",
                    "NumericLessThan": 0.75,
                    "Next": "FailState"
                  },
                  {
                    "And" : [
                      {
                        "Variable": "$.image_category",
                        "StringEquals": "Human"
                      },
                      {
                        "Variable": "$.probability",
                        "NumericGreaterThanEquals": 0.75
                      }
                    ],
                    "Next": "GenderClassifierState"
                  },
                  {
                    "And" : [
                      {
                        "Variable": "$.image_category",
                        "StringEquals": "Animal"
                      },
                      {
                        "Variable": "$.probability",
                        "NumericGreaterThanEquals": 0.75
                      }
                    ],
                    "Next": "AnimalClassifierState"
                  },
                  {
                    "And" : [
                      {
                        "Variable": "$.image_category",
                        "StringEquals": "Building"
                      },
                      {
                        "Variable": "$.probability",
                        "NumericGreaterThanEquals": 0.75
                      }
                    ],
                    "Next": "PersistState"
                  }
                ],
                "Default": "FailState"
              },

              "GenderClassifierState": {
                "Type" : "Task",
                "Resource": "${GenderClassifierLambdaArn}",
                "Next": "PersistState"
              },

              "AnimalClassifierState": {
                "Type" : "Task",
                "Resource": "${AnimalClassifierLambdaArn}",
                "Next": "PersistState"
              },

              "PersistState": {
                "Type": "Task",
                "Resource": "${FinalizeLambdaArn}",
                "Next": "SuceessState"
              },

              "SuceessState": {
                "Type": "Succeed"
              },

              "FailState": {
                "Type": "Fail",
                "Error": "FailStateError",
                "Cause": "Image couldn't be classified!"
              }
            }
          }
        - {
            GenericClassifierLambdaArn: {"Fn::ImportValue" : "GenericClassifierLambdaArn"},
            GenderClassifierLambdaArn: {"Fn::ImportValue" : "GenderClassifierLambdaArn"},
            AnimalClassifierLambdaArn: {"Fn::ImportValue" : "AnimalClassifierLambdaArn"},
            FinalizeLambdaArn: {"Fn::ImportValue" : "FinalizeLambdaArn"}
          }
    RoleArn: !GetAtt [ ImageClassifierStatesExecutionRole, Arn ]

NOTE: You have to have the ARNs exported in the Lambda function CloudFormation (which are used in Fn::ImportValue functions).

Visual Representation of the State Machine in the AWS Web Console

Let’s assume the input we get to the State Machine is as below.

{
    "url" : "http://somewebsite.com/random_image.jpg"
}

And the Starting state will get the value wrapped in “input” JSON tag.

{
    "input" : {
        "url" : "http://somewebsite.com/random_image.jpg"
    }
}

But only the body of the “input” will be passed to the Lambda function as the “event” as a JSON. If the handler for your InitialState is handle, the event will contain only the JSON with url.

def handle(event: dict, context) -> dict:
    logging.debug("Input : %s", str(event))
    #  Do some dark magic and get the image classification
    image_classification = _get_classification(event)
    return image_classification

It’s better image_classification to be a dict and then it will be passed to the next step by the State Machine.

As you can see, the calling the Lambda functions are handled by the State Machine, where each Lambda function is not aware of the others, resulting in minimum possible dependency.

Parameter Filtering

If you want to filter the input/output values between the states, you can add Paths to the states. All of the below paths default to $, which represents the entire input/output.

InputPath
Selects a portion of the input and pass to the state.
ResultPath
Defines the parent level node where the returned value should be added.
OutputPath
Selects the portion of the results which should be passed to the next state.

Check the input/output processing for more information.

Make sure that you use the input for a state/task considering the output of the previous state.

Testing the Step Function

After you deploy your Lambdas and then the State Machine, testing is pretty straightforward. You can either use the AWS Web Console or any SDKs to call a Step Function.

In the web console, go to the Step Function page and then you will see “Start Execution” button. When you click that, you will be prompted to provide the desired input and then click on “Start Execution”.

Executing a Step Function
(source: https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-creating-lambda-state-machine.html#create-lambda-state-machine-step-5)

You should make sure to provide a unique Execution ID for the execution, or else you can leave it blank for AWS will generate a unique ID for you.

If you’re using an SDK (i.e. boto), you can use a stepfunctions client to start an execution.

client = boto3.client('stepfunctions')
client.start_execution(
            stateMachineArn="ARN_for_the_step_function",
            input={"url": "http://somewebsite.com/random_image.jpg"}
        )

See more on the stepfunctions client on the official documentation.

Summary

Step Functions are a very powerful way of creating coordination between your micro-services (Lambdas). If you can visualize your business flow as a state machine, the best way is to separate the responsibilities into different Lambdas, if you already haven’t. Then you can declare a State Machine to do the coordination between them. It provides you the flexibility of replacing Lambda function easily without affecting others and enables us to individually test our Lambdas as well.