Using a custom model for ML inference with Amazon SageMaker

Recently, we started working on a project at trivago, which requires to use CNN (Convolutional Neural Networks) to do predictions related to images. Since the already available models are not giving any acceptable results, we have used a modified version of VGG16 to overcome this. In this article, I will explain how we deployed Amazon SageMaker to use a custom model and custom inference code we created.


Which we will not discuss here are:

  • What is the use case, because I won’t 😛
  • How to customize the model
  • How to train your dragon 😛
  • How to create a custom Docker image

There are three main components which you should create to make this working.

  1. Model
    This is the pre-trained model we want to use for our inferencing.
  2. Endpoint Config
    This defines the configurations for the SageMaker hosting service where your model is hosted.
  3. Endpoint
    This defines the entry point for the model, where external services can use to get the inferences.


Behind the scene

When the components are setup as above, following things are taken care of, by the Amazon SageMaker service.

  1. Read the Endpoint configuration and decide the required ML Instance Type and the initial instance count.
  2. Create the ML instance(s) and load the specified Docker image into them.
  3. Download the specified model (which MUST be a tar.gz file) and extract it to /opt/ml/model/ directory.
  4. Then it will run docker run image_id serve and your image MUST be able to handle that.
    • Your docker image should expose /invocations and /ping on port 8080.
  5. Then users/programs can use /ping for health checks and /invocations to get the inferences.

See their official documentation for more/less information.

Configure using the AWS Console

SageMaker Model

This represents the custom inferencing module and has two main parts; the custom model and the docker image containing the custom code.

The model should be hosted in an S3 bucket and it MUST be a tar.gz file containing a .hd5 file. For example, if your model is named predict_fifa_18_v1.hd5, make sure that you have a  predict_fifa_18_v1.tar.gz in your S3 bucket which contains your model file.

But why?

Because that’s how SageMaker is built: to read the specified file, untar it and copy the file into /opt/ml/model/ directory.

If you’re creating a model using the AWS console GUI, this is pretty simple.

  1. Login to your AWS console account (and make you’re in the desired region)
  2. Go to Amazon SageMaker (via the Services menu)
    Since we’re trying to build our own inferencing model, we can ignore the Notebook and Training sections, and straight go to the Inference section.

    Amazon SageMaker Dashboard
  3. Under the Inference section, click on the Models and now you will be directed to the SageMaker models view. It should be empty if you haven’t already created any models (if yes, why are you even reading this article? 😛 ).
  4. Click on the Create model button on the top-right corner.
  5. Now you have to give the correct parameters to create a model.
    Create Model Window
    Create SageMaker Model

    Let’s have a look at the parameters it requires.

    • Model Settings
      • Model Name
        Nothing fancy, just give a recognizable name to your model.
      • IAM Role
        This defines the IAM role which is used to operate this model. If you already have a model which can do so, you can select one from the list. But ideally, it’s always better to create one for each model.

        Select the IAM Role

        When you click on Create a new role, it will give you a pop-up window to configure your new role.

        Create IAM Role

        As you can see, you can restrict the role to a particular S3 bucket. You can specify the bucket name which you have the custom model hosted, which is a good practice. Or, any other option which is self-explanatory.
        We will talk about the exact permission needed when we talk about the Cloudformation template for this.

    • Network
      • VPC  (Virtual Private Cloud)
        If you have an AWS VPC configured and you want to use it with this model, you can specify the VPC you want. If you don’t (in most cases), you can ignore it.
    • Primary Container
      This is where we specify the details about our custom (docker) image and the model to use.

      • Location of inference code image
        This is the ECS name of your (docker) image in the AWS ECR. To know more about how to create the custom inference image, please read the “Creating the Custom Inference Image” section. To get your image ID, go to ECS from the Services menu and select Repositories under Amazon ECR from the left pane.
        An example image ID would be as below.
        Where 1.0 is the tag of the docker image you want SageMaker to deploy. (Learn more about Docker tags from here)
      • Location of model artifacts – optional
        This is the location of your custom model, which usually is an S3 link.
        You can get the link from the S3 bucket, as below.

        Link for the model in the S3 bucket

        Copy the link and paste as the location. Make sure that the file is a tar.gz file and it contains the model (.hd5) file (with the same file name).

      • Container hostname – optional
        If you need to specify a separate DNS, you can do it here. But you can ignore it for now.
    • Tags
      You can specify the Tags for this model which will be useful when searching the models and for billing (if you have many models).
  6. After specifying everything, hit Create model and it will start creating your model (template).

Endpoint Configuration

After your model is created successfully, you will get a notification at the top of the screen, where you will be prompted to Create Endpoint straightaway for your model. You can either click on that link and then you will be directed to create an Endpoint, where you can create a new Endpoint Configuration as well. Or you can click on Endpoint configurations on the left pane and then click on Create endpoint configuration button on the top-right corner.

Either way, when you’re prompted to the Create endpoint configuration window, it will look like this.

Create endpoint configuration
Create endpoint configuration
  • New endpoint configuration
    • Endpoint configuration name
      Nothing fancy, just enter a recognizable name for your endpoint.
    • Encryption key – optional
      If you want to encrypt your data on the storage volume attached to the ML compute instance (that means your Docker container), you can specify an AWS KMS key in here.
  • Production variants
    In here, you can specify the SageMaker Model(s) that you will be using with this Endpoint Configuration, by clicking the Add model link.
    The nice thing about having multiple models is, you can do load balancing between different models.
    But why?
    Imagine if you already have a working model and (you do some improvements to it and) you want to C-Test a new model. For this, you can deploy the new model as a new SageMaker Model and divert 10% or 20% or the traffic to the new model (while serving 90% or 80% to the older one). In this way, you’re not completely migrating to the new model, but rather gradually doing it, while changing the weight of the load balancing. (See InitialVariantWeight in ProductVariant documentation)
  • Tags
    This serves the same purpose as the tags you added for the model.

Finally, click Create endpoint configuration and it will create the configuration for you.


This acts as an endpoint to your inference service which is running on your Docker image and exposes /invocations and /ping endpoints to the externa world.

Either if you have selected the Create endpoint after creating the model (from the notification you get), or you click on Endpoints on the left pane and then click on the Create endpoint button at the top-right, you will be prompted to the Create endpoint window which look like below.

Create and configure endpoint
Create and configure endpoint
  • Endpoint
    • Endpoint name
      You know the drill!
  • Attach endpoint configuration
    In here, you have to specify the SageMaker Endpoint Configuration you want to use with this Endpoint.

    • Use an existing endpoint configuration
      If you have already created a configuration (from the previous step or you somehow have an existing configuration), you can select it from the list below.
    • Create a new endpoint configuration
      If you have skipped the Create endpoint configuration step (either by directly clicking the Create endpoint link after creating the model or for some other reason), you can do it here as well. It’s similar to creating an endpoint as explained in the previous section. To proceed, you should select the desired configuration and click on Select endpoint configuration.
  • Tags
    Same as the previously mentioned tags.

After you clicked on the Create endpoint button, you have enough time to make a coffee for yourself and come back, because it will take around 10 minutes. Unfortunately, if you’re configurations are wrong, some errors will be thrown at the end of this timeframe. So, be prepared to evaluate your patience. And don’t forget to check the “Common Errors” section of this article to reduce the number of errors you get.

If you did everything right, you will see the status of the endpoint as InService. If it had failed, you can check the logs and it might give you some idea about what had gone wrong.

You CAN NOT use Postman or similar tool (as far as I know/tried) to test your endpoint. So, you need to use the AWS SDK (i.e. AWS boto) and here is a sample code which you can use.

Create the infrastructure using a Cloudformation template

If you want your infrastructure to be well documented and have more flexibility in fine-tuning, you can create a Cloudformation template for the above components (resources). And please note I’m using YAML as my definition language, which I personally believe, is cleaner and more readable than using JSON.

The order of the template would be as below.

    Endpoint Configuration
    Execution Role

Let’s take a look how we can define each of the resources.


  Type: "AWS::SageMaker::Endpoint"
    EndpointName: !Sub ${Domain}--${Function}--predict-fifa-18-endpoint--${Environment}
      !GetAtt MyEndpointConfig.EndpointConfigName
      - Key: Environment
        Value: !Sub ${Environment}
      - Key: Domain
        Value: !Sub ${Domain}
      - Key: Function
        Value: !Sub ${Function}
      - Key: Project
        Value: !Sub ${Domain}

Nothing fancy goes here. It’s just defining a name for our endpoint and specifying the Endpoint Configuration to use for it (which should be defined as MyEndpointConfig afterwards).

If you’re new to Cloudformation and/or YAML, please read the YAML documentation and how to use it with Cloudformation. If you already are familiar with writing Cloudformation templates using Json, you can use a YAML to Json converter.

But for now, keep in mind that ${Parameter} is how you a pre-defined parameter (in the Parameters section) and !Sub is a YAML function which we use to specify that the following value is an expression to be substitued.

Endpoint Configuration

  Type: "AWS::SageMaker::EndpointConfig"
    EndpointConfigName: !Sub ${Domain}--${Function}--predict-fifa-18-config--${Environment}
      - InitialInstanceCount: 1
        InitialVariantWeight: 1.0
        InstanceType: !Sub ${InstanceType}
        ModelName: !GetAtt MyModel.ModelName
        VariantName: !GetAtt MyModel.ModelName
      - Key: Environment
        Value: !Sub ${Environment}
      - Key: Domain
        Value: !Sub ${Domain}
      - Key: Function
        Value: !Sub ${Function}
      - Key: Project
        Value: !Sub ${Domain}

You should pay attention to the ProductVariants section, because that is where the model(s) for this Endpoint Configuration is defined. Each item enclosed represets a model, which we should define afterwards. Fo each instance, be mindful about selecting a proper AWS ML Compute Instance by considering the size and the complexity of the model. You can get more information about the ML compute instance types from this link.

To learn more about ProductVariants, please read the documentation.


  Type: "AWS::SageMaker::Model"
    ModelName: !Sub ${Domain}--${Function}--predict-fifa-18--${Environment}
      ModelDataUrl: !Sub "https://s3-${Region}${S3Bucket}/models/${ModelNamePrefix}--v${ModelVersion}.hdf5.tar.gz"
      Image: !Sub "${AccountID}.dkr.ecr.${Region}${ECRImageName}:${ModelVersion}"
    ExecutionRoleArn: !GetAtt ExecutionRole.Arn
      - Key: Environment
        Value: !Sub ${Environment}
      - Key: Domain
        Value: !Sub ${Domain}
      - Key: Function
        Value: !Sub ${Function}
      - Key: Project
        Value: !Sub ${Domain}

The most important thing to focus here is the PrimaryContainer section where we define the key attributes of the model: ModelDataUrl and Image. As you may already know, we should specify the S3 bucket location of the model that should be loaded, and the Docker image which should be deployed in the ML compute instance, respectively.

And you should also define the Execution Role which this model uses, and it’s recommended to create a new role, which we should do after this block.

Execution Role

Now we should define the execution role which should be used with this SageMaker setup.

  Type: "AWS::IAM::Role"
    RoleName:  !Sub ${Domain}--${Function}--predict-fifa-18-role--${Environment}
      - "arn:aws:iam::aws:policy/CloudWatchFullAccess"
      - "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
      Version: "2012-10-17"
          Effect: "Allow"
              - ""
            - "sts:AssumeRole"
    Path: "/"
        PolicyName: !Sub ${Domain}--${Function}--predict-fifa-18-policy--${Environment}
          Version: "2012-10-17"
          - Action:
            - s3:ListBucket
            Effect: Allow
            - !Sub arn:aws:s3:::${S3Bucket}
          - Action:
            - s3:GetObject
            Effect: Allow
            - !Sub arn:aws:s3:::${S3Bucket}/*

When we create the execution role from the AWS web console, it does all the hardwork for us. But since we have taken the burden to ourselves by deciding to create a Cloudformation template, we should do it by ourselves.

We came across many problems related to the permissions to execute different tasks such as reading S3, writing logs, reading from ECR etc. After struggling for hours, we figured that we can use an already existing IAM Policy for our new role. We can specify them under ManagedPolicyArns section.

We added CloudWatchFullAccess for our in order for it to be able to have full access to CloudWatch logs, otherwise even though it would successfully create our endpoint, there won’t be any log streams created. (See AWS Managed (Predefined) Policies for CloudWatch for more information.)

And we added AmazonSageMakerFullAccess which covers almost all the required permission to work with SageMaker and the other related services. (See Amazon SageMaker Roles for more information)

We also created a custom policy for our role to restrict the role accessing other S3 buckets. If your S3 bucket name contains “sagemaker”, then this custom policy is invalidated, because  AmazonSageMakerFullAccess gives full access to all the S3 buckets with “sagemaker” in the name.

You have to create a Cloudformation template with these resources and the required parameters and then you can import it to Cloudformation as a new stack.

Your final Cloudformation template should look similar to this one.

Now everything is set and you can use the sample code or your complex application(s) to call the endpoint and get the predictions!

It was a long read indeed. Here is a cookie for you!



Creating the Custom Inference Image

We created a Docker container with Ubuntu 16 running and we used nginx, gunicorn and flask to create /invocations and /ping endpoints on port 8080. In your inference code, make sure that you’re opening the model from /opt/ml/model folder, otherwise your endpoint will fail!

Common Errors

There were numerous we faced during the process. Here are some significant ones, and forgive me if these are not the exact errors, rather these are the situations where you get errors.

  • File you specified is not an archive
    It should be evident for you now that your model on S3 MUST be a tar.gz file containing your model.
  • Couldn’t find the file (probably thrown from your inference code)
    Most probably, your tar.gz file doesn’t contain a file or has a different file name compared to the one you’re using in the code. And make sure that you’re reading the model from /opt/ml/model folder.
  • No enough permission
    This happens when your execution role doesn’t have enough permission to perform all the tasks in the schedule. For example, it should be able to read from the S3 bucket, read from ECR, create new EC2 instances (ML compute instances), write to logs etc. Make sure you give enough privileges and we have used AmazonSageMakerFullAccess policy to do it.
  • No log streams are created
    That’s because your execution role doesn’t have enough privileges to create log streams and write to logs. Make sure you give enough privileges to the role, such as CloudWatchFullAccess.
  • Endpoint timeout or not available in some cases
    This happens when your ML compute instances are having low resources to perform. Make sure the ML Instance Type you selected is capable to handle multiple requests at the same time, by considering the size of the model and the complexity of your inference code.


Apart from the AWS Official Documentation, below resources were helpful to achieve our goal.


Leave a Reply