SageMaker / Client / create_endpoint

create_endpoint#

SageMaker.Client.create_endpoint(**kwargs)#

Creates an endpoint using the endpoint configuration specified in the request. SageMaker uses the endpoint to provision resources and deploy models. You create the endpoint configuration with the CreateEndpointConfig API.

Use this API to deploy models using SageMaker hosting services.

Note

You must not delete an EndpointConfig that is in use by an endpoint that is live or while the UpdateEndpoint or CreateEndpoint operations are being performed on the endpoint. To update an endpoint, you must create a new EndpointConfig.

The endpoint name must be unique within an Amazon Web Services Region in your Amazon Web Services account.

When it receives the request, SageMaker creates the endpoint, launches the resources (ML compute instances), and deploys the model(s) on them.

Note

When you call CreateEndpoint, a load call is made to DynamoDB to verify that your endpoint configuration exists. When you read data from a DynamoDB table supporting Eventually Consistent Reads, the response might not reflect the results of a recently completed write operation. The response might include some stale data. If the dependent entities are not yet in DynamoDB, this causes a validation error. If you repeat your read request after a short time, the response should return the latest data. So retry logic is recommended to handle these possible issues. We also recommend that customers call DescribeEndpointConfig before calling CreateEndpoint to minimize the potential impact of a DynamoDB eventually consistent read.

When SageMaker receives the request, it sets the endpoint status to Creating. After it creates the endpoint, it sets the status to InService. SageMaker can then process incoming requests for inferences. To check the status of an endpoint, use the DescribeEndpoint API.

If any of the models hosted at this endpoint get model data from an Amazon S3 location, SageMaker uses Amazon Web Services Security Token Service to download model artifacts from the S3 path you provided. Amazon Web Services STS is activated in your Amazon Web Services account by default. If you previously deactivated Amazon Web Services STS for a region, you need to reactivate Amazon Web Services STS for that region. For more information, see Activating and Deactivating Amazon Web Services STS in an Amazon Web Services Region in the Amazon Web Services Identity and Access Management User Guide.

Note

To add the IAM role policies for using this API operation, go to the IAM console, and choose Roles in the left navigation pane. Search the IAM role that you want to grant access to use the CreateEndpoint and CreateEndpointConfig API operations, add the following policies to the role.

  • Option 1: For a full SageMaker access, search and attach the AmazonSageMakerFullAccess policy.

  • Option 2: For granting a limited access to an IAM role, paste the following Action elements manually into the JSON file of the IAM role: "Action": ["sagemaker:CreateEndpoint", "sagemaker:CreateEndpointConfig"] "Resource": [ "arn:aws:sagemaker:region:account-id:endpoint/endpointName" "arn:aws:sagemaker:region:account-id:endpoint-config/endpointConfigName" ] For more information, see SageMaker API Permissions: Actions, Permissions, and Resources Reference.

See also: AWS API Documentation

Request Syntax

response = client.create_endpoint(
    EndpointName='string',
    EndpointConfigName='string',
    DeploymentConfig={
        'BlueGreenUpdatePolicy': {
            'TrafficRoutingConfiguration': {
                'Type': 'ALL_AT_ONCE'|'CANARY'|'LINEAR',
                'WaitIntervalInSeconds': 123,
                'CanarySize': {
                    'Type': 'INSTANCE_COUNT'|'CAPACITY_PERCENT',
                    'Value': 123
                },
                'LinearStepSize': {
                    'Type': 'INSTANCE_COUNT'|'CAPACITY_PERCENT',
                    'Value': 123
                }
            },
            'TerminationWaitInSeconds': 123,
            'MaximumExecutionTimeoutInSeconds': 123
        },
        'RollingUpdatePolicy': {
            'MaximumBatchSize': {
                'Type': 'INSTANCE_COUNT'|'CAPACITY_PERCENT',
                'Value': 123
            },
            'WaitIntervalInSeconds': 123,
            'MaximumExecutionTimeoutInSeconds': 123,
            'RollbackMaximumBatchSize': {
                'Type': 'INSTANCE_COUNT'|'CAPACITY_PERCENT',
                'Value': 123
            }
        },
        'AutoRollbackConfiguration': {
            'Alarms': [
                {
                    'AlarmName': 'string'
                },
            ]
        }
    },
    Tags=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ]
)
Parameters:
  • EndpointName (string) –

    [REQUIRED]

    The name of the endpoint.The name must be unique within an Amazon Web Services Region in your Amazon Web Services account. The name is case-insensitive in CreateEndpoint, but the case is preserved and must be matched in InvokeEndpoint.

  • EndpointConfigName (string) –

    [REQUIRED]

    The name of an endpoint configuration. For more information, see CreateEndpointConfig.

  • DeploymentConfig (dict) –

    The deployment configuration for an endpoint, which contains the desired deployment strategy and rollback configurations.

    • BlueGreenUpdatePolicy (dict) –

      Update policy for a blue/green deployment. If this update policy is specified, SageMaker creates a new fleet during the deployment while maintaining the old fleet. SageMaker flips traffic to the new fleet according to the specified traffic routing configuration. Only one update policy should be used in the deployment configuration. If no update policy is specified, SageMaker uses a blue/green deployment strategy with all at once traffic shifting by default.

      • TrafficRoutingConfiguration (dict) – [REQUIRED]

        Defines the traffic routing strategy to shift traffic from the old fleet to the new fleet during an endpoint deployment.

        • Type (string) – [REQUIRED]

          Traffic routing strategy type.

          • ALL_AT_ONCE: Endpoint traffic shifts to the new fleet in a single step.

          • CANARY: Endpoint traffic shifts to the new fleet in two steps. The first step is the canary, which is a small portion of the traffic. The second step is the remainder of the traffic.

          • LINEAR: Endpoint traffic shifts to the new fleet in n steps of a configurable size.

        • WaitIntervalInSeconds (integer) – [REQUIRED]

          The waiting time (in seconds) between incremental steps to turn on traffic on the new endpoint fleet.

        • CanarySize (dict) –

          Batch size for the first step to turn on traffic on the new endpoint fleet. Value must be less than or equal to 50% of the variant’s total instance count.

          • Type (string) – [REQUIRED]

            Specifies the endpoint capacity type.

            • INSTANCE_COUNT: The endpoint activates based on the number of instances.

            • CAPACITY_PERCENT: The endpoint activates based on the specified percentage of capacity.

          • Value (integer) – [REQUIRED]

            Defines the capacity size, either as a number of instances or a capacity percentage.

        • LinearStepSize (dict) –

          Batch size for each step to turn on traffic on the new endpoint fleet. Value must be 10-50% of the variant’s total instance count.

          • Type (string) – [REQUIRED]

            Specifies the endpoint capacity type.

            • INSTANCE_COUNT: The endpoint activates based on the number of instances.

            • CAPACITY_PERCENT: The endpoint activates based on the specified percentage of capacity.

          • Value (integer) – [REQUIRED]

            Defines the capacity size, either as a number of instances or a capacity percentage.

      • TerminationWaitInSeconds (integer) –

        Additional waiting time in seconds after the completion of an endpoint deployment before terminating the old endpoint fleet. Default is 0.

      • MaximumExecutionTimeoutInSeconds (integer) –

        Maximum execution timeout for the deployment. Note that the timeout value should be larger than the total waiting time specified in TerminationWaitInSeconds and WaitIntervalInSeconds.

    • RollingUpdatePolicy (dict) –

      Specifies a rolling deployment strategy for updating a SageMaker endpoint.

      • MaximumBatchSize (dict) – [REQUIRED]

        Batch size for each rolling step to provision capacity and turn on traffic on the new endpoint fleet, and terminate capacity on the old endpoint fleet. Value must be between 5% to 50% of the variant’s total instance count.

        • Type (string) – [REQUIRED]

          Specifies the endpoint capacity type.

          • INSTANCE_COUNT: The endpoint activates based on the number of instances.

          • CAPACITY_PERCENT: The endpoint activates based on the specified percentage of capacity.

        • Value (integer) – [REQUIRED]

          Defines the capacity size, either as a number of instances or a capacity percentage.

      • WaitIntervalInSeconds (integer) – [REQUIRED]

        The length of the baking period, during which SageMaker monitors alarms for each batch on the new fleet.

      • MaximumExecutionTimeoutInSeconds (integer) –

        The time limit for the total deployment. Exceeding this limit causes a timeout.

      • RollbackMaximumBatchSize (dict) –

        Batch size for rollback to the old endpoint fleet. Each rolling step to provision capacity and turn on traffic on the old endpoint fleet, and terminate capacity on the new endpoint fleet. If this field is absent, the default value will be set to 100% of total capacity which means to bring up the whole capacity of the old fleet at once during rollback.

        • Type (string) – [REQUIRED]

          Specifies the endpoint capacity type.

          • INSTANCE_COUNT: The endpoint activates based on the number of instances.

          • CAPACITY_PERCENT: The endpoint activates based on the specified percentage of capacity.

        • Value (integer) – [REQUIRED]

          Defines the capacity size, either as a number of instances or a capacity percentage.

    • AutoRollbackConfiguration (dict) –

      Automatic rollback configuration for handling endpoint deployment failures and recovery.

      • Alarms (list) –

        List of CloudWatch alarms in your account that are configured to monitor metrics on an endpoint. If any alarms are tripped during a deployment, SageMaker rolls back the deployment.

        • (dict) –

          An Amazon CloudWatch alarm configured to monitor metrics on an endpoint.

          • AlarmName (string) –

            The name of a CloudWatch alarm in your account.

  • Tags (list) –

    An array of key-value pairs. You can use tags to categorize your Amazon Web Services resources in different ways, for example, by purpose, owner, or environment. For more information, see Tagging Amazon Web Services Resources.

    • (dict) –

      A tag object that consists of a key and an optional value, used to manage metadata for SageMaker Amazon Web Services resources.

      You can add tags to notebook instances, training jobs, hyperparameter tuning jobs, batch transform jobs, models, labeling jobs, work teams, endpoint configurations, and endpoints. For more information on adding tags to SageMaker resources, see AddTags.

      For more information on adding metadata to your Amazon Web Services resources with tagging, see Tagging Amazon Web Services resources. For advice on best practices for managing Amazon Web Services resources with tagging, see Tagging Best Practices: Implement an Effective Amazon Web Services Resource Tagging Strategy.

      • Key (string) – [REQUIRED]

        The tag key. Tag keys must be unique per resource.

      • Value (string) – [REQUIRED]

        The tag value.

Return type:

dict

Returns:

Response Syntax

{
    'EndpointArn': 'string'
}

Response Structure

  • (dict) –

    • EndpointArn (string) –

      The Amazon Resource Name (ARN) of the endpoint.

Exceptions

  • SageMaker.Client.exceptions.ResourceLimitExceeded