SageMaker / Client / get_scaling_configuration_recommendation
get_scaling_configuration_recommendation#
- SageMaker.Client.get_scaling_configuration_recommendation(**kwargs)#
Starts an Amazon SageMaker Inference Recommender autoscaling recommendation job. Returns recommendations for autoscaling policies that you can apply to your SageMaker endpoint.
See also: AWS API Documentation
Request Syntax
response = client.get_scaling_configuration_recommendation( InferenceRecommendationsJobName='string', RecommendationId='string', EndpointName='string', TargetCpuUtilizationPerCore=123, ScalingPolicyObjective={ 'MinInvocationsPerMinute': 123, 'MaxInvocationsPerMinute': 123 } )
- Parameters:
InferenceRecommendationsJobName (string) –
[REQUIRED]
The name of a previously completed Inference Recommender job.
RecommendationId (string) –
The recommendation ID of a previously completed inference recommendation. This ID should come from one of the recommendations returned by the job specified in the
InferenceRecommendationsJobNamefield.Specify either this field or the
EndpointNamefield.EndpointName (string) –
The name of an endpoint benchmarked during a previously completed inference recommendation job. This name should come from one of the recommendations returned by the job specified in the
InferenceRecommendationsJobNamefield.Specify either this field or the
RecommendationIdfield.TargetCpuUtilizationPerCore (integer) – The percentage of how much utilization you want an instance to use before autoscaling. The default value is 50%.
ScalingPolicyObjective (dict) –
An object where you specify the anticipated traffic pattern for an endpoint.
MinInvocationsPerMinute (integer) –
The minimum number of expected requests to your endpoint per minute.
MaxInvocationsPerMinute (integer) –
The maximum number of expected requests to your endpoint per minute.
- Return type:
dict
- Returns:
Response Syntax
{ 'InferenceRecommendationsJobName': 'string', 'RecommendationId': 'string', 'EndpointName': 'string', 'TargetCpuUtilizationPerCore': 123, 'ScalingPolicyObjective': { 'MinInvocationsPerMinute': 123, 'MaxInvocationsPerMinute': 123 }, 'Metric': { 'InvocationsPerInstance': 123, 'ModelLatency': 123 }, 'DynamicScalingConfiguration': { 'MinCapacity': 123, 'MaxCapacity': 123, 'ScaleInCooldown': 123, 'ScaleOutCooldown': 123, 'ScalingPolicies': [ { 'TargetTracking': { 'MetricSpecification': { 'Predefined': { 'PredefinedMetricType': 'string' }, 'Customized': { 'MetricName': 'string', 'Namespace': 'string', 'Statistic': 'Average'|'Minimum'|'Maximum'|'SampleCount'|'Sum' } }, 'TargetValue': 123.0 } }, ] } }
Response Structure
(dict) –
InferenceRecommendationsJobName (string) –
The name of a previously completed Inference Recommender job.
RecommendationId (string) –
The recommendation ID of a previously completed inference recommendation.
EndpointName (string) –
The name of an endpoint benchmarked during a previously completed Inference Recommender job.
TargetCpuUtilizationPerCore (integer) –
The percentage of how much utilization you want an instance to use before autoscaling, which you specified in the request. The default value is 50%.
ScalingPolicyObjective (dict) –
An object representing the anticipated traffic pattern for an endpoint that you specified in the request.
MinInvocationsPerMinute (integer) –
The minimum number of expected requests to your endpoint per minute.
MaxInvocationsPerMinute (integer) –
The maximum number of expected requests to your endpoint per minute.
Metric (dict) –
An object with a list of metrics that were benchmarked during the previously completed Inference Recommender job.
InvocationsPerInstance (integer) –
The number of invocations sent to a model, normalized by
InstanceCountin each ProductionVariant.1/numberOfInstancesis sent as the value on each request, wherenumberOfInstancesis the number of active instances for the ProductionVariant behind the endpoint at the time of the request.ModelLatency (integer) –
The interval of time taken by a model to respond as viewed from SageMaker. This interval includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.
DynamicScalingConfiguration (dict) –
An object with the recommended values for you to specify when creating an autoscaling policy.
MinCapacity (integer) –
The recommended minimum capacity to specify for your autoscaling policy.
MaxCapacity (integer) –
The recommended maximum capacity to specify for your autoscaling policy.
ScaleInCooldown (integer) –
The recommended scale in cooldown time for your autoscaling policy.
ScaleOutCooldown (integer) –
The recommended scale out cooldown time for your autoscaling policy.
ScalingPolicies (list) –
An object of the scaling policies for each metric.
(dict) –
An object containing a recommended scaling policy.
Note
This is a Tagged Union structure. Only one of the following top level keys will be set:
TargetTracking. If a client receives an unknown member it will setSDK_UNKNOWN_MEMBERas the top level key, which maps to the name or tag of the unknown member. The structure ofSDK_UNKNOWN_MEMBERis as follows:'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
TargetTracking (dict) –
A target tracking scaling policy. Includes support for predefined or customized metrics.
MetricSpecification (dict) –
An object containing information about a metric.
Note
This is a Tagged Union structure. Only one of the following top level keys will be set:
Predefined,Customized. If a client receives an unknown member it will setSDK_UNKNOWN_MEMBERas the top level key, which maps to the name or tag of the unknown member. The structure ofSDK_UNKNOWN_MEMBERis as follows:'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
Predefined (dict) –
Information about a predefined metric.
PredefinedMetricType (string) –
The metric type. You can only apply SageMaker metric types to SageMaker endpoints.
Customized (dict) –
Information about a customized metric.
MetricName (string) –
The name of the customized metric.
Namespace (string) –
The namespace of the customized metric.
Statistic (string) –
The statistic of the customized metric.
TargetValue (float) –
The recommended target value to specify for the metric when creating a scaling policy.
Exceptions
SageMaker.Client.exceptions.ResourceNotFound