SageMaker / Client / update_training_job

update_training_job#

SageMaker.Client.update_training_job(**kwargs)#

Update a model training job to request a new Debugger profiling configuration or to change warm pool retention length.

See also: AWS API Documentation

Request Syntax

response = client.update_training_job(
    TrainingJobName='string',
    ProfilerConfig={
        'S3OutputPath': 'string',
        'ProfilingIntervalInMilliseconds': 123,
        'ProfilingParameters': {
            'string': 'string'
        },
        'DisableProfiler': True|False
    },
    ProfilerRuleConfigurations=[
        {
            'RuleConfigurationName': 'string',
            'LocalPath': 'string',
            'S3OutputPath': 'string',
            'RuleEvaluatorImage': 'string',
            'InstanceType': 'ml.t3.medium'|'ml.t3.large'|'ml.t3.xlarge'|'ml.t3.2xlarge'|'ml.m4.xlarge'|'ml.m4.2xlarge'|'ml.m4.4xlarge'|'ml.m4.10xlarge'|'ml.m4.16xlarge'|'ml.c4.xlarge'|'ml.c4.2xlarge'|'ml.c4.4xlarge'|'ml.c4.8xlarge'|'ml.p2.xlarge'|'ml.p2.8xlarge'|'ml.p2.16xlarge'|'ml.p3.2xlarge'|'ml.p3.8xlarge'|'ml.p3.16xlarge'|'ml.c5.xlarge'|'ml.c5.2xlarge'|'ml.c5.4xlarge'|'ml.c5.9xlarge'|'ml.c5.18xlarge'|'ml.m5.large'|'ml.m5.xlarge'|'ml.m5.2xlarge'|'ml.m5.4xlarge'|'ml.m5.12xlarge'|'ml.m5.24xlarge'|'ml.r5.large'|'ml.r5.xlarge'|'ml.r5.2xlarge'|'ml.r5.4xlarge'|'ml.r5.8xlarge'|'ml.r5.12xlarge'|'ml.r5.16xlarge'|'ml.r5.24xlarge'|'ml.g4dn.xlarge'|'ml.g4dn.2xlarge'|'ml.g4dn.4xlarge'|'ml.g4dn.8xlarge'|'ml.g4dn.12xlarge'|'ml.g4dn.16xlarge',
            'VolumeSizeInGB': 123,
            'RuleParameters': {
                'string': 'string'
            }
        },
    ],
    ResourceConfig={
        'KeepAlivePeriodInSeconds': 123
    }
)
Parameters:
  • TrainingJobName (string) –

    [REQUIRED]

    The name of a training job to update the Debugger profiling configuration.

  • ProfilerConfig (dict) –

    Configuration information for Amazon SageMaker Debugger system monitoring, framework profiling, and storage paths.

    • S3OutputPath (string) –

      Path to Amazon S3 storage location for system and framework metrics.

    • ProfilingIntervalInMilliseconds (integer) –

      A time interval for capturing system metrics in milliseconds. Available values are 100, 200, 500, 1000 (1 second), 5000 (5 seconds), and 60000 (1 minute) milliseconds. The default value is 500 milliseconds.

    • ProfilingParameters (dict) –

      Configuration information for capturing framework metrics. Available key strings for different profiling options are DetailedProfilingConfig, PythonProfilingConfig, and DataLoaderProfilingConfig. The following codes are configuration structures for the ProfilingParameters parameter. To learn more about how to configure the ProfilingParameters parameter, see Use the SageMaker and Debugger Configuration API Operations to Create, Update, and Debug Your Training Job.

      • (string) –

        • (string) –

    • DisableProfiler (boolean) –

      To turn off Amazon SageMaker Debugger monitoring and profiling while a training job is in progress, set to True.

  • ProfilerRuleConfigurations (list) –

    Configuration information for Amazon SageMaker Debugger rules for profiling system and framework metrics.

    • (dict) –

      Configuration information for profiling rules.

      • RuleConfigurationName (string) – [REQUIRED]

        The name of the rule configuration. It must be unique relative to other rule configuration names.

      • LocalPath (string) –

        Path to local storage location for output of rules. Defaults to /opt/ml/processing/output/rule/.

      • S3OutputPath (string) –

        Path to Amazon S3 storage location for rules.

      • RuleEvaluatorImage (string) – [REQUIRED]

        The Amazon Elastic Container Registry Image for the managed rule evaluation.

      • InstanceType (string) –

        The instance type to deploy a custom rule for profiling a training job.

      • VolumeSizeInGB (integer) –

        The size, in GB, of the ML storage volume attached to the processing instance.

      • RuleParameters (dict) –

        Runtime configuration for rule container.

        • (string) –

          • (string) –

  • ResourceConfig (dict) –

    The training job ResourceConfig to update warm pool retention length.

    • KeepAlivePeriodInSeconds (integer) – [REQUIRED]

      The KeepAlivePeriodInSeconds value specified in the ResourceConfig to update.

Return type:

dict

Returns:

Response Syntax

{
    'TrainingJobArn': 'string'
}

Response Structure

  • (dict) –

    • TrainingJobArn (string) –

      The Amazon Resource Name (ARN) of the training job.

Exceptions

  • SageMaker.Client.exceptions.ResourceNotFound

  • SageMaker.Client.exceptions.ResourceLimitExceeded