Bedrock / Client / get_evaluation_job

get_evaluation_job#

Bedrock.Client.get_evaluation_job(**kwargs)#

Retrieves the properties associated with a model evaluation job, including the status of the job. For more information, see Model evaluation.

See also: AWS API Documentation

Request Syntax

response = client.get_evaluation_job(
    jobIdentifier='string'
)
Parameters:

jobIdentifier (string) –

[REQUIRED]

The Amazon Resource Name (ARN) of the model evaluation job.

Return type:

dict

Returns:

Response Syntax

{
    'jobName': 'string',
    'status': 'InProgress'|'Completed'|'Failed'|'Stopping'|'Stopped'|'Deleting',
    'jobArn': 'string',
    'jobDescription': 'string',
    'roleArn': 'string',
    'customerEncryptionKeyId': 'string',
    'jobType': 'Human'|'Automated',
    'evaluationConfig': {
        'automated': {
            'datasetMetricConfigs': [
                {
                    'taskType': 'Summarization'|'Classification'|'QuestionAndAnswer'|'Generation'|'Custom',
                    'dataset': {
                        'name': 'string',
                        'datasetLocation': {
                            's3Uri': 'string'
                        }
                    },
                    'metricNames': [
                        'string',
                    ]
                },
            ]
        },
        'human': {
            'humanWorkflowConfig': {
                'flowDefinitionArn': 'string',
                'instructions': 'string'
            },
            'customMetrics': [
                {
                    'name': 'string',
                    'description': 'string',
                    'ratingMethod': 'string'
                },
            ],
            'datasetMetricConfigs': [
                {
                    'taskType': 'Summarization'|'Classification'|'QuestionAndAnswer'|'Generation'|'Custom',
                    'dataset': {
                        'name': 'string',
                        'datasetLocation': {
                            's3Uri': 'string'
                        }
                    },
                    'metricNames': [
                        'string',
                    ]
                },
            ]
        }
    },
    'inferenceConfig': {
        'models': [
            {
                'bedrockModel': {
                    'modelIdentifier': 'string',
                    'inferenceParams': 'string'
                }
            },
        ]
    },
    'outputDataConfig': {
        's3Uri': 'string'
    },
    'creationTime': datetime(2015, 1, 1),
    'lastModifiedTime': datetime(2015, 1, 1),
    'failureMessages': [
        'string',
    ]
}

Response Structure

  • (dict) –

    • jobName (string) –

      The name of the model evaluation job.

    • status (string) –

      The status of the model evaluation job.

    • jobArn (string) –

      The Amazon Resource Name (ARN) of the model evaluation job.

    • jobDescription (string) –

      The description of the model evaluation job.

    • roleArn (string) –

      The Amazon Resource Name (ARN) of the IAM service role used in the model evaluation job.

    • customerEncryptionKeyId (string) –

      The Amazon Resource Name (ARN) of the customer managed key specified when the model evaluation job was created.

    • jobType (string) –

      The type of model evaluation job.

    • evaluationConfig (dict) –

      Contains details about the type of model evaluation job, the metrics used, the task type selected, the datasets used, and any custom metrics you defined.

      Note

      This is a Tagged Union structure. Only one of the following top level keys will be set: automated, human. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:

      'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
      
      • automated (dict) –

        Used to specify an automated model evaluation job. See AutomatedEvaluationConfig to view the required parameters.

        • datasetMetricConfigs (list) –

          Specifies the required elements for an automatic model evaluation job.

          • (dict) –

            Defines the built-in prompt datasets, built-in metric names and custom metric names, and the task type.

            • taskType (string) –

              The task type you want the model to carry out.

            • dataset (dict) –

              Specifies the prompt dataset.

              • name (string) –

                Used to specify supported built-in prompt datasets. Valid values are Builtin.Bold, Builtin.BoolQ, Builtin.NaturalQuestions, Builtin.Gigaword, Builtin.RealToxicityPrompts, Builtin.TriviaQA, Builtin.T-Rex, Builtin.WomensEcommerceClothingReviews and Builtin.Wikitext2.

              • datasetLocation (dict) –

                For custom prompt datasets, you must specify the location in Amazon S3 where the prompt dataset is saved.

                Note

                This is a Tagged Union structure. Only one of the following top level keys will be set: s3Uri. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:

                'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
                
                • s3Uri (string) –

                  The S3 URI of the S3 bucket specified in the job.

            • metricNames (list) –

              The names of the metrics used. For automated model evaluation jobs valid values are "Builtin.Accuracy", "Builtin.Robustness", and "Builtin.Toxicity". In human-based model evaluation jobs the array of strings must match the name parameter specified in HumanEvaluationCustomMetric.

              • (string) –

      • human (dict) –

        Used to specify a model evaluation job that uses human workers.See HumanEvaluationConfig to view the required parameters.

        • humanWorkflowConfig (dict) –

          The parameters of the human workflow.

          • flowDefinitionArn (string) –

            The Amazon Resource Number (ARN) for the flow definition

          • instructions (string) –

            Instructions for the flow definition

        • customMetrics (list) –

          A HumanEvaluationCustomMetric object. It contains the names the metrics, how the metrics are to be evaluated, an optional description.

          • (dict) –

            In a model evaluation job that uses human workers you must define the name of the metric, and how you want that metric rated ratingMethod, and an optional description of the metric.

            • name (string) –

              The name of the metric. Your human evaluators will see this name in the evaluation UI.

            • description (string) –

              An optional description of the metric. Use this parameter to provide more details about the metric.

            • ratingMethod (string) –

              Choose how you want your human workers to evaluation your model. Valid values for rating methods are ThumbsUpDown, IndividualLikertScale, ComparisonLikertScale, ComparisonChoice, and ComparisonRank

        • datasetMetricConfigs (list) –

          Use to specify the metrics, task, and prompt dataset to be used in your model evaluation job.

          • (dict) –

            Defines the built-in prompt datasets, built-in metric names and custom metric names, and the task type.

            • taskType (string) –

              The task type you want the model to carry out.

            • dataset (dict) –

              Specifies the prompt dataset.

              • name (string) –

                Used to specify supported built-in prompt datasets. Valid values are Builtin.Bold, Builtin.BoolQ, Builtin.NaturalQuestions, Builtin.Gigaword, Builtin.RealToxicityPrompts, Builtin.TriviaQA, Builtin.T-Rex, Builtin.WomensEcommerceClothingReviews and Builtin.Wikitext2.

              • datasetLocation (dict) –

                For custom prompt datasets, you must specify the location in Amazon S3 where the prompt dataset is saved.

                Note

                This is a Tagged Union structure. Only one of the following top level keys will be set: s3Uri. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:

                'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
                
                • s3Uri (string) –

                  The S3 URI of the S3 bucket specified in the job.

            • metricNames (list) –

              The names of the metrics used. For automated model evaluation jobs valid values are "Builtin.Accuracy", "Builtin.Robustness", and "Builtin.Toxicity". In human-based model evaluation jobs the array of strings must match the name parameter specified in HumanEvaluationCustomMetric.

              • (string) –

    • inferenceConfig (dict) –

      Details about the models you specified in your model evaluation job.

      Note

      This is a Tagged Union structure. Only one of the following top level keys will be set: models. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:

      'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
      
      • models (list) –

        Used to specify the models.

        • (dict) –

          Defines the models used in the model evaluation job.

          Note

          This is a Tagged Union structure. Only one of the following top level keys will be set: bedrockModel. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:

          'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
          
          • bedrockModel (dict) –

            Defines the Amazon Bedrock model or inference profile and inference parameters you want used.

            • modelIdentifier (string) –

              The ARN of the Amazon Bedrock model or inference profile specified.

            • inferenceParams (string) –

              Each Amazon Bedrock support different inference parameters that change how the model behaves during inference.

    • outputDataConfig (dict) –

      Amazon S3 location for where output data is saved.

      • s3Uri (string) –

        The Amazon S3 URI where the results of model evaluation job are saved.

    • creationTime (datetime) –

      When the model evaluation job was created.

    • lastModifiedTime (datetime) –

      When the model evaluation job was last modified.

    • failureMessages (list) –

      An array of strings the specify why the model evaluation job has failed.

      • (string) –

Exceptions

  • Bedrock.Client.exceptions.ResourceNotFoundException

  • Bedrock.Client.exceptions.AccessDeniedException

  • Bedrock.Client.exceptions.ValidationException

  • Bedrock.Client.exceptions.InternalServerException

  • Bedrock.Client.exceptions.ThrottlingException