Bedrock / Client / get_evaluation_job
get_evaluation_job#
- Bedrock.Client.get_evaluation_job(**kwargs)#
Retrieves the properties associated with a model evaluation job, including the status of the job. For more information, see Model evaluation.
See also: AWS API Documentation
Request Syntax
response = client.get_evaluation_job( jobIdentifier='string' )
- Parameters:
jobIdentifier (string) –
[REQUIRED]
The Amazon Resource Name (ARN) of the model evaluation job.
- Return type:
dict
- Returns:
Response Syntax
{ 'jobName': 'string', 'status': 'InProgress'|'Completed'|'Failed'|'Stopping'|'Stopped'|'Deleting', 'jobArn': 'string', 'jobDescription': 'string', 'roleArn': 'string', 'customerEncryptionKeyId': 'string', 'jobType': 'Human'|'Automated', 'evaluationConfig': { 'automated': { 'datasetMetricConfigs': [ { 'taskType': 'Summarization'|'Classification'|'QuestionAndAnswer'|'Generation'|'Custom', 'dataset': { 'name': 'string', 'datasetLocation': { 's3Uri': 'string' } }, 'metricNames': [ 'string', ] }, ] }, 'human': { 'humanWorkflowConfig': { 'flowDefinitionArn': 'string', 'instructions': 'string' }, 'customMetrics': [ { 'name': 'string', 'description': 'string', 'ratingMethod': 'string' }, ], 'datasetMetricConfigs': [ { 'taskType': 'Summarization'|'Classification'|'QuestionAndAnswer'|'Generation'|'Custom', 'dataset': { 'name': 'string', 'datasetLocation': { 's3Uri': 'string' } }, 'metricNames': [ 'string', ] }, ] } }, 'inferenceConfig': { 'models': [ { 'bedrockModel': { 'modelIdentifier': 'string', 'inferenceParams': 'string' } }, ] }, 'outputDataConfig': { 's3Uri': 'string' }, 'creationTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1), 'failureMessages': [ 'string', ] }
Response Structure
(dict) –
jobName (string) –
The name of the model evaluation job.
status (string) –
The status of the model evaluation job.
jobArn (string) –
The Amazon Resource Name (ARN) of the model evaluation job.
jobDescription (string) –
The description of the model evaluation job.
roleArn (string) –
The Amazon Resource Name (ARN) of the IAM service role used in the model evaluation job.
customerEncryptionKeyId (string) –
The Amazon Resource Name (ARN) of the customer managed key specified when the model evaluation job was created.
jobType (string) –
The type of model evaluation job.
evaluationConfig (dict) –
Contains details about the type of model evaluation job, the metrics used, the task type selected, the datasets used, and any custom metrics you defined.
Note
This is a Tagged Union structure. Only one of the following top level keys will be set:
automated
,human
. If a client receives an unknown member it will setSDK_UNKNOWN_MEMBER
as the top level key, which maps to the name or tag of the unknown member. The structure ofSDK_UNKNOWN_MEMBER
is as follows:'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
automated (dict) –
Used to specify an automated model evaluation job. See
AutomatedEvaluationConfig
to view the required parameters.datasetMetricConfigs (list) –
Specifies the required elements for an automatic model evaluation job.
(dict) –
Defines the built-in prompt datasets, built-in metric names and custom metric names, and the task type.
taskType (string) –
The task type you want the model to carry out.
dataset (dict) –
Specifies the prompt dataset.
name (string) –
Used to specify supported built-in prompt datasets. Valid values are
Builtin.Bold
,Builtin.BoolQ
,Builtin.NaturalQuestions
,Builtin.Gigaword
,Builtin.RealToxicityPrompts
,Builtin.TriviaQA
,Builtin.T-Rex
,Builtin.WomensEcommerceClothingReviews
andBuiltin.Wikitext2
.datasetLocation (dict) –
For custom prompt datasets, you must specify the location in Amazon S3 where the prompt dataset is saved.
Note
This is a Tagged Union structure. Only one of the following top level keys will be set:
s3Uri
. If a client receives an unknown member it will setSDK_UNKNOWN_MEMBER
as the top level key, which maps to the name or tag of the unknown member. The structure ofSDK_UNKNOWN_MEMBER
is as follows:'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
s3Uri (string) –
The S3 URI of the S3 bucket specified in the job.
metricNames (list) –
The names of the metrics used. For automated model evaluation jobs valid values are
"Builtin.Accuracy"
,"Builtin.Robustness"
, and"Builtin.Toxicity"
. In human-based model evaluation jobs the array of strings must match thename
parameter specified inHumanEvaluationCustomMetric
.(string) –
human (dict) –
Used to specify a model evaluation job that uses human workers.See
HumanEvaluationConfig
to view the required parameters.humanWorkflowConfig (dict) –
The parameters of the human workflow.
flowDefinitionArn (string) –
The Amazon Resource Number (ARN) for the flow definition
instructions (string) –
Instructions for the flow definition
customMetrics (list) –
A
HumanEvaluationCustomMetric
object. It contains the names the metrics, how the metrics are to be evaluated, an optional description.(dict) –
In a model evaluation job that uses human workers you must define the name of the metric, and how you want that metric rated
ratingMethod
, and an optional description of the metric.name (string) –
The name of the metric. Your human evaluators will see this name in the evaluation UI.
description (string) –
An optional description of the metric. Use this parameter to provide more details about the metric.
ratingMethod (string) –
Choose how you want your human workers to evaluation your model. Valid values for rating methods are
ThumbsUpDown
,IndividualLikertScale
,ComparisonLikertScale
,ComparisonChoice
, andComparisonRank
datasetMetricConfigs (list) –
Use to specify the metrics, task, and prompt dataset to be used in your model evaluation job.
(dict) –
Defines the built-in prompt datasets, built-in metric names and custom metric names, and the task type.
taskType (string) –
The task type you want the model to carry out.
dataset (dict) –
Specifies the prompt dataset.
name (string) –
Used to specify supported built-in prompt datasets. Valid values are
Builtin.Bold
,Builtin.BoolQ
,Builtin.NaturalQuestions
,Builtin.Gigaword
,Builtin.RealToxicityPrompts
,Builtin.TriviaQA
,Builtin.T-Rex
,Builtin.WomensEcommerceClothingReviews
andBuiltin.Wikitext2
.datasetLocation (dict) –
For custom prompt datasets, you must specify the location in Amazon S3 where the prompt dataset is saved.
Note
This is a Tagged Union structure. Only one of the following top level keys will be set:
s3Uri
. If a client receives an unknown member it will setSDK_UNKNOWN_MEMBER
as the top level key, which maps to the name or tag of the unknown member. The structure ofSDK_UNKNOWN_MEMBER
is as follows:'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
s3Uri (string) –
The S3 URI of the S3 bucket specified in the job.
metricNames (list) –
The names of the metrics used. For automated model evaluation jobs valid values are
"Builtin.Accuracy"
,"Builtin.Robustness"
, and"Builtin.Toxicity"
. In human-based model evaluation jobs the array of strings must match thename
parameter specified inHumanEvaluationCustomMetric
.(string) –
inferenceConfig (dict) –
Details about the models you specified in your model evaluation job.
Note
This is a Tagged Union structure. Only one of the following top level keys will be set:
models
. If a client receives an unknown member it will setSDK_UNKNOWN_MEMBER
as the top level key, which maps to the name or tag of the unknown member. The structure ofSDK_UNKNOWN_MEMBER
is as follows:'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
models (list) –
Used to specify the models.
(dict) –
Defines the models used in the model evaluation job.
Note
This is a Tagged Union structure. Only one of the following top level keys will be set:
bedrockModel
. If a client receives an unknown member it will setSDK_UNKNOWN_MEMBER
as the top level key, which maps to the name or tag of the unknown member. The structure ofSDK_UNKNOWN_MEMBER
is as follows:'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
bedrockModel (dict) –
Defines the Amazon Bedrock model and inference parameters you want used.
modelIdentifier (string) –
The ARN of the Amazon Bedrock model specified.
inferenceParams (string) –
Each Amazon Bedrock support different inference parameters that change how the model behaves during inference.
outputDataConfig (dict) –
Amazon S3 location for where output data is saved.
s3Uri (string) –
The Amazon S3 URI where the results of model evaluation job are saved.
creationTime (datetime) –
When the model evaluation job was created.
lastModifiedTime (datetime) –
When the model evaluation job was last modified.
failureMessages (list) –
An array of strings the specify why the model evaluation job has failed.
(string) –
Exceptions
Bedrock.Client.exceptions.ResourceNotFoundException
Bedrock.Client.exceptions.AccessDeniedException
Bedrock.Client.exceptions.ValidationException
Bedrock.Client.exceptions.InternalServerException
Bedrock.Client.exceptions.ThrottlingException