Glue / Client / get_data_quality_ruleset_evaluation_run

get_data_quality_ruleset_evaluation_run¶

Glue.Client.get_data_quality_ruleset_evaluation_run(**kwargs)¶

Retrieves a specific run where a ruleset is evaluated against a data source.

Request Syntax

response = client.get_data_quality_ruleset_evaluation_run(
    RunId='string'
)

Parameters:

RunId (string) –

[REQUIRED]

The unique run identifier associated with this run.

Return type:

dict

Returns:

Response Syntax

{
    'RunId': 'string',
    'DataSource': {
        'GlueTable': {
            'DatabaseName': 'string',
            'TableName': 'string',
            'CatalogId': 'string',
            'ConnectionName': 'string',
            'AdditionalOptions': {
                'string': 'string'
            }
        },
        'DataQualityGlueTable': {
            'DatabaseName': 'string',
            'TableName': 'string',
            'CatalogId': 'string',
            'ConnectionName': 'string',
            'AdditionalOptions': {
                'string': 'string'
            },
            'PreProcessingQuery': 'string'
        }
    },
    'Role': 'string',
    'NumberOfWorkers': 123,
    'Timeout': 123,
    'AdditionalRunOptions': {
        'CloudWatchMetricsEnabled': True|False,
        'ResultsS3Prefix': 'string',
        'CompositeRuleEvaluationMethod': 'COLUMN'|'ROW'
    },
    'Status': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT',
    'ErrorString': 'string',
    'StartedOn': datetime(2015, 1, 1),
    'LastModifiedOn': datetime(2015, 1, 1),
    'CompletedOn': datetime(2015, 1, 1),
    'ExecutionTime': 123,
    'RulesetNames': [
        'string',
    ],
    'ResultIds': [
        'string',
    ],
    'AdditionalDataSources': {
        'string': {
            'GlueTable': {
                'DatabaseName': 'string',
                'TableName': 'string',
                'CatalogId': 'string',
                'ConnectionName': 'string',
                'AdditionalOptions': {
                    'string': 'string'
                }
            },
            'DataQualityGlueTable': {
                'DatabaseName': 'string',
                'TableName': 'string',
                'CatalogId': 'string',
                'ConnectionName': 'string',
                'AdditionalOptions': {
                    'string': 'string'
                },
                'PreProcessingQuery': 'string'
            }
        }
    }
}

Response Structure

(dict) –
- RunId (string) –
  
  The unique run identifier associated with this run.
- DataSource (dict) –
  
  The data source (an Glue table) associated with this evaluation run.
  - GlueTable (dict) –
    
    An Glue table.
    - DatabaseName (string) –
      
      A database name in the Glue Data Catalog.
    - TableName (string) –
      
      A table name in the Glue Data Catalog.
    - CatalogId (string) –
      
      A unique identifier for the Glue Data Catalog.
    - ConnectionName (string) –
      
      The name of the connection to the Glue Data Catalog.
    - AdditionalOptions (dict) –
      
      Additional options for the table. Currently there are two keys supported:
      - pushDownPredicate: to filter on partitions without having to list and read all the files in your dataset.
      - catalogPartitionPredicate: to use server-side partition pruning using partition indexes in the Glue Data Catalog.
      - (string) –
        
        (string) –
  - DataQualityGlueTable (dict) –
    
    An Glue table for Data Quality Operations.
    - DatabaseName (string) –
      
      A database name in the Glue Data Catalog.
    - TableName (string) –
      
      A table name in the Glue Data Catalog.
    - CatalogId (string) –
      
      A unique identifier for the Glue Data Catalog.
    - ConnectionName (string) –
      
      The name of the connection to the Glue Data Catalog.
    - AdditionalOptions (dict) –
      
      Additional options for the table. Currently there are two keys supported:
      - pushDownPredicate: to filter on partitions without having to list and read all the files in your dataset.
      - catalogPartitionPredicate: to use server-side partition pruning using partition indexes in the Glue Data Catalog.
      - (string) –
        
        (string) –
    - PreProcessingQuery (string) –
      
      SQL Query of SparkSQL format that can be used to pre-process the data for the table in Glue Data Catalog, before running the Data Quality Operation.
- Role (string) –
  
  An IAM role supplied to encrypt the results of the run.
- NumberOfWorkers (integer) –
  
  The number of G.1X workers to be used in the run. The default is 5.
- Timeout (integer) –
  
  The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).
- AdditionalRunOptions (dict) –
  
  Additional run options you can specify for an evaluation run.
  - CloudWatchMetricsEnabled (boolean) –
    
    Whether or not to enable CloudWatch metrics.
  - ResultsS3Prefix (string) –
    
    Prefix for Amazon S3 to store results.
  - CompositeRuleEvaluationMethod (string) –
    
    Set the evaluation method for composite rules in the ruleset to ROW/COLUMN
- Status (string) –
  
  The status for this run.
- ErrorString (string) –
  
  The error strings that are associated with the run.
- StartedOn (datetime) –
  
  The date and time when this run started.
- LastModifiedOn (datetime) –
  
  A timestamp. The last point in time when this data quality rule recommendation run was modified.
- CompletedOn (datetime) –
  
  The date and time when this run was completed.
- ExecutionTime (integer) –
  
  The amount of time (in seconds) that the run consumed resources.
- RulesetNames (list) –
  
  A list of ruleset names for the run. Currently, this parameter takes only one Ruleset name.
  - (string) –
- ResultIds (list) –
  
  A list of result IDs for the data quality results for the run.
  - (string) –
- AdditionalDataSources (dict) –
  
  A map of reference strings to additional data sources you can specify for an evaluation run.
  - (string) –
    - (dict) –
      
      A data source (an Glue table) for which you want data quality results.
      - GlueTable (dict) –
        
        An Glue table.
        
        DatabaseName (string) –
        
        A database name in the Glue Data Catalog.
        
        TableName (string) –
        
        A table name in the Glue Data Catalog.
        
        CatalogId (string) –
        
        A unique identifier for the Glue Data Catalog.
        
        ConnectionName (string) –
        
        The name of the connection to the Glue Data Catalog.
        
        AdditionalOptions (dict) –
        
        Additional options for the table. Currently there are two keys supported:
        
        pushDownPredicate: to filter on partitions without having to list and read all the files in your dataset.
        
        catalogPartitionPredicate: to use server-side partition pruning using partition indexes in the Glue Data Catalog.
        
        (string) –
        
        (string) –
      - DataQualityGlueTable (dict) –
        
        An Glue table for Data Quality Operations.
        
        DatabaseName (string) –
        
        A database name in the Glue Data Catalog.
        
        TableName (string) –
        
        A table name in the Glue Data Catalog.
        
        CatalogId (string) –
        
        A unique identifier for the Glue Data Catalog.
        
        ConnectionName (string) –
        
        The name of the connection to the Glue Data Catalog.
        
        AdditionalOptions (dict) –
        
        Additional options for the table. Currently there are two keys supported:
        
        pushDownPredicate: to filter on partitions without having to list and read all the files in your dataset.
        
        catalogPartitionPredicate: to use server-side partition pruning using partition indexes in the Glue Data Catalog.
        
        (string) –
        
        (string) –
        
        PreProcessingQuery (string) –
        
        SQL Query of SparkSQL format that can be used to pre-process the data for the table in Glue Data Catalog, before running the Data Quality Operation.

Exceptions

Glue.Client.exceptions.EntityNotFoundException
Glue.Client.exceptions.InvalidInputException
Glue.Client.exceptions.OperationTimeoutException
Glue.Client.exceptions.InternalServiceException