Glue / Client / start_data_quality_ruleset_evaluation_run
start_data_quality_ruleset_evaluation_run#
- Glue.Client.start_data_quality_ruleset_evaluation_run(**kwargs)#
- Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table). The evaluation computes results which you can retrieve with the - GetDataQualityResultAPI.- See also: AWS API Documentation - Request Syntax- response = client.start_data_quality_ruleset_evaluation_run( DataSource={ 'GlueTable': { 'DatabaseName': 'string', 'TableName': 'string', 'CatalogId': 'string', 'ConnectionName': 'string', 'AdditionalOptions': { 'string': 'string' } } }, Role='string', NumberOfWorkers=123, Timeout=123, ClientToken='string', AdditionalRunOptions={ 'CloudWatchMetricsEnabled': True|False, 'ResultsS3Prefix': 'string', 'CompositeRuleEvaluationMethod': 'COLUMN'|'ROW' }, RulesetNames=[ 'string', ], AdditionalDataSources={ 'string': { 'GlueTable': { 'DatabaseName': 'string', 'TableName': 'string', 'CatalogId': 'string', 'ConnectionName': 'string', 'AdditionalOptions': { 'string': 'string' } } } } ) - Parameters:
- DataSource (dict) – - [REQUIRED] - The data source (Glue table) associated with this run. - GlueTable (dict) – [REQUIRED] - An Glue table. - DatabaseName (string) – [REQUIRED] - A database name in the Glue Data Catalog. 
- TableName (string) – [REQUIRED] - A table name in the Glue Data Catalog. 
- CatalogId (string) – - A unique identifier for the Glue Data Catalog. 
- ConnectionName (string) – - The name of the connection to the Glue Data Catalog. 
- AdditionalOptions (dict) – - Additional options for the table. Currently there are two keys supported: - pushDownPredicate: to filter on partitions without having to list and read all the files in your dataset.
- catalogPartitionPredicate: to use server-side partition pruning using partition indexes in the Glue Data Catalog.
 - (string) – - (string) – 
 
 
 
 
- Role (string) – - [REQUIRED] - An IAM role supplied to encrypt the results of the run. 
- NumberOfWorkers (integer) – The number of - G.1Xworkers to be used in the run. The default is 5.
- Timeout (integer) – The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters - TIMEOUTstatus. The default is 2,880 minutes (48 hours).
- ClientToken (string) – Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource. 
- AdditionalRunOptions (dict) – - Additional run options you can specify for an evaluation run. - CloudWatchMetricsEnabled (boolean) – - Whether or not to enable CloudWatch metrics. 
- ResultsS3Prefix (string) – - Prefix for Amazon S3 to store results. 
- CompositeRuleEvaluationMethod (string) – - Set the evaluation method for composite rules in the ruleset to ROW/COLUMN 
 
- RulesetNames (list) – - [REQUIRED] - A list of ruleset names. - (string) – 
 
- AdditionalDataSources (dict) – - A map of reference strings to additional data sources you can specify for an evaluation run. - (string) – - (dict) – - A data source (an Glue table) for which you want data quality results. - GlueTable (dict) – [REQUIRED] - An Glue table. - DatabaseName (string) – [REQUIRED] - A database name in the Glue Data Catalog. 
- TableName (string) – [REQUIRED] - A table name in the Glue Data Catalog. 
- CatalogId (string) – - A unique identifier for the Glue Data Catalog. 
- ConnectionName (string) – - The name of the connection to the Glue Data Catalog. 
- AdditionalOptions (dict) – - Additional options for the table. Currently there are two keys supported: - pushDownPredicate: to filter on partitions without having to list and read all the files in your dataset.
- catalogPartitionPredicate: to use server-side partition pruning using partition indexes in the Glue Data Catalog.
 - (string) – - (string) – 
 
 
 
 
 
 
 
- Return type:
- dict 
- Returns:
- Response Syntax- { 'RunId': 'string' } - Response Structure- (dict) – - RunId (string) – - The unique run identifier associated with this run. 
 
 
 - Exceptions- Glue.Client.exceptions.InvalidInputException
- Glue.Client.exceptions.EntityNotFoundException
- Glue.Client.exceptions.OperationTimeoutException
- Glue.Client.exceptions.InternalServiceException
- Glue.Client.exceptions.ConflictException