start_data_quality_rule_recommendation_run¶

start_data_quality_rule_recommendation_run(**kwargs)¶

Starts a recommendation run that is used to generate rules when you don't know what rules to write. Glue Data Quality analyzes the data and comes up with recommendations for a potential ruleset. You can then triage the ruleset and modify the generated ruleset to your liking.

See also: AWS API Documentation

Request Syntax

response = client.start_data_quality_rule_recommendation_run(
    DataSource={
        'GlueTable': {
            'DatabaseName': 'string',
            'TableName': 'string',
            'CatalogId': 'string',
            'ConnectionName': 'string',
            'AdditionalOptions': {
                'string': 'string'
            }
        }
    },
    Role='string',
    NumberOfWorkers=123,
    Timeout=123,
    CreatedRulesetName='string',
    ClientToken='string'
)

Parameters

DataSource (dict) --
[REQUIRED]

The data source (Glue table) associated with this run.
- GlueTable (dict) -- [REQUIRED]
  An Glue table.
  - DatabaseName (string) -- [REQUIRED]
    A database name in the Glue Data Catalog.
  - TableName (string) -- [REQUIRED]
    A table name in the Glue Data Catalog.
  - CatalogId (string) --
    A unique identifier for the Glue Data Catalog.
  - ConnectionName (string) --
    The name of the connection to the Glue Data Catalog.
  - AdditionalOptions (dict) --
    Additional options for the table. Currently there are two keys supported:
    - pushDownPredicate : to filter on partitions without having to list and read all the files in your dataset.
    - catalogPartitionPredicate : to use server-side partition pruning using partition indexes in the Glue Data Catalog.
    - (string) --
      - (string) --
Role (string) --
[REQUIRED]

An IAM role supplied to encrypt the results of the run.
NumberOfWorkers (integer) -- The number of G.1X workers to be used in the run. The default is 5.
Timeout (integer) -- The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).
CreatedRulesetName (string) -- A name for the ruleset.
ClientToken (string) -- Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

Return type

dict

Returns

Response Syntax

{
    'RunId': 'string'
}

Response Structure

(dict) --
- RunId (string) --
  
  The unique run identifier associated with this run.

Exceptions

Glue.Client.exceptions.InvalidInputException
Glue.Client.exceptions.OperationTimeoutException
Glue.Client.exceptions.InternalServiceException
Glue.Client.exceptions.ConflictException

Table Of Contents

start_data_quality_rule_recommendation_run¶