GlueDataBrew.Client.
create_ruleset
(**kwargs)¶Creates a new ruleset that can be used in a profile job to validate the data quality of a dataset.
See also: AWS API Documentation
Request Syntax
response = client.create_ruleset(
Name='string',
Description='string',
TargetArn='string',
Rules=[
{
'Name': 'string',
'Disabled': True|False,
'CheckExpression': 'string',
'SubstitutionMap': {
'string': 'string'
},
'Threshold': {
'Value': 123.0,
'Type': 'GREATER_THAN_OR_EQUAL'|'LESS_THAN_OR_EQUAL'|'GREATER_THAN'|'LESS_THAN',
'Unit': 'COUNT'|'PERCENTAGE'
},
'ColumnSelectors': [
{
'Regex': 'string',
'Name': 'string'
},
]
},
],
Tags={
'string': 'string'
}
)
[REQUIRED]
The name of the ruleset to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.
[REQUIRED]
The Amazon Resource Name (ARN) of a resource (dataset) that the ruleset is associated with.
[REQUIRED]
A list of rules that are defined with the ruleset. A rule includes one or more checks to be validated on a DataBrew dataset.
Represents a single data quality requirement that should be validated in the scope of this dataset.
The name of the rule.
A value that specifies whether the rule is disabled. Once a rule is disabled, a profile job will not validate it during a job run. Default value is false.
The expression which includes column references, condition names followed by variable references, possibly grouped and combined with other conditions. For example, (:col1 starts_with :prefix1 or :col1 starts_with :prefix2) and (:col1 ends_with :suffix1 or :col1 ends_with :suffix2)
. Column and value references are substitution variables that should start with the ':' symbol. Depending on the context, substitution variables' values can be either an actual value or a column name. These values are defined in the SubstitutionMap. If a CheckExpression starts with a column reference, then ColumnSelectors in the rule should be null. If ColumnSelectors has been defined, then there should be no column reference in the left side of a condition, for example, is_between :val1 and :val2
.
For more information, see Available checks
The map of substitution variable names to their values used in a check expression. Variable names should start with a ':' (colon). Variable values can either be actual values or column names. To differentiate between the two, column names should be enclosed in backticks, for example, ":col1": "`Column A`".
The threshold used with a non-aggregate check expression. Non-aggregate check expressions will be applied to each row in a specific column, and the threshold will be used to determine whether the validation succeeds.
The value of a threshold.
The type of a threshold. Used for comparison of an actual count of rows that satisfy the rule to the threshold value.
Unit of threshold value. Can be either a COUNT or PERCENTAGE of the full sample size used for validation.
List of column selectors. Selectors can be used to select columns using a name or regular expression from the dataset. Rule will be applied to selected columns.
Selector of a column from a dataset for profile job configuration. One selector includes either a column name or a regular expression.
A regular expression for selecting a column from a dataset.
The name of a column from a dataset.
Metadata tags to apply to the ruleset.
dict
Response Syntax
{
'Name': 'string'
}
Response Structure
(dict) --
Name (string) --
The unique name of the created ruleset.
Exceptions
GlueDataBrew.Client.exceptions.ConflictException
GlueDataBrew.Client.exceptions.ServiceQuotaExceededException
GlueDataBrew.Client.exceptions.ValidationException