EntityResolution / Client / create_matching_workflow
create_matching_workflow#
- EntityResolution.Client.create_matching_workflow(**kwargs)#
Creates a
MatchingWorkflow
object which stores the configuration of the data processing job to be run. It is important to note that there should not be a pre-existingMatchingWorkflow
with the same name. To modify an existing workflow, utilize theUpdateMatchingWorkflow
API.See also: AWS API Documentation
Request Syntax
response = client.create_matching_workflow( description='string', incrementalRunConfig={ 'incrementalRunType': 'IMMEDIATE' }, inputSourceConfig=[ { 'applyNormalization': True|False, 'inputSourceARN': 'string', 'schemaName': 'string' }, ], outputSourceConfig=[ { 'KMSArn': 'string', 'applyNormalization': True|False, 'output': [ { 'hashed': True|False, 'name': 'string' }, ], 'outputS3Path': 'string' }, ], resolutionTechniques={ 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING', 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, roleArn='string', tags={ 'string': 'string' }, workflowName='string' )
- Parameters:
description (string) – A description of the workflow.
incrementalRunConfig (dict) –
An object which defines an incremental run type and has only
incrementalRunType
as a field.incrementalRunType (string) –
The type of incremental run. It takes only one value:
IMMEDIATE
.
inputSourceConfig (list) –
[REQUIRED]
A list of
InputSource
objects, which have the fieldsInputSourceARN
andSchemaName
.(dict) –
An object containing
InputSourceARN
,SchemaName
, andApplyNormalization
.applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeType
ofPHONE_NUMBER
, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.inputSourceARN (string) – [REQUIRED]
An Glue table ARN for the input source table.
schemaName (string) – [REQUIRED]
The name of the schema to be retrieved.
outputSourceConfig (list) –
[REQUIRED]
A list of
OutputSource
objects, each of which contains fieldsOutputS3Path
,ApplyNormalization
, andOutput
.(dict) –
A list of
OutputAttribute
objects, each of which have the fieldsName
andHashed
. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.KMSArn (string) –
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeType
ofPHONE_NUMBER
, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.output (list) – [REQUIRED]
A list of
OutputAttribute
objects, each of which have the fieldsName
andHashed
. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.(dict) –
A list of
OutputAttribute
objects, each of which have the fieldsName
andHashed
. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.hashed (boolean) –
Enables the ability to hash the column values in the output.
name (string) – [REQUIRED]
A name of a column to be written to the output. This must be an
InputField
name in the schema mapping.
outputS3Path (string) – [REQUIRED]
The S3 path to which Entity Resolution will write the output table.
resolutionTechniques (dict) –
[REQUIRED]
An object which defines the
resolutionType
and theruleBasedProperties
.resolutionType (string) – [REQUIRED]
The type of matching. There are two types of matching:
RULE_MATCHING
andML_MATCHING
.ruleBasedProperties (dict) –
An object which defines the list of matching rules to run and has a field
Rules
, which is a list of rule objects.attributeMatchingModel (string) – [REQUIRED]
The comparison type. You can either choose
ONE_TO_ONE
orMANY_TO_MANY
as the AttributeMatchingModel. When choosingMANY_TO_MANY
, the system can match attributes across the sub-types of an attribute type. For example, if the value of theEmail
field of Profile A and the value ofBusinessEmail
field of Profile B matches, the two profiles are matched on theEmail
type. When choosingONE_TO_ONE
,the system can only match if the sub-types are exact matches. For example, only when the value of theEmail
field of Profile A and the value of theEmail
field of Profile B matches, the two profiles are matched on theEmail
type.rules (list) – [REQUIRED]
A list of
Rule
objects, each of which have fieldsRuleName
andMatchingKeys
.(dict) –
An object containing
RuleName
, andMatchingKeys
.matchingKeys (list) – [REQUIRED]
A list of
MatchingKeys
. TheMatchingKeys
must have been defined in theSchemaMapping
. Two records are considered to match according to this rule if all of theMatchingKeys
match.(string) –
ruleName (string) – [REQUIRED]
A name for the matching rule.
roleArn (string) –
[REQUIRED]
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
tags (dict) –
The tags used to organize, track, or control access for this resource.
(string) –
(string) –
workflowName (string) –
[REQUIRED]
The name of the workflow. There cannot be multiple
DataIntegrationWorkflows
with the same name.
- Return type:
dict
- Returns:
Response Syntax
{ 'description': 'string', 'incrementalRunConfig': { 'incrementalRunType': 'IMMEDIATE' }, 'inputSourceConfig': [ { 'applyNormalization': True|False, 'inputSourceARN': 'string', 'schemaName': 'string' }, ], 'outputSourceConfig': [ { 'KMSArn': 'string', 'applyNormalization': True|False, 'output': [ { 'hashed': True|False, 'name': 'string' }, ], 'outputS3Path': 'string' }, ], 'resolutionTechniques': { 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING', 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, 'roleArn': 'string', 'workflowArn': 'string', 'workflowName': 'string' }
Response Structure
(dict) –
description (string) –
A description of the workflow.
incrementalRunConfig (dict) –
An object which defines an incremental run type and has only
incrementalRunType
as a field.incrementalRunType (string) –
The type of incremental run. It takes only one value:
IMMEDIATE
.
inputSourceConfig (list) –
A list of
InputSource
objects, which have the fieldsInputSourceARN
andSchemaName
.(dict) –
An object containing
InputSourceARN
,SchemaName
, andApplyNormalization
.applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeType
ofPHONE_NUMBER
, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.inputSourceARN (string) –
An Glue table ARN for the input source table.
schemaName (string) –
The name of the schema to be retrieved.
outputSourceConfig (list) –
A list of
OutputSource
objects, each of which contains fieldsOutputS3Path
,ApplyNormalization
, andOutput
.(dict) –
A list of
OutputAttribute
objects, each of which have the fieldsName
andHashed
. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.KMSArn (string) –
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeType
ofPHONE_NUMBER
, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.output (list) –
A list of
OutputAttribute
objects, each of which have the fieldsName
andHashed
. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.(dict) –
A list of
OutputAttribute
objects, each of which have the fieldsName
andHashed
. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.hashed (boolean) –
Enables the ability to hash the column values in the output.
name (string) –
A name of a column to be written to the output. This must be an
InputField
name in the schema mapping.
outputS3Path (string) –
The S3 path to which Entity Resolution will write the output table.
resolutionTechniques (dict) –
An object which defines the
resolutionType
and theruleBasedProperties
.resolutionType (string) –
The type of matching. There are two types of matching:
RULE_MATCHING
andML_MATCHING
.ruleBasedProperties (dict) –
An object which defines the list of matching rules to run and has a field
Rules
, which is a list of rule objects.attributeMatchingModel (string) –
The comparison type. You can either choose
ONE_TO_ONE
orMANY_TO_MANY
as the AttributeMatchingModel. When choosingMANY_TO_MANY
, the system can match attributes across the sub-types of an attribute type. For example, if the value of theEmail
field of Profile A and the value ofBusinessEmail
field of Profile B matches, the two profiles are matched on theEmail
type. When choosingONE_TO_ONE
,the system can only match if the sub-types are exact matches. For example, only when the value of theEmail
field of Profile A and the value of theEmail
field of Profile B matches, the two profiles are matched on theEmail
type.rules (list) –
A list of
Rule
objects, each of which have fieldsRuleName
andMatchingKeys
.(dict) –
An object containing
RuleName
, andMatchingKeys
.matchingKeys (list) –
A list of
MatchingKeys
. TheMatchingKeys
must have been defined in theSchemaMapping
. Two records are considered to match according to this rule if all of theMatchingKeys
match.(string) –
ruleName (string) –
A name for the matching rule.
roleArn (string) –
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
workflowArn (string) –
The ARN (Amazon Resource Name) that Entity Resolution generated for the
MatchingWorkflow
.workflowName (string) –
The name of the workflow.
Exceptions
EntityResolution.Client.exceptions.ThrottlingException
EntityResolution.Client.exceptions.InternalServerException
EntityResolution.Client.exceptions.AccessDeniedException
EntityResolution.Client.exceptions.ExceedsLimitException
EntityResolution.Client.exceptions.ConflictException
EntityResolution.Client.exceptions.ValidationException