EntityResolution / Client / create_matching_workflow
create_matching_workflow#
- EntityResolution.Client.create_matching_workflow(**kwargs)#
Creates a
MatchingWorkflowobject which stores the configuration of the data processing job to be run. It is important to note that there should not be a pre-existingMatchingWorkflowwith the same name. To modify an existing workflow, utilize theUpdateMatchingWorkflowAPI.See also: AWS API Documentation
Request Syntax
response = client.create_matching_workflow( description='string', incrementalRunConfig={ 'incrementalRunType': 'IMMEDIATE' }, inputSourceConfig=[ { 'applyNormalization': True|False, 'inputSourceARN': 'string', 'schemaName': 'string' }, ], outputSourceConfig=[ { 'KMSArn': 'string', 'applyNormalization': True|False, 'output': [ { 'hashed': True|False, 'name': 'string' }, ], 'outputS3Path': 'string' }, ], resolutionTechniques={ 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, roleArn='string', tags={ 'string': 'string' }, workflowName='string' )
- Parameters:
description (string) – A description of the workflow.
incrementalRunConfig (dict) –
An object which defines an incremental run type and has only
incrementalRunTypeas a field.incrementalRunType (string) –
The type of incremental run. It takes only one value:
IMMEDIATE.
inputSourceConfig (list) –
[REQUIRED]
A list of
InputSourceobjects, which have the fieldsInputSourceARNandSchemaName.(dict) –
An object containing
InputSourceARN,SchemaName, andApplyNormalization.applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.inputSourceARN (string) – [REQUIRED]
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) – [REQUIRED]
The name of the schema to be retrieved.
outputSourceConfig (list) –
[REQUIRED]
A list of
OutputSourceobjects, each of which contains fieldsOutputS3Path,ApplyNormalization, andOutput.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.KMSArn (string) –
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.output (list) – [REQUIRED]
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.hashed (boolean) –
Enables the ability to hash the column values in the output.
name (string) – [REQUIRED]
A name of a column to be written to the output. This must be an
InputFieldname in the schema mapping.
outputS3Path (string) – [REQUIRED]
The S3 path to which Entity Resolution will write the output table.
resolutionTechniques (dict) –
[REQUIRED]
An object which defines the
resolutionTypeand theruleBasedProperties.providerProperties (dict) –
The properties of the provider service.
intermediateSourceConfiguration (dict) –
The Amazon S3 location that temporarily stores your data while it processes. Your information won’t be saved permanently.
intermediateS3Path (string) – [REQUIRED]
The Amazon S3 location (bucket and prefix). For example:
s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (document) –
The required configuration fields to use with the provider service.
providerServiceArn (string) – [REQUIRED]
The ARN of the provider service.
resolutionType (string) – [REQUIRED]
The type of matching. There are three types of matching:
RULE_MATCHING,ML_MATCHING, andPROVIDER.ruleBasedProperties (dict) –
An object which defines the list of matching rules to run and has a field
Rules, which is a list of rule objects.attributeMatchingModel (string) – [REQUIRED]
The comparison type. You can either choose
ONE_TO_ONEorMANY_TO_MANYas theattributeMatchingModel.If you choose
MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of theEmailfield of Profile A and the value ofBusinessEmailfield of Profile B matches, the two profiles are matched on theEmailattribute type.If you choose
ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for theEmailattribute type, the system will only consider it a match if the value of theEmailfield of Profile A matches the value of theEmailfield of Profile B.matchPurpose (string) –
An indicator of whether to generate IDs and index the data or not.
If you choose
IDENTIFIER_GENERATION, the process generates IDs and indexes the data.If you choose
INDEXING, the process indexes the data without generating IDs.rules (list) – [REQUIRED]
A list of
Ruleobjects, each of which have fieldsRuleNameandMatchingKeys.(dict) –
An object containing
RuleName, andMatchingKeys.matchingKeys (list) – [REQUIRED]
A list of
MatchingKeys. TheMatchingKeysmust have been defined in theSchemaMapping. Two records are considered to match according to this rule if all of theMatchingKeysmatch.(string) –
ruleName (string) – [REQUIRED]
A name for the matching rule.
roleArn (string) –
[REQUIRED]
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
tags (dict) –
The tags used to organize, track, or control access for this resource.
(string) –
(string) –
workflowName (string) –
[REQUIRED]
The name of the workflow. There can’t be multiple
MatchingWorkflowswith the same name.
- Return type:
dict
- Returns:
Response Syntax
{ 'description': 'string', 'incrementalRunConfig': { 'incrementalRunType': 'IMMEDIATE' }, 'inputSourceConfig': [ { 'applyNormalization': True|False, 'inputSourceARN': 'string', 'schemaName': 'string' }, ], 'outputSourceConfig': [ { 'KMSArn': 'string', 'applyNormalization': True|False, 'output': [ { 'hashed': True|False, 'name': 'string' }, ], 'outputS3Path': 'string' }, ], 'resolutionTechniques': { 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, 'roleArn': 'string', 'workflowArn': 'string', 'workflowName': 'string' }
Response Structure
(dict) –
description (string) –
A description of the workflow.
incrementalRunConfig (dict) –
An object which defines an incremental run type and has only
incrementalRunTypeas a field.incrementalRunType (string) –
The type of incremental run. It takes only one value:
IMMEDIATE.
inputSourceConfig (list) –
A list of
InputSourceobjects, which have the fieldsInputSourceARNandSchemaName.(dict) –
An object containing
InputSourceARN,SchemaName, andApplyNormalization.applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.inputSourceARN (string) –
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) –
The name of the schema to be retrieved.
outputSourceConfig (list) –
A list of
OutputSourceobjects, each of which contains fieldsOutputS3Path,ApplyNormalization, andOutput.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.KMSArn (string) –
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.output (list) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.hashed (boolean) –
Enables the ability to hash the column values in the output.
name (string) –
A name of a column to be written to the output. This must be an
InputFieldname in the schema mapping.
outputS3Path (string) –
The S3 path to which Entity Resolution will write the output table.
resolutionTechniques (dict) –
An object which defines the
resolutionTypeand theruleBasedProperties.providerProperties (dict) –
The properties of the provider service.
intermediateSourceConfiguration (dict) –
The Amazon S3 location that temporarily stores your data while it processes. Your information won’t be saved permanently.
intermediateS3Path (string) –
The Amazon S3 location (bucket and prefix). For example:
s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (document) –
The required configuration fields to use with the provider service.
providerServiceArn (string) –
The ARN of the provider service.
resolutionType (string) –
The type of matching. There are three types of matching:
RULE_MATCHING,ML_MATCHING, andPROVIDER.ruleBasedProperties (dict) –
An object which defines the list of matching rules to run and has a field
Rules, which is a list of rule objects.attributeMatchingModel (string) –
The comparison type. You can either choose
ONE_TO_ONEorMANY_TO_MANYas theattributeMatchingModel.If you choose
MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of theEmailfield of Profile A and the value ofBusinessEmailfield of Profile B matches, the two profiles are matched on theEmailattribute type.If you choose
ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for theEmailattribute type, the system will only consider it a match if the value of theEmailfield of Profile A matches the value of theEmailfield of Profile B.matchPurpose (string) –
An indicator of whether to generate IDs and index the data or not.
If you choose
IDENTIFIER_GENERATION, the process generates IDs and indexes the data.If you choose
INDEXING, the process indexes the data without generating IDs.rules (list) –
A list of
Ruleobjects, each of which have fieldsRuleNameandMatchingKeys.(dict) –
An object containing
RuleName, andMatchingKeys.matchingKeys (list) –
A list of
MatchingKeys. TheMatchingKeysmust have been defined in theSchemaMapping. Two records are considered to match according to this rule if all of theMatchingKeysmatch.(string) –
ruleName (string) –
A name for the matching rule.
roleArn (string) –
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
workflowArn (string) –
The ARN (Amazon Resource Name) that Entity Resolution generated for the
MatchingWorkflow.workflowName (string) –
The name of the workflow.
Exceptions
EntityResolution.Client.exceptions.ThrottlingExceptionEntityResolution.Client.exceptions.InternalServerExceptionEntityResolution.Client.exceptions.AccessDeniedExceptionEntityResolution.Client.exceptions.ExceedsLimitExceptionEntityResolution.Client.exceptions.ConflictExceptionEntityResolution.Client.exceptions.ValidationException