EntityResolution / Client / create_matching_workflow
create_matching_workflow¶
- EntityResolution.Client.create_matching_workflow(**kwargs)¶
- Creates a matching workflow that defines the configuration for a data processing job. The workflow name must be unique. To modify an existing workflow, use - UpdateMatchingWorkflow.- Warning- For workflows where - resolutionTypeis- ML_MATCHINGor- PROVIDER, incremental processing is not supported.- See also: AWS API Documentation - Request Syntax- response = client.create_matching_workflow( workflowName='string', description='string', inputSourceConfig=[ { 'inputSourceARN': 'string', 'schemaName': 'string', 'applyNormalization': True|False }, ], outputSourceConfig=[ { 'outputS3Path': 'string', 'KMSArn': 'string', 'output': [ { 'name': 'string', 'hashed': True|False }, ], 'applyNormalization': True|False }, ], resolutionTechniques={ 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'rules': [ { 'ruleName': 'string', 'matchingKeys': [ 'string', ] }, ], 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING' }, 'ruleConditionProperties': { 'rules': [ { 'ruleName': 'string', 'condition': 'string' }, ] }, 'providerProperties': { 'providerServiceArn': 'string', 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' } } }, incrementalRunConfig={ 'incrementalRunType': 'IMMEDIATE' }, roleArn='string', tags={ 'string': 'string' } ) - Parameters:
- workflowName (string) – - [REQUIRED] - The name of the workflow. There can’t be multiple - MatchingWorkflowswith the same name.
- description (string) – A description of the workflow. 
- inputSourceConfig (list) – - [REQUIRED] - A list of - InputSourceobjects, which have the fields- InputSourceARNand- SchemaName.- (dict) – - An object containing - inputSourceARN,- schemaName, and- applyNormalization.- inputSourceARN (string) – [REQUIRED] - An Glue table Amazon Resource Name (ARN) for the input source table. 
- schemaName (string) – [REQUIRED] - The name of the schema to be retrieved. 
- applyNormalization (boolean) – - Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an - AttributeTypeof- PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
 
 
- outputSourceConfig (list) – - [REQUIRED] - A list of - OutputSourceobjects, each of which contains fields- outputS3Path,- applyNormalization,- KMSArn, and- output.- (dict) – - A list of - OutputAttributeobjects, each of which have the fields- Nameand- Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.- outputS3Path (string) – [REQUIRED] - The S3 path to which Entity Resolution will write the output table. 
- KMSArn (string) – - Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key. 
- output (list) – [REQUIRED] - A list of - OutputAttributeobjects, each of which have the fields- Nameand- Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.- (dict) – - A list of - OutputAttributeobjects, each of which have the fields- Nameand- Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.- name (string) – [REQUIRED] - A name of a column to be written to the output. This must be an - InputFieldname in the schema mapping.
- hashed (boolean) – - Enables the ability to hash the column values in the output. 
 
 
- applyNormalization (boolean) – - Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an - AttributeTypeof- PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
 
 
- resolutionTechniques (dict) – - [REQUIRED] - An object which defines the - resolutionTypeand the- ruleBasedProperties.- resolutionType (string) – [REQUIRED] - The type of matching workflow to create. Specify one of the following types: - RULE_MATCHING: Match records using configurable rule-based criteria
- ML_MATCHING: Match records using machine learning models
- PROVIDER: Match records using a third-party matching provider
 
- ruleBasedProperties (dict) – - An object which defines the list of matching rules to run and has a field - rules, which is a list of rule objects.- rules (list) – [REQUIRED] - A list of - Ruleobjects, each of which have fields- RuleNameand- MatchingKeys.- (dict) – - An object containing the - ruleNameand- matchingKeys.- ruleName (string) – [REQUIRED] - A name for the matching rule. 
- matchingKeys (list) – [REQUIRED] - A list of - MatchingKeys. The- MatchingKeysmust have been defined in the- SchemaMapping. Two records are considered to match according to this rule if all of the- MatchingKeysmatch.- (string) – 
 
 
 
- attributeMatchingModel (string) – [REQUIRED] - The comparison type. You can choose - ONE_TO_ONEor- MANY_TO_MANYas the- attributeMatchingModel.- If you choose - ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the- Emailattribute type, the system will only consider it a match if the value of the- Emailfield of Profile A matches the value of the- Emailfield of Profile B.- If you choose - MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the- Emailfield of Profile A and the value of- BusinessEmailfield of Profile B matches, the two profiles are matched on the- Emailattribute type.
- matchPurpose (string) – - An indicator of whether to generate IDs and index the data or not. - If you choose - IDENTIFIER_GENERATION, the process generates IDs and indexes the data.- If you choose - INDEXING, the process indexes the data without generating IDs.
 
- ruleConditionProperties (dict) – - An object containing the - rulesfor a matching workflow.- rules (list) – [REQUIRED] - A list of rule objects, each of which have fields - ruleNameand- condition.- (dict) – - An object that defines the - ruleConditionand the- ruleNameto use in a matching workflow.- ruleName (string) – [REQUIRED] - A name for the matching rule. - For example: - Rule1
- condition (string) – [REQUIRED] - A statement that specifies the conditions for a matching rule. - If your data is accurate, use an Exact matching function: - Exactor- ExactManyToMany.- If your data has variations in spelling or pronunciation, use a Fuzzy matching function: - Cosine,- Levenshtein, or- Soundex.- Use operators if you want to combine ( - AND), separate (- OR), or group matching functions- (...).- For example: - (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
 
 
 
- providerProperties (dict) – - The properties of the provider service. - providerServiceArn (string) – [REQUIRED] - The ARN of the provider service. 
- providerConfiguration (document) – - The required configuration fields to use with the provider service. 
- intermediateSourceConfiguration (dict) – - The Amazon S3 location that temporarily stores your data while it processes. Your information won’t be saved permanently. - intermediateS3Path (string) – [REQUIRED] - The Amazon S3 location (bucket and prefix). For example: - s3://provider_bucket/DOC-EXAMPLE-BUCKET
 
 
 
- incrementalRunConfig (dict) – - Optional. An object that defines the incremental run type. This object contains only the - incrementalRunTypefield, which appears as “Automatic” in the console.- Warning- For workflows where - resolutionTypeis- ML_MATCHINGor- PROVIDER, incremental processing is not supported.- incrementalRunType (string) – - The type of incremental run. The only valid value is - IMMEDIATE. This appears as “Automatic” in the console.- Warning- For workflows where - resolutionTypeis- ML_MATCHINGor- PROVIDER, incremental processing is not supported.
 
- roleArn (string) – - [REQUIRED] - The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution. 
- tags (dict) – - The tags used to organize, track, or control access for this resource. - (string) – - (string) – 
 
 
 
- Return type:
- dict 
- Returns:
- Response Syntax- { 'workflowName': 'string', 'workflowArn': 'string', 'description': 'string', 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string', 'applyNormalization': True|False }, ], 'outputSourceConfig': [ { 'outputS3Path': 'string', 'KMSArn': 'string', 'output': [ { 'name': 'string', 'hashed': True|False }, ], 'applyNormalization': True|False }, ], 'resolutionTechniques': { 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'rules': [ { 'ruleName': 'string', 'matchingKeys': [ 'string', ] }, ], 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING' }, 'ruleConditionProperties': { 'rules': [ { 'ruleName': 'string', 'condition': 'string' }, ] }, 'providerProperties': { 'providerServiceArn': 'string', 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' } } }, 'incrementalRunConfig': { 'incrementalRunType': 'IMMEDIATE' }, 'roleArn': 'string' } - Response Structure- (dict) – - workflowName (string) – - The name of the workflow. 
- workflowArn (string) – - The ARN (Amazon Resource Name) that Entity Resolution generated for the - MatchingWorkflow.
- description (string) – - A description of the workflow. 
- inputSourceConfig (list) – - A list of - InputSourceobjects, which have the fields- InputSourceARNand- SchemaName.- (dict) – - An object containing - inputSourceARN,- schemaName, and- applyNormalization.- inputSourceARN (string) – - An Glue table Amazon Resource Name (ARN) for the input source table. 
- schemaName (string) – - The name of the schema to be retrieved. 
- applyNormalization (boolean) – - Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an - AttributeTypeof- PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
 
 
- outputSourceConfig (list) – - A list of - OutputSourceobjects, each of which contains fields- outputS3Path,- applyNormalization,- KMSArn, and- output.- (dict) – - A list of - OutputAttributeobjects, each of which have the fields- Nameand- Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.- outputS3Path (string) – - The S3 path to which Entity Resolution will write the output table. 
- KMSArn (string) – - Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key. 
- output (list) – - A list of - OutputAttributeobjects, each of which have the fields- Nameand- Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.- (dict) – - A list of - OutputAttributeobjects, each of which have the fields- Nameand- Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.- name (string) – - A name of a column to be written to the output. This must be an - InputFieldname in the schema mapping.
- hashed (boolean) – - Enables the ability to hash the column values in the output. 
 
 
- applyNormalization (boolean) – - Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an - AttributeTypeof- PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
 
 
- resolutionTechniques (dict) – - An object which defines the - resolutionTypeand the- ruleBasedProperties.- resolutionType (string) – - The type of matching workflow to create. Specify one of the following types: - RULE_MATCHING: Match records using configurable rule-based criteria
- ML_MATCHING: Match records using machine learning models
- PROVIDER: Match records using a third-party matching provider
 
- ruleBasedProperties (dict) – - An object which defines the list of matching rules to run and has a field - rules, which is a list of rule objects.- rules (list) – - A list of - Ruleobjects, each of which have fields- RuleNameand- MatchingKeys.- (dict) – - An object containing the - ruleNameand- matchingKeys.- ruleName (string) – - A name for the matching rule. 
- matchingKeys (list) – - A list of - MatchingKeys. The- MatchingKeysmust have been defined in the- SchemaMapping. Two records are considered to match according to this rule if all of the- MatchingKeysmatch.- (string) – 
 
 
 
- attributeMatchingModel (string) – - The comparison type. You can choose - ONE_TO_ONEor- MANY_TO_MANYas the- attributeMatchingModel.- If you choose - ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the- Emailattribute type, the system will only consider it a match if the value of the- Emailfield of Profile A matches the value of the- Emailfield of Profile B.- If you choose - MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the- Emailfield of Profile A and the value of- BusinessEmailfield of Profile B matches, the two profiles are matched on the- Emailattribute type.
- matchPurpose (string) – - An indicator of whether to generate IDs and index the data or not. - If you choose - IDENTIFIER_GENERATION, the process generates IDs and indexes the data.- If you choose - INDEXING, the process indexes the data without generating IDs.
 
- ruleConditionProperties (dict) – - An object containing the - rulesfor a matching workflow.- rules (list) – - A list of rule objects, each of which have fields - ruleNameand- condition.- (dict) – - An object that defines the - ruleConditionand the- ruleNameto use in a matching workflow.- ruleName (string) – - A name for the matching rule. - For example: - Rule1
- condition (string) – - A statement that specifies the conditions for a matching rule. - If your data is accurate, use an Exact matching function: - Exactor- ExactManyToMany.- If your data has variations in spelling or pronunciation, use a Fuzzy matching function: - Cosine,- Levenshtein, or- Soundex.- Use operators if you want to combine ( - AND), separate (- OR), or group matching functions- (...).- For example: - (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
 
 
 
- providerProperties (dict) – - The properties of the provider service. - providerServiceArn (string) – - The ARN of the provider service. 
- providerConfiguration (document) – - The required configuration fields to use with the provider service. 
- intermediateSourceConfiguration (dict) – - The Amazon S3 location that temporarily stores your data while it processes. Your information won’t be saved permanently. - intermediateS3Path (string) – - The Amazon S3 location (bucket and prefix). For example: - s3://provider_bucket/DOC-EXAMPLE-BUCKET
 
 
 
- incrementalRunConfig (dict) – - An object which defines an incremental run type and has only - incrementalRunTypeas a field.- incrementalRunType (string) – - The type of incremental run. The only valid value is - IMMEDIATE. This appears as “Automatic” in the console.- Warning- For workflows where - resolutionTypeis- ML_MATCHINGor- PROVIDER, incremental processing is not supported.
 
- roleArn (string) – - The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution. 
 
 
 - Exceptions- EntityResolution.Client.exceptions.ThrottlingException
- EntityResolution.Client.exceptions.InternalServerException
- EntityResolution.Client.exceptions.AccessDeniedException
- EntityResolution.Client.exceptions.ExceedsLimitException
- EntityResolution.Client.exceptions.ConflictException
- EntityResolution.Client.exceptions.ValidationException