SupplyChain / Client / create_data_integration_flow

create_data_integration_flow#

SupplyChain.Client.create_data_integration_flow(**kwargs)#

Enables you to programmatically create a data pipeline to ingest data from source systems such as Amazon S3 buckets, to a predefined Amazon Web Services Supply Chain dataset (product, inbound_order) or a temporary dataset along with the data transformation query provided with the API.

Request Syntax

response = client.create_data_integration_flow(
    instanceId='string',
    name='string',
    sources=[
        {
            'sourceType': 'S3'|'DATASET',
            'sourceName': 'string',
            's3Source': {
                'bucketName': 'string',
                'prefix': 'string',
                'options': {
                    'fileType': 'CSV'|'PARQUET'|'JSON'
                }
            },
            'datasetSource': {
                'datasetIdentifier': 'string',
                'options': {
                    'loadType': 'INCREMENTAL'|'REPLACE',
                    'dedupeRecords': True|False
                }
            }
        },
    ],
    transformation={
        'transformationType': 'SQL'|'NONE',
        'sqlTransformation': {
            'query': 'string'
        }
    },
    target={
        'targetType': 'S3'|'DATASET',
        's3Target': {
            'bucketName': 'string',
            'prefix': 'string',
            'options': {
                'fileType': 'CSV'|'PARQUET'|'JSON'
            }
        },
        'datasetTarget': {
            'datasetIdentifier': 'string',
            'options': {
                'loadType': 'INCREMENTAL'|'REPLACE',
                'dedupeRecords': True|False
            }
        }
    },
    tags={
        'string': 'string'
    }
)

Parameters:

instanceId (string) –
[REQUIRED]

The Amazon Web Services Supply Chain instance identifier.
name (string) –
[REQUIRED]

Name of the DataIntegrationFlow.
sources (list) –
[REQUIRED]

The source configurations for DataIntegrationFlow.
- (dict) –
  
  The DataIntegrationFlow source parameters.
  - sourceType (string) – [REQUIRED]
    
    The DataIntegrationFlow source type.
  - sourceName (string) – [REQUIRED]
    
    The DataIntegrationFlow source name that can be used as table alias in SQL transformation query.
  - s3Source (dict) –
    
    The S3 DataIntegrationFlow source.
    - bucketName (string) – [REQUIRED]
      
      The bucketName of the S3 source objects.
    - prefix (string) – [REQUIRED]
      
      The prefix of the S3 source objects.
    - options (dict) –
      
      The other options of the S3 DataIntegrationFlow source.
      - fileType (string) –
        
        The Amazon S3 file type in S3 options.
  - datasetSource (dict) –
    
    The dataset DataIntegrationFlow source.
    - datasetIdentifier (string) – [REQUIRED]
      
      The ARN of the dataset.
    - options (dict) –
      
      The dataset DataIntegrationFlow source options.
      - loadType (string) –
        
        The dataset data load type in dataset options.
      - dedupeRecords (boolean) –
        
        The dataset load option to remove duplicates.
transformation (dict) –
[REQUIRED]

The transformation configurations for DataIntegrationFlow.
- transformationType (string) – [REQUIRED]
  
  The DataIntegrationFlow transformation type.
- sqlTransformation (dict) –
  
  The SQL DataIntegrationFlow transformation configuration.
  - query (string) – [REQUIRED]
    
    The transformation SQL query body based on SparkSQL.
target (dict) –
[REQUIRED]

The target configurations for DataIntegrationFlow.
- targetType (string) – [REQUIRED]
  
  The DataIntegrationFlow target type.
- s3Target (dict) –
  
  The S3 DataIntegrationFlow target.
  - bucketName (string) – [REQUIRED]
    
    The bucketName of the S3 target objects.
  - prefix (string) – [REQUIRED]
    
    The prefix of the S3 target objects.
  - options (dict) –
    
    The S3 DataIntegrationFlow target options.
    - fileType (string) –
      
      The Amazon S3 file type in S3 options.
- datasetTarget (dict) –
  
  The dataset DataIntegrationFlow target.
  - datasetIdentifier (string) – [REQUIRED]
    
    The dataset ARN.
  - options (dict) –
    
    The dataset DataIntegrationFlow target options.
    - loadType (string) –
      
      The dataset data load type in dataset options.
    - dedupeRecords (boolean) –
      
      The dataset load option to remove duplicates.
tags (dict) –
The tags of the DataIntegrationFlow to be created
- (string) –
  - (string) –

Return type:

dict

Returns:

Response Syntax

{
    'instanceId': 'string',
    'name': 'string'
}

Response Structure

(dict) –

The response parameters for CreateDataIntegrationFlow.
- instanceId (string) –
  
  The Amazon Web Services Supply Chain instance identifier.
- name (string) –
  
  The name of the DataIntegrationFlow created.

Exceptions

SupplyChain.Client.exceptions.ServiceQuotaExceededException
SupplyChain.Client.exceptions.ThrottlingException
SupplyChain.Client.exceptions.ResourceNotFoundException
SupplyChain.Client.exceptions.AccessDeniedException
SupplyChain.Client.exceptions.ValidationException
SupplyChain.Client.exceptions.InternalServerException
SupplyChain.Client.exceptions.ConflictException