CleanRoomsML / Client / create_training_dataset

create_training_dataset¶

CleanRoomsML.Client.create_training_dataset(**kwargs)¶

Defines the information necessary to create a training dataset. In Clean Rooms ML, the TrainingDataset is metadata that points to a Glue table, which is read only during AudienceModel creation.

Request Syntax

response = client.create_training_dataset(
    name='string',
    roleArn='string',
    trainingData=[
        {
            'type': 'INTERACTIONS',
            'inputConfig': {
                'schema': [
                    {
                        'columnName': 'string',
                        'columnTypes': [
                            'USER_ID'|'ITEM_ID'|'TIMESTAMP'|'CATEGORICAL_FEATURE'|'NUMERICAL_FEATURE',
                        ]
                    },
                ],
                'dataSource': {
                    'glueDataSource': {
                        'tableName': 'string',
                        'databaseName': 'string',
                        'catalogId': 'string'
                    }
                }
            }
        },
    ],
    tags={
        'string': 'string'
    },
    description='string'
)

Parameters:

name (string) –
[REQUIRED]

The name of the training dataset. This name must be unique in your account and region.
roleArn (string) –
[REQUIRED]

The ARN of the IAM role that Clean Rooms ML can assume to read the data referred to in the dataSource field of each dataset.

Passing a role across AWS accounts is not allowed. If you pass a role that isn’t in your account, you get an AccessDeniedException error.
trainingData (list) –
[REQUIRED]

An array of information that lists the Dataset objects, which specifies the dataset type and details on its location and schema. You must provide a role that has read access to these tables.
- (dict) –
  
  Defines where the training dataset is located, what type of data it contains, and how to access the data.
  - type (string) – [REQUIRED]
    
    What type of information is found in the dataset.
  - inputConfig (dict) – [REQUIRED]
    
    A DatasetInputConfig object that defines the data source and schema mapping.
    - schema (list) – [REQUIRED]
      
      The schema information for the training data.
      - (dict) –
        
        Metadata for a column.
        
        columnName (string) – [REQUIRED]
        
        The name of a column.
        
        columnTypes (list) – [REQUIRED]
        
        The data type of column.
        
        (string) –
    - dataSource (dict) – [REQUIRED]
      
      A DataSource object that specifies the Glue data source for the training data.
      - glueDataSource (dict) – [REQUIRED]
        
        A GlueDataSource object that defines the catalog ID, database name, and table name for the training data.
        
        tableName (string) – [REQUIRED]
        
        The Glue table that contains the training data.
        
        databaseName (string) – [REQUIRED]
        
        The Glue database that contains the training data.
        
        catalogId (string) –
        
        The Glue catalog that contains the training data.
tags (dict) –
The optional metadata that you apply to the resource to help you categorize and organize them. Each tag consists of a key and an optional value, both of which you define.

The following basic restrictions apply to tags:
- Maximum number of tags per resource - 50.
- For each resource, each tag key must be unique, and each tag key can have only one value.
- Maximum key length - 128 Unicode characters in UTF-8.
- Maximum value length - 256 Unicode characters in UTF-8.
- If your tagging schema is used across multiple services and resources, remember that other services may have restrictions on allowed characters. Generally allowed characters are: letters, numbers, and spaces representable in UTF-8, and the following characters: + - = . _ : / @.
- Tag keys and values are case sensitive.
- Do not use aws:, AWS:, or any upper or lowercase combination of such as a prefix for keys as it is reserved for AWS use. You cannot edit or delete tag keys with this prefix. Values can have this prefix. If a tag value has aws as its prefix but the key does not, then Clean Rooms ML considers it to be a user tag and will count against the limit of 50 tags. Tags with only the key prefix of aws do not count against your tags per resource limit.
- (string) –
  - (string) –
description (string) – The description of the training dataset.

Return type:

dict

Returns:

Response Syntax

{
    'trainingDatasetArn': 'string'
}

Response Structure

(dict) –
- trainingDatasetArn (string) –
  
  The Amazon Resource Name (ARN) of the training dataset resource.

Exceptions

CleanRoomsML.Client.exceptions.ConflictException
CleanRoomsML.Client.exceptions.ValidationException
CleanRoomsML.Client.exceptions.AccessDeniedException