Glue / Client / create_table

create_table#

Glue.Client.create_table(**kwargs)#

Creates a new table definition in the Data Catalog.

See also: AWS API Documentation

Request Syntax

response = client.create_table(
    CatalogId='string',
    DatabaseName='string',
    TableInput={
        'Name': 'string',
        'Description': 'string',
        'Owner': 'string',
        'LastAccessTime': datetime(2015, 1, 1),
        'LastAnalyzedTime': datetime(2015, 1, 1),
        'Retention': 123,
        'StorageDescriptor': {
            'Columns': [
                {
                    'Name': 'string',
                    'Type': 'string',
                    'Comment': 'string',
                    'Parameters': {
                        'string': 'string'
                    }
                },
            ],
            'Location': 'string',
            'AdditionalLocations': [
                'string',
            ],
            'InputFormat': 'string',
            'OutputFormat': 'string',
            'Compressed': True|False,
            'NumberOfBuckets': 123,
            'SerdeInfo': {
                'Name': 'string',
                'SerializationLibrary': 'string',
                'Parameters': {
                    'string': 'string'
                }
            },
            'BucketColumns': [
                'string',
            ],
            'SortColumns': [
                {
                    'Column': 'string',
                    'SortOrder': 123
                },
            ],
            'Parameters': {
                'string': 'string'
            },
            'SkewedInfo': {
                'SkewedColumnNames': [
                    'string',
                ],
                'SkewedColumnValues': [
                    'string',
                ],
                'SkewedColumnValueLocationMaps': {
                    'string': 'string'
                }
            },
            'StoredAsSubDirectories': True|False,
            'SchemaReference': {
                'SchemaId': {
                    'SchemaArn': 'string',
                    'SchemaName': 'string',
                    'RegistryName': 'string'
                },
                'SchemaVersionId': 'string',
                'SchemaVersionNumber': 123
            }
        },
        'PartitionKeys': [
            {
                'Name': 'string',
                'Type': 'string',
                'Comment': 'string',
                'Parameters': {
                    'string': 'string'
                }
            },
        ],
        'ViewOriginalText': 'string',
        'ViewExpandedText': 'string',
        'TableType': 'string',
        'Parameters': {
            'string': 'string'
        },
        'TargetTable': {
            'CatalogId': 'string',
            'DatabaseName': 'string',
            'Name': 'string'
        }
    },
    PartitionIndexes=[
        {
            'Keys': [
                'string',
            ],
            'IndexName': 'string'
        },
    ],
    TransactionId='string'
)
Parameters:
  • CatalogId (string) – The ID of the Data Catalog in which to create the Table. If none is supplied, the Amazon Web Services account ID is used by default.

  • DatabaseName (string) –

    [REQUIRED]

    The catalog database in which to create the new table. For Hive compatibility, this name is entirely lowercase.

  • TableInput (dict) –

    [REQUIRED]

    The TableInput object that defines the metadata table to create in the catalog.

    • Name (string) – [REQUIRED]

      The table name. For Hive compatibility, this is folded to lowercase when it is stored.

    • Description (string) –

      A description of the table.

    • Owner (string) –

      The table owner. Included for Apache Hive compatibility. Not used in the normal course of Glue operations.

    • LastAccessTime (datetime) –

      The last time that the table was accessed.

    • LastAnalyzedTime (datetime) –

      The last time that column statistics were computed for this table.

    • Retention (integer) –

      The retention time for this table.

    • StorageDescriptor (dict) –

      A storage descriptor containing information about the physical storage of this table.

      • Columns (list) –

        A list of the Columns in the table.

        • (dict) –

          A column in a Table.

          • Name (string) – [REQUIRED]

            The name of the Column.

          • Type (string) –

            The data type of the Column.

          • Comment (string) –

            A free-form text comment.

          • Parameters (dict) –

            These key-value pairs define properties associated with the column.

            • (string) –

              • (string) –

      • Location (string) –

        The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

      • AdditionalLocations (list) –

        A list of locations that point to the path where a Delta table is located.

        • (string) –

      • InputFormat (string) –

        The input format: SequenceFileInputFormat (binary), or TextInputFormat, or a custom format.

      • OutputFormat (string) –

        The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat, or a custom format.

      • Compressed (boolean) –

        True if the data in the table is compressed, or False if not.

      • NumberOfBuckets (integer) –

        Must be specified if the table contains any dimension columns.

      • SerdeInfo (dict) –

        The serialization/deserialization (SerDe) information.

        • Name (string) –

          Name of the SerDe.

        • SerializationLibrary (string) –

          Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe.

        • Parameters (dict) –

          These key-value pairs define initialization parameters for the SerDe.

          • (string) –

            • (string) –

      • BucketColumns (list) –

        A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

        • (string) –

      • SortColumns (list) –

        A list specifying the sort order of each bucket in the table.

        • (dict) –

          Specifies the sort order of a sorted column.

          • Column (string) – [REQUIRED]

            The name of the column.

          • SortOrder (integer) – [REQUIRED]

            Indicates that the column is sorted in ascending order ( == 1), or in descending order ( ==0).

      • Parameters (dict) –

        The user-supplied properties in key-value form.

        • (string) –

          • (string) –

      • SkewedInfo (dict) –

        The information about values that appear frequently in a column (skewed values).

        • SkewedColumnNames (list) –

          A list of names of columns that contain skewed values.

          • (string) –

        • SkewedColumnValues (list) –

          A list of values that appear so frequently as to be considered skewed.

          • (string) –

        • SkewedColumnValueLocationMaps (dict) –

          A mapping of skewed values to the columns that contain them.

          • (string) –

            • (string) –

      • StoredAsSubDirectories (boolean) –

        True if the table data is stored in subdirectories, or False if not.

      • SchemaReference (dict) –

        An object that references a schema stored in the Glue Schema Registry.

        When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.

        • SchemaId (dict) –

          A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.

          • SchemaArn (string) –

            The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

          • SchemaName (string) –

            The name of the schema. One of SchemaArn or SchemaName has to be provided.

          • RegistryName (string) –

            The name of the schema registry that contains the schema.

        • SchemaVersionId (string) –

          The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.

        • SchemaVersionNumber (integer) –

          The version number of the schema.

    • PartitionKeys (list) –

      A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

      When you create a table used by Amazon Athena, and you do not specify any partitionKeys, you must at least set the value of partitionKeys to an empty list. For example:

      "PartitionKeys": []

      • (dict) –

        A column in a Table.

        • Name (string) – [REQUIRED]

          The name of the Column.

        • Type (string) –

          The data type of the Column.

        • Comment (string) –

          A free-form text comment.

        • Parameters (dict) –

          These key-value pairs define properties associated with the column.

          • (string) –

            • (string) –

    • ViewOriginalText (string) –

      Included for Apache Hive compatibility. Not used in the normal course of Glue operations. If the table is a VIRTUAL_VIEW, certain Athena configuration encoded in base64.

    • ViewExpandedText (string) –

      Included for Apache Hive compatibility. Not used in the normal course of Glue operations.

    • TableType (string) –

      The type of this table. Glue will create tables with the EXTERNAL_TABLE type. Other services, such as Athena, may create tables with additional table types.

      Glue related table types:

      EXTERNAL_TABLE

      Hive compatible attribute - indicates a non-Hive managed table.

      GOVERNED

      Used by Lake Formation. The Glue Data Catalog understands GOVERNED.

    • Parameters (dict) –

      These key-value pairs define properties associated with the table.

      • (string) –

        • (string) –

    • TargetTable (dict) –

      A TableIdentifier structure that describes a target table for resource linking.

      • CatalogId (string) –

        The ID of the Data Catalog in which the table resides.

      • DatabaseName (string) –

        The name of the catalog database that contains the target table.

      • Name (string) –

        The name of the target table.

  • PartitionIndexes (list) –

    A list of partition indexes, PartitionIndex structures, to create in the table.

    • (dict) –

      A structure for a partition index.

      • Keys (list) – [REQUIRED]

        The keys for the partition index.

        • (string) –

      • IndexName (string) – [REQUIRED]

        The name of the partition index.

  • TransactionId (string) – The ID of the transaction.

Return type:

dict

Returns:

Response Syntax

{}

Response Structure

  • (dict) –

Exceptions

  • Glue.Client.exceptions.AlreadyExistsException

  • Glue.Client.exceptions.InvalidInputException

  • Glue.Client.exceptions.EntityNotFoundException

  • Glue.Client.exceptions.ResourceNumberLimitExceededException

  • Glue.Client.exceptions.InternalServiceException

  • Glue.Client.exceptions.OperationTimeoutException

  • Glue.Client.exceptions.GlueEncryptionException

  • Glue.Client.exceptions.ConcurrentModificationException

  • Glue.Client.exceptions.ResourceNotReadyException