AgentsforBedrock / Client / create_data_source

create_data_source#

AgentsforBedrock.Client.create_data_source(**kwargs)#

Sets up a data source to be added to a knowledge base.

Warning

You can’t change the chunkingConfiguration after you create the data source.

See also: AWS API Documentation

Request Syntax

response = client.create_data_source(
    clientToken='string',
    dataDeletionPolicy='RETAIN'|'DELETE',
    dataSourceConfiguration={
        's3Configuration': {
            'bucketArn': 'string',
            'bucketOwnerAccountId': 'string',
            'inclusionPrefixes': [
                'string',
            ]
        },
        'type': 'S3'
    },
    description='string',
    knowledgeBaseId='string',
    name='string',
    serverSideEncryptionConfiguration={
        'kmsKeyArn': 'string'
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE'|'NONE',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 123,
                'overlapPercentage': 123
            }
        }
    }
)
Parameters:
  • clientToken (string) –

    A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.

    This field is autopopulated if not provided.

  • dataDeletionPolicy (string) – The data deletion policy assigned to the data source.

  • dataSourceConfiguration (dict) –

    [REQUIRED]

    Contains metadata about where the data source is stored.

    • s3Configuration (dict) –

      Contains details about the configuration of the S3 object containing the data source.

      • bucketArn (string) – [REQUIRED]

        The Amazon Resource Name (ARN) of the bucket that contains the data source.

      • bucketOwnerAccountId (string) –

        The bucket account owner ID for the S3 bucket.

      • inclusionPrefixes (list) –

        A list of S3 prefixes that define the object containing the data sources. For more information, see Organizing objects using prefixes.

        • (string) –

    • type (string) – [REQUIRED]

      The type of storage for the data source.

  • description (string) – A description of the data source.

  • knowledgeBaseId (string) –

    [REQUIRED]

    The unique identifier of the knowledge base to which to add the data source.

  • name (string) –

    [REQUIRED]

    The name of the data source.

  • serverSideEncryptionConfiguration (dict) –

    Contains details about the server-side encryption for the data source.

    • kmsKeyArn (string) –

      The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.

  • vectorIngestionConfiguration (dict) –

    Contains details about how to ingest the documents in the data source.

    • chunkingConfiguration (dict) –

      Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.

      • chunkingStrategy (string) – [REQUIRED]

        Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.

        • FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.

        • NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.

      • fixedSizeChunkingConfiguration (dict) –

        Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.

        • maxTokens (integer) – [REQUIRED]

          The maximum number of tokens to include in a chunk.

        • overlapPercentage (integer) – [REQUIRED]

          The percentage of overlap between adjacent chunks of a data source.

Return type:

dict

Returns:

Response Syntax

{
    'dataSource': {
        'createdAt': datetime(2015, 1, 1),
        'dataDeletionPolicy': 'RETAIN'|'DELETE',
        'dataSourceConfiguration': {
            's3Configuration': {
                'bucketArn': 'string',
                'bucketOwnerAccountId': 'string',
                'inclusionPrefixes': [
                    'string',
                ]
            },
            'type': 'S3'
        },
        'dataSourceId': 'string',
        'description': 'string',
        'failureReasons': [
            'string',
        ],
        'knowledgeBaseId': 'string',
        'name': 'string',
        'serverSideEncryptionConfiguration': {
            'kmsKeyArn': 'string'
        },
        'status': 'AVAILABLE'|'DELETING'|'DELETE_UNSUCCESSFUL',
        'updatedAt': datetime(2015, 1, 1),
        'vectorIngestionConfiguration': {
            'chunkingConfiguration': {
                'chunkingStrategy': 'FIXED_SIZE'|'NONE',
                'fixedSizeChunkingConfiguration': {
                    'maxTokens': 123,
                    'overlapPercentage': 123
                }
            }
        }
    }
}

Response Structure

  • (dict) –

    • dataSource (dict) –

      Contains details about the data source.

      • createdAt (datetime) –

        The time at which the data source was created.

      • dataDeletionPolicy (string) –

        The data deletion policy for a data source.

      • dataSourceConfiguration (dict) –

        Contains details about how the data source is stored.

        • s3Configuration (dict) –

          Contains details about the configuration of the S3 object containing the data source.

          • bucketArn (string) –

            The Amazon Resource Name (ARN) of the bucket that contains the data source.

          • bucketOwnerAccountId (string) –

            The bucket account owner ID for the S3 bucket.

          • inclusionPrefixes (list) –

            A list of S3 prefixes that define the object containing the data sources. For more information, see Organizing objects using prefixes.

            • (string) –

        • type (string) –

          The type of storage for the data source.

      • dataSourceId (string) –

        The unique identifier of the data source.

      • description (string) –

        The description of the data source.

      • failureReasons (list) –

        The detailed reasons on the failure to delete a data source.

        • (string) –

      • knowledgeBaseId (string) –

        The unique identifier of the knowledge base to which the data source belongs.

      • name (string) –

        The name of the data source.

      • serverSideEncryptionConfiguration (dict) –

        Contains details about the configuration of the server-side encryption.

        • kmsKeyArn (string) –

          The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.

      • status (string) –

        The status of the data source. The following statuses are possible:

        • Available – The data source has been created and is ready for ingestion into the knowledge base.

        • Deleting – The data source is being deleted.

      • updatedAt (datetime) –

        The time at which the data source was last updated.

      • vectorIngestionConfiguration (dict) –

        Contains details about how to ingest the documents in the data source.

        • chunkingConfiguration (dict) –

          Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.

          • chunkingStrategy (string) –

            Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.

            • FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.

            • NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.

          • fixedSizeChunkingConfiguration (dict) –

            Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.

            • maxTokens (integer) –

              The maximum number of tokens to include in a chunk.

            • overlapPercentage (integer) –

              The percentage of overlap between adjacent chunks of a data source.

Exceptions

  • AgentsforBedrock.Client.exceptions.ThrottlingException

  • AgentsforBedrock.Client.exceptions.AccessDeniedException

  • AgentsforBedrock.Client.exceptions.ValidationException

  • AgentsforBedrock.Client.exceptions.InternalServerException

  • AgentsforBedrock.Client.exceptions.ResourceNotFoundException

  • AgentsforBedrock.Client.exceptions.ConflictException

  • AgentsforBedrock.Client.exceptions.ServiceQuotaExceededException