AgentsforBedrock / Client / create_data_source
create_data_source#
- AgentsforBedrock.Client.create_data_source(**kwargs)#
Sets up a data source to be added to a knowledge base.
Warning
You can’t change the
chunkingConfiguration
after you create the data source.See also: AWS API Documentation
Request Syntax
response = client.create_data_source( clientToken='string', dataDeletionPolicy='RETAIN'|'DELETE', dataSourceConfiguration={ 's3Configuration': { 'bucketArn': 'string', 'bucketOwnerAccountId': 'string', 'inclusionPrefixes': [ 'string', ] }, 'type': 'S3' }, description='string', knowledgeBaseId='string', name='string', serverSideEncryptionConfiguration={ 'kmsKeyArn': 'string' }, vectorIngestionConfiguration={ 'chunkingConfiguration': { 'chunkingStrategy': 'FIXED_SIZE'|'NONE', 'fixedSizeChunkingConfiguration': { 'maxTokens': 123, 'overlapPercentage': 123 } } } )
- Parameters:
clientToken (string) –
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.
This field is autopopulated if not provided.
dataDeletionPolicy (string) – The deletion policy for the requested data source
dataSourceConfiguration (dict) –
[REQUIRED]
Contains metadata about where the data source is stored.
s3Configuration (dict) –
Contains details about the configuration of the S3 object containing the data source.
bucketArn (string) – [REQUIRED]
The Amazon Resource Name (ARN) of the bucket that contains the data source.
bucketOwnerAccountId (string) –
The account ID for the owner of the S3 bucket.
inclusionPrefixes (list) –
A list of S3 prefixes that define the object containing the data sources. For more information, see Organizing objects using prefixes.
(string) –
type (string) – [REQUIRED]
The type of storage for the data source.
description (string) – A description of the data source.
knowledgeBaseId (string) –
[REQUIRED]
The unique identifier of the knowledge base to which to add the data source.
name (string) –
[REQUIRED]
The name of the data source.
serverSideEncryptionConfiguration (dict) –
Contains details about the server-side encryption for the data source.
kmsKeyArn (string) –
The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
vectorIngestionConfiguration (dict) –
Contains details about how to ingest the documents in the data source.
chunkingConfiguration (dict) –
Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
chunkingStrategy (string) – [REQUIRED]
Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for
NONE
, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.FIXED_SIZE
– Amazon Bedrock splits your source data into chunks of the approximate size that you set in thefixedSizeChunkingConfiguration
.NONE
– Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
fixedSizeChunkingConfiguration (dict) –
Configurations for when you choose fixed-size chunking. If you set the
chunkingStrategy
asNONE
, exclude this field.maxTokens (integer) – [REQUIRED]
The maximum number of tokens to include in a chunk.
overlapPercentage (integer) – [REQUIRED]
The percentage of overlap between adjacent chunks of a data source.
- Return type:
dict
- Returns:
Response Syntax
{ 'dataSource': { 'createdAt': datetime(2015, 1, 1), 'dataDeletionPolicy': 'RETAIN'|'DELETE', 'dataSourceConfiguration': { 's3Configuration': { 'bucketArn': 'string', 'bucketOwnerAccountId': 'string', 'inclusionPrefixes': [ 'string', ] }, 'type': 'S3' }, 'dataSourceId': 'string', 'description': 'string', 'failureReasons': [ 'string', ], 'knowledgeBaseId': 'string', 'name': 'string', 'serverSideEncryptionConfiguration': { 'kmsKeyArn': 'string' }, 'status': 'AVAILABLE'|'DELETING'|'DELETE_UNSUCCESSFUL', 'updatedAt': datetime(2015, 1, 1), 'vectorIngestionConfiguration': { 'chunkingConfiguration': { 'chunkingStrategy': 'FIXED_SIZE'|'NONE', 'fixedSizeChunkingConfiguration': { 'maxTokens': 123, 'overlapPercentage': 123 } } } } }
Response Structure
(dict) –
dataSource (dict) –
Contains details about the data source.
createdAt (datetime) –
The time at which the data source was created.
dataDeletionPolicy (string) –
The deletion policy for the data source.
dataSourceConfiguration (dict) –
Contains details about how the data source is stored.
s3Configuration (dict) –
Contains details about the configuration of the S3 object containing the data source.
bucketArn (string) –
The Amazon Resource Name (ARN) of the bucket that contains the data source.
bucketOwnerAccountId (string) –
The account ID for the owner of the S3 bucket.
inclusionPrefixes (list) –
A list of S3 prefixes that define the object containing the data sources. For more information, see Organizing objects using prefixes.
(string) –
type (string) –
The type of storage for the data source.
dataSourceId (string) –
The unique identifier of the data source.
description (string) –
The description of the data source.
failureReasons (list) –
The details of the failure reasons related to the data source.
(string) –
knowledgeBaseId (string) –
The unique identifier of the knowledge base to which the data source belongs.
name (string) –
The name of the data source.
serverSideEncryptionConfiguration (dict) –
Contains details about the configuration of the server-side encryption.
kmsKeyArn (string) –
The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
status (string) –
The status of the data source. The following statuses are possible:
Available – The data source has been created and is ready for ingestion into the knowledge base.
Deleting – The data source is being deleted.
updatedAt (datetime) –
The time at which the data source was last updated.
vectorIngestionConfiguration (dict) –
Contains details about how to ingest the documents in the data source.
chunkingConfiguration (dict) –
Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
chunkingStrategy (string) –
Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for
NONE
, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.FIXED_SIZE
– Amazon Bedrock splits your source data into chunks of the approximate size that you set in thefixedSizeChunkingConfiguration
.NONE
– Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
fixedSizeChunkingConfiguration (dict) –
Configurations for when you choose fixed-size chunking. If you set the
chunkingStrategy
asNONE
, exclude this field.maxTokens (integer) –
The maximum number of tokens to include in a chunk.
overlapPercentage (integer) –
The percentage of overlap between adjacent chunks of a data source.
Exceptions
AgentsforBedrock.Client.exceptions.ThrottlingException
AgentsforBedrock.Client.exceptions.AccessDeniedException
AgentsforBedrock.Client.exceptions.ValidationException
AgentsforBedrock.Client.exceptions.InternalServerException
AgentsforBedrock.Client.exceptions.ResourceNotFoundException
AgentsforBedrock.Client.exceptions.ConflictException
AgentsforBedrock.Client.exceptions.ServiceQuotaExceededException