AgentsforBedrock / Client / ingest_knowledge_base_documents

ingest_knowledge_base_documents#

AgentsforBedrock.Client.ingest_knowledge_base_documents(**kwargs)#

Ingests documents directly into the knowledge base that is connected to the data source. The dataSourceType specified in the content for each document must match the type of the data source that you specify in the header. For more information, see Ingest documents into a knowledge base in real-time in the Amazon Bedrock User Guide.

See also: AWS API Documentation

Request Syntax

response = client.ingest_knowledge_base_documents(
    clientToken='string',
    dataSourceId='string',
    documents=[
        {
            'content': {
                'custom': {
                    'customDocumentIdentifier': {
                        'id': 'string'
                    },
                    'inlineContent': {
                        'byteContent': {
                            'data': b'bytes',
                            'mimeType': 'string'
                        },
                        'textContent': {
                            'data': 'string'
                        },
                        'type': 'BYTE'|'TEXT'
                    },
                    's3Location': {
                        'bucketOwnerAccountId': 'string',
                        'uri': 'string'
                    },
                    'sourceType': 'IN_LINE'|'S3_LOCATION'
                },
                'dataSourceType': 'CUSTOM'|'S3',
                's3': {
                    's3Location': {
                        'uri': 'string'
                    }
                }
            },
            'metadata': {
                'inlineAttributes': [
                    {
                        'key': 'string',
                        'value': {
                            'booleanValue': True|False,
                            'numberValue': 123.0,
                            'stringListValue': [
                                'string',
                            ],
                            'stringValue': 'string',
                            'type': 'BOOLEAN'|'NUMBER'|'STRING'|'STRING_LIST'
                        }
                    },
                ],
                's3Location': {
                    'bucketOwnerAccountId': 'string',
                    'uri': 'string'
                },
                'type': 'IN_LINE_ATTRIBUTE'|'S3_LOCATION'
            }
        },
    ],
    knowledgeBaseId='string'
)
Parameters:
  • clientToken (string) –

    A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.

    This field is autopopulated if not provided.

  • dataSourceId (string) –

    [REQUIRED]

    The unique identifier of the data source connected to the knowledge base that you’re adding documents to.

  • documents (list) –

    [REQUIRED]

    A list of objects, each of which contains information about the documents to add.

    • (dict) –

      Contains information about a document to ingest into a knowledge base and metadata to associate with it.

      • content (dict) – [REQUIRED]

        Contains the content of the document.

        • custom (dict) –

          Contains information about the content to ingest into a knowledge base connected to a custom data source.

          • customDocumentIdentifier (dict) – [REQUIRED]

            A unique identifier for the document.

            • id (string) – [REQUIRED]

              The identifier of the document to ingest into a custom data source.

          • inlineContent (dict) –

            Contains information about content defined inline to ingest into a knowledge base.

            • byteContent (dict) –

              Contains information about content defined inline in bytes.

              • data (bytes) – [REQUIRED]

                The base64-encoded string of the content.

              • mimeType (string) – [REQUIRED]

                The MIME type of the content. For a list of MIME types, see Media Types. The following MIME types are supported:

                • text/plain

                • text/html

                • text/csv

                • text/vtt

                • message/rfc822

                • application/xhtml+xml

                • application/pdf

                • application/msword

                • application/vnd.ms-word.document.macroenabled.12

                • application/vnd.ms-word.template.macroenabled.12

                • application/vnd.ms-excel

                • application/vnd.ms-excel.addin.macroenabled.12

                • application/vnd.ms-excel.sheet.macroenabled.12

                • application/vnd.ms-excel.template.macroenabled.12

                • application/vnd.ms-excel.sheet.binary.macroenabled.12

                • application/vnd.ms-spreadsheetml

                • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

                • application/vnd.openxmlformats-officedocument.spreadsheetml.template

                • application/vnd.openxmlformats-officedocument.wordprocessingml.document

                • application/vnd.openxmlformats-officedocument.wordprocessingml.template

            • textContent (dict) –

              Contains information about content defined inline in text.

              • data (string) – [REQUIRED]

                The text of the content.

            • type (string) – [REQUIRED]

              The type of inline content to define.

          • s3Location (dict) –

            Contains information about the Amazon S3 location of the file from which to ingest data.

            • bucketOwnerAccountId (string) –

              The identifier of the Amazon Web Services account that owns the S3 bucket containing the content to ingest.

            • uri (string) – [REQUIRED]

              The S3 URI of the file containing the content to ingest.

          • sourceType (string) – [REQUIRED]

            The source of the data to ingest.

        • dataSourceType (string) – [REQUIRED]

          The type of data source that is connected to the knowledge base to which to ingest this document.

        • s3 (dict) –

          Contains information about the content to ingest into a knowledge base connected to an Amazon S3 data source

          • s3Location (dict) – [REQUIRED]

            The S3 location of the file containing the content to ingest.

            • uri (string) – [REQUIRED]

              The location’s URI. For example, s3://my-bucket/chunk-processor/.

      • metadata (dict) –

        Contains the metadata to associate with the document.

        • inlineAttributes (list) –

          An array of objects, each of which defines a metadata attribute to associate with the content to ingest. You define the attributes inline.

          • (dict) –

            Contains information about a metadata attribute.

            • key (string) – [REQUIRED]

              The key of the metadata attribute.

            • value (dict) – [REQUIRED]

              Contains the value of the metadata attribute.

              • booleanValue (boolean) –

                The value of the Boolean metadata attribute.

              • numberValue (float) –

                The value of the numeric metadata attribute.

              • stringListValue (list) –

                An array of strings that define the value of the metadata attribute.

                • (string) –

              • stringValue (string) –

                The value of the string metadata attribute.

              • type (string) – [REQUIRED]

                The type of the metadata attribute.

        • s3Location (dict) –

          The Amazon S3 location of the file containing metadata to associate with the content to ingest.

          • bucketOwnerAccountId (string) –

            The identifier of the Amazon Web Services account that owns the S3 bucket containing the content to ingest.

          • uri (string) – [REQUIRED]

            The S3 URI of the file containing the content to ingest.

        • type (string) – [REQUIRED]

          The type of the source source from which to add metadata.

  • knowledgeBaseId (string) –

    [REQUIRED]

    The unique identifier of the knowledge base to ingest the documents into.

Return type:

dict

Returns:

Response Syntax

{
    'documentDetails': [
        {
            'dataSourceId': 'string',
            'identifier': {
                'custom': {
                    'id': 'string'
                },
                'dataSourceType': 'CUSTOM'|'S3',
                's3': {
                    'uri': 'string'
                }
            },
            'knowledgeBaseId': 'string',
            'status': 'INDEXED'|'PARTIALLY_INDEXED'|'PENDING'|'FAILED'|'METADATA_PARTIALLY_INDEXED'|'METADATA_UPDATE_FAILED'|'IGNORED'|'NOT_FOUND'|'STARTING'|'IN_PROGRESS'|'DELETING'|'DELETE_IN_PROGRESS',
            'statusReason': 'string',
            'updatedAt': datetime(2015, 1, 1)
        },
    ]
}

Response Structure

  • (dict) –

    • documentDetails (list) –

      A list of objects, each of which contains information about the documents that were ingested.

      • (dict) –

        Contains the details for a document that was ingested or deleted.

        • dataSourceId (string) –

          The identifier of the data source connected to the knowledge base that the document was ingested into or deleted from.

        • identifier (dict) –

          Contains information that identifies the document.

          • custom (dict) –

            Contains information that identifies the document in a custom data source.

            • id (string) –

              The identifier of the document to ingest into a custom data source.

          • dataSourceType (string) –

            The type of data source connected to the knowledge base that contains the document.

          • s3 (dict) –

            Contains information that identifies the document in an S3 data source.

            • uri (string) –

              The location’s URI. For example, s3://my-bucket/chunk-processor/.

        • knowledgeBaseId (string) –

          The identifier of the knowledge base that the document was ingested into or deleted from.

        • status (string) –

          The ingestion status of the document. The following statuses are possible:

          • STARTED – You submitted the ingestion job containing the document.

          • PENDING – The document is waiting to be ingested.

          • IN_PROGRESS – The document is being ingested.

          • INDEXED – The document was successfully indexed.

          • PARTIALLY_INDEXED – The document was partially indexed.

          • METADATA_PARTIALLY_INDEXED – You submitted metadata for an existing document and it was partially indexed.

          • METADATA_UPDATE_FAILED – You submitted a metadata update for an existing document but it failed.

          • FAILED – The document failed to be ingested.

          • NOT_FOUND – The document wasn’t found.

          • IGNORED – The document was ignored during ingestion.

          • DELETING – You submitted the delete job containing the document.

          • DELETE_IN_PROGRESS – The document is being deleted.

        • statusReason (string) –

          The reason for the status. Appears alongside the status IGNORED.

        • updatedAt (datetime) –

          The date and time at which the document was last updated.

Exceptions

  • AgentsforBedrock.Client.exceptions.ThrottlingException

  • AgentsforBedrock.Client.exceptions.AccessDeniedException

  • AgentsforBedrock.Client.exceptions.ValidationException

  • AgentsforBedrock.Client.exceptions.InternalServerException

  • AgentsforBedrock.Client.exceptions.ResourceNotFoundException

  • AgentsforBedrock.Client.exceptions.ServiceQuotaExceededException