describe_pii_entities_detection_job

Comprehend.Client.describe_pii_entities_detection_job(**kwargs)

Gets the properties associated with a PII entities detection job. For example, you can use this operation to get the job status.

See also: AWS API Documentation

Request Syntax

response = client.describe_pii_entities_detection_job(
    JobId='string'
)
Parameters
JobId (string) --

[REQUIRED]

The identifier that Amazon Comprehend generated for the job. The operation returns this identifier in its response.

Return type
dict
Returns
Response Syntax
{
    'PiiEntitiesDetectionJobProperties': {
        'JobId': 'string',
        'JobArn': 'string',
        'JobName': 'string',
        'JobStatus': 'SUBMITTED'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'STOP_REQUESTED'|'STOPPED',
        'Message': 'string',
        'SubmitTime': datetime(2015, 1, 1),
        'EndTime': datetime(2015, 1, 1),
        'InputDataConfig': {
            'S3Uri': 'string',
            'InputFormat': 'ONE_DOC_PER_FILE'|'ONE_DOC_PER_LINE',
            'DocumentReaderConfig': {
                'DocumentReadAction': 'TEXTRACT_DETECT_DOCUMENT_TEXT'|'TEXTRACT_ANALYZE_DOCUMENT',
                'DocumentReadMode': 'SERVICE_DEFAULT'|'FORCE_DOCUMENT_READ_ACTION',
                'FeatureTypes': [
                    'TABLES'|'FORMS',
                ]
            }
        },
        'OutputDataConfig': {
            'S3Uri': 'string',
            'KmsKeyId': 'string'
        },
        'RedactionConfig': {
            'PiiEntityTypes': [
                'BANK_ACCOUNT_NUMBER'|'BANK_ROUTING'|'CREDIT_DEBIT_NUMBER'|'CREDIT_DEBIT_CVV'|'CREDIT_DEBIT_EXPIRY'|'PIN'|'EMAIL'|'ADDRESS'|'NAME'|'PHONE'|'SSN'|'DATE_TIME'|'PASSPORT_NUMBER'|'DRIVER_ID'|'URL'|'AGE'|'USERNAME'|'PASSWORD'|'AWS_ACCESS_KEY'|'AWS_SECRET_KEY'|'IP_ADDRESS'|'MAC_ADDRESS'|'ALL'|'LICENSE_PLATE'|'VEHICLE_IDENTIFICATION_NUMBER'|'UK_NATIONAL_INSURANCE_NUMBER'|'CA_SOCIAL_INSURANCE_NUMBER'|'US_INDIVIDUAL_TAX_IDENTIFICATION_NUMBER'|'UK_UNIQUE_TAXPAYER_REFERENCE_NUMBER'|'IN_PERMANENT_ACCOUNT_NUMBER'|'IN_NREGA'|'INTERNATIONAL_BANK_ACCOUNT_NUMBER'|'SWIFT_CODE'|'UK_NATIONAL_HEALTH_SERVICE_NUMBER'|'CA_HEALTH_NUMBER'|'IN_AADHAAR'|'IN_VOTER_NUMBER',
            ],
            'MaskMode': 'MASK'|'REPLACE_WITH_PII_ENTITY_TYPE',
            'MaskCharacter': 'string'
        },
        'LanguageCode': 'en'|'es'|'fr'|'de'|'it'|'pt'|'ar'|'hi'|'ja'|'ko'|'zh'|'zh-TW',
        'DataAccessRoleArn': 'string',
        'Mode': 'ONLY_REDACTION'|'ONLY_OFFSETS'
    }
}

Response Structure

  • (dict) --
    • PiiEntitiesDetectionJobProperties (dict) --

      Provides information about a PII entities detection job.

      • JobId (string) --

        The identifier assigned to the PII entities detection job.

      • JobArn (string) --

        The Amazon Resource Name (ARN) of the PII entities detection job. It is a unique, fully qualified identifier for the job. It includes the AWS account, Region, and the job ID. The format of the ARN is as follows:

        arn:<partition>:comprehend:<region>:<account-id>:pii-entities-detection-job/<job-id>

        The following is an example job ARN:

        arn:aws:comprehend:us-west-2:111122223333:pii-entities-detection-job/1234abcd12ab34cd56ef1234567890ab
      • JobName (string) --

        The name that you assigned the PII entities detection job.

      • JobStatus (string) --

        The current status of the PII entities detection job. If the status is FAILED , the Message field shows the reason for the failure.

      • Message (string) --

        A description of the status of a job.

      • SubmitTime (datetime) --

        The time that the PII entities detection job was submitted for processing.

      • EndTime (datetime) --

        The time that the PII entities detection job completed.

      • InputDataConfig (dict) --

        The input properties for a PII entities detection job.

        • S3Uri (string) --

          The Amazon S3 URI for the input data. The URI must be in same region as the API endpoint that you are calling. The URI can point to a single input file or it can provide the prefix for a collection of data files.

          For example, if you use the URI S3://bucketName/prefix , if the prefix is a single file, Amazon Comprehend uses that file as input. If more than one file begins with the prefix, Amazon Comprehend uses all of them as input.

        • InputFormat (string) --

          Specifies how the text in an input file should be processed:

          • ONE_DOC_PER_FILE - Each file is considered a separate document. Use this option when you are processing large documents, such as newspaper articles or scientific papers.
          • ONE_DOC_PER_LINE - Each line in a file is considered a separate document. Use this option when you are processing many short documents, such as text messages.
        • DocumentReaderConfig (dict) --

          Provides configuration parameters to override the default actions for extracting text from PDF documents and image files.

          • DocumentReadAction (string) --

            This field defines the Amazon Textract API operation that Amazon Comprehend uses to extract text from PDF files and image files. Enter one of the following values:

            • TEXTRACT_DETECT_DOCUMENT_TEXT - The Amazon Comprehend service uses the DetectDocumentText API operation.
            • TEXTRACT_ANALYZE_DOCUMENT - The Amazon Comprehend service uses the AnalyzeDocument API operation.
          • DocumentReadMode (string) --

            Determines the text extraction actions for PDF files. Enter one of the following values:

            • SERVICE_DEFAULT - use the Amazon Comprehend service defaults for PDF files.
            • FORCE_DOCUMENT_READ_ACTION - Amazon Comprehend uses the Textract API specified by DocumentReadAction for all PDF files, including digital PDF files.
          • FeatureTypes (list) --

            Specifies the type of Amazon Textract features to apply. If you chose TEXTRACT_ANALYZE_DOCUMENT as the read action, you must specify one or both of the following values:

            • TABLES - Returns information about any tables that are detected in the input document.
            • FORMS - Returns information and the data from any forms that are detected in the input document.
            • (string) --

              Specifies the type of Amazon Textract features to apply. If you chose TEXTRACT_ANALYZE_DOCUMENT as the read action, you must specify one or both of the following values:

              • TABLES - Returns additional information about any tables that are detected in the input document.
              • FORMS - Returns additional information about any forms that are detected in the input document.
      • OutputDataConfig (dict) --

        The output data configuration that you supplied when you created the PII entities detection job.

        • S3Uri (string) --

          When you use the PiiOutputDataConfig object with asynchronous operations, you specify the Amazon S3 location where you want to write the output data.

          For a PII entity detection job, the output file is plain text, not a compressed archive. The output file name is the same as the input file, with .out appended at the end.

        • KmsKeyId (string) --

          ID for the AWS Key Management Service (KMS) key that Amazon Comprehend uses to encrypt the output results from an analysis job.

      • RedactionConfig (dict) --

        Provides configuration parameters for PII entity redaction.

        This parameter is required if you set the Mode parameter to ONLY_REDACTION . In that case, you must provide a RedactionConfig definition that includes the PiiEntityTypes parameter.

        • PiiEntityTypes (list) --

          An array of the types of PII entities that Amazon Comprehend detects in the input text for your request.

          • (string) --
        • MaskMode (string) --

          Specifies whether the PII entity is redacted with the mask character or the entity type.

        • MaskCharacter (string) --

          A character that replaces each character in the redacted PII entity.

      • LanguageCode (string) --

        The language code of the input documents

      • DataAccessRoleArn (string) --

        The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that grants Amazon Comprehend read access to your input data.

      • Mode (string) --

        Specifies whether the output provides the locations (offsets) of PII entities or a file in which PII entities are redacted.

Exceptions

  • Comprehend.Client.exceptions.InvalidRequestException
  • Comprehend.Client.exceptions.JobNotFoundException
  • Comprehend.Client.exceptions.TooManyRequestsException
  • Comprehend.Client.exceptions.InternalServerException