Exposure: What are PIIs? And how it can slip into the Mind of AI?

By the end of this lesson, you will understand the risks associated with inadvertently feeding Personally Identifiable Information (PII) into AI systems, the potential consequences, and how to implement safeguards for protecting user privacy.

Introduction to PII and AI Systems

Personally Identifiable Information (PII) refers to any information that can be used to identify an individual, such as names, addresses, phone numbers, social security numbers, email addresses, or even IP addresses. In AI systems, PII may inadvertently slip into the training or operational data, potentially causing significant privacy concerns and compliance issues.

This lesson explores how PII can enter AI systems and the steps necessary to ensure that this information is handled securely.

Which data can be considered PIIs?

Personally Identifiable Information (PII) in text form encompasses a wide range of sensitive data that can directly identify individuals. This includes information such as names, addresses, phone numbers, social security numbers, email addresses, financial account details, and more. Textual PII is commonly found in documents, emails, forms, messages, and databases, and its protection is paramount to safeguarding individuals’ privacy and preventing identity theft or fraud.

A list of possible PIIs in text are:

Social Security Numbers (SSN)
Credit card numbers
Bank account numbers
Driver’s license numbers
Passport numbers
Date of birth
Home address
Email addresses
Phone numbers
Personal identification numbers (PIN)
Biometric data
Employee identification numbers
Digital signatures
Health insurance information
Taxpayer identification numbers

Besides text, some PIIs can exist in non-text media, also known as Non-Textual Personally Identifiable Information (PII): sensitive data that can identify individuals through visual, biometric, or contextual elements present in multimedia formats such as images and videos. These identifiers include facial features, biometric data, body characteristics, unique physical traits like tattoos and scars, vehicle license plates, location details, and audiovisual cues. Non-textual PII poses unique challenges for privacy protection, requiring sophisticated tools and techniques to detect and anonymize personal information embedded in visual content.

A list of possible Non-Textual Personally Identifiable Information:

Faces: Facial recognition technology can identify individuals in images or videos, making faces potentially sensitive PII.
Biometric Data: Fingerprints, iris scans, and facial recognition data may be embedded in biometric or authentication images.
Vehicle Registration Plates: License plates captured in images or videos can be used to identify vehicle owners.
Geolocation Data: GPS coordinates or location information may be stored in the metadata of images, revealing the exact location of where the media was captured.
Text Recognition: Text present in images (such as signs, documents, or screens) can include sensitive information like names, addresses, phone numbers, or credit card details.
Identity Documents: Images of identity documents like driver’s licenses, passports, or employee badges may contain PII.

How PII Can Slip into the Mind of AI

AI systems often rely on vast datasets that can contain personal data. This data may be collected from a variety of sources, such as customer interactions, web scraping, or third-party databases. Here’s how PII might “slip” into an AI system:

Training Data: AI models are often trained on large datasets, and if PII is present in these datasets, the model could learn and potentially memorize this information. For example, training on unfiltered text or customer interaction logs may result in the model associating certain phrases with individuals.
Data Collection: During interactions with users or systems, AI models may gather data that could include PII (e.g., names, email addresses, or locations). If not carefully managed, this data can be fed directly into the system.
Inadequate Anonymization or Pseudonymization: In some cases, data might be intended to be anonymized, but if the anonymization process is weak or flawed, PII may still remain identifiable.
Data Sharing Across Systems: When data is shared between multiple AI models or third-party systems without sufficient safeguards, it increases the risk of exposing PII.

Consequences of PII Slipping into AI

When PII accidentally gets into an AI system, it can have serious legal, ethical, and operational consequences. Let’s explore the potential risks:

Privacy Violations

The most significant risk is the breach of user privacy. If an AI system inadvertently retains or reveals PII, it could expose individuals to harm, such as identity theft or unwanted surveillance.

Legal and Regulatory Consequences

Data protection regulations like the General Data Protection Regulation (GDPR) in the EU, California Consumer Privacy Act (CCPA), and others mandate that organizations must protect PII and handle it with care. Breaching these regulations can lead to heavy fines, legal liabilities, and loss of customer trust.

Reputation Damage

A data breach or misuse of PII in AI systems can lead to severe reputational damage. Once trust is lost, regaining it can be a long and costly process.

Misuse of PII in AI Systems

Consider the case of a hospital AI system, used for managerial and assistance task. If the AI model was trained or used on a dataset that included PII without proper consent (such as individuals’ faces, their names, addresses, SSNs, card numbers, etc), the system could potentially identify and track people across different locations, violating their privacy rights. But, not just that, but keeping this info without consent, or using it by any means to produce output, can be treated a violation of HIPAA laws and can get the hospital in legal trouble

The Mistakes:

The facial recognition model wasn’t trained and used with anonymized or consented images.
The data governance protocols failed to properly manage and anonymize the sensitive information before it was used for AI training or inference.

The Consequences:

Privacy violations occurred as individuals were unknowingly tracked.
The company faced fines under GDPR for mishandling data.
The public lost trust in the company’s technology.

Corrective Measures:

Revised Data Collection Processes: The company implemented stricter data collection standards and consent protocols.
Enhanced Privacy Measures: They incorporated differential privacy and anonymization techniques for future datasets.
Transparency with Users: The company improved its communication about how data would be used, allowing users to control what data was shared.

The Importance of Data Governance

AI systems are powerful tools, but they come with the responsibility of ensuring that they respect privacy and data security. In the case of PII, careful attention must be paid to how data is collected, processed, and used in AI models. Implementing strong data governance practices can mitigate the risk of PII slipping into AI’s “mind” and ensure that these technologies are used ethically and in compliance with privacy laws.