APHI Privacy & Data Protection

Privacy-First Architecture

Our approach integrates privacy protection at every layer—from data collection through model deployment—using mathematically rigorous techniques that preserve analytical utility while protecting individual rights.

Differential Privacy

Mathematical Privacy Guarantee: Differential privacy provides a formal guarantee that the inclusion or exclusion of any single individual's data does not significantly affect aggregate model outputs or statistics.

How It Works: We add calibrated statistical noise to query results and model training processes. The noise magnitude is tuned to balance privacy protection (measured by epsilon, ε) against analytical accuracy. For public-facing dashboards, we typically use ε ≤ 1.0, ensuring strong privacy.

Example: A query for "How many people in ZIP code 12345 tested positive for influenza this week?" returns an approximate count (e.g., 47 ± 3) rather than an exact value. An adversary cannot determine whether any specific individual is included, even with auxiliary information.

Application: Used in all publicly released statistics, aggregated dashboards, and research data releases.

Federated Learning

Train Models Without Centralizing Data: Federated learning enables collaborative model development across multiple institutions (hospitals, health departments, countries) without any organization sharing raw patient data.

How It Works: Each participating institution trains a local copy of the model on their own data. Only model updates (parameter gradients) are shared with a central coordinator, which aggregates updates to improve the global model. Raw data never leaves the source institution.

Example: Five hospitals collaborate on a sepsis prediction model. Each hospital trains on its own EHR data (which remains on-site). The central server receives only encrypted parameter updates and aggregates them. The final model benefits from all five datasets without any hospital exposing PHI.

Additional Protections: We apply secure aggregation (encrypted updates), differential privacy on gradients (to prevent membership inference), and federated dropout (preventing single-institution dominance).

Application: Multi-site clinical validation studies, cross-border surveillance systems, health system consortia.

Data Minimization

Collect Only What's Necessary: We adhere to the principle of data minimization—collecting, processing, and retaining only the minimum data required for specified public health purposes.

Practical Implementation:

Aggregation-First: Whenever possible, we work with pre-aggregated data (e.g., daily counts by ZIP code) rather than individual-level records.
Feature Engineering: Derive population-level features (e.g., "% unvaccinated") without storing individual identifiers.
On-the-Fly Computation: Generate insights via streaming analytics without persistent storage when feasible.
Automatic Deletion: Time-limited retention with automated purging of raw data after model training/validation completes.

Application: All APHI systems default to minimal data collection. Expanded data access requires explicit justification and ethics board approval.

Purpose Limitation

Use Data Only for Stated Public Health Purposes: Data and models are used exclusively for the public health objectives for which they were collected or developed. No commercial use, no mission creep, no secondary exploitation.

Governance: All data use agreements specify permitted purposes. Any new use case requires renewed consent or legal authority, ethics review, and transparency reporting.

Examples of Prohibited Uses:

Sale or licensing of data to commercial entities
Use for immigration enforcement, law enforcement profiling, or employment screening
Marketing or advertising targeting
Insurance underwriting or eligibility determination
Re-identification attempts or linking with non-health datasets without IRB approval

Data Lifecycle & Protection

Privacy-Preserving Data Flow

1

Data Ingestion

Minimal Collection: Accept only aggregated or de-identified data streams where possible. For individual-level data (e.g., EHR feeds), enforce strict access controls and legal agreements (BAAs, DUAs).

Encryption in Transit: TLS 1.3 for all data transmission. Mutual TLS authentication for high-sensitivity sources.

2

Storage & Access Control

Encryption at Rest: AES-256 encryption for all stored data. Encryption keys managed via hardware security modules (HSMs) with key rotation every 90 days.

Role-Based Access: Principle of least privilege. Data scientists access only de-identified datasets. Identifiable data restricted to authorized public health officials with legitimate need.

Audit Logging: Immutable logs of all data access, with automated anomaly detection for unusual query patterns.

3

Model Training

Privacy-Preserving Techniques: Differential privacy applied to training process (DP-SGD). Gradient clipping and noise injection prevent memorization of individual records.

Federated Approaches: When training across institutions, use federated learning to keep data decentralized.

Synthetic Data Augmentation: Supplement real data with synthetic examples (GANs, SMOTE) to reduce dependency on sensitive records.

4

Model Deployment & Inference

No Individual Targeting: Models generate population-level insights or risk scores for public health action—never individual diagnoses or punitive measures.

Output Privacy: Apply differential privacy to model predictions when querying small subgroups to prevent re-identification.

Explainability Without Exposure: Provide aggregate feature importance and model explanations without revealing individual contributions.

5

Data Retention & Deletion

Time-Limited Retention: Raw individual-level data retained only for duration necessary for model development and validation (typically 12-24 months post-deployment).

Secure Deletion: Cryptographic erasure (destroy encryption keys) followed by multi-pass overwriting for physical media. Certificates of destruction provided.

Aggregates Retained: De-identified, aggregated statistics may be retained indefinitely for long-term epidemiological research (with differential privacy).

Compliance Framework

When deploying AI systems in operational public health settings, APHI's architecture is designed to align with applicable regulatory frameworks including HIPAA (for protected health information), GDPR (for EU data subjects), and CCPA (for California residents). Our privacy-preserving techniques are built to exceed minimum compliance standards, ensuring data protection and individual rights are safeguarded throughout the AI lifecycle.

Health Data Protection Standards

Our framework supports compliance with HIPAA requirements for protected health information, implementing administrative, physical, and technical safeguards. Privacy-preserving techniques like differential privacy and federated learning enable analytics while minimizing data exposure risks.

Global Privacy Regulations

Systems are designed to align with GDPR principles (privacy by design, data minimization, subject rights) and CCPA requirements (transparency, opt-out rights, no sale of personal information). We implement privacy impact assessments for high-risk processing activities.

Data Interoperability Standards

We utilize HL7 FHIR standards and USCDI (United States Core Data for Interoperability) data elements to support interoperability mandates. APIs are designed to meet ONC security and privacy requirements for health information exchange.

Ethical Review & Oversight

All research and deployment activities are subject to ethical review processes. We follow CDC and WHO guidance on responsible AI development, emphasizing transparency, accountability, and continuous monitoring for fairness and safety.

Individual Rights & Transparency

Right to Information

Individuals have the right to know how their health data is used for public health AI. We provide plain-language privacy notices, model cards, and public documentation of our systems.

Opt-Out Mechanisms

Where legally permitted and technically feasible, individuals may opt-out of inclusion in AI model training. For population surveillance (where opt-out may compromise public health), we apply maximum privacy protections.

Data Access Requests

Individuals can request information about data collected about them (subject to legal and technical constraints of de-identification). Requests processed within 30 days in accordance with applicable regulations.

Breach Notification

In the unlikely event of a data breach, we provide notification to affected individuals and regulators within 72 hours, as required by GDPR and other applicable laws. Mitigation steps detailed in our Security page.

Privacy Questions or Concerns?

We welcome inquiries from individuals, institutions, and regulators about our privacy practices. Our Data Protection Officer is available to discuss compliance, data use agreements, or privacy impact assessments.

Contact Our Privacy Team