Every day there is a new report of a major personal data breach. The core issue is that user data is commonly spread across multiple systems that are increasingly difficult to fully secure, such as database user tables, data warehouses, and unstructured documents. In addition, handling cross-border data transfers of user data is increasingly burdensome and infeasible for several countries and types of data such as health and financial services data.
Most enterprises are already running an incredibly secure system to manage core customer PII, commonly referred to as a Customer Identity Access Management system (CIAM). Since CIAMs manage user authentication, CIAMs are typically the most hardened user data system used by enterprises and typically offer compliance with ISO 27000 family, SOC-2, HIPPA, PCI DSS, and other critical standards.
Identity systems were already designed to store numerous PII fields and mask the fields for other systems. Over twenty years ago, I was the CTO of the Liberty Project, which developed the protocols that became SAML 2.0, the architecture at the core of CIAM systems. When we designed SAML 2.0, we built it so that identity data would be fully secure, and opaque tokens would be shared with other systems. Using tokens instead of actual user data is a core feature of identity software that can be used to fully user data across your applications.
Enterprises also commonly run an enterprise API Gateway, which when combined to govern access to the CIAM, provides a very functional and secure methodology that can secure all user PII fields and centralize masking and distribution of PII data to other systems.
Storing additional PII fields in an identity system
CIAMs typically are the source of truth for user names and their email addresses and phone numbers which are typically used for multi-factor authentication. Most CIAMs allow additional PII fields to be specified, so it is easy to add new fields like Social Security Numbers and physical addresses. Some CIAMs even support JSON fields for a fully flexible schema and BLOBs for file attachments.
Asgardio CIAM user attribute editor
How to isolate PII from your environments
A great feature of CIAMs is that they often provide a full suite of user interface modules for users to directly enter and manage profile fields. Many CIAMs also offer extensive user interface workflow builders to design how users navigate the registration flow and profile editing flow.
Okta Auth0’s user interface components
The benefit of using your CIAM’s user interface component to manage additional PII fields is that the CIAM vendor assumes all responsibility for managing PII and your application does not transit, store, or manage PII. Using these features can also limit your application’s HIPAA and PCI DSS compliance requirements and delegate compliance to the CIAM.
Store PII data in a CIAM and non-PII data in the application backend
Governing access to the identity system
A key feature of CIAMs is extensive logging, threat detection, availability, and patch management. Provisioning and governing access to data are at the core of every identity system, with extensive user roles, permissions, and token assertion management capabilities. Microsoft Entra provides over 100 pre-configured roles including Application Developer and Application Administrator.
In addition, access to the identity system’s own governance, a key enterprise architectural component is to govern access with an enterprise API manager that manages and logs API access.
Managing PII data distribution, access, and processing
CIAM’s typically have both sophisticated APIs to manipulate user data and many also support the SCIM 2.0 (System for Cross-domain Identity Management) identity standard that can automatically sync identity data to other applications. Some CIAM systems can filter which fields are sent to each application and mask identity assertions so that identity data is anonymized.
This is a convenient way to provision customer accounts in CRM and customer service systems without unnecessarily exposing PII. For PII contained within customer applications such as Salesforce, it is critical to employ fine-grained access controls so that only entitled employees can access the PII data.
Push from Identity System to application with RegEx Masking
Rather than storing even anonymized PII in an application, if applications are already using web services to access user data, directing those requests to an enterprise API gateway to dynamically request PII fields from an identity system is an effective method to reduce PII sprawl.
Request from Application via API Gateway with masking
For bulk updates of PII data from an identity system, it’s critical that PII data be masked as much as possible.
ETL with masking from identity system to data warehouse
A key benefit of identity systems is compliant lifecycle management. User records are created and deleted in a centralized manner based on retention requirements, and record deletions can be automatically synced to other systems with SCIM, API gateway, and ETL methods.
Masking PII fields
There are a variety of methods to mask PII fields depending on vendor capabilities:
- CIAM masking if available from vendor
- API gateway masking if available from vendor
- Serverless function with custom masking
- Data mask proxy server
A data mask proxy server is a convenient system to fully centralize masking capabilities, and create new fields that operationalize data without exposing PII, for example converting an age integer field into an “Over18” boolean.
A new set of companies have introduced services that combine the governance and field management features of an identity server with extensive field masking. However, since these systems do not offer the authentication and authorization features of a CIAM, it’s important to balance the additional features since they introduce an additional threat surface with PII storage and permissions.
Centralizing PII in an identity system without cross-border data flows
It is increasingly cumbersome to transfer customer PII across borders due to regulatory restrictions and the compliance burden. One cumbersome option is to operate a separate identity system in each required country or region and federate the systems together.
InCountry’s solution adds Profile Data Residency in countries around the world with secure and compliant infrastructure. The solution offers full data sovereignty and isolation by proxying login and registration dialogs, SCIM 2.0 and custom API calls, and email and SMS MFA support.
Emerging AI systems increase the criticality of centralizing PII in an identity system and governing PII data distribution to other systems. Data should be automatically anonymized before training AI models, lifecycle management should automatically remove users from AI systems, and PII needs to be tightly filtered from LLM interactions.
In conclusion, PII sprawl is an increasing liability for companies. The most secure, compliant, and flexible central data store to manage PII is the existing CIAM and API Gateway infrastructure that enterprises have already deployed and can now leverage to control all PII.