What is a Digital Footprint
As individuals, our digital footprint starts when we open our eyes, with sensitive data such as our DOB, full name, and birth certificate stored in the records of a healthcare system we aren’t yet aware of.
Over the years, our digital footprint continues to expand to include education, social media, employment, online services, medical records, financial information, and much more. It exists and grows long before we can even grasp the concept of trust, let alone vet the organizations handling this sensitive data. As our privacy awareness increases, we start to think about how our information is handled and who should be held accountable in the event of a breach involving compromised sensitive data.
In this article, we explore the business view of the PII (Personal Identifiable Information) footprint, how it relates to the individual PII footprint mentioned above, and how to make the footprint more manageable and simplify the effort of complying with privacy laws.
As with individuals, businesses and institutions’ digital PII footprint starts upon inception, with the company’s founders and its first employees. Their personal data is stored on the company’s systems. As the business grows and additional employees join, the human resources (HR) PII footprint starts growing.
The organization’s PII footprint grows further as the investors’, vendors’, and partners’ information is accumulated in the organization’s systems. While these additions are routine, everything changes once customers’ personal information is collected to facilitate business needs, as it is highly regulated by evolving privacy laws. For years, customers’ PII has been collected without consent, but with the global growth in privacy awareness and regulations, that practice is becoming unacceptable and comes at the price of a hefty fine.
Whether it involves fines or not, businesses have a legal and moral obligation to protect PII, regardless of whom it belongs to. They are entrusted with sensitive information and expected to protect it.
Why Organizations Need To Know What Their Digital PII Footprint Is
We live in the “Data Age” with data scientists, data analysts, and ML/AI algorithms working around the clock to analyze insurmountable quantities of data, including sensitive information. Data specialists are constantly monitoring data and making sense of it to facilitate business goals. This means that employees have access to sensitive data, including information that may identify data subjects, creating a far-from-ideal privacy situation.
As a business evolves and its customer base grows, its customer PII footprint grows too. This can be beneficial but also challenging. On one hand, more information is available for analysis, and on the other hand, challenges arise related to protecting and controlling PII compounds.
Below are some of the questions that organizations should be able to answer continuously to ensure PII security at any point in time:
- What do I need to protect? Where is the PII at rest, in transit, and in use?
- Who needs to access the data? Internal and external access needs
- Is the data protected? The PII data protection strategy must be continuously validated.
- What are privacy laws relevant to me?
“What do I need to protect?” may seem like a simple question, but it’s a significant undertaking for most organizations. As such, it shouldn’t be surprising that sensitive data discovery is one of the fastest-growing domains in the privacy market (estimated to reach a market size of $12.4 billion by 2026).
The customer PII footprint in organizations can include:
- Production and non-production databases, data lakes, and data warehouses
- Structured data – Data that fits a predefined data model. It can be easily mapped into designated fields and is easily searchable (e.g., personal data in relational databases: name, address, SSN, etc.)
- Unstructured data – Data that doesn’t have a predefined data model, so it’s not as easily categorized into the predefined tables and rows of a relational database (e.g., satellite imagery, audio files, video files, or even emails and documents)
- Operational footprint
- Databases with customers’ data
- Log files
- On-prem/edge/cloud infrastructure
- Data that was sent to or received from 3rd-party vendors via APIs (e.g., SaaS)
- Data warehouse for analytics use
- Organizational footprint
- Document repositories
PII Footprint and Privacy Regulations
While this article doesn’t take a deep dive into privacy regulations, it is still important to keep regulatory requirements in mind when choosing how to protect your PII footprint. The topic is covered by the GDPR, particularly in articles 25 and 32 described below:
Article 25: Data protection by design and by default – Sets the stage for companies to consider data privacy and protection in all aspects of their business, including product development and operations, to render their services.
To observe the obligations of Article 25, companies must incorporate principles like data minimization and measures like pseudonymization designed to protect personal data.
In obligation to Article 25, companies must collect only necessary personal data.
Article 32 – Security of processing – Requires controllers and processors to implement measures that ensure an appropriate level of security.
Every legal requirement in Articles 25 and 32 mentioned above requires full visibility of customer PII across the organization. Not having full visibility can increase the risk of lack of compliance and compromise the ability to protect all the information properly.
GDPR User Rights
In addition to the above, the GDPR includes eight user rights. Not having a clear view of the users’ PII footprint within the organization systems makes facilitating these user rights a manual, tedious, lengthy, and costly process:
- The Right to Information
- The Right of Access
- The Right to Rectification
- The Right to Erasure (RTBF)
- The Right to Restriction of Processing
- The Right to Data Portability
- The Right to Object
- The Right to Avoid Automated Decision-Making
An individual (data subject) can submit a request to exercise one or more of those rights by filing a data subject access request (DSAR). For example, an individual can submit a DSAR to find out what information a company has collected on them.
How To Reduce Digital PII Footprint
This principle is known as data minimization. Businesses should collect only what they need with the clear purpose of facilitating their service. Unfortunately, ‘collecting less’ has its challenges as it requires having visibility into what is collected and where (in the application) it is stored. In addition, collecting less might not be feasible if it has a negative business impact.
If data minimization doesn’t work for your organization, here are a few other techniques you can implement:
Data retention optimization
Ensure minimal personal information retention time: Identify the optimal data retention period that supports business needs, and ensure compliance with retention-related regulations and internal/external audit needs.
Implement robust time-to-live (TTL) functionality in the system and support customer right-to-be-forgotten (RTBF) requests.
Implement privacy-enhancing technologies (PET)
Privacy-enhancing technologies embody fundamental data protection principles by minimizing personal data use, maximizing data security, and empowering individuals. GDPR Article 25(1) recommends data pseudonymization be implemented by organizations to reduce the PII footprint.
What is data pseudonymization?
Pseudonymization is when all sensitive information is replaced with pseudonyms/aliases or tokens. It is a reversible process that de-identifies data but allows for re-identification if necessary. More information can be found in GDPR Article 4(5).
When the data is pseudonymized, there is a much lower chance of exposing personal data, even to internal employees such as data scientists, since it makes sensitive information unidentifiable. Pseudonymization is a well-known data management technique highly recommended by GDPR.
The principle of “data minimization,” mentioned above, requires limiting the collection of personal information only to what is directly relevant and necessary to accomplish a specific business goal. To accomplish this, there has to be a clear traceability mapping of business requirements to sensitive information being collected across all the organization applications.
Implementing privacy-by-design will incorporate privacy at the earliest possible stages in your software development cycle. Adopting this strategy, also known as the shift-left strategy, minimizes where PII resides in your system, in essence, enabling better control of your PII footprint by design. Below are some use cases in which incorporating privacy by design can lead to a smaller PII footprint:
- By masking or obfuscating PII appearances in log files
- Minimizing PII sharing via API calls
- Implementing tokenization – to reduce direct usage of sensitive data as much as possible. Refers to the pseudonymization process by replacing sensitive information with tokens generated from a token system. Implementing tokenization requires a lot of up-front design.
- Ability to support automated DSAR/RTBF requests
Training & Awareness
The human element is part of every cyber defense strategy and should be part of every privacy risk mitigation strategy. Whether it involves providing general privacy training for all employees or targeted training for data scientists, data analysts, and DevOps teams, a privacy-aware organization is more capable of mitigating and preventing privacy exposures.
For example: If employees think twice before including sensitive information in emails/documents or developers evaluate more PET solutions, the PII footprint can be reduced and better controlled.
Privacy laws in the “Data Age” are complex and challenging. The information organizations need to process and handle to support growing business needs has increased while new privacy laws emerge and are constantly amended. It is easy to lose focus on what laws apply currently while keeping an eye on new privacy laws waiting around the bend.
The suggestions in this article are the result of interviews with many leading global organizations. PII centralization is a growing trend and necessity. Implementing special data repositories that focus on PII as part of the organization’s defense strategy and, in essence, controlling the PII footprint by centralization can ensure your data remains private.
Organizations must maintain near real-time visibility of their digital PII footprint. It is required for privacy compliance, security, and ensuring the company is trustworthy.
How Piiano Can Help
Piiano is designed and built on the premise of PII footprint reduction as a method to mitigate risk and increase control.
Both our products are complementary and can assist in reducing the PII footprint:
PII Vault – Centralizing PII in the vault increases visibility and enables maximum control, while the tokenization and encryption functionality can reduce the PII storage in databases. Incorporating a PII Vault by design can support the organization’s shift-left strategy and Privacy-by-design use cases.
Privacy code scanner – The Piiano code scanner scans the organization’s source code and locates where PII is being processed. This helps facilitate data minimization and provides organizations with a clear understanding of how their system processes PII.