Your privacy is important to us, privacy policy.
Prioritizing a vault for customer data protection is crucial. In this post we explain the rationale behind this decision, it's essential to delve into the reasons why a vault is a must-have component in today’s cloud architectures.
Every data type has a home
Vaults vs. Databases
While vaults and databases share the common purpose of data storage, their objectives differ. Databases prioritize making the data as accessible as possible, to support any business need. Vaults are built to support limited business needs, but with the emphasis on reducing access and exposure to a minimum. In fact, vaults specialize in secure data protection, whereas databases excel in comprehensive data management and accessibility.
Vaults are built with a fortress mentality, focusing on keeping data secure and inaccessible to unauthorized parties. Databases, on the other hand, are designed to be data repositories that prioritize speed and usability.
KMS vs Vault
Similar to how production environment secrets are stored in a secret manager such as AWS SecretS Manager or Hashicorp's Vault or even an SSM, encryption keys can be stored in a key management system (KMS). Note that there’s a difference between vault for secrets and vault for customer data as different database technologies serve different purposes. Read more about the differences between AWS KMS and Piiano Vault. Using secret managers is very pricey and doesn’t make sense when you store customer’s secrets (not your own production environment stuff).
Choosing a database
When selecting a database engine, various options are available, including most common SQL structured DB, document DB, graph DB, time series DB, big data DB, and vector DB for AI. Each of these options offers distinct advantages tailored to specific use cases. However, it's crucial to recognize that the decision is primarily driven by technical requirements.
R&D teams typically consider a few key metrics when selecting their tech stack and database technology. They prioritize their decision-making process by asking the following questions:
- What is the specific use case at hand?
- Which database technology offers the best results and ease of use for the nature of my data?
- What are the maintenance requirements and scaling considerations for the chosen database?
By evaluating these factors, R&D teams can make informed decisions that align with their project requirements and long-term goals.
Sensitive Data Security
When using a data vault we prioritize security above anything else!
Our primary objective remains to simplify the developer's tasks at protecting and managing data. We aim to enhance their productivity achieving data protection rather than competing with other databases, recognizing their great capabilities.
While it's widely acknowledged that not all data is created equal, this fact holds profound significance. It underscores the critical need to prioritize specific data types for protection within a vault. Key data categories that require extra safeguarding include Personally Identifiable Information (PII, SSN), Payment Card Industry Data (PCI), Protected Health Information (PHI), Know Your Customer (KYC) data, Automated Clearing House (ACH) data, customer secrets (like third-party service credential - access customer’s Slack channel), and any information deemed confidential or sensitive by the specific business line.
Through this approach, we establish a clear distinction between the appropriate data types for a vault and a database. The rationale behind prioritizing security for the most sensitive information of our customers should be evident.
Deidentified data is safer
Here’s why: when PII columns like name, phone, email, address, and other unique identifiers are stored alongside application-specific columns in a database, a data breach can result in significant privacy damage for customers. To mitigate this risk, it is crucial to segregate PII-related columns into a separate vault, making it more challenging to access them in specific scenarios.
By doing so, even if the database is stolen, customer records will not contain PII, rendering them de-identified and difficult to link back to individuals, potentially avoiding fines and minimizing harm to customers!
The triple data protection technique
Given the sensitive nature of social security numbers (SSN) and the relatively ease of identity fraud in the USA, we see that common knowledge dictates it is essential to encrypt it. However, all PII columns should be encrypted to raise the bar against data theft. Mainstream privacy regulations like GDPR and CCPA are here to push the industry toward more data protection for a good reason - to fight the rising data breach events.
For example, in accordance with the PCI-DSS security standard, it is a mandatory requirement to encrypt and scope payment information (credit card numbers). The triple data protection technique is one of the best ways to harden any type of data:
- Field level encryption - Any data stored by the vault is first encrypted in runtime before reaching the database.
- Data tokenization - Replacing sensitive data with non-sensitive unique value that is risk-free. It requires extra permissions to access the original data. The tokens table is also fully encrypted (#1).
- Isolation - The vault is the only mechanism through which you access the data, given you have the right permissions and context. Stored data is segregated and separated from anything else in your environment.
It saved the day for Capital One’s data breach in 2019 in what they called ‘post-compromise mitigation technique’ when a database with around 100M customer records including tokenized SSNs was stolen.
Therefore, the most effective way to mitigate the risks associated with collecting and storing sensitive customer data is leveraging the power of this triplet. Which slowly gets adopted for other sensitive data types too. And vault encourages developers working this way.
Database disadvantages when it comes to storing sensitive data, or where vault shines:
But first things first, we start with the reasons why databases are not a good fit for storing sensitive customer data.
1. Storing data in plaintext within a database significantly increases its susceptibility to unauthorized access and compromise. Malicious actors can readily exploit vulnerabilities to exfiltrate sensitive information, leading to severe data breaches. Learn more about why data encryption at rest is useless in cloud environments.
2. Databases are optimized for quick and easy access to data. For example, SQL Injections are natural to string concatenation for building query strings confusing the SQL command with unfiltered malicious user input. Dumping all data with a where clause of ‘1=1’ is so old school, and today pagination is a must. Or ‘select *’ that returns all fields in a row might return excessive sensitive information without differentiating the sensitivity of the columns involved. Vault promotes secure usage behavior for its APIs.
3. A compromised database often means unrestricted access to all data. In contrast, vaults typically have additional safeguards, preventing unauthorized data extraction even if credentials are stolen. For instance, vaults often restrict simple "select all" commands, adding an extra layer of security. See more below about data access…
4. SQL databases often struggle with privacy, security, and access control. They are susceptible to threats like SQL injection and unauthorized data access. Managing complex access controls (ACLs) can also be challenging. Vault offers a more secure alternative by preventing these issues. It eliminates vulnerabilities like SQL injection and cross-tenant data snooping. Additionally, Vault provides granular control over data access, making it impossible to extract entire datasets with a single command.
5. Databases typically don't record detailed activity logs. Unlike vaults, which provide a complete history of actions, databases often only keep basic logs. Even when logs exist, they usually lack specifics about data accessed or modified, making it difficult to identify affected individuals in case of a breach. Additionally, efforts to understand which specific people might be impacted is challenging to impossible when doing a breach post-mortem analysis.
6. A backend service with the permissions to access a database can be compromised and eventually let all data be dumped from a database. This is an implicit trust between the database and the application and is considered an unsafe design.
7. Databases usually store exactly what the user asks for, however in today’s privacy regulated world, we need more metadata stored alongside, such as the semantic data type of the data we store (unlike varchar or strings that don’t mean anything and now all companies try to label and tag their own data), also record the reason for the data access that can help with consent management, or other data like time of access per record, time retention tracking data, etc.
8. With a database you can only fetch the data. Technically, you can run some user defined procedures to do local data processing within the database. With a vault you can use a proxy to pass on sensitive data (to keep you out of scope) or use tokens instead of the data.
Why vault is not yet another database with security marketing fluff
Vault takes responsibility for protecting data at various levels of the tech stack, at the network, API and storage levels:
A vault is a secure database with lots of security primitives that actually thwarts data theft and privacy controls incorporated, designed from scratch for developers for ease of use and productivity.
Having a management console that tracks and shows data access and helps configuring everything is a great starting point, but let’s see what makes vaults so powerful, making attackers miserable.
One of the main strengths of a vault is the ability to block attackers even if they managed to breach your cloud environment. Raising the bar by protecting data directly, Vault's threat modeling assumes a breached environment.
Encryption as a first-class citizen
Let us handle data encryption for you.
- Vault eliminates manual encryption key and certificate management.
- Vault APIs don’t bloat your code or client-side calls to the database when using encrypted data. Like SQL requires you to change your queries to support encryption and it complicates even simple accesses.
- Just define your data schema and Vault will transparently operate data encryption at the field-level behind the scenes.
- Vault lets you rotate keys with the call of an API.
- Vault ensures the security of encryption keys by never letting them out from its secure environment. All keys are isolated and you never get to see them, making the system more secure.
- Unlike traditional databases where you must implement BYOK separately, Vault natively supports a simple form of BYOK, making it easier to integrate.
- Vault prevents unauthorized data access even if the server’s api-key is compromised. Vault employs a zero-trust mechanism that returns data only upon end-user impersonation, significantly enhancing security against cross-tenant data attacks (IDOR, BOLA) and compromised web apps!
- Vault incorporates anti-tampering measures to safeguard data integrity. If an attacker changes the encrypted data in the underlying storage, it will be detected.
- For self-hosted vaults, integration with your cloud’s KMS is necessary for securing the data encryption keys. Employing KEKs and DEKs and securing them with the KMS.
Data access
- Vault offers data access throttling to prevent excessive requests, thus reducing risk with privileged users that can access all data.
- Vault's IP whitelisting capability blocks unauthorized IP addresses from accessing the system, making it more difficult for attackers to gain entry. Useful in cases where your web app’s servers work exclusively and directly with a vault.
- Vault does not allow direct data access to its underlying back store (which anyway, everything is encrypted there). Instead, it supports a "break the glass" approach for diagnostics during emergencies.
- Vault provides highly granular access controls through features like RBAC and ABAC. It also allows you to customize your own data access policies by running JavaScript code to condition access.
- You can leverage JavaScript code to perform unlimited data transformations when retrieving data from Vault, allowing for flexible data manipulation and customization (e.g. your own data masking and data tokenization formats).
- In some cases, like PCI-DSS, where you need to use a credit card number, but you’re not allowed to access it directly, because of scoping and compliance or security issues. You would use a proxy server to pass requests along with sensitive data to another server. Vault supports that for any type of data and approved third party servers. In this example, it can be third party secrets that you don’t want to read back to your webserver, to reduce exposure.
Data management and privacy
Vault's Privacy-Aware Data Management
- Vault supports privacy-aware data management models, eliminating the need for manual data categorization and labeling in the future. Semantic data types, data masking, and tokenization are built-in features. Even a new tokenization type that can point to the latest version of the stored data record (just like pointers in programming languages).
- Vault supports consent management, by recording access reasons as optional params in all data APIs. This data can later be used to either allow or block access to specific fields or do other manipulations.
Unprecedented Granularity in Vault's IAM
- Vault's IAM provides unparalleled granularity, enabling users to search for data without being able to read it. This is particularly useful when looking up records based on sensitive fields like Social Security Numbers (SSNs). Additionally, Vault can block the listing of data from a collection, preventing unauthorized dumping. It also allows for the retrieval of masked data only and grants permissions to access data based on its sensitivity level, ensuring that users with low privileges cannot access highly sensitive data within the same collection.
- Vault's Identity and Access Management (IAM) allows for the creation of users with different roles out of the box, such as CICD, Admin, WebApp, client-side and many more.
Privacy Compliance and Data Management Functionality
- Vault supports privacy compliance features such as Data Subject Access Rights (DSAR) APIs.
- Data retention policies with full object life cycle management out of the box. It can automatically archive or delete data, eliminating the need for manual iteration over related data.
- Vault can delete (RTBF) associated data over various collections for a specific person through a single API call, streamlining the process.
- Vault supports a data hierarchy that will save you development time. The highest level is tenants (for example for B2B2C companies), and it’s enforced at the API level, helping to have a better isolation and avoid cross tenant attacks.
- Vault supports storing files too and all data management applies to them automatically.
Direct Client-Side Object Manipulation
- Vault enables direct client-side (end user) object manipulation, saving time and effort with frontend integrations. It allows for the storage of customer data directly from the client side, enhancing the security of mobile and web applications.
- Vault supports embedding secure HTML iframes and forms for collecting data directly from the end user.
- Vault eliminates the need for PII or PCI data to touch your backend, providing an additional layer of protection by reducing the scope of data.
Scaling data security at the org level
- Vault offers application-level data security, allowing it to work with any data store and tech stack, simplifying data security across multiple R&D teams and database technologies. Eliminating the need to build security per database.
- Vault enables decoupling of data access policy from the code itself, so you don’t need to release a new version of your application if you want to change some policies. Allowing AppSec teams have more control over sensitive data.
- Vault automatically raises developers' awareness, making them mind how and where to store sensitive data securely. Over time it helps avoiding sensitive data drift and developers get used to the idea that sensitive data must be isolated and more protected.
- Vault acts as a critical data defense layer, enhancing the security of your applications and reducing data theft. Its use of secure infrastructure leads to increased security for your customers.
- Vault helps to confine sensitive data to specific scopes, such as PCI or PII, while projecting high-security standards on the rest of the architecture.
- In large organizations where different R&D teams choose their tech stack, it can be challenging for the app-sec team to secure data access and maintain visibility. Vault can serve as a secure data access layer (SDAL) by incorporating encryption and tokenization, acting as a virtual layer.
- Protecting data can go beyond your transactional databases, including your data analytics environments where you won’t have to compromise privacy if you use tokens instead of PII.
Conclusion
In this enlightening post, we explored the intricate distinctions between a vault and a database, unraveling their unique characteristics and capabilities. We listed many differences between the technologies, as well as advantages and disadvantages of databases.
We emphasized how vaults represent a cutting-edge technology that empowers developers with the ability to safeguard customer data effectively. We explained that vaults go beyond the traditional limitations of databases by offering enhanced security and privacy features. Which enables developers to build applications that can handle sensitive data with confidence, knowing that it is protected by robust security measures.
We even discussed a few points that go beyond technology, that show how a vault can be useful in the process of scaling data security across the organization and helping SDLC take charge over data protection.
Building secure and private applications was never easier.
It all begins with the cloud, where applications are accessible to everyone. Therefore, a user or an attacker makes no difference per se. Technically, encrypting all data at rest and in transit might seem like a comprehensive approach, but these methods are not enough anymore. For cloud hosted applications, data-at-rest encryption does not provide the coverage one might expect.
Senior Product Owner