You agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Piiano Vault Architecture – 60K RPS, 300 million rows, low latency w/PostgreSQL

Security
Table of content:
Join our newsletter

Your privacy is important to us, privacy policy.

Piiano Vault is a data protection service that stores, tokenizes, and encrypts sensitive data. Unlike a traditional database, which is designed to make data accessible, we created Vault to encrypt and lock down data. Users can access and retrieve data only according to strict granular access controls, with all access logged and audited.

We also designed Valut to be performant and scalable and have benchmarked it to 60k requests per second.

In this article, we delve into the architecture behind Piiano Vault and its innovative use of a database and level one cache to resolve cache inconsistency among multiple containers effectively. The architecture uses the principle of separating control and data functionality to support enhanced scalability. The design is particularly effective for high throughput, high data volume, and low latency applications, offering a flexible and efficient solution.

The requirements

In creating Piiano Vault, we considered these requirements:

  1. Dynamic customer schemas and data that aren’t known in advance, similar to any SQL database.
  2. Data atomicity in any API call.
  3. Low latency for data requests (e.g., a few milliseconds) in operational use cases without compromising high throughput for analytical use cases.
  4. The ability to easily scale up and out.
  5. Maximum uptime and robustness against failures. Even with a failure, the system must self-protect against data corruption.
  6. Fast bulk operations, such as inserting many objects or tokenizing many records in one API call.
  7. Reusing a trusted and proven SQL engine as the underlying data storage layer.
  8. All data is strongly encrypted by default.
  9. Easily deployed anywhere: any cloud, any on-prem environment.

The architecture challenges

Compared to purpose-specific applications, Piiano Vault faces a unique challenge.

Typically, applications use a static database schema that the developer optimizes for their use case. To be flexible and enable dynamic table creation—adding or removing columns based on the user’s requirements, even in runtime—Piiano Vault takes a different approach. 

This approach is technically challenging as Vault needs to save and retrieve data while the data schema changes. Achieving this is particularly difficult when there are scale and low latency requirements. While a schema change is underway, we can't have the user experience undefined behavior or corruption. Consistency needs to be restored as soon as possible.

Therefore, our challenge was implementing a system that scales horizontally to easily support tens of thousands of requests per second yet handle data generically and provide for schema changes. And do this all while also field-encrypting every item of data.

Database selection

Vault is not a transparent data security solution or a proxy to an existing database. Vault is a standalone secure storage device using PostgreSQL for data persistence.

We selected PostgreSQL as the underlying storage device because:

  1. We didn’t want to create a persistent storage solution (which would have meant more code and more bugs, and we had enough work to do on upper layers anyway.)
  2. It’s free, popular, battle-tested, and has a commercial-friendly open-source license.
  3. All cloud providers support it.
  4. It provides for backups and high availability.
  5. It makes it easier to gain customer trust when used as a building block.

The components of Piiano Vault

At the core of Piiano Vault are two internal components

  • Control that manages the collection schemas and overall configuration of Piiano Vault
  • Data that facilitates the creation, updating, and accessing of data stored within Piiano Vault

This split between control and data planes is a common architecture. It enables Piiano Vault to optimize for two different use cases: data operations that must be low latency and do not require complex transactions, compared with the slower control operations that may involve multiple SQL tables and need a transaction concept to keep the structure of the underlying data consistent.

This split has several operational advantages as it allows for:

  • Optimizing for fast data operations without considering the slower and consistent control transactions.
  • Independent scaling of each service, enabling the data service to be scaled up without the overhead of the control service.
  • Network segregation of the control operations. Then, control operations can be limited to a narrower, trusted network.

The Control component handles infrequent operations. It plays a critical role in interfacing with the underlying database to perform Data Definition Language (DDL) operations, enabling dynamic schema modifications. This flexibility allows applications to tailor data collections to meet their requirements.

Both components are accessible using HTTP REST services and can run independently, although they are connected and communicate with each other to synchronize control data.

The Vault architecture showing the interaction of the data service and control service through the database
The Vault architecture provides two entry points for APIs, and the two components interact with each other only through the database.

The only intersection between the state of Control and Data components is in the data schema and encryption keys (given you might rotate them too). This means we need a mechanism to ensure that all instances are always synchronized.

Diagram showing how an API request to update a collection schema goes through the control service and affects the database operation through the schema stored in the database.
The intersection between the state of Control and Data components through the data schema.

Data schema implementation

In addition to the split, we implemented a method of persisting the user’s schema using the PostgreSQL building blocks. 

However, we chose to ditch SQL as the interface for our users. While popular, SQL isn’t friendly to privacy or security applications. Instead, we provide RESTful APIs. This approach means that a customer’s “collection” is mapped to a PostgreSQL table, a user's index on a specific property is based on a PostgreSQL index, and so on. When a data operation occurs, Vault translates from its APIs to the PostgreSQL APIs (aka SQL). This mechanism also means we have more control over the handling of collections and their data.

The logic behind persisting with the user’s schema using PostgreSQL is that: 

  1. Updates using the Control component are processed like database transactions, safeguarding atomicity and integrity.
  2. We can optimize performance using the PostgreSQL constructs.
  3. The Data component can rely solely on the database state of the control structures (like the user's schema).
  4. No other communication channels are needed between the Data and Control components, simplifying the implementation (e.g., broadcasting to update cache).
  5. The database structure is self-explanatory and isn't a black box on a self-hosted installation. Understanding the structure makes debugging, maintenance, and DevOps work much easier.
  6. As PostgreSQL develops and gains new features, adopting them in Piiano Vault will be easier.

Data component design

We engineered the Data component for high throughput with a stateless design and schema-only in-memory cache that can handle a vast number of requests. 

For example, under a properly configured test rig, without batching, Piiano Vault supports:

  • Hundreds of millions of records (200-300 million).
  • Tens of thousands of requests per second.
  • Latency below 10ms for most operations.

See the Benchmarking section for more details.

You can spin up multiple instances of Vault (as a container) on the same physical machine or different machines. The architecture doesn’t assume anything about the number of Vault instances or their placement.

This design of the Data component enables seamless scaling to manage varying traffic loads without the complications of synchronizing the global state between many instances. Using physical tables in PostgreSQL for schema storage, Piiano Vault maximizes query efficiency and data consistency, even under the strain of thousands of queries per second.

As the user’s schema is fixed, data write requests go through the system without any hurdles. A client-side API request to the data plane queries the database once. However, what happens when the schema is changed? There’s no easy way to signal to other instances that a change occurred. This challenge is where things become very interesting.

Diagram showing Vault can scale data and control instances separately, relying on the database to coordinate data and cache.
The Vault architecture showing how it can scale data and control instances separately, relying on the database to coordinate data and cache.

Cache behavior and consistency

Typically, in distributed systems, caching the data requires coordination that complicates the deployment of multiple instances. We wanted to avoid cache invalidation complexities (for a general-purpose vault, we can’t know schema and data use in advance and can’t optimize for them). We also wanted to ensure Piiano Vault was easy to run with a low total cost of ownership (TCO). Therefore, Piiano Vault only caches schema metadata, not the underlying collection data.

Handling consistency within the control layer

To avoid inconsistencies in the control layer, Vault doesn’t cache the schema at the control component, instead, it uses the Serializable transaction isolation level. It is the highest level of transaction isolation in PostgreSQL and provides the strongest guarantee that updates are atomic and consistent. As control operations are few and far between, the performance penalty associated with this isolation level and lack of caching is negligible. Unfortunately, that is not true for the data operations that must use a cached schema and a lower isolation level (Repeatable Read).

Handling consistency between control and data

Before talking about consistency, we need to discuss a critical aspect of scaling Piiano Vault. That involves using the control state’s metadata cache to reduce database interactions.

This cache stores collection schema information in memory, enhancing performance by minimizing the database queries required to fetch the collection schema for every operation. The schema contains a “generation number” that is incremented on every schema change. Each Vault instance also periodically checks the cache (every 30 seconds by default), comparing the instance’s “generation number” with the latest generation. If there’s a difference, the new control state is fetched from the database and cached. If there is no difference, the cache keeps the metadata, and nothing is fetched from the database. This is an optimization to avoid pulling the updated schema every poll interval. However, maintaining this cache further complicates the consistency challenge.

For additional performance gains, Piiano Vault makes extensive use of PostgreSQL-prepared statements. One side effect is that if a collection’s schema changes and the metadata cache becomes outdated, the query fails with an error. Piiano Vault captures this error and automatically refreshes the cache before retrying the request.

This handles most cases of an instance’s data operation working on a stale copy of the schema but is not bulletproof. Ensuring consistency requires more work.

Interaction diagram showing how the Data service fulfills a user data request when encountering an out-of-date metadata cache.
The intersection between the Data service and metadata cache when the cache is out of date.

The “ideal” solution

Technically, when a schema is changed and data is written or read to or from the database, we need to know if schema versions are outdated. Therefore, we considered wrapping all the queries to the database with this kind of logic:

if schema1.generationId == $latestGenerationId: insert values (x, y, z) …
Or another example:
if schema1.generationId == $latestGenerationId: select values (x, y, z) …

When there is no result, it is possible to detect that the operation failed due to a recently updated collection schema and trigger a refresh before retrying the original request. However, we discovered some serious drawbacks with this solution:

  1. It doesn’t play nicely with ORMs. It requires that you manually craft all your SQL queries and we have some ORM use that we wanted to keep.
  2. The “no result” conclusion may require additional PostgreSQL IF statements and functions.
  3. This complicates the SQL builder code significantly.
  4. It slows down the data channel on every request.
  5. It makes debugging more difficult and disrupts the use of standard tools that parse SQL queries, such as APMs and AWS performance insights.

The practical solution

We analyzed the situation and concluded that to meet the business requirements and have a manageable product we should consider metadata changes as “eventually consistent”

To support this decision, we analyzed all the scenarios and options that modify the schema and concluded that while errors may be received for conflicting requests, no corruption occurs. At a high level, the logic is supported by only allowing non-conflicting changes on a property. 

To reach a corruption, the property must be interpreted differently by two instances, one with the new schema and one with the old schema. Consider this case, one that can’t happen in Vault: Instance A sees property X as an INTEGER, but instance B has changed it to BOOLEAN. If instance A now adds data, it may result in data corruption.

Vault doesn’t allow such property data type modifications, so the preceding case cannot happen in Vault. In general, only backward compatible changes are allowed. You can add and remove indices or move from a more restrictive UNIQUE constraint to the non-UNIQUE constraint, but not vice versa. All changes to properties follow the design criteria of not breaking compatibility with the current schema.

We found and implemented mitigations for three other potential causes of data corruption due to an out-of-date cache:

  1. Missing resource

A table or a property is dropped and an instance with an out-of-date schema tries to add data to it. In this case, the add operation fails, and the client receives an error message.

  1. Partial missing resource

A property is added to a table and an instance with an out-of-date schema adds an object without that new property. In this case, if the new property:

  • is nullable, the object property is added as null
  • is not nullable, the add operation fails and the client receives an error.
  1. Replacing a resource

A resource (such as a property) is deleted,and then added again before any other instances are synced, e.g., within the (30-second) refresh period. As the Vault API calls use resource names, similar to SQL (select <property name> from <table name>) this scenario has the potential to lead to corruption. To prevent this issue, Vault uses the central list of instances and their generation numbers. Before adding any resource Vault examines that list and rejects a change when the generation numbers of all instances are not identical. So, if all instances are in sync, you can delete a property. Then, when another control operation attempts to add a property of the same name, if all instances aren’t in sync, this operation fails with a 409 (conflict) response averting the issue. The CLI has convenience flags to delay the control operation until it is safe to be performed so the user won’t need to handle the error cases (coming soon).

Summary

We’ve engineered a solution that balances the data operations performance with its eventual consistency and the strict control operations requirement for strong consistency. 

Atomicity is always maintained. Every data operation happens in a transaction. If you send conflicting requests simultaneously, the result is eventually consistent and lands at one of the two requests.

Stateless Mode

Piiano Vault also offers a stateless mode for users focused solely on encryption and decryption services. This mode operates without a centralized database, storing its configuration in memory. The absence of a database enables stateless mode to offer unparalleled scalability and performance speeds. Even without a database, Piiano Vault can still maintain features such as expiration, granular access to properties, and many more. However, it still needs to connect to a KMS to create a key-encryption-key (KEK) and data-encryption-key (DEK).

Benchmarking

Throughout the development of Piiano Vault, we performed regular benchmarking to confirm that we were achieving the performance we knew the product needed. Here, we present the result of a benchmark using Vault 1.8.1 as of writing this post.

We ran the benchmark using the following infrastructure:

  • Platform: AWS, within one VPC in us-east
  • DB: Aurora PostgreSQL 15.2
    Specification: db.r7g.8xlarge - 32 vCPU, 256 GiB RAM
  • Vault server: Vault 1.8.1
    Specification: One EC2 instance, c7g.16xlarge - 64 vCPU, 128 GiB RAM

The test was run on a powerful machine to get to high speeds, using a collection with four properties: SSN, Email, Phone Number, and Zip Code (in addition to the built-in properties). The collection contained 320M objects.

We then ran two tests:

  • Test 1: Queried a random object at approximately 60K requests per second (RPS).
  • Test 2: Across the 320M objects, random object queries were performed at 45K RPS, random object tokenization was performed at 5K RPS, and objects were added at 5K RPS.
    Note: This is approximately 20% "write" operations, which is relatively high compared to normal operations.

Results

Test #1: Average query latency was 3.34 ms, with P95 at 8.29 ms and P99 at 13.15 ms.

The results of the first benchmarking test. The top graph shows the number of requests processed per second, and the bottom graph shows the average, P99, and P95 latency of those requests.
Results of Test #1 showing the RPS and request latency across the test period.

2. Test #2:

  1. Query: 2.83ms average
  2. Tokenize: 14.24ms average
  3. Add Object: 16.71ms average
    (Note: Both tokenization and adding objects are write operations, so they are disk IO-bound.)
The results of the second benchmarking test. A graph shows the number of requests processed per second for the three transactions. Three further graphs show average, P99, and P95 latency for those requests.
Results of Test #2 showing the RPS and request latency across the test period.

Combined CPU utilization for Vault and database hovered around 70% during the tests.

Conclusion

With its distinct separation of control and data planes and control state cache, Piiano Vault's architecture offers a massively scalable and flexible solution for managing data in high-volume settings while avoiding data corruption or undefined behavior issues. Piiano Vault ensures robust performance and scalability while offering dynamic schema management, a stateless data handling model, and an efficient metadata caching mechanism.

Piiano Vault was tested at tens of thousands of requests per second, but its design allows it to handle hundreds of thousands of requests per second.

Whether you need to manage complex data schemas or require high-throughput data access, Piiano Vault's architecture efficiently meets these needs, making it an ideal choice for modern data management challenges.

Share article

Powering Data Protection

Skip PCI compliance with our tokenization APIs

Skip PCI compliance with our tokenization APIs

hey

h2

link2

It all begins with the cloud, where applications are accessible to everyone. Therefore, a user or an attacker makes no difference per se. Technically, encrypting all data at rest and in transit might seem like a comprehensive approach, but these methods are not enough anymore. For cloud hosted applications, data-at-rest encryption does not provide the coverage one might expect.

John Marcus

Senior Product Owner

const protectedForm = 
pvault.createProtectedForm(payment Div, 
secureFormConfig);
Thank you! Your submission has been received!

We care about your data in our privacy policy

Oops! Something went wrong while submitting the form.
Submit