Your privacy is important to us, privacy policy.
Following our previous post on column-level encryption, this post explores several implementation approaches and discusses their advantages and disadvantages.
To start, we introduce a simple example. We then look at how to implement manual encryption, add automation and encapsulation with a property, use an encryption library, and explore alternatives such as using a proxy and database-level encryption. We then review the security implications of these approaches before, finally, showing how the Piiano Django ORM integration offers a simple and robust solution.
Looking to start right away? Create your own Vault account now, our APIs are language and database agnostic!
Example application
To aid this discussion, let’s create a simple Django application and a trivial model for it: Person, with two fields: name and SSN (social security number - the national ID number in the USA). Let’s say we want to encrypt the SSN field. Here is a very naive and simplified implementation without encryption.
models.py:
views.py:
Note: A proper Django implementation is likely to use forms and class-based or generic model views. For brevity, we avoid these for now.
Approach 1 – Manual encryption and decryption
For this approach, we use Python’s cryptography.fernet as the encryption library.
This approach is the most straightforward one to implement but the hardest to maintain. We include it here for completeness, as it’s usually not a practical approach.
Whenever Django views access the ssn field, they encrypt before storing and decrypt after retrieving.
We also need to consider the acquisition and use of the encryption key. One approach is storing the key as an environment variable and reading it on process load. A better approach is to use a Secrets Manager System, such as AWS Secrets Manager, and read the key from there. It’s possible to read the key from the Secrets Manager on load or every time it’s used. Preferably, use a cache to reduce the calls to the Secrets Manager.
However, there is more to key management. If we require key rotation, and we usually would, then MultiFernet should be used to encrypt the field. When decrypting the field, we must provide the previous keys and rotate them as necessary.
The main disadvantage of this approach is that all access to the ssn column needs to account for encryption and, if relevant, key rotation. This approach leaves more room for errors and makes for much non-don’t-repeat-yourself (DRY) code.
This example assumes that the encryption key is read from an environment variable and stored in the Django settings file.
Approach 2 – Automation and encapsulation with a property
To improve on the previous approach, we create a property called ssn, with the functions set_ssn and get_ssn, that stores and reads data to and from the encrypted_ssn column.
To prevent frequent decryptions of the same data, we can also store an additional member called decrypted_ssn as a cache for the ssn column. (Not shown in the example code)
This approach improves significantly over the previous one. The code accessing the ssn column doesn’t need to consider encryption, making it transparent. We can encapsulate key rotation inside the Person class or, even better, inside the encrypt() and decrypt() functions. That encapsulation means that the code in views.py is identical to the original without encryption and doesn’t need to change.
However, this approach is far from ideal. Encrypting another field repeats boilerplate code, and we also miss some features such as batching or passing additional parameters to decrypt() (e.g. to only get a masked version of the SSN). Supporting that would require a context manager and storing the desired transformation in a context variable. Here is an example use of such a context manager:
Approach 3 – Using an encryption library
To prevent code duplication, when specifying a property as encrypted, the work we did previously should apply automatically to the new encrypted field.
There are several libraries to achieve this for many languages and platforms. For example, the Django library is django-encrypted-model-fields.
The advantages of using the library are that there’s little we need to implement apart from specifying which fields to encrypt. This approach also delivers a high level of automation and encapsulation.
Depending on the library used, the library can fetch the keys. Otherwise, we must add code to fetch the keys from the Secret Manager.
Approach 4 – Using a proxy
Moving away from code changes, a network proxy is an alternative approach to field-level encryption. Notable examples include Evervault, Satori, and Fortanix. Some proxies, such as Evervault, are deployed between the browser and the backend of an application, guaranteeing that the backend only sees an encrypted version of the data. Other proxies, such as Satori, are deployed between the backend and database, making all encryption transparent to the backend.
There are advantages to using a proxy, especially regarding centralization and simplification of an app. There are also disadvantages, as getting the application traffic to flow through a proxy can add latency, a single point of failure, and a scalability problem. And normally, a proxy means another team (not the app developers) has to maintain it. This team may not always be in sync with the engineering effort and know what’s going on.
Also, a proxy dependency makes local testing and testing in the CI environment harder, as it’s another component to test. If it’s not tested, another difference between production and testing is introduced.
Approach 5 – Using a database plug-in
We can use cryptographic functions in a database using, for example, PostgreSQL's pgcrypto module. A plug-in like this can help implement field-level encryption.
To use pgcrypto, first, we need to install the extension into PostgreSQL. This might be as straightforward as running a SQL command to create the extension. However, it may be more complex when using an RDS or setting up the database as part of CI/CD workflow.
Once installed, we could use pgcrypto for field-level encryption in SQL queries, although that would be cumbersome, especially when using an ORM. Continuing our example, we use a library, such as django-pgcrypto-fields, to encrypt and decrypt values as needed. This approach is very similar to using a library. However, instead of the backend doing the encryption, the database does the work, and the library takes care of the differences in SQL queries.
This code is very similar to the previous approach of using a library to encrypt fields in the backend. When implemented using SQL queries, we need significant code changes to support field-level encryption within the application code to implement key management. Also, we must integrate the infrastructure changes to ensure that the database has pgcrypto installed and enabled.
Finally, the additional security gained from this approach is limited, as it doesn’t protect against someone gaining access to the database.
Security implications
So far, we’ve not discussed the challenges of working with cryptography libraries in applications. Aside from complicating the code, there are some issues to consider:
- Key Access – The application needs to access a KMS to fetch the right key
- Key Distribution – if an application is broken down into microservices, each service needs to access the key, and the key must be fetched securely.
- Key Rotation – The requirement to meet today's security standards. This complicates code, backups, anti-tampering, and queries.
- Key Compromise – Each micro service can be compromised and leak the encryption key.
- Searchability – When we encrypt data in a safe manner (without leaking information through indexing), there’s no good way to search it.
- App Level Attacks - The SQL code can be susceptible to SQL-injection and IDOR attacks. See the post OWASP Top 10 Vulnerabilities – A Guide for Pen-Testers & Bug Bounty Hunters for more information.
- Logs – Databases in production rarely record access to the data. If there’s a breach, we can be in a situation where we don’t know what happened.
The better approach – Using Piiano Vault’s Django ORM Integration
Piiano recently released Django ORM integration (along with similar integrations for Hibernate for Java and TypeORM for TypeScript). The Django ORM integration encrypts and decrypts values in a transparent way, similar to using a library. In fact, the resulting code is almost identical to the basic approach used to illustrate this post.
With Piiano Vault, you don’t need to worry about key management, your app being compromised, or adding code complexity to deal with the encryption intricacies. You use the ORM and annotate your fields.
Piiano Vault, an infrastructure for the protection of sensitive customer data, is built to make your life easy. It mitigates all the security implications we've discussed and decouples code from messing with encryption, keeping it readable and focused on data operations.
The biggest advantage of using the Piiano integration is that, unlike pgcrypto for example, it gives your organization a centralized way to control sensitive data access. It also records all data access to logs, so if you ever want to do forensics, you have all the information needed.
There are several advantages to using Vault’s ORM integration:
- It requires very few code changes, making it almost platform independent.
- It supports several encryption types, such as deterministic encryption (standard encryption) and randomized encryption, where two identical values can yield different ciphertexts.
- It enables you to specify the semantic data type for object properties, for example, SSN, address, or phone number, rather than just specifying a property as a string or needing to comply with a required format. Once specified, the data type unlocks more powerful features.
- It can mask based on the data type and use. For example, masking SSNs in some cases and not others.
- You can search encrypted data without breaking the security of the encryption (for deterministic encryption and only for exact matches).
- It enables you to manage permissions with a broad range of granularity, from very detailed to highly generalized control levels:
- You can set a data type to always be protected. For example, when SSN is returned as part of an analytics query, it can always be masked.
- You can take context into account, such as the reason for the query: is it for analytics, marketing, or app functionality?
For example, you can set your system up so addresses for people are masked unless strictly necessary, while business addresses are never blocked. - It matches the use patterns of your application, such that batches can be decrypted using an API call instead of an API call per row.
Finally, remember that achieving privacy and security requires careful analysis and planning. Our hope is that with this post, you are better prepared to implement both in your project.
It all begins with the cloud, where applications are accessible to everyone. Therefore, a user or an attacker makes no difference per se. Technically, encrypting all data at rest and in transit might seem like a comprehensive approach, but these methods are not enough anymore. For cloud hosted applications, data-at-rest encryption does not provide the coverage one might expect.
Senior Product Owner