Imri Goldberg
August 1, 2023
Following our previous post on column-level encryption, this post explores several implementation approaches and discusses their advantages and disadvantages.
To start, we introduce a simple example. We then look at how to implement manual encryption, add automation and encapsulation with a property, use an encryption library, and explore alternatives such as using a proxy and database-level encryption. We then review the security implications of these approaches before, finally, showing how the Piiano Django ORM integration offers a simple and robust solution.
To aid this discussion, let’s create a simple Django application and a trivial model for it: Person, with two fields: name and ssn. Let’s say we want to encrypt the ssn field. Here is a very naive and simplified implementation without encryption.
models.py:
views.py:
Note: A proper Django implementation is likely to use forms and class-based or generic model views. For brevity, we avoid these for now.
For this approach, we use Python’s cryptography.fernet as the encryption library.
This approach is the most straightforward one to implement but the hardest to maintain. We include it here for completeness, as it’s usually not a practical approach.
Whenever Django views access the ssn field, they encrypt before storing and decrypt after retrieving.
We also need to consider the acquisition and use of the encryption key. One approach is storing the key as an environment variable and reading it on process load. A better approach is to use a Secrets Manager System, such as AWS Secrets Manager, and read the key from there. It’s possible to read the key from the Secrets Manager on load or every time it’s used. Preferably, use a cache to reduce the calls to the Secrets Manager.
However, there is more to key management. If we require key rotation, and we usually would, then MultiFernet should be used to encrypt the field. When decrypting the field, we must provide the previous keys and rotate them as necessary.
The main disadvantage of this approach is that all access to the ssn column needs to account for encryption and, if relevant, key rotation. This approach leaves more room for errors and makes for much non-don’t-repeat-yourself (DRY) code.
This example assumes that the encryption key is read from an environment variable and stored in the Django settings file.
To improve on the previous approach, we create a property called ssn, with the functions set_ssn and get_ssn, that stores and reads data to and from the encrypted_ssn column.
To prevent frequent decryptions of the same data, we can also store an additional member called decrypted_ssn as a cache for the ssn column. (Not shown in the example code)
This approach improves significantly over the previous one. The code accessing the ssn column doesn’t need to consider encryption, making it transparent. We can encapsulate key rotation inside the Person class or, even better, inside the encrypt() and decrypt() functions. That encapsulation means that the code in views.py is identical to the original without encryption and doesn’t need to change.
However, this approach is far from ideal. Encrypting another field repeats boilerplate code, and we also miss some features such as batching or passing additional parameters to decrypt() (e.g. to only get a masked version of the SSN). Supporting that would require a context manager and storing the desired transformation in a context variable. Here is an example use of such a context manager:
To prevent code duplication, when specifying a property as encrypted, the work we did previously should apply automatically to the new encrypted field.
There are several libraries to achieve this for many languages and platforms. For example, the Django library is django-encrypted-model-fields.
The advantages of using the library are that there’s little we need to implement apart from specifying which fields to encrypt. This approach also delivers a high level of automation and encapsulation.
Depending on the library used, the library can fetch the keys. Otherwise, we must add code to fetch the keys from the Secret Manager.
Moving away from code changes, a network proxy is an alternative approach to field-level encryption. Notable examples include Evervault, Satori, and Fortanix. Some proxies, such as Evervault, are deployed between the browser and the backend of an application, guaranteeing that the backend only sees an encrypted version of the data. Other proxies, such as Satori, are deployed between the backend and database, making all encryption transparent to the backend.
There are advantages to using a proxy, especially regarding centralization and simplification of an app. There are also disadvantages, as getting the application traffic to flow through a proxy can add latency, a single point of failure, and a scalability problem. And normally, a proxy means another team (not the app developers) has to maintain it. This team may not always be in sync with the engineering effort and know what’s going on.
Also, a proxy dependency makes local testing and testing in the CI environment harder, as it’s another component to test. If it’s not tested, another difference between production and testing is introduced.
We can use cryptographic functions in a database using, for example, PostgreSQL's pgcrypto module. A plug-in like this can help implement field-level encryption.
To use pgcrypto, first, we need to install the extension into PostgreSQL. This might be as straightforward as running a SQL command to create the extension. However, it may be more complex when using an RDS or setting up the database as part of CI/CD workflow.
Once installed, we could use pgcrypto for field-level encryption in SQL queries, although that would be cumbersome, especially when using an ORM. Continuing our example, we use a library, such as django-pgcrypto-fields, to encrypt and decrypt values as needed. This approach is very similar to using a library. However, instead of the backend doing the encryption, the database does the work, and the library takes care of the differences in SQL queries.
This code is very similar to the previous approach of using a library to encrypt fields in the backend. When implemented using SQL queries, we need significant code changes to support field-level encryption within the application code to implement key management. Also, we must integrate the infrastructure changes to ensure that the database has pgcrypto installed and enabled.
Finally, the additional security gained from this approach is limited, as it doesn’t protect against someone gaining access to the database.
So far, we’ve not discussed the challenges of working with cryptography libraries in applications. Aside from complicating the code, there are some issues to consider:
Piiano recently released Django ORM integration (along with similar integrations for Hibernate for Java and TypeORM for TypeScript). The Django ORM integration encrypts and decrypts values in a transparent way, similar to using a library. In fact, the resulting code is almost identical to the basic approach used to illustrate this post.
With Piiano Vault, you don’t need to worry about key management, your app being compromised, or adding code complexity to deal with the encryption intricacies. You use the ORM and annotate your fields.
Piiano Vault, an infrastructure for the protection of sensitive customer data, is built to make your life easy. It mitigates all the security implications we've discussed and decouples code from messing with encryption, keeping it readable and focused on data operations.
The biggest advantage of using the Piiano integration is that, unlike pgcrypto for example, it gives your organization a centralized way to control sensitive data access. It also records all data access to logs, so if you ever want to do forensics, you have all the information needed.
There are several advantages to using Vault’s ORM integration:
Finally, remember that achieving privacy and security requires careful analysis and planning. Our hope is that with this post, you are better prepared to implement both in your project.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Tech Lead Advisor
Imri is a lifelong software engineer who loves Python and has taken on CTO roles as an entrepreneur. He has been writing code professionally for his entire life, and he is passionate about building SaaS companies.
Increased complexity as the number of keys and systems grow.
Adopt a centralized key management solution such as a Hardware Security Module (HSM) or cloud-based KMS to securely manage and control cryptographic keys at scale.
Ensuring secure and timely key distribution and synchronization at scale.
Automate key rotation processes to maintain synchronization, reduce human intervention, and minimize errors as the system grows.