Ariel Shiftan
September 5, 2023
Over the years, we have seen more novel cyber threats, but our security mechanisms are still stuck in the past. It’s time to move on and protect the sensitive data in a better way. Just like regulations push the industry forward to protect the data of our customers, we need to push our security infrastructure in response. We need a forward-thinking data protection strategy - one that blends novel security practices, tools, and methodologies, including the nuanced handling of personally identifiable information (PII).
It all begins with the cloud, where applications are accessible to everyone. Therefore, a user or an attacker makes no difference per se. Technically, encrypting all data at rest and in transit might seem like a comprehensive approach, but these methods are not enough anymore. For cloud hosted applications, data-at-rest encryption does not provide the coverage one might expect. When someone accesses a database, it automatically decrypts the data when reading it from the disk. This once great security mechanism wouldn’t stop an attacker who has direct access to a database, because the decryption happens automatically and transparently. Touché. And if the keys are readily accessible, the security benefits of such encryption will be severely limited.
In this article, we delve deep into the realm of PII encryption, discussing its significance, how it elevates the entirety of your customer data protection posture, and techniques to integrate it into your organization’s overall security program.
PII, or personally identifiable information, represents any data element that can independently or in conjunction identify an individual. This typically ranges from names, email IDs, phone numbers, and addresses to social security numbers, bank details, or even digital footprints such as IP addresses. Due to the nature of the information, securing PII data is very important.
As a second line of defense mechanism, PII encryption means to encrypt only the PII data on top of the other automatic encryption mechanisms that are out there. Regardless of the data’s state - whether it is at rest in a database, in transit across the internet, or even in use - the encryption method can vary. Technically, enterprises should opt for the simple and robust symmetric encryption, like AES, that employs a single key to both encrypt and decrypt the data.
But PII encryption is not without its complexities at the operational level. What to encrypt, the right approach to choosing which data to encrypt, where to store the secret keys, and when should data be decrypted are the most vital questions that don't yield simple answers - but we will try to answer them.
The traditional approach of solely relying on data-at-rest encryption has its limitations. Although it encrypts the entire dataset, it doesn't provide the specific, targeted protection that enterprises require today. On the surface it sounds like encrypting everything should be sufficient to stop attackers. But in practice, once an attacker gains access to a database using stolen credentials, they can simply read all the data without being stopped. Once the database reads the data to memory, to serve the application, data gets immediately decrypted. Unfortunately, this is how data-at-rest encryption is designed, to mitigate physical theft of the data. A great security mechanism against physical theft of hard drives, only that.
So why haven’t we seen field level encryption taking off until now?
Because, truth be told, it’s harder to implement it from scratch, requires expertise and most importantly - awareness. In addition, nobody likes to hear (ahem ahem, looking at you CISOs) that compliance only requires data-at-rest encryption. So why bother going over the bar?
Field level encryption was always one of the best security mechanisms out there, however, maybe the world was solving a higher priority set of security problems. And compliance, in this regard, stalls everything. Anyway, now it’s time to fix this too.
Data At Rest Encryption (DARE): As a standard practice required for most compliance frameworks, this method involves encrypting the entire dataset at the storage level, making the data unreadable unless decrypted when it’s being read by the database engine itself. While it provides a generalized level of security for all the data, it doesn’t differentiate between sensitive and non-sensitive data, because it doesn’t matter at this layer of how things work. If a breach occurs and an attacker gains access to the database, all data is automatically decrypted upon reading it, making DARE alone insufficient for protecting data. This mechanism is also very popular due to its integration nature which is very easy, hence it’s also called transparent data encryption (TDE).
Field-Level Encryption (FLE): This technique encrypts selected data fields within a database or application. For instance, in a database with multiple columns, only the column containing PII should be encrypted, leaving other columns in plain text. The primary advantage of field-level encryption is in its granularity; it allows specific, sensitive data pieces to be protected while the rest remains untouched.
There are downsides for encrypting data at the field level:
However, the benefits trample these issues. And for that reason we target PII data, at the minimum. Eventually, this mechanism ensures that even if a database is compromised, encrypted PII remains secure and unreadable to the attacker. This is due to the fact that the database isn’t responsible for the encryption/decryption of the data, it’s up to the application itself.
Instead of viewing them as an either-or scenario, considering their complementary layers of security offer finer controls to secure data and broader coverage. While DARE provides a foundational layer of security, FLE adds a targeted, strategic layer to safeguard the most valuable and sensitive data fields, like PII, while keeping the system performant and functional.
In the cloud era, where physical thefts of hard drives in a data center are unlikely to happen anymore, and the internet connects everything - PII field level encryption is becoming a must.
When handling data where PII security is crucial, field-level encryption presents a more tailored and rigorous protection mechanism, ensuring that specific data pieces remain secure even in the face of security incidents like accessing databases or backups directly.
Technically, if attackers manage to put their hands on the PII data in the database, then it’s indecipherable to them, as they don’t have the keys and the data isn’t automatically decrypted like it is in data-at-rest encryption.
In the following picture, we’re showing the differences between how the encryption protects the data in both DARE and FLE cases, and notice how the bad actor is either sad or happy according to their effort trying to obtain readable data.
To summarize - PII field level encryption lets only the application access the data at runtime. Anything else will not have the ability to decrypt the data without the secret keys. So eventually there’s PII field level encryption stacked on top of data-at-rest encryption. Anyone that tries to access the database directly, with credentials or not, benign or malicious, is going to be able to read indecipherable data, thus raising the bar for stealing sensitive data.
Contrary to a simplistic view of encryption as a silver bullet, integrating PII encryption into a security framework can be complex. When seriously thinking about integrating encryption to your software, many issues will surface:
The following approach can help answering some of these questions about integration and implementation of PII encryption:
Information is everywhere, but not all data is attributed the same weightage. PII encompasses any information subset that can be used in isolation or alongside other data to identify a specific individual. The ability to single out a data subject from a broader dataset underscores the importance of treating PII with utmost protection.
Check our PII By Design cheat sheet for more details.
PII is typically categorized into:
For businesses, understanding the complexity and full scope of PII is crucial. Strict privacy regulations and security standards govern it, and gaps in its implementation can erode trust, damage reputation, and lead to severe financial penalties.
When you scale this to an organization managing enormous amounts of PII data, the stakes rise significantly. For perspective, imagine a scenario where PII is exposed due to weak encryption key management or the use of outdated cryptography implementation libraries only in one of your services.
There are two main reasons why we want to encrypt PII data using field level encryption:
Meaning that we can leave everything plaintext at the database level (which is encrypted with data-at-rest encryption), except the PII fields that are covered with field level encryption.
In other words, thanks to PII encryption, if cyber criminals manage to steal data from a database, it’s going to be anonymized, thus useless and reducing the impact of a breach when we’re talking about customer data.
The challenge then becomes covering all identifying information in a customer data record stored in a transactional database like SQL. And it gets trickier if you have too much unstructured data about that customer too, though in most businesses it’s not the case.
But there's also a nuance to PII encryption - it's not just about data defense or selecting the strongest AES algorithm. It signifies an organization's ethos and a commitment that affirms to customers that their digital identities are respected and protected.
Given the stakes, it's vital to choose the right protective strategy. The choice between the approaches depends on the specific requirements of data access controls, the particular use cases, and the overarching goals of PII security within an organization. While encryption and tokenization both aim to protect sensitive data, the mechanisms employed and their use cases vary distinctly.
They are both equally good for most data access control requirements. So practically, it all comes down to the matter of implementation and integration. To summarize, tokenization is by far easier to implement, unlike field level encryption, however it’s a bit harder to integrate with at the software level, in a production grade level.
We've already discussed data tokenization extensively and thoroughly, examining and comparing various techniques in detail. However, let’s review a few factors.
Which of the strategies to opt for is often a debatable topic.
Although there are contrarian views, neither encryption nor tokenization is generally superior over the other, and are meant to serve different purposes. Tokenization reduces scope of sensitive data everywhere across the system, while encryption conceals data. The best approach is a refined evaluation of your data's nature, access patterns, and regulatory landscape.
For example, if you have a single database with structured data of your customer records, field level encryption for the PII fields is a great solution to raise the bar against data theft. However, if you have multiple locations holding the same PII data in the organization, tokenization might be a better choice, so you end up having a single copy of the sensitive data in exactly one place, and control it in a consolidated way from the data vault.
A symmetric encryption methodology employs a single cryptographic key for both encryption and decryption of data. This makes it simple to manage and fast to implement. As a result, it is commonly preferred for workloads (such as real-time applications) that need quicker encryption and decryption of data. Therefore these workloads (software services) need direct access to the keys. While its simplicity is one of its major strengths, managing the keys requires consideration.
For instance:
A solution to these challenges is utilizing a decentralized Key Management Service that operates on a never-let-the-keys-out principle and ensures secure key handling without the need for physical distribution or centralizing risks. Instead of storing keys in the KMS, the KMS generates the keys and then distributes them to the authorized users. This approach helps to protect the keys from being compromised if the KMS is ever attacked. You can read more about the difference between a KMS and a data vault.
In all scenarios, the sanctity of the transmission medium is proportional to your security posture. Ensure it remains uncompromised when sharing the key.
And recently I heard from a friend that they implemented their own sensitive data field level encryption, and it was harder for the engineers to debug the system, and eventually the keys were found in some Slack channel!
So it’s time to mention that keys should never reach human users; you should design an organized secure mechanism (based on permissions) to access the data in an auditable and monitored fashion only.
It is a great solution for many use cases, mostly PKI, blockchain, or exchanging keys, for eventually - doing symmetric encryption over network protocols, like TLS. However, we don’t find it a good fit for the requirements of data encryption in production environments, mostly for performance’s sake, as it is a few times slower. Some unskilled-cryptography architects might think that having two separate keys is a good design if you break it into two services (one for encryption and another for decryption, which is normally smart for security-by-design, so it becomes harder to compromise both keys and therefore the data). But that hassle of maintaining more components isn’t worth it in most cases. And to begin with, one should use a KMS or a vault where keys never leave them.
Given the complexity and diversity of databases today, as well as the volume of the data and its many uses in production backends, encryption practices need an added layer of robustness to safeguard these systems.
While foundational techniques such as data-at-rest and data-in-transit encryption may remain relevant, field-level encryption acknowledges the reality that PII encryption is an ongoing, dynamic process, instead of being a one-time task, as it must be implemented at the application level. But the big plus side is that now data is by far more secure than ever before.
The fragmented, database-centric approach to PII encryption has shown itself to be limited and often unscalable, even often uglifying the SQL queries that need to reach the database. Standardizing and decoupling PII encryption from database-specific implementations ensures that it remains an integral part of your overall data strategy. The most important reason for this shift is the recognition that PII encryption must be agile, adaptive and responsive to be widely used by the software development organization. More importantly, a standardized encryption approach ensures consistency across different databases and data stores and even teams, eliminating the need to manage multiple keys and monitor specific systems.
Homomorphic encryption stands distinguished by its capability to execute computations on encrypted data without necessitating decryption, as the result is also encrypted naturally. For PII, this means enhanced privacy even during data processing, making it a potential game-changer in areas where data utility and privacy need to coexist seamlessly. Despite the benefits, ensure to weigh in its computational complexity, resource-intensive nature, and slow processing speeds that can render it impractical for many real-time or large-scale applications. It is an especially great solution when data sharing is required across businesses, while satisfying the need to preserve data subjects confidentiality. Note that in some industries it is not a viable solution as the business owner must know who the customers are.
Regular audits are indispensable and underscore the resilience of your encryption implementation. With consistent evaluation of your methods against current industry standards and recognizing looming threats, you maintain a robust shield over your data security. It is also important to note that even widely accepted encryption implementations can sometimes be found to contain bugs or vulnerabilities. Keep these libraries up to date.
When choosing a PII encryption tool, it is important you find one that has everything you need. To help you make this decision, below is a list of critical aspects you want to be part of your PII encryption tool:
The selected tool should not hamper your system's efficiency. Consider its impact on system resources, latency, and the potential influence on user experience. For instance, consider its effect on latency during heavy data processing or the encryption/decryption speed in a high-volume environment. A tool that assures encryption without affecting performance aligns well with business continuity and sustains high productivity levels.
Your encryption tool must allow specific control over different data segments. This enables precise management of who can access specific PII, enhancing targeted protection without hindering the accessibility of non-sensitive data. Practically, this translates to more streamlined processes where only relevant data is secured, minimizing unnecessary encryption overhead.
Consider the integration simplicity and alignment with the development team's needs. Choose a tool that offers developer-friendly interfaces, automation capabilities, and compatibility with existing systems. The goal is to maintain development velocity, promote innovation, and simplify PII encryption across various application layers. The chosen tool should be able to quickly and easily encrypt PII data in your database, applications, and code. In addition, the tool should also support a variety of data types and formats, including structured data, unstructured data, and binary data.
Consider the tool's adaptability to modern cryptographic methods like AES-256. For instance, a tool that supports forward secrecy can ensure that even if one session's key is compromised, past sessions remain secure. Assessing encryption strength isn't just about current robustness; it also includes the tool's ability to evolve with emerging cryptographic standards.
In a real-world business environment, the need for rapidly editing permissions, accessing logs, or managing users is critical. A tool that provides intuitive controls, such as implementing role-based access controls (RBAC) or utilizing SIEM solutions for logging, can enhance security oversight and allow flexibility in accommodating dynamic organizational needs.
Key management is intricate and crucial. Additionally, supporting standards like KMIP and utilizing HSM-backed infrastructures can ensure optimal key safety through secure generation, rotation, and retirement processes.
Finally, align your choice with the compliance requirements specific to your industry or region. Whether adhering to GDPR, HIPAA, or other regulatory frameworks, the tool must facilitate adherence to legal obligations. For instance, a tool capable of generating evidence of encryption per legal obligations (e.g., logs showcasing access controls and encryption methodologies) assists both in compliance verification and potential audits.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
The increasing attention to PII encryption in the modern cybersecurity landscape and cloud hosting era is bright. While PII encryption is picking momentum as the frontline defense against security threats, it must be noted that over-reliance on encryption alone shouldn’t cause organizations to overlook underlying vulnerabilities. As we all know, a single data leak or a bug in a web API can still be fatal. However, PII encryption is both good for compliance and for seriously neutralizing a whole class of attacks that are based on network penetration to the production environment, and from there - to the database.
To really access the protected PII data, it will now require the bad actors to compromise the application itself, either by achieving arbitrary code execution (RCE) or SQL injections. The former is less likely to happen, while the latter, still exists, is mostly (but not perfectly) taken care of by security perimeter solutions like WAFs or in the software by DB drivers (ORMs). Overall, the risk is now practically reduced against exposing the sensitive data, and the impact of an inevitable data breach is now nulled in terms of privacy damage to your customers as data becomes de-identified.
A good PII encryption infrastructure will enable organizations to quickly adapt to GDPR, CCPA and others to achieve a great level of data security. And perhaps, for the first time, a truce will happen between application-security teams and the R&D teams that have the upper hand on deciding which database technology to use. The relentless pursuit of data protection extends beyond any single solution. And the key to success is to consistently remain vigilant, adaptive, and proactive.
CTO & Co-founder
Ariel, despite holding a PhD in Computer Science, doesn't strictly conform to the traditional academic archetype. His heart lies in the realm of hacking, a passion he has nurtured since his early years. As a proficient problem solver, Ariel brings unmatched practicality and resourcefulness to every mission he undertakes.
Increased complexity as the number of keys and systems grow.
Adopt a centralized key management solution such as a Hardware Security Module (HSM) or cloud-based KMS to securely manage and control cryptographic keys at scale.
Ensuring secure and timely key distribution and synchronization at scale.
Automate key rotation processes to maintain synchronization, reduce human intervention, and minimize errors as the system grows.