You agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

PII Encryption: The Key to Protecting Your Customers’ Privacy

Security
Table of content:
Join our newsletter

Your privacy is important to us, privacy policy.

Over the years, we have seen more novel cyber threats, but our security mechanisms are still stuck in the past. It’s time to move on and protect the sensitive data in a better way. Just like regulations push the industry forward to protect the data of our customers, we need to push our security infrastructure in response. We need a forward-thinking data protection strategy - one that blends novel security practices, tools, and methodologies, including the nuanced handling of personally identifiable information (PII).

It all begins with the cloud, where applications are accessible to everyone. Therefore, a user or an attacker makes no difference per se. Technically, encrypting all data at rest and in transit might seem like a comprehensive approach, but these methods are not enough anymore. For cloud hosted applications, data-at-rest encryption does not provide the coverage one might expect. When someone accesses a database, it automatically decrypts the data when reading it from the disk. This once great security mechanism wouldn’t stop an attacker who has direct access to a database, because the decryption happens automatically and transparently. Touché. And if the keys are readily accessible, the security benefits of such encryption will be severely limited.

In this article, we delve deep into the realm of PII encryption, discussing its significance, how it elevates the entirety of your customer data protection posture, and techniques to integrate it into your organization’s overall security program.

If you're looking for a quick way to protect the PII in your backend, create a Vault account now.

What is PII Encryption?

PII, or personally identifiable information, represents any data element that can independently or in conjunction identify an individual. This typically ranges from names, email IDs, phone numbers, and addresses to social security numbers, bank details, or even digital footprints such as IP addresses. Due to the nature of the information, securing PII data is very important.

As a second line of defense mechanism, PII encryption means to encrypt only the PII data on top of the other automatic encryption mechanisms that are out there. Regardless of the data’s state - whether it is at rest in a database, in transit across the internet, or even in use - the encryption method can vary. Technically, enterprises should opt for the simple and robust symmetric encryption, like AES, that employs a single key to both encrypt and decrypt the data.

But PII encryption is not without its complexities at the operational level. What to encrypt, the right approach to choosing which data to encrypt, where to store the secret keys, and when should data be decrypted are the most vital questions that don't yield simple answers - but we will try to answer them.

PII Field Level Encryption vs. Data-At-Rest Encryption

The traditional approach of solely relying on data-at-rest encryption has its limitations. Although it encrypts the entire dataset, it doesn't provide the specific, targeted protection that enterprises require today. On the surface it sounds like encrypting everything should be sufficient to stop attackers. But in practice, once an attacker gains access to a database using stolen credentials, they can simply read all the data without being stopped. Once the database reads the data to memory, to serve the application, data gets immediately decrypted. Unfortunately, this is how data-at-rest encryption is designed, to mitigate physical theft of the data. A great security mechanism against physical theft of hard drives, only that.

So why haven’t we seen field level encryption taking off until now?

Because, truth be told, it’s harder to implement it from scratch, requires expertise and most importantly - awareness. In addition, nobody likes to hear (ahem ahem, looking at you CISOs) that compliance only requires data-at-rest encryption. So why bother going over the bar?

Field level encryption was always one of the best security mechanisms out there, however, maybe the world was solving a higher priority set of security problems. And compliance, in this regard, stalls everything. Anyway, now it’s time to fix this too.

Data At Rest Encryption (DARE): As a standard practice required for most compliance frameworks, this method involves encrypting the entire dataset at the storage level, making the data unreadable unless decrypted when it’s being read by the database engine itself. While it provides a generalized level of security for all the data, it doesn’t differentiate between sensitive and non-sensitive data, because it doesn’t matter at this layer of how things work. If a breach occurs and an attacker gains access to the database, all data is automatically decrypted upon reading it, making DARE alone insufficient for protecting data. This mechanism is also very popular due to its integration nature which is very easy, hence it’s also called transparent data encryption (TDE).

Field-Level Encryption (FLE): This technique encrypts selected data fields within a database or application. For instance, in a database with multiple columns, only the column containing PII should be encrypted, leaving other columns in plain text. The primary advantage of field-level encryption is in its granularity; it allows specific, sensitive data pieces to be protected while the rest remains untouched.

There are downsides for encrypting data at the field level:

  1. More storage space is required per field
  2. Harder to (implement) search over encrypted data
  3. Encryption/decryption consumes more CPU time

However, the benefits trample these issues. And for that reason we target PII data, at the minimum. Eventually, this mechanism ensures that even if a database is compromised, encrypted PII remains secure and unreadable to the attacker. This is due to the fact that the database isn’t responsible for the encryption/decryption of the data, it’s up to the application itself.

Instead of viewing them as an either-or scenario, considering their complementary layers of security offer finer controls to secure data and broader coverage. While DARE provides a foundational layer of security, FLE adds a targeted, strategic layer to safeguard the most valuable and sensitive data fields, like PII, while keeping the system performant and functional.

In the cloud era, where physical thefts of hard drives in a data center are unlikely to happen anymore, and the internet connects everything - PII field level encryption is becoming a must.

How Field-Level Encryption Stops Attackers

When handling data where PII security is crucial, field-level encryption presents a more tailored and rigorous protection mechanism, ensuring that specific data pieces remain secure even in the face of security incidents like accessing databases or backups directly.

Technically, if attackers manage to put their hands on the PII data in the database, then it’s indecipherable to them, as they don’t have the keys and the data isn’t automatically decrypted like it is in data-at-rest encryption.

In the following picture, we’re showing the differences between how the encryption protects the data in both DARE and FLE cases, and notice how the bad actor is either sad or happy according to their effort trying to obtain readable data.

Technical illustration on data-at-rest-encryption vs. PII encryption.

To summarize - PII field level encryption lets only the application access the data at runtime. Anything else will not have the ability to decrypt the data without the secret keys. So eventually there’s PII field level encryption stacked on top of data-at-rest encryption. Anyone that tries to access the database directly, with credentials or not, benign or malicious, is going to be able to read indecipherable data, thus raising the bar for stealing sensitive data.

Challenges and Considerations in PII Encryption

Contrary to a simplistic view of encryption as a silver bullet, integrating PII encryption into a security framework can be complex. When seriously thinking about integrating encryption to your software, many issues will surface:

  1. Which encryption algorithm to use?
  2. Which (micro/) services have access to encrypt/decrypt the data?
  3. How do we distribute the keys safely to the services?
  4. What happens once a service, that has access to the keys, is compromised?
  5. How do we migrate existing data to be encrypted without breaking the system?
  6. How do we ensure performance of the encryption is up to our expectations?
  7. How do we search over encrypted data?
  8. Can we use a database’s driver to encrypt the data?
  9. How can we rotate keys?

The following approach can help answering some of these questions about integration and implementation of PII encryption:

Stage Purpose Approach
PII Identification Methodically recognize and categorize data fields/columns that qualify as PII or any other crown jewels like PHI or PCI data. Harness DLP tools to systematically pinpoint the sensitive data within data repositories, or analyze your source code to see what data is persistently stored.
Encryption Algorithm Deployment Select the optimal encryption algorithm that ensures data confidentiality without compromising performance (much). Symmetric Encryption: Utilize AES (preferably AES-256).
Encryption Integration to Code Integrate with an encryption library. Use a known library to encrypt the data, one that is professionally maintained. Make sure that encryption of the same input data doesn’t yield the same output, that will severely limit the protection of the data. In the code:
- Encrypt data before storing it.
- Decrypt data after reading it.
Searchable Encrypted Data For sake of functionality some sensitive fields have to be searchable. Normally you would want to implement blind index for the encrypted data, using a secure hash that is always fixed/deterministic for the same input. It is an additional column next to the encrypted data that is used for the lookups. For exact-match searches you will have to securely hash the input and look it up against this column. For partial text searches over encrypted data, you will have to index the data using member-querying algorithms and that’s a whole topic by itself.
Key Management Ensure secure creation, distribution, and storage of keys, minimizing the risk of unauthorized access or key compromise. Securely generate, distribute (if necessary), rotate, and retire encryption keys using KMIP protocols and HSM-backed infrastructures. Employ KMS to avoid distributing (and leaking) cryptographic keys whenever possible - might have latency performance issues.

What Constitutes Personally Identifiable Information (PII)?

Illustration of what constitutes Personally Identifiable Information (PII). It includes direct identifiers, indirect identifiers, digital footprints and sensitive PII.

Information is everywhere, but not all data is attributed the same weightage. PII encompasses any information subset that can be used in isolation or alongside other data to identify a specific individual. The ability to single out a data subject from a broader dataset underscores the importance of treating PII with utmost protection.

Check our PII By Design cheat sheet for more details.

PII is typically categorized into:

  • Direct Identifiers: Explicitly identify an individual. Examples include full name, email address, and passport number.
  • Indirect Identifiers: When combined with other data, it can pinpoint an individual. For example, these include a mix of data fields such as date of birth, gender, and postal code.
  • Digital Footprints: Elements like IP addresses, login IDs, digital images, or social media posts or handle names which, in today's interconnected world, are often tied back to individual identities.
  • Sensitive PII: Details that, if disclosed, could lead to harm, discrimination, or identity theft. This encompasses medical records, financial data, and other similar categories.

Why PII Encryption is Essential for Protecting Sensitive Data

For businesses, understanding the complexity and full scope of PII is crucial. Strict privacy regulations and security standards govern it, and gaps in its implementation can erode trust, damage reputation, and lead to severe financial penalties.

When you scale this to an organization managing enormous amounts of PII data, the stakes rise significantly. For perspective, imagine a scenario where PII is exposed due to weak encryption key management or the use of outdated cryptography implementation libraries only in one of your services.

There are two main reasons why we want to encrypt PII data using field level encryption:

  1. Field level encryption has its operational drawbacks - (we don’t want to use it for everything) and we must selectively encrypt specific fields.
  2. GDPR eventually says, with some interpretation (see pseudonymization) that if a customer record is stolen, while being de-identified, then there’s no need for a breach notification or a penalty!

Meaning that we can leave everything plaintext at the database level (which is encrypted with data-at-rest encryption), except the PII fields that are covered with field level encryption.

In other words, thanks to PII encryption, if cyber criminals manage to steal data from a database, it’s going to be anonymized, thus useless and reducing the impact of a breach when we’re talking about customer data.

The challenge then becomes covering all identifying information in a customer data record stored in a transactional database like SQL. And it gets trickier if you have too much unstructured data about that customer too, though in most businesses it’s not the case.

But there's also a nuance to PII encryption - it's not just about data defense or selecting the strongest AES algorithm. It signifies an organization's ethos and a commitment that affirms to customers that their digital identities are respected and protected.

You can start protecting your customers' PII with our Piiano-hosted Vault using simple APIs, create an account right away here.

Encryption vs. Tokenization for Your PII Security

Given the stakes, it's vital to choose the right protective strategy. The choice between the approaches depends on the specific requirements of data access controls, the particular use cases, and the overarching goals of PII security within an organization. While encryption and tokenization both aim to protect sensitive data, the mechanisms employed and their use cases vary distinctly.

They are both equally good for most data access control requirements. So practically, it all comes down to the matter of implementation and integration. To summarize, tokenization is by far easier to implement, unlike field level encryption, however it’s a bit harder to integrate with at the software level, in a production grade level.

We've already discussed data tokenization extensively and thoroughly, examining and comparing various techniques in detail. However, let’s review a few factors.

Factor Encryption Tokenization
Regulatory Compliance General-purpose solution and is adaptable to multiple regulations. Initially favored for PCI-DSS in payment card scenarios. Recently, it was mentioned in GDPR under pseudonymization.
Key Management vs. Data Vaulting Demands rigorous key management processes. All data is at risk if keys are compromised. Relies on a secure data vault. A token vault breach exposes original data but piecing them together with related data is complex. Surprisingly, a data vault must use field level encryption.
Scalability & Performance More compute-intensive but scales well across systems. Offers the same level of control and granularity for all the data that is encrypted under the same key. Offers very high granularity of control - per token - but may grapple with performance when having hundreds of millions of tokens, particularly if using centralized vaults.
Network Latency To prevent key compromise - one should use a KMS or a separated and isolated service that holds the secret keys to do the encryption and decryption of data. Therefore a safer encryption implementation requires another component in the architecture, thus introducing network latency. Using a secure data vault will also introduce network latency as it is normally another component in the architecture.
Database-Level Breach Unauthorized access to the disk or database is rendered ineffective, as decryption typically requires application-level permission. Vault’s database must employ field level encryption, therefore tokens’ data is secure.
General Security Implications Security remains intact as long as the secret keys are secure. Tokens in datasets are benign unless linked back to their original data en-masse. Although isolated tokens remain protected, the risk increases if the linkage between tokens and original data is compromised.
Integration Complexity Simple -
Stateless nature of encryption eliminates the need for complex transaction handling and offers a simpler integration pathway when working with a database. It allows encrypting a payload and then storing it anywhere, without the need to maintain a relationship between the encrypted data and the original data.
Harder -
Since tokenization is stateful it means that integrating with it may require careful consideration. When tokenizing PII fields and storing everything in an adjunct database, error handling is harder and might require a two phase commit.

Which of the strategies to opt for is often a debatable topic.

Although there are contrarian views, neither encryption nor tokenization is generally superior over the other, and are meant to serve different purposes. Tokenization reduces scope of sensitive data everywhere across the system, while encryption conceals data. The best approach is a refined evaluation of your data's nature, access patterns, and regulatory landscape.

For example, if you have a single database with structured data of your customer records, field level encryption for the PII fields is a great solution to raise the bar against data theft. However, if you have multiple locations holding the same PII data in the organization, tokenization might be a better choice, so you end up having a single copy of the sensitive data in exactly one place, and control it in a consolidated way from the data vault.

PII Encryption Techniques and Best Practices

Symmetric Encryption

A symmetric encryption methodology employs a single cryptographic key for both encryption and decryption of data. This makes it simple to manage and fast to implement. As a result, it is commonly preferred for workloads (such as real-time applications) that need quicker encryption and decryption of data. Therefore these workloads (software services) need direct access to the keys. While its simplicity is one of its major strengths, managing the keys requires consideration.

  • Key Distribution: Symmetric encryption's strength and vulnerability lies in its key. It's because if the key is compromised, the data's safety is compromised too. This places immense importance on how these keys are distributed and who has access to them. Although there are multiple methods to distribute encryption keys, each has its flaws.

For instance:

  • A secure channel might offer rapid transmission, but if there's a flaw in the protocol, it might be vulnerable.
  • Physical couriers remove many digital vulnerabilities but introduce tangible risks (e.g., keys getting intercepted, lost or stolen during transit).
  • Key Distribution Centers (KDCs) are efficient and secure, but they centralize the distribution process. Which implies, if the KDC is compromised, a lot of data can be at risk.

A solution to these challenges is utilizing a decentralized Key Management Service that operates on a never-let-the-keys-out principle and ensures secure key handling without the need for physical distribution or centralizing risks. Instead of storing keys in the KMS, the KMS generates the keys and then distributes them to the authorized users. This approach helps to protect the keys from being compromised if the KMS is ever attacked. You can read more about the difference between a KMS and a data vault.

In all scenarios, the sanctity of the transmission medium is proportional to your security posture. Ensure it remains uncompromised when sharing the key.

And recently I heard from a friend that they implemented their own sensitive data field level encryption, and it was harder for the engineers to debug the system, and eventually the keys were found in some Slack channel!

So it’s time to mention that keys should never reach human users; you should design an organized secure mechanism (based on permissions) to access the data in an auditable and monitored fashion only.

  • Key Rotation: A frequently overlooked but critical practice, key rotation ensures that even if a key is compromised, the window of vulnerability remains small. This is because an attacker would need to compromise all of the keys in use before they could access the data. There are many strategies of how to manage the usage of keys actively to encrypt data. And regularly updating and rotating encryption keys minimizes potential exposure and keeps the encrypted data dynamic in its defense (a la moving target).

Asymmetric Encryption

It is a great solution for many use cases, mostly PKI, blockchain, or exchanging keys, for eventually - doing symmetric encryption over network protocols, like TLS. However, we don’t find it a good fit for the requirements of data encryption in production environments, mostly for performance’s sake, as it is a few times slower. Some unskilled-cryptography architects might think that having two separate keys is a good design if you break it into two services (one for encryption and another for decryption, which is normally smart for security-by-design, so it becomes harder to compromise both keys and therefore the data). But that hassle of maintaining more components isn’t worth it in most cases. And to begin with, one should use a KMS or a vault where keys never leave them.

PII Encryption in Production Backend Systems

Given the complexity and diversity of databases today, as well as the volume of the data and its many uses in production backends, encryption practices need an added layer of robustness to safeguard these systems.

  • Data-at-Rest Encryption: Even while static and stored, PII must be encrypted. Consider leveraging solutions like Transparent Data Encryption (TDE) to ensure that data sitting in databases is unbreachable. It’s better than nothing, and checks the compliance.
  • Data-in-Transit Encryption: As PII journeys through an organization's network, it must remain cloaked. Employ techniques such as Transport Layer Security (TLS), and make sure that you authenticate the certificate of the host too!
  • Field-Level Encryption: This is where a truly consolidated and practical strategy comes into play. Beyond just encrypting data as it rests or moves, encrypting data at the application level — where it's initially created or first processed — adds another layer of defense.

How to Make Data Protection Based on PII Encryption Robust

  • Centralized Control: As organizations adopt various databases and data stores, managing encryption keys and monitoring each system becomes unwieldy. Field-level encryption implemented by a single secure component (that holds the keys too) offers centralized control over access policies. And more importantly acts as a unifying layer on top of any database used in the system.

  • Ease of Implementation: With different databases requiring specific expertise for encrypting columns, developers often neglect this critical task. Field-level encryption standardizes the process, making it more accessible and thus more likely to be prioritized and picked up by different R&D teams.

  • Adaptation to Evolving Needs: Over time, new sensitive fields may be collected, and existing ones may change in importance. Being stored in a new database solution for a specific type of data processing. Or a rigid, database-specific encryption strategy can cause these evolving needs to be neglected. By contrast, a separate layer of field-level encryption allows for flexibility and scalability, enabling organizations to adapt to changing requirements without a complete overhaul.

  • Aligning Security and R&D: Standardized encryption at the field-level eases the collaboration between security and development teams. It facilitates a shared understanding and approach, rather than leaving security teams to chase R&D for piecemeal, database-specific implementations, that eventually are never prioritized. Or that R&D has the veto right to choose their own databases, and security isn’t part of the process. Expediting security to field-level encryption is a great win for both sides.

Data Protection Culture and Awareness

While foundational techniques such as data-at-rest and data-in-transit encryption may remain relevant, field-level encryption acknowledges the reality that PII encryption is an ongoing, dynamic process, instead of being a one-time task, as it must be implemented at the application level. But the big plus side is that now data is by far more secure than ever before.

The fragmented, database-centric approach to PII encryption has shown itself to be limited and often unscalable, even often uglifying the SQL queries that need to reach the database. Standardizing and decoupling PII encryption from database-specific implementations ensures that it remains an integral part of your overall data strategy. The most important reason for this shift is the recognition that PII encryption must be agile, adaptive and responsive to be widely used by the software development organization. More importantly, a standardized encryption approach ensures consistency across different databases and data stores and even teams, eliminating the need to manage multiple keys and monitor specific systems. 

Homomorphic Encryption for PII

Homomorphic encryption stands distinguished by its capability to execute computations on encrypted data without necessitating decryption, as the result is also encrypted naturally. For PII, this means enhanced privacy even during data processing, making it a potential game-changer in areas where data utility and privacy need to coexist seamlessly. Despite the benefits, ensure to weigh in its computational complexity, resource-intensive nature, and slow processing speeds that can render it impractical for many real-time or large-scale applications. It is an especially great solution when data sharing is required across businesses, while satisfying the need to preserve data subjects confidentiality. Note that in some industries it is not a viable solution as the business owner must know who the customers are.

Regular Audits and Compliance Checks

Regular audits are indispensable and underscore the resilience of your encryption implementation. With consistent evaluation of your methods against current industry standards and recognizing looming threats, you maintain a robust shield over your data security. It is also important to note that even widely accepted encryption implementations can sometimes be found to contain bugs or vulnerabilities. Keep these libraries up to date.

How to Choose the Right PII Encryption Tool for Your Business

Illustration on how to choose the right PII encryption tool for your business.

When choosing a PII encryption tool, it is important you find one that has everything you need. To help you make this decision, below is a list of critical aspects you want to be part of your PII encryption tool:

Performance

The selected tool should not hamper your system's efficiency. Consider its impact on system resources, latency, and the potential influence on user experience. For instance, consider its effect on latency during heavy data processing or the encryption/decryption speed in a high-volume environment. A tool that assures encryption without affecting performance aligns well with business continuity and sustains high productivity levels.

Data Access Granularity

Your encryption tool must allow specific control over different data segments. This enables precise management of who can access specific PII, enhancing targeted protection without hindering the accessibility of non-sensitive data. Practically, this translates to more streamlined processes where only relevant data is secured, minimizing unnecessary encryption overhead.

Developer Agility

Consider the integration simplicity and alignment with the development team's needs. Choose a tool that offers developer-friendly interfaces, automation capabilities, and compatibility with existing systems. The goal is to maintain development velocity, promote innovation, and simplify PII encryption across various application layers. The chosen tool should be able to quickly and easily encrypt PII data in your database, applications, and code. In addition, the tool should also support a variety of data types and formats, including structured data, unstructured data, and binary data.

Encryption Strength

Consider the tool's adaptability to modern cryptographic methods like AES-256. For instance, a tool that supports forward secrecy can ensure that even if one session's key is compromised, past sessions remain secure. Assessing encryption strength isn't just about current robustness; it also includes the tool's ability to evolve with emerging cryptographic standards.

Controls 

In a real-world business environment, the need for rapidly editing permissions, accessing logs, or managing users is critical. A tool that provides intuitive controls, such as implementing role-based access controls (RBAC) or utilizing SIEM solutions for logging, can enhance security oversight and allow flexibility in accommodating dynamic organizational needs.

Safety of Keys 

Key management is intricate and crucial. Additionally, supporting standards like KMIP and utilizing HSM-backed infrastructures can ensure optimal key safety through secure generation, rotation, and retirement processes.

Compliance Support

Finally, align your choice with the compliance requirements specific to your industry or region. Whether adhering to GDPR, HIPAA, or other regulatory frameworks, the tool must facilitate adherence to legal obligations. For instance, a tool capable of generating evidence of encryption per legal obligations (e.g., logs showcasing access controls and encryption methodologies) assists both in compliance verification and potential audits. 

Conclusion

The increasing attention to PII encryption in the modern cybersecurity landscape and cloud hosting era is bright. While PII encryption is picking momentum as the frontline defense against security threats, it must be noted that over-reliance on encryption alone shouldn’t cause organizations to overlook underlying vulnerabilities. As we all know, a single data leak or a bug in a web API can still be fatal. However, PII encryption is both good for compliance and for seriously neutralizing a whole class of attacks that are based on network penetration to the production environment, and from there - to the database.

To really access the protected PII data, it will now require the bad actors to compromise the application itself, either by achieving arbitrary code execution (RCE) or SQL injections. The former is less likely to happen, while the latter, still exists, is mostly (but not perfectly) taken care of by security perimeter solutions like WAFs or in the software by DB drivers (ORMs). Overall, the risk is now practically reduced against exposing the sensitive data, and the impact of an inevitable data breach is now nulled in terms of privacy damage to your customers as data becomes de-identified.

A good PII encryption infrastructure will enable organizations to quickly adapt to GDPR, CCPA and others to achieve a great level of data security. And perhaps, for the first time, a truce will happen between application-security teams and the R&D teams that have the upper hand on deciding which database technology to use. The relentless pursuit of data protection extends beyond any single solution. And the key to success is to consistently remain vigilant, adaptive, and proactive.

Share article

Powering Data Protection

Skip PCI compliance with our tokenization APIs

Skip PCI compliance with our tokenization APIs

It all begins with the cloud, where applications are accessible to everyone. Therefore, a user or an attacker makes no difference per se. Technically, encrypting all data at rest and in transit might seem like a comprehensive approach, but these methods are not enough anymore. For cloud hosted applications, data-at-rest encryption does not provide the coverage one might expect.

John Marcus

Senior Product Owner

const protectedForm = 
pvault.createProtectedForm(payment Div, 
secureFormConfig);
No items found.
Thank you! Your submission has been received!

We care about your data in our privacy policy

Oops! Something went wrong while submitting the form.
Submit