Why shouldn’t security teams solve data protection alone?
The short answer is – because they can’t. But let’s examine what’s going on and why that’s the case.
We’re actually seeing more and more of a shift left (more on this below) toward developers. That is, privacy and security business requirements come all the way down to the developers of the products that companies build.
- Maybe that’s because people are starting to understand that privacy is more of a priority than before, given the tailwind of the GDPR and CCPA – and it affects product requirements and, therefore, also engineers.
- Another way to put it – The GDPR and CCPA requirements start to trickle all the way down to the architecture of products, and only the developers can actually implement it – no one else.
- Security engineers are normally a fraction of the size of the software engineering groups. They are outnumbered 1 to 50 (security engineer vs. software engineers). Then, obviously, the security engineers chase the software engineers, and their throughput is limited.
- Developers run much faster than anybody else in the organization. The whole software development industry is about the velocity of CI/CD, one-click deployment to production, etc. Naturally, security engineers are left behind.
- Security engineers have limited access to production environments.
- Cloud environment changes, like deploying yet another data store while not having privacy and security awareness, make security engineers’ work much harder.
The bottom line is that security organizations are too small and are limited in R&D power. And given that software should (must, considering today’s standards and threats) be built by design with more security and privacy using the best practices – developers must take part in it.
Wait, but what’s shift left?
It means doing something from the beginning of a process. For example, when you develop software, you’d normally start with writing down a product requirements document and then design its architecture. In that process, you would want to take into consideration security and privacy requirements as early on as possible. That’s also called ‘by design.’ Basically, if you think of a Gantt chart, then left is the beginning of the project in the timeline axis. Shifting left also means, in our case, that the developers are part of the security and privacy efforts and take responsibility for it too.
Why are developers the new data protectors?
For homegrown applications in production environments, there is nobody else who can take care of the security and privacy requirements but developers. If companies wish to protect sensitive data and mitigate breaches, only the engineers can embed security and privacy measures. Security organizations may have engineers too, but normally they aren’t the ones building the product.
What data is sensitive? Are there any special requirements around it?
It’s really up to the business use case, and it varies between companies and industries. Generally, as we focus on security and privacy, PII becomes an important part of sensitive data. Usually, payment and health information is also important. Sometimes, the mere link between a person and an organization can be sensitive in itself. For example, a person that has signed up for a diabetes newsletter should be classified as sensitive or confidential, or the reason for a business trip can be confidential information and thus classified as sensitive too.
Data requirements from GDPR might be relevant (learn more here), like having retention policies of when to delete such sensitive data and honoring a user’s preferences about their consent, or lack thereof, with processing or sharing their data.
What’s wrong with how it’s protected today?
- Generally speaking, it’s not really protected today. There’s no awareness of security and privacy requirements for sensitive data. What is shaping this understanding is the push of regulators toward privacy and data protection, which is a big part of it. Only a few big companies are doing ‘privacy by design.’ But otherwise, sensitive data sprawl is a huge problem, as systems aren’t built to protect the data and use it responsibly. They are built to work with the data as fast as possible and make it accessible and monetizable. But by building privacy-aware systems, you can achieve all business objectives while also preserving individuals’ rights.
- Technically, even utilizing SQL column encryption requires a lot of work and expertise, and then it limits what you can do with the data (searching over it becomes super slow).
- For example, data masking and the granularity of accessing data stores aren’t designed to meet today’s needs. If someone manages to compromise a web server, then they will pretty much be able to fetch all the data from the database instead of only partial information. Think of a social security number or credit card number that might be read in full from the database that is now exposed to attackers. Instead, you should want to read only the last 4 digits as a web server, so even if the server is compromised, the data won’t be.
- As security experts, we don’t think of ‘encryption at rest’ as a practical mitigation against attackers, as most attacks are not about stealing a hard drive physically. It’s more about accessing data through the systems, which renders ‘encryption at rest’ not effective at all.
What are the best practices to protect sensitive data in apps?
We wrote an entire article about this, with the example of working with a particularly sensitive piece of data – social security numbers – check it out here.
But just to name a few techniques: granular access controls, native data masking (at the datastore level), field-level encryption, auditing of accesses, omitting PII from data warehouses or from HTTP GET params (that might get logged where not), making sure you don’t log PIIs in plaintext inside your application, and more.
Why do I need to tokenize data?
Tokenizing sensitive data can be critical in the right situations and really be a business enabler. But first, let’s understand the value of doing it. It allows you to work on data without revealing the person behind it while having the need, from time to time, to identify a specific person.
Suppose you need to manually analyze data for fraud detection, but without revealing who the data belongs to – so no one person can be identified by examining the data alone, therefore persevering the individual’s privacy rights. However, sometimes you would still need to contact a specific person from that (tokenized) data. Tokenizing the identifiers before they go to the data warehouse is a great use case for making sure data scientists and analysts, or anybody else, can’t see to whom the data belongs while being able to reverse it in specific situations – like finding out who that person is in order to contact them.
When would I want to remove some PII, and why?
We’re talking about pseudo-anonymizing data, so once it’s done, the data can’t identify the data subjects (the persons behind it).
Therefore, tokenizing or omitting some data can become a business enabler. For example, many companies allow data scientists to access data warehouses that also contain sensitive information such as PII, so the people behind the data are potentially recognized – effectively, not preserving their privacy.
Handling Sensitive Data
If you want to learn more about why developers are the new data protectors, please watch the webinar with our friends from Permit.io.