Your privacy is important to us, privacy policy.
Sometimes we hear companies collect sensitive data (like social security numbers, credit card numbers, health information, PII, KYC info, etc) but they store them in plaintext in databases. Oi vey. Nevermind now the business justification for whether collecting them or not is the issue to begin with.
In this blog we will talk about relevant attack vectors for web applications storing sensitive data, and we will go over the various layers of the tech stack. We will focus on application data security, and not general security. When building your own backend architecture, it’s better to know the relevant attack vectors and how to defend against them.
Application Security Essentials
Application level data security is the practice of protecting sensitive data from unauthorized access, use, modification, or destruction within a software application.
We must have understanding and awareness of security in order to build a secure system. It all starts with security by design.
Security by Design
Designing a robust architecture is a challenge. While our goal is making sure it’s harder to penetrate our system, we have to to understand the threats and attack vectors involved when designing a web application.
In a nutshell, security-by-design is the planning and implementation of a foundationally secure system, one that is harder to attack and damage. And that’s exactly what we’re going to explore in the next sections.
Security as a concept has a few principles that affect implementation of the system at the code level, as well as the component-design level.
Throughout the rest of the article we will keep on providing more important security by design principles.
Never Trust Any User Input
Never trust the input received from any client. The input can be malicious and compromise your system. Sounds like sci-fi? Not really - a single SQL-injection (read more below) can take down your system. And that’s only one example, there are so many, and we will go over them here.
Over the years, I learned from talking with engineers who are non-security savvy, that they don’t get what’s so special about this important principle, as they are the ones that designed and implemented the client application or web app as well. And then I realized that the main difference is that as security specialists we assume: ‘no trust’, and they assume ‘full trust’ (because they built both sides). And this is the barrier between a secure system or not.
The point is this, any one can impersonate your client and send you malicious payloads that can cause harm to your system or steal customer data.
It’s your job to sanitize the input, or validate the input, or filter it, or whatever is necessary to make sure the rest of the system can trust it after it is received in the end point. Sometimes it’s technically very simple, but if you don’t know that you need to do so, you might leave the door open for intruders.
Remember: All sanitization and validation must happen in the server side! Otherwise malicious clients can bypass your defenses, if any exist.
Here’s real story that happened with a flights booking web app. You choose a flight and proceed to check out to pay for it. However, the price for the check out page is received from the client side (from the HTML hidden input params, because that was easier do develop it this way for the developer) and it’s not enforced by the server at all, a moment before charging the credit card. Thus, hackers can now buy flights for $1. Check your input, never trust it!
Security Awareness
You just can’t assume that nobody can learn (reverse engineer) your protocols or non-documented-yet-public APIs and use them against you! That’s false!
Security-by-obscurity is the number one enemy of security-by-design. Probably you know the ‘trust me, I’m an engineer’ t-shirt, so “trust me, I’m a security specialist” here.
Once you realize this assumption and fact of life, then you will change how you look at your system and code forever. You will start to verify all inputs that get to your software, whether it’s endpoint APIs, files that need to be parsed, network protocols, or just anything coming from the outside world.
Storage Level Security
Data Encryption at Rest Isn’t Effective
Data encryption at rest is a security measure that is often used to protect sensitive data from unauthorized access. It is particularly effective at thwarting physical access attacks, such as when an attacker steals a database server or hard drive.
Unfortunately, today it is still one of the most required ‘check the box’ security features for compliance. However it’s only misleading people to think their data is protected, while in truth it’s not adding much anymore. It is provided as a default security feature by databases hosted by cloud providers.
When data is encrypted at rest, it is scrambled using a cryptographic algorithm by the database engine itself. This makes it unreadable to anyone who does not have the decryption key. E.g. if an attacker steals the hard drive and accesses the database files directly, they will not be able to read the data without the decryption key.
The technical reason of why this protection mechanism is generally weak is because every time the database engine reads data from the hard drive, it will automatically decrypt the data and give it back to the calling user (e.g. a web application).
This security feature is needed, but definitely not sufficient. It is important to note that data encryption at rest is not a complete security solution. Attackers can still gain access to sensitive data by exploiting vulnerabilities in the database software (less likely, but happens rarely) or by compromising a database administrator account (who can still access the data), or through web applications.
Therefore, it is important to implement other security measures too, especially application-level encryption that can completely mitigate this weakness.
Database Level Security
Databases are servers like any other serving components in the tech stack. They have vulnerabilities themselves and they might require upgrades from time to time too. But we want to concentrate on what really drives attackers into getting your customer data.
Credentials Theft
Obviously, if someone obtained credentials of the database it used to be a game-over (but not anymore, keep reading). It will be possible for the attacker to read all the data, or even modify and destroy it. Not to mention, using default configuration or opening the database to the world. There used to be tens of thousands of MongoDB databases open to the public with real customer data, researchers say.
But it doesn’t have to be the case, if we use application-level encryption techniques, and reduce permissions of the users (e.g. no user should be able to delete a table).
Network Level Security
Firewalls, web-app firewalls (WAFs), API-security - are all required technologies. But they are not going to stop attackers from trying to break into your web application. Public APIs are designed to let untrusted payloads into your systems.
Networks have huge attack surfaces, starting from the public web application connected to the internet and ending in the backend system’s private network. Each requires their own threat modeling and isolation as much as possible.
Network Communications Between the Web Application and the Database
A man-in-the-middle (MITM) attack on the connection between a web application and the database is a type of attack in which an attacker intercepts and manipulates communication between the two systems. This can allow the attacker to steal sensitive data, such as user credentials and credit card numbers.
MITM attacks on database connections can be carried out in a number of ways. One common method is to use DNS poisoning or ARP poisoning, or being able to sniff the communications of the existing TCP channel.
Normally, this channel isn’t secure by default in private networks supplied by public cloud providers.
There are a number of things that organizations can do to protect themselves from MITM attacks on database connections. One important step is to use a secure TLS connection between the web application and the database. TLS encrypts all traffic between the two systems, making it much more difficult for attackers to intercept and manipulate the data.
Make sure your TLS is configured correctly in your SQL driver and connection string.
Application Level Security
Application vulnerabilities are the worst, as there is no good way to automate their detection. Unfortunately, they rely on the skills of unaware developers, and writing secure code is hard. One way is to design a secure architecture (with security by design principles), that will reduce the collateral damage if someone manages to break in (they wouldn’t be able to hop from one service to another so easily), and the other is the quality of code, which might have holes.
SQL Injections From Malicious Web-Application End-Users
“Over the last year, 5 percent of organizations had at least one exploitable SQL injection vulnerability.”
According to a recent report by Datadog SQL injections are still prevalent in businesses.
SQL injection is a type of attack in which an attacker inserts malicious code into a SQL query. This can allow the attacker to steal sensitive data, modify data, or even destroy the database.
SQL injection originates in the fact that accessing SQL programmatically is done by sending commands in text. Most of the time the web application will concatenate strings and send that final string to the database engine. An unaware developer will take an end-user’s input and add it to a pre-made query string template, without sanitizing or dropping the input. Given the implicit trust between the web application and the database, it is now susceptible to unintended operations over the database.
sql = "INSERT INTO users (name, email) VALUES (%s, %s)" % (customerName, customerEmail)
db.cursor().execute(sql)
Once the attacker has injected malicious code into the SQL query, they can steal sensitive data, modify data, or even destroy the database. For example, the attacker could steal user credentials, credit card numbers, or other sensitive information. They could also modify data in the database, such as changing customer balances or deleting records. In some cases, the attacker could even destroy the database, making it unavailable to legitimate users.
Sanitizing and validating user inputs is #1 action against attackers regarding input for SQL queries, it’s so easy, yet overlooked often. Or alternatively, use prepared statements. Nevertheless, you may not want to store any character or payload that is given to you as is. (If you wonder why, to make a long story short - imagine you use the customer name in HTML pages and show it unfiltered to other users, now an attacker can inject <script> tag to your pages and steal their cookies).
Missing Access Checks (OWASP’s BOLA/IDOR)
IDOR/BOLA attacks are a type of attack in which an attacker exploits vulnerabilities in access control mechanisms to gain unauthorized access to sensitive data. IDOR stands for insecure direct object reference, and BOLA stands for broken object level authorization. They are defined by the OWASP foundation as a top vulnerability type for 2021.
These attacks can be carried out in a number of ways. For example, an attacker could exploit a vulnerability in a web application to modify the object ID of a resource. This would allow the attacker to access the resource even if they do not have the proper permissions. Imagine, this is possible sometimes by just changing the relevant query parameter in a URL, no more. Like in the example below, you can be logged in to a system, and change the resourceId to another arbitrary resource (by brute-forcing, or running numbers sequentially) and the system would return some information that belongs to another user, because the developer didn’t cross-check the ownership between the active (session) user and the requested resource.
https://example.com/action.php?action=read&resourceId=100
The confusion of accessing resources belonging to other users is very dangerous and normally the root cause is lack of appropriate access checks. This type of attack is very popular and susceptible on all public APIs facing mobile or web applications.
Mitigating this one across the board is hard. A data-access-layer (DAL) might be required to do it in a fully covered way (a la defense programming). But, anyway, employ object level security techniques when accessing the data.
Blocking malformed requests between users in your APIs is tricky, because it's not enough to do it in the application code, if you want to hermetically mitigate this attack. In the next attack, we will explain why.
BTW - our data protection APIs have a mechanism to check data ownership at the right place in the code.
Remote Code Execution (RCE)
Remote code execution (RCE) is a type of attack in which an attacker gains the ability to execute arbitrary code on a victim's machine - a web application. Basically, gaining complete control of the machine or application. They can run in the same privileges of that system and thus access everything accessible to it by default. Therefore, it’s considered the worst of the attacks. This can be done by exploiting a vulnerability in a software application itself or the operating system running the server.
In order to achieve RCE, the exploitation can be very simple (like when attacking web servers like PHP, Ruby, etc) or other vulnerabilities that require exploiting proprietary servers in hardcore binary form (assembly, shellcode, etc).
It’s true that RCE’s are unlikely to happen so easily, but they can also originate from using open source libraries or other frameworks. For example, they can happen because a developer trusted the end-user’s input and used eval() on it in the server for some reason, to show the dumbest form of RCE. Or a local-file-inclusion (LFI) attack was possible on your code to run arbitrary code in your web server.
It can also be due to weak passwords in SSH. Nowadays there are many crypto-miner bots scanning the internet for accessible shell servers and trying to guess their passwords. Something like that should never happen in production environments, but you will be surprised.
There’s no easy way to mitigate RCE’s in one move, as there are so many attack vectors in complicated systems. However, developers must consider using one of the most important security principles: least of privileges. That is, to reduce the permissions of what the web application can do (with data). This, in tandem with (correctly implemented) object level security can result in attackers not able to read much customer data from an operational database. Another powerful technique is to read masked data. When reading fields from the database, make sure the permissions allow you to read only the masked version, thus nobody can bypass that, and sometimes it’s sufficient in terms of displaying it in that form to the end-user.
While we understand that it's hard to immune your system again RCE, because eventually, no software is bug free, and we can't prove that it's bug free by nature of how software works. Not to mention the amount of open source and packages and other supply chain attacks we experience today. Then we're left with sticking to other said security principles of 'least of privileges' and 'segregation of duties' - both mean to effectively reduce access to resources from a target component. So if attackers managed to compromise it, they can't, for example, access all data in the database to exfiltrate all end-users' information.
Zero Trust Architecture for DB & Web-App
No, it's nota buzz word. In order to deal with unknown attacks on your system, the less you trust and the more you verify in run time will make it safer. To portray a good example, imagine that an attacker managed to fully execute arbitrary code on your web server (with an RCE or even a simple SQL injection). And normally, all developers have one SQL user that can execute any SQL query to read/write data from the database. Granted, by weak and unaware security design, there's a strong implicit trust here between the web-app and the DB. The DB will just serve whatever request it gets from the web-app, and it won't stop attacks and collateral damage.
Therefore, here's what you can do:
1. The connection between a web-app and a DB, should be first hardened by reducing ACLs and permissions of what operations the database user can do (e.g. removing deletion permission, etc).
2. Use data-masking originating at the DB (cannot be bypassed this way) for the most sensitive data columns (like social security numbers).
3. Use row level security to limit what data rows the web-app can read.
4. Use JWT token to limit what data rows the web-app can read, so an end-user can only ever access their data rows.
Protecting Data In Use
Use data tokens instead of the real data when appropriate and possible. Sometimes even in runtime, unique personal identifiers can be fully replaced by a deterministic non-sensitive string (aka a data token). So a 'SQL joins' query over two tables can be used with these non-sensitive tokens which practically replace the original end-user identifier (a social security number, email, phone), without ever the need to fetch the real data. Configuring the DB user permission to not being able to fetch that sensitive column from the database. Thus, practically, reducing exposure to sensitive data in a specific component in the architecture!
Mitigations Must Be Implemented In The Server
Remember, everything has to be enforced in the server side of things. The client cannot ever be trusted, because any body can be a client side (attackers will reverse engineer your code and learn your protocols even if they're undocumented, not to mention understanding REST APIs is super easy today).
Going back to our zero-trust architecture with connecting to the DB. If mitigations are not done by the database engine itself, an attacker will be able to run code and use the SQL connection directly to query for whatever they wish. In other words, if the mitigation is implemented in the web-app directly then it breaks our mitigation, it's becoming bypass'able!
In every design, we need to understand the who's the client and who's the server. Sometimes our web-app is the client, when we discuss about it in relation to a database. And therefore, if we want to limit access and exposure to sensitive data, the web-app can't be fully trusted!
In order to reduce trust to the web-app, all permissions it uses should be lowered as much as possible. Being able to query for whole table with a single SQL query is not cool anymore, to say the least. And obviously to strip down deletion and schema modification permissions iss non-negotiable.
The problem is that SQL databases don't provide this functionality out of the box in an easy way, and it might require you to develop it on top of it, and that doesn't scale when using multiple data store technologies.
Our data protection APIs support everything that is needed to really harden against the worst attacks. We support validating JWT ownership inside our engine.
Log Leaks
Another form of application level privacy issue is debug logs that make their way to production environments. These logs are very dangerous because they can leak sensitive PII data of customers to other systems with bigger exposure. Or they can break the scope of containing such sensitive data. Imagine the following line of code inside your application:
logger.log(“Social security number of user (%s): %s”, userId, userSSN);
And the log is being picked up by a Datadog APM (application performance monitoring), and now all employees having access to datadog are exposed to this data.
This requires training developers and raising awareness about not logging customer data in production (or some means to find it, if it’s forgotten in the code). There are automated ways to detect such leaks, like using our very own Piiano Flows.
Operating System Level Security
Everything is running on top of operating systems, the question is whether it’s transparent to you or not - not whether you care or not. If it’s not transparent, then you will have extra legwork to do, to make sure your system is always up to date, and it’s surprisingly hard to do that in scale.
Patch Management
Operating systems are another important consideration, although they may be less relevant for fully managed software stacks like serverless/lambda or machines run by cloud providers. Operating systems still need to be maintained from time to time, whether it's to update the server that runs your application on top of it (such as Apache or Tomcat) or to update the OS itself.
In poorly designed systems, updating these layers can require downtime, which can lead to legacy systems that are out of date with security patches. These patches are essential for hardening the OS and closing vulnerabilities that are being exploited in the wild. Use modern tech stacks and this issue will vanish immediately.
Conclusion
Data breaches are hard to defend against. There are so many vulnerabilities that can eventually let attackers find their way into your customer data. New code, new bugs. However, with the right design, implementation and knowledge, it’s possible to build robust web applications that are immune to data theft, but that requires resourcefulness and prioritizing security engineering.
With our APIs, you don't need to be a data protection expert, you can start protecting your customer data in the backend by using our data protection APIs, sign up here to create your account now.
Note how data encryption is not enough by itself to thwart data theft. Controlling data is done by many levels and therefore requires stacking various protection mechanisms.
That’s why our Piiano Vault, designed our data protection APIs to make it all very easy for all backend and application developers, agnostic to your tech stack and databases, and something that can really solve all those attack vectors.
It all begins with the cloud, where applications are accessible to everyone. Therefore, a user or an attacker makes no difference per se. Technically, encrypting all data at rest and in transit might seem like a comprehensive approach, but these methods are not enough anymore. For cloud hosted applications, data-at-rest encryption does not provide the coverage one might expect.
Senior Product Owner