Your privacy is important to us, privacy policy.
Data security is a key priority for organizations of all sizes. The data we depend on is more vulnerable than ever because business and personal information is constantly being transmitted, stored, and accessed - exposing it to potential risks.
Companies and organizations must defend and protect their data from potential data leaks and breaches.
Just this month, General Electric confirmed that it had started an investigation into a data leak that exposed information related to confidential military projects that the company was working on. Earlier this year, Duolingo, the popular language learning app, suffered a big data leak that exposed the email addresses, usernames, names, and phone numbers of 2.6 million of its users.
Bahrain's flag carrier Gulf Air disclosed being impacted by a data breach that may result in the exposure of client database information. They are not even sure of the damage yet. In a different incident, more than 27,000 New York City Bar Association members and employees had their information exposed following a data breach of the organization's systems.
Companies face increasing pressure to be accountable for the data they collect and handle. In addition, people are becoming more aware of their privacy - the data they share and the potential risks associated with it, and want to keep their data protected.
Recent regulatory updates from the U.S. Federal Trade Commission and Securities and Exchange Commission have brought significant changes to the way businesses protect data. Their focus on transparency and stricter security measures have important practical implications for data storage, transfer, and use.
By recognizing the early signs of a data leak or data breach and taking proactive steps to prevent them, companies can protect their valuable assets and keep their customer’s information secure.
This article will explain the differences between a leak and a breach and will provide guidance on how to best prevent them.
What Is a Data Leak?
Data leaks are a prominent concern in the technology industry. Simply put, sensitive, protected, or confidential information that is exposed unintentionally outside its intended environment, without anyone breaking in the system. It occurs when valuable data, which might include personal, financial, or proprietary information, is exposed by mistake to individuals or entities who should not have access to it. Data leaks can be the result of human error, system vulnerabilities, faulty code, or misconfigured security settings.
Data leaks are commonly caused by organizations inadvertently exposing sensitive information without being hacked or breached - the attacker pulls the data from being on the outside, or the system sends it externally to the wrong recipients.
A few weeks ago, a new report revealed that the Microsoft AI research team accidentally leaked 38TB(!) of the company's private data. This happened when a Microsoft researcher shared a URL in a public git repository. The exposed data included full backups of two employees' computers. These backups contained sensitive personal data, including passwords to Microsoft services, secret keys, and more than 30,000 internal Microsoft Teams messages from more than 350 Microsoft employees.
Businesses face a myriad of repercussions when a data leak occurs. It can damage customer trust and loyalty, resulting in lost revenue, financial losses, and a tarnished reputation. On top of all of this, those companies will have to invest many resources (manpower, time, money, etc.) to investigate the leak and repair the damage.
What Is a Data Breach?
A data breach is any security incident in which unauthorized parties act to gain access to sensitive data or confidential information, including personal data (Social Security numbers, bank account numbers, healthcare data) or corporate data (customer data records, intellectual property, financial information). It always starts with an attacker first breaking into the system and then stealing the data.
The term Data Breach is often used interchangeably with the term Cyberattack, but not all cyberattacks are data breaches, and not all data breaches are cyberattacks.
The consequences of a data breach can be significant and wide-ranging (from financial losses to loss of customer trust). That is why in response to the growing threat of data breaches, organizations are increasingly implementing strong security measures, training employees on best practices, and regularly conducting risk assessments.
The cost of mitigation, including investigating the breach, providing credit monitoring services to affected customers, and implementing stronger security measures, can be exorbitant. According to IBM's Cost of a Data Breach 2022 report, the average cost of a data breach in the United States is USD 9.44 million (83% of organizations surveyed in the report experienced more than one data breach). This includes costs such as customer notification, reputational damage, regulatory fines, legal fees, and serious legal consequences, particularly when it comes to the personal information of customers or employees.
The difference between a Data Leak and a Data Breach
While data leaks and data breaches may seem similar, there are significant differences between the two. Understanding these distinctions is crucial to addressing these security incidents and mitigating their potential impact effectively.
While data leaks and data breaches both involve the unauthorized exposure of data, the cause of the exposure determines whether it's a leak or a breach. In general, a data leak is usually an accident, while a breach is often intentional and malicious. Both can happen by a deliberate attack on a system. However, it depends whether the attacker broke (hence ‘breach’) into the system, or managed to only pull data (hence ‘leak’) while being on the outside.
Data breaches are usually more sophisticated and intentional, with hackers aiming to steal, manipulate, or exploit sensitive information for personal gain or malicious purposes. These attacks can occur through various methods, including phishing attacks, malware, or exploiting vulnerabilities in network systems.
Prevention, prompt detection and response play a critical role in minimizing the impact of both data leaks and data breaches.
Where do Data Leaks and Data Breaches come from?
Several causes can potentially lead to compromised data and the severe consequences that follow. Understanding these causes is vital in preventing such incidents.
1. Human Error: The Costly Consequence of Imperfection.
According to a study by IBM, 95% of cyber security breaches result from human error.
Employees are constantly bombarded with external noise that can divert focus away from critical tasks, resulting in oversights and mistakes.
2. Social Engineering Attacks: Phishing, Baiting and Pretexting
These days we work from home and use our work computer for private activities - that is why the chances of a company being targeted by social engineering attacks (different techniques at makin a target reveal specific information for illegitimate reasons) are rapidly increasing: 98% of cyberattacks depend on social engineering, according to Purplesec).
Unlike other cyber attacks that rely on exploiting software vulnerabilities, social engineering attacks aim to exploit human psychology and trust. The most common form of social engineering attacks are Phishing, Baiting techniques, and Pretexting.
3. Ransomware Attacks: Holding your data hostage
Ransomware attacks have become a pervasive and costly threat to individuals, businesses, and organizations across the globe. This form of cyberattack involves malicious software that encrypts files on a victim's computer or servers, rendering them inaccessible until a ransom is paid to the attacker.
On October 25, for example, the personal information of employees was stolen in a ransomware attack targeting Yamaha Motor. Two weeks later, China's largest lender, the Industrial and Commercial Bank of China (ICBC), reportedly paid a ransom following a cyber attack by ransomware group Lockbit.
These attacks have proven to be a lucrative business for cybercriminals, fueling a wave of increasingly sophisticated techniques that often outpace the efforts of cybersecurity professionals. Therefore, companies must proactively protect their most valuable assets - their customers’ sensitive data. They should store, manage and protect sensitive data (solutions that apply encryption, access control, etc.)
4. Weak Password Protection: your dog’s name is not good enough
Most of our sensitive information is stored online, so companies should ensure that only authorized employees have access to sensitive data and that the wrong people will not have that access.
Using strong password protection on your data plays a crucial role in safeguarding our activity.
One of the major issues with weak password protection is the use of simple and easily guessable passwords. Using common phrases or personal information like birth dates or names of loved ones also poses a significant risk, as hackers can easily guess them using publicly available information, such as social media profiles.
Another common mistake is the reuse of un-salted (or otherwise protected) passwords across multiple accounts. A security breach on one account can compromise all other accounts as well. This practice creates a domino effect, increasing the potential damage.
Failing to update passwords regularly is another alarming issue. Many users tend to keep the same password for months or even years, making their accounts susceptible to password-cracking techniques.
5. Malicious Software: Infections in your system
Cybercriminals create and distribute malicious software (malware) to gain unauthorized access to computers, steal sensitive data, or disrupt digital systems. Common types of malware include viruses, worms, trojans, ransomware, and spyware. Once infected, these malicious programs can wreak havoc on unsuspecting users.
6. Insider Threats: Protecting Organizations from Hidden Dangers
Insider threats refer to the potential harm posed to an organization by individuals within its own ranks, including employees, contractors, or even trusted partners (Forrester predicts that insider incidents caused 33% of data breaches in 2021).
Companies should be aware of the risks posed by internal sources such as disgruntled employees or malicious employees and should have procedures in place to detect signs of insider threats and take action before a breach occurs.
Understanding the three categories of insider threats can help organizations better prepare and protect themselves:
1. Accidental insiders - individuals who unknowingly may transfer confidential data or click on phishing emails that expose sensitive information
2. Negligent insiders - employees who often take cybersecurity measures lightly, fail to follow established protocols, and create vulnerabilities within the organization's infrastructure
3. Malicious insiders - individuals who intentionally exploit their insider position to cause harm to the organization
7. Unusual Activity in the Network: A Red Flag to Stay Vigilant
Internet facing servers can be hacked and compromised. Monitoring unusual activity in the network and servers is crucial in safeguarding against potential breaches.
Another red flag is unauthorized access attempts. If company systems register numerous failed login attempts or suspicious login activities, it is crucial to take immediate action. Recognizing such attempts and implementing stronger access controls (like identity monitoring) can help thwart potential breaches.
8. Supply chain attack: A breach in the back door
This method of data breach is also known as a third-party attack, value-chain attack, or backdoor breach. It occurs when someone accesses a business’s network via third-party vendors or through the supply chain. Supply chains can be massive and complex, which is why some attacks are so difficult to trace.
Supply chain attacks are a type of attack that is often overlooked, even though it can cause catastrophic damage over time and can be more difficult to detect and prevent if the vendors aren’t maintaining strict cybersecurity policies. Such was the case with the “SolarWinds hack”, that triggered a much larger supply chain incident that affected thousands of organizations, including the U.S. government, and that went undetected for nearly 9 months.
When the leaks are part of the code
In addition to this long list of threats and problems, experienced developers are aware that faulty code can expose data to unauthorized parties, without any obvious external indicators.
Data leaks originating from the code have become a recurring issue and can cause a whole security incident. Requiring leak prevention to become a key component of any data security strategy.
These leaks occur when developers write buggy code not on purpose, obviously. Developers have high pressure on them to deliver new features and hit hard deadlines. And they are not necessarily security savvy. However, inadvertently they might write code in a certain way that might potentially lead to leaking data in certain situations. This data will then be accessible to unauthorized individuals or entities, thus impacting the organization and customers.
1. A common example is when debug lines in the code record sensitive data to logging systems. And furthermore, they are forgotten in the code base and reach production environments accidentally, now logging sensitive customer data in live systems. Happened to Github with customer credentials.
2. Another example is unintended leaks or exposure of data, such as sharing it via third party API’s. Developers using services like analytics, AI, other services without the right permissions from various stakeholders like privacy, security, legal, etc. And it also may oppose the privacy policy of the organization. Or it might lack the specific user’s consent to share her data with a 3rd party vendor.
3. And then there is a bigger problem with public APIs which a web application facing the internet is responsible for. Serving clients such as mobile apps and web pages. These public APIs might lack access checks (that the developers should have written there in the first place) that potentially lead to leakage of data belonging to other customers (known by OWASP as broken-access-controls). This issue can become more severe, because external attackers can exploit it and expose customer data as happened with Duolingo recently, which we mentioned earlier here.
4. And last but not least are the data leaks that happen when systems go crazy a bit. But seriously, when a system sends an automated email to one customer with the data of another, because of a software bug. And that’s a data leak par excellence.
The most concerning aspect is that often there are no signs or indications that a data leak has occurred - for example through logs or data shared via third-party APIs. These data leaks can go undetected for extended periods. This stealthy nature makes them an attractive method for cybercriminals seeking to exploit vulnerabilities or gain unauthorized access to sensitive information.
The problem with detection tools for such issues is that sampling data in logs or in pipelines at runtime is very challenging technologically, not to mention the high price and miss-rate.
Executing extensive code audits and penetration tests can help identify vulnerabilities before they can be exploited. The challenge is that because of the nature of data leaks it's very hard to spot them, and it requires a savvy security engineer to do it (often manually). As data breaches and leaks continue to haunt companies, the hunt is on for a failsafe method to catch leaks before they explode into costly catastrophes.
An innovative technique to discover leaks is employing automated code analysis tools. These tools analyze the codebase for coding errors and security vulnerabilities. They are designed to identify data exposure issues such as log leaks, third party API leaks, - even those that are hard to detect through code audits.
By integrating these tools into the development process, companies and organizations can detect and fix leaks in code early on, thus minimizing the chances of them causing significant problems (and major cost) later. More importantly, they can stop being reactive regarding these issues and take control of their security strategy.
By implementing comprehensive monitoring systems, companies can track application performance, resource usage, and error logs in real-time. These systems provide valuable insights, enabling teams to identify potential leaks promptly. Analyzing logs and monitoring metrics also help in identifying patterns and trends, potentially before they escalate into larger problems.
What to do? follow the data, continuously and proactively
To ensure that data security is taken seriously, organizations should have a comprehensive data security strategy in place. This strategy should encompass both preventive measures and policies for responding to an incident.
The following is a practical list of things you can do to proactively prevent leaks in your homegrown applications:
1. Monitoring systems that can identify suspicious activities and traffic behavior in real-time. Mostly still done ad-hoc in companies. An example is to enable log filtering in datadog for sensitive data
2. Regular code audits and penetration tests must be conducted to detect weaknesses or vulnerabilities in public APIs
3. Automated sensitive data code analyzers can help identify potential leaks early on in the development process
4. Additionally, have a SBOM report showing data flows and data mapping, which will help the developers understand how data is moving in their systems and how to avoid potential leakage
5. Use accessing, storing and protection infrastructure in the backend to store and control who has access to what data, and how they are allowed to use it
By following these steps, companies can really reduce the risk of unintended data exposure in their own software, and minimize the chance of a costly leak or breach and customer damage.
It all begins with the cloud, where applications are accessible to everyone. Therefore, a user or an attacker makes no difference per se. Technically, encrypting all data at rest and in transit might seem like a comprehensive approach, but these methods are not enough anymore. For cloud hosted applications, data-at-rest encryption does not provide the coverage one might expect.
Senior Product Owner