Unfortunately, data breaches have become very frequent. In the first 6 months of 2019, there were more than 3,800 violations-an estimated 4.1 billion records, an increase of 50% in the past 4 years.
So what is a data breach? The GDPR defines it as “violating security regulations, leading to accidental or illegal destruction, loss, alteration, unauthorized disclosure or access to personal data transmitted, stored or otherwise processed”. In other words: unauthorized outsiders access private data at any time.
As a small step to make our data more secure, we decided to find out the types of email addresses that most frequently appear in data breaches. When choosing an email address, can you do something to make you more vulnerable to damage? For example, is a long or short email address safer? What if you include numbers or your name in the address?
Email violation trend analysis header image
In order to explore these issues and more, we randomly selected 212,000 samples of public email addresses through the free resource “I was Owned” (HIBP) to determine how many of them were compromised by the data breach, and how often they were leaked. How many.
Organizations such as Mozilla and 1Password trust HIBP to verify violations. From there, we analyzed the results to identify trends such as e-mail providers or countries/regions related to e-mail. We also looked at some of the more general issues, such as the comparison of the average number of violations of long and short email addresses. Read on to learn more about what we found.
Compare email domains: .Com, .Org, .Net, etc.
The image shows the email domains most likely to be compromised
First, we compare email addresses by domain to determine the percentage of each vulnerable email address. We found that .com email addresses have the highest percentage of illegal addresses (80%) and the highest average number of violations (33).
The second highest e-mail address is .uk address, followed by .ca, with 63% and 59% of e-mails being violated respectively. The .us domain has the least number of violations, at 29%.
Compare email providers: AOL, Gmail, Hotmail, MSN and Yahoo
The chart identifies the email providers most likely to be compromised
We looked at the 5 largest email providers: AOL, Gmail, Hotmail, MSN and Yahoo.
So far, the AOL users in the sample have experienced the most violations, and almost all AOL email addresses in the sample involved at least one violation.
Please note that AOL is 20 years earlier than Gmail, which may explain the distribution.
Gmail is the least affected by violations, with about three-quarters of email addresses involved in violations.
Frequency of occurrence of each country/region in the data breach
The bar chart shows the most likely country domain names to be compromised
We analyzed the domain names of 10 different countries/regions: Australia (.au), Canada (.ca), China (.cn), Colombia (.co), Germany (.de), Ireland (.ie), New Zealand ( .nz), Singapore (.sg), United Kingdom (.uk) and United States (.us).
Emails from German domains have the highest percentage of vulnerabilities (64%), followed by emails from the UK (63%). Emails from Chinese domains have the lowest percentage of violating emails, only 8%, and the average number of violations is only 0.36.
Long emails or short emails: Which emails are more likely to appear in the breach?
For the background, the average email address length of the sample is 9 characters, and the standard deviation is 3.6; therefore, we set the standard range of email address length to 5 to 12 characters.
Email addresses with 1-4 characters are classified as short, and email addresses with 13 or more characters are classified as long. We found that short email addresses were breached more frequently, 71% of which were breached compared to 62% of long emails.
Based on this, we can conclude that short emails are more likely to appear in vulnerabilities than long emails.
Male and female emails: Which ones are more likely to appear in violations?
The graph shows which gender emails are more likely to violate
What about men and women? Which gender email is more likely to appear in violations?
Approximately 55% of the emails in the sample contained names, and we used a list of the top 1,000 male and female names to determine which category each email belongs to.
Analysis of the collected samples showed that email addresses with male names are more likely to be leaked, but the difference is only 1%.
Similarly, the average number of breaches of male and female email addresses is almost the same-an average of 26.31 times for men and 25.45 times for women.
Further studying the e-mail addresses containing names in the sample, we also found the first names, which have the highest e-mail leak rate and the highest average number of leaks. The percentage of violating emails indicates the percentage of emails containing the name in which there was at least one violation.
The average number of violations indicates the average number of violations for each name. We calculated these numbers for the most common repeated names (or parts of longer names) in the sample.
Email addresses named “angels” have the highest percentage of violations, almost 95%. Those named “Nehemiah” had the highest number of violations on average, at 180.
Please note that these results may suffer from statistical data mining and may not indicate any causality.
Role-based email is most likely to appear in the vulnerability
The graph shows which roles have the most emails in violations
Email addresses such as info @ or admin @ are usually used as general accounts for businesses. Usually, multiple people in the organization can visit them on a regular basis. Does this make them more likely to violate the rules?
Overall, 62% of e-mails containing one of the positions or categories were destroyed, while 70% of e-mails without a position were destroyed.
The role-based email addresses support @ and admin @ have the lowest percentage of violating emails, both below 25%. Among role-based emails, the email with the highest violation rate is “acceptance rate@”, and 90% of the emails are violated.
Presence of numbers in email addresses
Chat investigates the risk of violations of including numeric strings in emails
Finally, we explored how the presence of numbers in email addresses affects the percentage of emails that are leaked. We found that about 94% of emails with numbers have been compromised, while only 65% of addresses without numbers.
What is even more strange is whether well-known numeric strings such as 69, 420, and 123 affect your likelihood of violations? When looking at our sample, it did. Almost all email addresses (with one of these 3 numeric strings) included in the sample are involved in violations, and the average number of violations exceeds 40.
in conclusion
What does all this mean to you? Many insights can be drawn from our research on how to design the “most secure” possible email address.
For example, the next time you set up an email account, you can avoid including numbers in your address, especially common numeric strings like 123. Although we cannot say with certainty, this may help reduce your chances of participating. Data leakage based on research results.
We believe that given the significant risks of data breaches, the powerful insights provided by this research will persuade global Internet users to be more cautious in how and where to share email addresses.