Companies nowadays offer all kinds of services that use your personal data in order to provide personalised customer experiences. But to what extent is your personal data safe?
Systems can be breached and, when that happens, the data that was supposed to be private goes public and can result in all kinds of nefarious consequences.
The Ashley Madison website hack was the most recent case of private data leakage becoming a current talking trend - everyone wants to know who the website users are, what they desired and what they did or didn't do.
BinaryEdge, being a data and cybersecurity company, is naturally very interested by the topic of privacy (internally we have a motto "Data is the new oil, privacy is the new currency").
In this article, we provide an analysis to the Ashley Madison data that uses data science to answer questions without putting in jeopardy the privacy of the users of the A.M. website.
On the Ashley Madison leak, there were multiple ".dump" files. Each of these was a different table on the database.
The following is the description of the tables:
Source of image:
Our focus for this article was on the am_am_member table as the aminno_member table felt very incomplete as many of the fields were essentially filled with NULL.
We also tried to focus on data points that have not been constantly analysed by others, such as the email domains. You can look at our previous blogpost for a good looking visualisation of some data.
Limitations and assumptions
Some were placed by ourselves and some were put on by the data or actors.
All of our analysis must not expose the users of Ashley Madison (at BinaryEdge privacy is of outmost respect and we do not condone the actions that were performed against the Ashley Madison website).
We knew from reading other analysis that there was the possibility that a lot of data belonging to the female sex was tainted (this was later verified and confirmed by looking at the emails and seeing that for a sub-set of the female accounts the "ashleymadison.com" domain was used).
On the ip addresses fields quite often the ips would be the ones for localhost or internal ip addresses (this also supported our previous assumption of women accounts being fake).
Below you will find the Ashley Madison growth over time across different regions:
We can observe that North America recorded a slow growth until approximately 2008, date at which this growth exploded and the website started to become more popular. On other regions however, the website was pretty much unknown until around 2010, when it started to gain traction among users of other regions.
Lets get to know the community that was active on Ashley Madison
The first question that was posed was what is the percentage of men vs women?
This data means that there is a ratio of about 1 woman to 6 men on the Ashley Madison database.
Another important piece of data about the users of Ashley Madison website is the breakdown by age/region:
We can observe that the youngest users are from Oceania, more precisely from Australia and New Zealand.
We also found the average age for the males on the Ashley Madison website is 39 and for females 37.
There are also a couple of introductory pieces of data that are good to model so we get an idea of the physical characteristics of the users of the Ashley Madison website.
This is the weight, height and body type distribution of users:
As one can observe, an average male user claims to have 70-80 kg weight, a height of 1.70-1.80 m and an average/medium bodytype. An average female user claims to have 50-60 kg weight, a height of 1.60-1.70 m and also an average/medium bodytype.
One other variable on the database was the ethnicity:
Although almost half of Ashley Madison users claims to be white, about 34.4% preferred not to reveal their ethnicity.
It is also important to look at the breakdown by World regions:
It's clear that the majority of the Ashley Madison users are north americans.
One interesting field was the one that described the current "type of user":
In this graphic we can see that there is a much higher rate of "attached male seeking female" when compared to all other types, with the lowest type being "male seeking males" the LGBT community. In the Ashley Madison website, for every 1 LGBT community member there are about 31 straight members. The LGBT community also constitutes 3.2% of the entire Ashley Madison community.
A deeper look into the different choices of the users
We found what the different members were "Open to" on the Ashley Madison. Here we look at the differences between single people and "attached" people, as one theory we have often read is that people go on these websites "when they are married and their partner is not willing to try different things" (this was read multiple times in some of the sentences when generating the word clouds for our previous blogpost).
When the datascience team first generated these visualisations, they sent them to some members of the company to try and understand how different people were reading these. We would like to try and do the same with our readers. Please direct your opinions of these visualisations to the
comments section and tell us what exactly do you extract from this data.
Internally we heard many different things such as:
"Asians are selfish"
"Aww Asian man are the most romantic" (bubblebath for 2 in Single Males)
"Aww Single males like kissing more than attached males"
And it went on and on.
The "Open to" also paints some interesting pictures when distributed by countries (in this case we picked only European countries).
One interesting comparison is Single males vs Attached Males on the "Nothing kinky" choice. Attached males seem more open to try kinky things than single ones.
It's interesting to see how females in Luxembourg have some very specific choices.
Another perspective to look into, is crossing certain aspects of the "open to" variable (without targeting a specific gender) like the ones seen here:
There is a good balance amongst the ones that like being or giving something to those that like receiving it.
Aside from preferences of "Open to", users could also select "Turns me on" and "Looking for" preferences. Below we can see these preferences of all users by world region.
“Sense of humor”, “Good personal hygiene”, “Disease free” and “Discretion/Secrecy” are what most excites the Ashley Madison users. Another curious observation is that Asia “turns me on” differ from other countries, since they prefer “Girl next door” and they aren’t big fans of “body piercing”.
The top "looking for" preferences are "Travel" and "Music Lover" for all world regions. On the other hand, most Ashley Madison users are not fans of "On-line Games", "Cards" and "Opera". Furthermore, as an opposite to the other world regions, Asians seem to enjoy "Karaoke".
Another curious piece that was also released was the transactions of Ashley Madison. Here is a geographical distribution over-time of transactions. We decided to plot these on a map over time:
The data leaked from the Ashley Madison "hook-up" website presented itself as a very interesting piece of information. While it's important to remember that this analysis was based on data that any user could easily falsify, there are still some interesting observed aspects that are worth mentioning. For instance:
- The majority of the Ashley Madison users are North American attached males seeking females.
- The top "open to" preference among all Ashley Madison users is oral sex (both giving and receiving).
- Asia is the world region that exhibits more differences on the sexual preferences when compared to other world regions.
This kind of analysis, while it's very interesting in a sociological point of view, has as its main objective to remind companies to be very careful with how they take care of their users data and to remind users to be even more careful with which data they give away and to whom... because once certain pieces of data get leaked, anyone can see, take advantage or spread it at will.