BinaryEdge - Science and Technology

Thoughts, stories and ideas.



August 2015

Data, Technologies and Security - Part 1

by BinaryEdge

A lot of technologies present themselves as solutions for multiple challenges. At BinaryEdge, we are big adepts of analyzing all the different technologies until we see what correctly adapts and fits our environment. From a security perspective, most of these technologies are not meant to be exposed to the internet.

These technologies' default settings tend to have no configuration for authentication, encryption, authorization or any other type of security controls that we take for granted. Some of them don't even have a built-in access control.

This blogpost presents a look at the internet exposure of these technologies, how they are spread around the world, what versions and other sets of information that we considered interesting.

So which technologies did we look at?

We chose these technologies because they are a sub-set of technologies that we use ourselves.


Redis is a key-value cache and store. It is a very well known and used technology. Worldwide, we found 35,330 instances that answered our requests and that didn't have any type of authentication. Redis default configuration doesn't set any type of authentication and listens on all network interfaces as stated on the configuration file:


For Redis we looked at versions, amount of current memory use (current quantity of data available for access) and peak memory use (the most data that was exposed at a certain point in time):

Worldwide Redis is found distributed as follows:


Current memory (bytes):


Peak memory (bytes):


These numbers are represented in bytes, which means:

  • Current memory is 13.2133 Terabytes;
  • Peak Memory is 17.0801 Terabytes.

That is a lot of data.

We also looked at the spread of versions of Redis that were found in the wild - this is how it looks:


Interesting facts: most used version and the amount of versions lower than 2.6 that exists. If we look at the Redis website:


On top of that we have this.

Another interesting part is the following advice taken from redis security.


For information on how to securely configure Redis please read: redis security.


MongoDB is a NoSQL database that sells itself as highly scalable, performant and agile. While we've used it in the past and can agree to certain parts of it, it is worrisome the amount of MongoDB Servers that are exposed.

Worldwide, we found 39,134 MongoDB Servers instances that answered our requests and that didn't have any type of authentication. We also found 7,267 instances that did have some kind of authentication enabled.

For MongoDB, we looked at two data use measurements. "SizeOnDisk" (which is essentially data on server with compression) and "TotalSize" (data without any type of compression).

Worldwide spread looks as following for MongoDB:


Total Size (bytes)


As usual, this number looks too big and complicated. This is what they mean in a more human readable form:

  • Total Size is 619.803 Terabytes

One important fact about this value is that MongoDB takes into account pre-allocated space on the totalsize and in terms of real data the number found here should be a bit lower.

One interesting thing we looked at was database names:


In case it isn't clear in the image, there is one disturbing fact that can be extracted from this list. We will help you identify it by showing you the following image:


At the time of this writing, someone has been connecting to Mongodb Servers and creating databases with the name "DELETED_BECAUSE_YOU_DIDNT_PASSWORD_PROTECT_YOUR_MONGODB", specifically 347 different IPs had this in them.


Another interesting view to look at is a breakdown by database sizes by IP.

(In the following image we do not show the IP addresses leaking the data as we do not reveal this type of information.)


As we can see, different IPs have different databases with different sizes. The first column (representing the first IP address) has less databases than the fourth column but these are much bigger and expose a lot more data.

We can also look at the breakdown by version for MongoDB:


Just to end the MongoDB section, we can look at which countries are leaking the most data via MongoDB servers:


For information on how to securely configure MongoDB please read: MongoDB security.


Memcached is a general-purpose distributed memory caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source (such as a database or API) must be read. You can also find it along side with Couchbase installations.

We have found 118,574 instances of Memcached online.

Worldwide, the distribution of Memcached instances looks as follows:


The total data exposed by the Memcached servers is the following in bytes:


That is 11.347 Terabytes.

Version wise, we found the following distribution:

While it's good to see that at least there are a lot of Memcached installations on 1.4.*, it's disappointing to see the amount exposed to the web and the amount of data leaking.

For information on how to securely configure Memcached please read: How to get memcached secure.


Elasticsearch, as Wikipedia describes it "is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents.". At BinaryEdge, we are big fans of ElasticSearch and we use it regularly.

We found online 8990 instances of ElasticSearch that replied to our probes.

Worldwide, the spread looks as following:


For Elasticsearch this is the version spread:


One interesting fact that we can take from version spread for Elasticsearch is that there are still a couple of servers out there older than version 1.4.3. This is worrying because all versions before 1.4.3 are vulnerable to CVE-2015-1427, which allows for attackers to use the API to gain remote code execution in these platforms.

In terms of data, ElasticSearch was also an interesting find. We found these bytes exposed:


As before after converting this to Terabytes it comes down to 531,199 TB worth of data exposed.

With ElasticSearch, we also looked at other stats that are maybe not interesting from a security perspective, but here at BinaryEdge we love data of all different types.

Breakdown by CPU type:


As expected (ES is quite CPU intensive) most of the instances are running on intel Xeon CPUs.

And breakdown by Java version used:


This last version provides us with a list of concerns when crossed with this list..

For information on how to securely configure Elasticsearch please look into: Elastic Shield.


  • A total close to 1,175 Terabytes (or 1.1 Petabytes) of data was found exposed online.
  • We got to this number by looking merely at 4 different technologies.
  • Versions installed are quite often old and not updated, which means that, in some cases, not only is data exposed but even servers can be compromised.
  • Companies are still figuring out how to use these technologies and by default they are not secure.
  • The mis-configured installations range from small companies to large top 500 companies.
  • Some of these technologies are used as cache servers, so its data is always changing and a multitude of client/company data can be looked at, for example, auth sessions information.
  • Our values might be missing some IP addresses. The reason for this is that we have certain companies/IP range owners that email us and ask not to be scan. When this happens, we add them to our blacklist and never scan them again. This means that, in reality, some numbers might be higher than what is seen on this blogpost.

We are currently working on our platform, "Timelines", that will continue to monitor the amount of data exposed by these platforms.

In the next parts of this blogpost, we will continue to look at other technologies and bring in our data science team insight on the data we have gathered.

No specific company data or confidential data was collected by our probes, only statistical information for each technology. No data from this dataset will be made public. We are in the process of setting up a automated system that will alert companies of open technologies in their networks.

If you would like to know if your company is exposed or would like some information about the work that we do, contact us on [email protected].

If you would like to keep up to date with our analysis and posts please consider following us on twitter, google+ and facebook.

We would like to thank Claudio @clviper and Bruno @morisson for reviewing the article and giving us feedback.

BinaryEdge -

BinaryEdge is a Swiss startup with a focus on DataScience and CyberSecurity.