Peering into the Muddy Waters of Pastebin
by Srdjan Matic, Aristide Fattori, Danilo Bruschi and Lorenzo Cavallaro
Advances in technology and a steady orientation of services toward the cloud are becoming increasingly popular with legitimate users and cybercriminals. How frequently is sensitive information leaked to the public? And how easy it is to identify it amongst the tangled maze of legitimate posts that are published daily? The underground economy and the trade of users' stolen information are once again rising to the surface and mutating into a bazaar under the eyes of everyone. Do we have to worry about it and can we do anything to stop it?
Pastebin applications, also known simply as “pastebin”, are the most well-known information-sharing web applications on the Internet. Pastebin applications enable users to share information with others by creating a paste. Users only need to submit the information to be shared and the service provides an URL to retrieve it. In addition to being useful for sharing long messages in accordance with policies (eg Twitter) and netiquette (IRC chats), one of the main features that make pastebin appealing is the possibility of anonymously sharing information with a potentially large crowd.
Unfortunately, as along with the legitimate use of such services comes their inevitable exploitation for illegal activities. The first outbreak occurred in late 2009, when roughly 20,000 compromised Hotmail accounts were disclosed in a public post. Many other sensitive leaks followed shortly thereafter, but it is with the illegal activities of the hacker groups Anonymous and LulzSec that such security concerns reached a much wider audience .
To shed interesting insights on the underground economy, we, Royal Holloway, University of London and University of Milan, jointly developed a framework to automatically monitor text-based content-sharing pastebin-like applications to harvest and categorize (using pattern matching and machine learning) leaked sensitive
We monitored pastebin.com from late 2011 to early 2012, periodically downloading public pastes and following links to user-defined posts. We recorded a diverse range of categories of sensitive or malicious information leaked daily: lists of compromised accounts, database dumps, list of compromised hosts (with backdoor accesses), stealer malware dumps, and lists of premium accounts.
The list of compromised accounts (ie username and password pairs) is the most commonly recorded stolen sensitive information (685 posts with 197,022 unique accounts). Such lists are often packed with references to where these accounts were stolen and the websites where they would be valid, giving miscreants (or just random curious readers) an easy shot. Such information enables us to shed some light on previous security trends and weaknesses  (eg password strengths and credential reuse). For instance, more than 75% of such passwords were cracked in a negligible amount of time, pointing out that users still rely on poorly chosen or weak passwords.
Similarly, posts of leaked database dumps often include references to the attacked servers, precise information on the exploited vulnerability and clear indications of the tools used to perform the attack, providing interesting insights into the attackers’ methods.
Posts containing leaked information about compromised servers (104 posts with 5,011 unique accounts) include lists of URLs with recurring patterns (eg webdav, shell, dos). Our analysis shows that such PHP-written shells are generally aimed at performing UDP-based DoS attacks.
Information leaked by malware was responsible for 121 posts with 12,036 unique accounts. Such posts report very precise information associated with the leaked credentials, ie the URL of the website for which the account is valid, the program from which they were stolen, an IP addresses, a computer name and a date.
Finally, posts of leaked premium website accounts contain lists of username and password used to access web applications that provide enhanced features for paying customers (892 posts with 239,976 unique accounts). Unsurprisingly, the two commonest categories of premium accounts refer to pornography and file sharing websites.
As previous researchers have done , we evaluated the potential value of this sensitive information on the black market ; prices and values are reported in Table 1.
Table 1: Prices and values of goods on the black market
As outlined above, some leaked posts linked to shell installed on compromised servers. To better understand the threat posed by the public disclosure of such information, we evaluated the bandwidth capacity (using a geo-location database) these shells may generate in a DDoS attack. Out of more than a hundred shell-related posts, we extracted roughly 31,000 shell-related URLs (5,011 unique, 4,784 of which valid). Such shells are installed on servers located in 118 different Countries (as shown in Figure 1), with the top five referring to USA (1074), Germany (629), The Netherlands (219), France (166), and UK (164). The aggregate computed bandwidth is 23.3Gbps, comparable to that of a small botnet.
Figure 1: Geographical distribution of shells.
Our analysis reported 121 posts containing stealer malware dumps. We identified was roughly 14,000 dumps (12,036 of which were unique). Owing to the structured nature of these dumps, it was possible to gather precise statistics (omitted from this article due to space constraints). Most of the websites were about gaming, social networking, and file sharing.
In conclusion, our ongoing research effort showed that sensitive information is easily and publicly leaked on the Internet. The automatic identification of such information is not only an interesting research topic, as it sheds insights on underground economy trends, but, if properly enforced, it may allow us to detect and contain the damage caused by malicious leaks.
 LulzSec, "Fox.com hack", http://pastebin.com/Q2xTKU2s, 2011
 Brett Stone-Gross, Marco Cova, Lorenzo Cavallaro, Bob Gilbert, Martin Szydlowski, Richard Kemmerer, Christopher Kruegel, and Giovanni Vigna, Giovanni, "Your botnet is my botnet: analysis of a botnet takeover", Proceedings of the 16th ACM conference on Computer and Communications Security, 2009
 Symantec Corporation, "Symantec Internet Security Threat Report 2010", 2010
Royal Holloway, University of London
Tel: +44 1784 414381