by Peter Dorfinger, Carsten Schmoll and Felix Strohmeier
Collecting information as required for network operation also gathers personal information from users sending their data over the network. Existing network-monitoring applications do not take user privacy into consideration by design. We present a framework that allows these applications to operate in a privacy-preserving environment.
Network monitoring is a central component behind a stable operational network. It is used to guarantee the security of the network infrastructure or to validate service-level agreements. To extract the necessary information, network operators collect data sent by individual users with monitoring applications. Often the privacy of the users is not seriously taken into account, and the gathered information may contain a significant amount of personal content (Figure 1). Since IP addresses constitute private information, each captured packet contains private information. If the payload is also captured then passwords (email, ftp etc), e-mail content or VoIP calls can simply be regenerated from the captured packets. Since the employees of the operator, at least those working in network operation, will generally have access to these traces, the privacy of the users is threatened. As user privacy legislation extends further into the Internet, network monitoring on the future Internet will only be able to take place if privacy can be guaranteed. Since network monitoring is necessary, the PRISM (PRIvacy-aware Secure Monitoring) project consortium has developed a framework that allows network monitoring to take place in a manner that ensures user privacy in the future Internet.
This solution is based on a two-tiered approach as shown in Figure 2. In the front-end block the traffic is captured. The front end processes the traffic with the goal that no privacy information is handed over to the back end. Depending on the purpose of the monitoring application, individual fields of the packets may be, for example, deleted, randomized, summed up or anonymized. The information is sent to the back end where it is stored. Per request or for on-the-fly monitoring without intermediate storing, the information from the front end is transformed to the input format of the external monitoring application and exported to it. Fields of the input data that are not received from the front end can be filled with random values. The application operates on a privacy-preserving modified/reduced data set strictly tailored to its needs.
Transforming a legacy monitoring application into a privacy-preserving one will result in different front-end processing for each individual monitoring purpose. One such example, the Skype traffic detection engine of TSTAT (Transfer Control Protocol STatistic and Analysis Tool), is described here in detail. TSTAT can capture packets or operate on a tracefile. TSTAT currently captures all the traffic on a link, meaning any information transported on the link can be rebuilt from the packets, including information relevant to privacy. Further IP addresses are privacy-sensible data, especially since they indicate who communicates with whom. Skype uses payload-encrypted packets for transportation. Thus the first task in the front end after capturing the packets will be to filter for packets where the payload is encrypted. Encrypted payload can be handed over to an external application because the information in it cannot be rebuilt. All other payload must be randomized or stripped because, depending on the protocol, it may contain private information in plain text.
One important aspect is the question of how to handle the common 5-tuple attributes (src/dst IP address, protocol and src/dst port number). Since the IP addresses are the most privacy-sensitive of these fields, they should be removed or at least remapped in a non-reversible way. It is recommended that src and dst IP addresses be mapped together; this means that each observed IP address is mapped to a different new value for each different src/dst IP address pair. In this way the mapping space is much larger and single addresses cannot be reverse-mapped easily by injection attacks. The protocol number is generally non-critical and can be kept. Port numbers could be removed (set to zero) in our example (with the exception of the well-known Skype port number 12340 for outgoing Skype traffic), effectively hiding all other port numbers since they are not needed for this analysis.
The information is then sent to the back end and stored. On a request the back end generates a tracefile and hands it over to TSTAT. TSTAT can now perform Skype traffic detection on a tracefile that contains all Skype traffic and is free of sensitive information.
Future work in the PRISM project will be the implementation of the proposed framework and the adaptation of existing monitoring applications. The focus of the work for involved ERCIM members lies in the adaptation of monitoring applications and on traffic anonymization.
Links:
http://www.fp7-prism.eu
http://tstat.tlc.polito.it/skype.shtml
Please contact:
Peter Dorfinger
Salzburg Research Forschungsgesellschaft mbH/AARIT, Austria
Tel: +43 662 2288 452
E-mail: peter.dorfingersalzburgresearch.at
Carsten Schmoll
Fraunhofer Institute for Open Communication Systems - FOKUS, Germany
Tel: +49 30 3463 7136
E-mail: carsten.schmollfokus.fraunhofer.de