Analyzing HTTPS Traffic for a Robust Identification of Operating System, Browser and Application
Desktops and laptops can be maliciously exploited to violate privacy. There are two main types of attack scenarios: active and passive. In this paper, we consider the passive scenario where the adversary does not interact actively with the device, but he is able to eavesdrop on the network traffic of the device from the network side. Most of the Internet traffic is encrypted and thus passive attacks are challenging. In this paper, we show that an external attacker can robustly identify the operating system, browser and application of HTTP encrypted traffic (HTTPS). We provide a large dataset of more than 20,000 examples for this task. We present a comprehensive evaluation of traffic features including new ones and machine learning algorithms. We run a comprehensive set of experiments, which shows that our classification accuracy is 96.06%. Due to the adaptive nature of the problem, we also investigate the robustness and resilience to changes of features due to different network conditions (e.g., VPN) at test time and the effect of small training set on the accuracy. We show that our proposed solution is robust to these changes.