Analyzing HTTPS Traffic for a Robust Identification of Operating System, Browser and Application

Published on Mar 15, 2016in arXiv: Cryptography and Security
Jonathan Muehlstein2
Estimated H-index: 2
Yehonatan Zion2
Estimated H-index: 2
+ 4 AuthorsOfir Pele9
Estimated H-index: 9
Desktops and laptops can be maliciously exploited to violate privacy. There are two main types of attack scenarios: active and passive. In this paper, we consider the passive scenario where the adversary does not interact actively with the device, but he is able to eavesdrop on the network traffic of the device from the network side. Most of the Internet traffic is encrypted and thus passive attacks are challenging. In this paper, we show that an external attacker can robustly identify the operating system, browser and application of HTTP encrypted traffic (HTTPS). We provide a large dataset of more than 20,000 examples for this task. We present a comprehensive evaluation of traffic features including new ones and machine learning algorithms. We run a comprehensive set of experiments, which shows that our classification accuracy is 96.06%. Due to the adaptive nature of the problem, we also investigate the robustness and resilience to changes of features due to different network conditions (e.g., VPN) at test time and the effect of small training set on the accuracy. We show that our proposed solution is robust to these changes.
  • References (0)
  • Citations (1)
📖 Papers frequently viewed together
1 Citations
1 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
Cited By1
#1Jan Kohout (CTU: Czech Technical University in Prague)H-Index: 6
#2Tomas Pevny (CTU: Czech Technical University in Prague)H-Index: 11
Last. Tomáš Pevný (CTU: Czech Technical University in Prague)H-Index: 17
view all 2 authors...
Many applications and communication protocols exhibit unique communication patterns that can be exploited to identify them in network traffic. This paper proposes a method to represent these patterns compactly, such that they can be used in different analytical tasks. The method treats each communication as a set of observations of a random variable with unknown probability distribution. This view allows us to derive the representation from a distance between two probability distributions used i...
4 CitationsSource