Two Research Papers Presented at CSET'22: "The DARPA SEARCHLIGHT Dataset of Application Network Traffic" and "Generating Representative Video Teleconferencing Traffic"

On August 8, 2022, we presented two research papers at the 15th Cyber Security Experimentation and Test Workshop:

  1. The DARPA SEARCHLIGHT Dataset of Application Network Traffic by Calvin Ardi, Connor Aubry, Brian Kocoloski, Dave DeAngelis, Alefiya Hussain, Matt Troglia, and Stephen Schwab.

    Abstract: Researchers are in constant need of reliable data to develop and evaluate AI/ML methods for networks and cybersecurity. While Internet measurements can provide realistic data, such datasets lack ground truth about application flows. We present a ∼ 750GB dataset that includes ∼ 2000 systematically conducted experiments and the resulting packet captures with video streaming, video teleconferencing, and cloud-based document editing applications. This curated and labeled dataset has bidirectional and encrypted traffic with complete ground truth that can be widely used for assessments and evaluation of AI/ML algorithms.

  2. Generating Representative Video Teleconferencing Traffic by David DeAngelis, Alefiya Hussain, Brian Kocoloski, Calvin Ardi, and Stephen Schwab.

    Abstract: Video teleconferencing (VTC) is a dominant network application, yet there is a dearth of tools to generate such traffic for systematic and reproducible experimentation. We present a framework to create representative video teleconferencing traffic and discuss our methodology for behavioral control of multiple bots to create human-like dialog coordination, including interactive talking and silence patterns. Our framework can be coupled with proprietary commercial VTC applications as well as deployed completely within a testbed environment to benchmark emerging networking technology and evaluate the next generation of traffic classification, quality of service (QoS) algorithms, and traffic engineering systems.

The tools we developed in these papers are open source and freely available at https://gitlab.com/mergetb/exptools.

The DARPA SEARCHLIGHT dataset is currently in the process of being released. A preview of the dataset, including graphs and text summaries, can be found at https://stor0.lighthousetb.net/dataset-preview/.

This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). Work by USC/ISI was sponsored by Sandia National Laboratories (SNL) under PO2160586. SNL is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense, Department of Energy, or the U.S. Government.