You are here

Multi-level, multi-core distributed traces synchronization

The probes installed at the different software layers (hypervisor, operating system, virtual machine, system libraries, applications) may be used to provide monitoring and tracing data. Each processor, with its own local clock, then generates a steady flow of events. These events, from multiple cores on each system, and from several distributed systems, must then be collected and stored efficiently. The events coming from the different cores must be synchronized and allow navigation through the possibly huge traces. The Linux Trace Toolkit Viewer, developed at Ecole Polytechnique, is capable of handling huge traces of several gigabytes or more. However, a new architecture is required to handle huge traces while allowing the collection of traces from multiple systems and embedded devices, for both online and a posteriori offline analysis and viewing.

More importantly, further work is required to develop algorithms for the synchronization of events coming from multiple nodes, multiple cores and even multiple virtual machines. Existing tracing tools for distributed systems often use coarse level events, for which local clocks differences may not be a problem. Tools for tracing newer distributed real-time systems rely on the local clocks synchronization and incur a significant loss of accuracy. A posteriori synchronization of traces allows more accurate drift estimation because the network delay variations are amortized over a large number of message exchanges and a long period of time.