At this year’s SIGGRAPH Conference in New Orleans, Carl Jacobson met with the host of Intel’s Visualize This! TV Show, Arti Gupta, to discuss Cakewalk’s advancements in it’s software performance. Some of Arti’s questions were so technical that we thought it only fair to ask Cakewalk’s CTO Noel Borthwick to add his thoughts. Watch the video and check out Noel’s comments below:
AG: Cakewalk is a member of the Intel Software Partner Program. What challenges were you trying to solve?
NB: The bandwidth available to the typical modern DAW user using a modern CPU such as the Core I7 is astounding compared to what was available just a couple of years ago. Users expect our software to use every ounce of available CPU cycles and horsepower it can to process their audio and mix. Cakewalk has been on the bleeding edge of technology for the last 15 years, taking advantage of cutting edge capabilities of the operating system as well as available hardware resources. With multiprocessing and 64-bit computing rapidly becoming mainstream, it has become even more critical for our software to make efficient use of hardware resources.
For example, for efficient multiprocessing we try and optimize all the code paths that are used in asynchronously mixing audio. The goal is to present a multi core machine with even and distributed workloads allowing the cores to work as hard as possible. To do this, we streamline the relevant code and minimize all high latency instructions.
Some typical areas that we try and improve our performance in are:
• Multi-processor load scaling: How well does a controlled test project load across multiple CPU cores?
• 64 bit performance: How well does the 64 bit version of the application perform with multiple workloads?
• CPU use: How efficiently does SONAR play back a CPU intensive project?
• High bandwidth tests: How well does the application perform while streaming audio at high sample rates (192K, 384K, etc) and bit depths (64 bit audio, etc)?
• Low latency performance: How well does the application perform streaming audio with very small audio buffer sizes (such as 1 msec buffers)?
AG: I understand you also improved your spinlock implementation using better use of threading. Can you talk to what was the problem and how you addressed it?
NB: We use ‘spin locks’ in various places in our application where we run time critical low latency operations such as in our mixer topology. The idea of using spinlocks is to avoid dreaded OS context switching of our time critical threads while locking read/write resources to maintain thread safety. A good spinlock implementation can be tricky since it needs to support many behaviors of the client application such as reentrancy, deadlock protection, etc. A poor spinlock implementation can literally freeze the entire system.
A spinlock also needs to be efficient at “spinning” which is one of the things that it does a lot! One of the problems we had with an earlier version of our spinlock (in an earlier version of SONAR) was that it under-performed on hyper-threaded systems causing worse performance when it was enabled. We found that hyper-threaded machines seemed especially prone to high penalties with spin wait loops. At the time, the initial workaround was to tell users not to use hyper- threading. Further research into this problem led us to an old Intel white paper which describes a workaround for this problem by using the PAUSE instruction.
Most recently we have also optimized our implementation to minimize problems with false sharing as reported by Intel.
AG: Besides using Intel hardware you also used tools like Vtune Performance Analyzer and threading tools, how did the use of these tools help you scale?
NB: SONAR is designed to be a highly scalable application. It supports as many CPU’s and cores that you can throw at it and will automatically balance a workload to take advantage of all cores.
VTune and Intel Thread checker have been useful for routine analysis to determine performance hotspots in our application. They help us make the best decisions on the areas that would most benefit from performance optimizations. It’s easy to spend a lot of time with performance optimizations with marginal user benefit, so it’s critical for us to do cost-benefit analysis, before investing too many resources into it.
AG: Cakewalk has optimized Sonar 8 for the latest Intel platforms. What kind of performance improvements did you see?
NB: The primary improvements in SONAR 8 were in performance – under low latency and high workloads. Many projects that would cause audio glitches before now run smoothly on the same system.
The overall gains we achieved in SONAR 8 over SONAR 7 were in the range of 30-240% across a variety of systems surveyed.
AG: How has the relationship with Intel helped your development?
NB: We have a lot of back and forth with Intel on various issues and this helps us focus on the types of things that will impact the greatest number of our customers. We also provide information to Intel on the types of operations that are of most importance to our application, helping Intel improve its architecture to scale to our use cases.
Having access to early releases of the new Intel platforms such as the Core I7 has also been very useful to help us keep our applications scalable and designed for the future.