NVIDIA GTC 2014 Speaker Interview: Zuofu Cheng

Attention, gamers: geometric acoustics is a real thing. Wikipedia says so. More than that, though, this audio science applies to all the games you play, because it defines the way created sound in a game travels in a certain path, factoring in interference and diffusion. But how accurate is that sound? Is it a true simulation or defined by some other set of limits, and does it make a difference at the end of the day in the way sound is created and projected within the game environment?

Bring your lab coat and your headphones, because a GPU Technology Conference 2014 speaker, Zuofu Cheng, will be digging a little deeper into how GPU processing can create more accurate simulations of sound within a game.

Events for Gamers: Zuofu, tell us a little about your background in the educational and engineering side of audio and visuals.

Zuofu: Sure, my degrees (BS/MS/PhD) are all from U of I (University of Illinois in Urbana-Champaign) in Electrical Engineering. My original specialization was actually in music synthesis and control with Prof. Lippold Haken (who is the inventor of the very cool Continuum Fingerboard). My previous work was mostly in musical instruments and synthesis, I’ve done some work on a continuous-pitch electronic wind instrument, as well as analysis of some musical filters used in the Continuum Fingerboard.

E4G: Picking up where the first questions leaves off, please share some of your background about Z-space and your role in its creation.

Zuofu: (As a side note, we’re no longer calling our engine zspace since another company has copyrighted the term for a VR display; for now, we are internally calling it avidengine (for our game AViD – a voice in darkness).

I’m a big gamer, and I sort of came-to-age during the big 3D sound boom of the late 90s and early 2000s. One of my coolest memories was playing with the Aureal A3D occlusion and reflection demo (where you could move these virtual walls around your avatar and hear the difference) and being blown away, thinking this was the future, and then having that technology suddenly disappear as the sound card industry consolidated (and then basically disappeared completely).

When people first started doing general purpose compute on the GPU, I thought this was the perfect opportunity; the biggest problem with 3D sound was trying to convince gamers to buy hardware to make their games sound better – it just didn’t work like that. Gamers would presumably already have high end graphics cards. This was around 2009, and I would say that game audio at that point had a lot of good design lineage, but very little tech. I threw together a simple prototype in a week or so (hilariously, you couldn’t yet turn your head) and sent it to my sound-designer friend Sean, and he immediately started virtually ducking behind walls and it was really unlike anything that we’ve heard in games in a long time, since the character of the sound changes completely depending on where in the room the listener is. Having that visual to audio correspondence, you see yourself in a corner of the virtual room and it sounds like a corner; that was huge. Later we added stereo and eventually HRTF (head-related transfer function) support, so that additionally it sounds different depending on your virtual head orientation. I’ve been updating avidengine ever since, splitting my time between getting my degree, working on the tech, and working on our game AViD (more about that later).

E4G: What do you hope you will communicate to folks who attend your talk, “Making Games Sound as Good as They Look: Real-time Geometric Acoustics on the GPU” at NVIDIA GTC 2014?

Zuofu: I want to let people know that this technology is coming, though perhaps not as soon as people might think just hearing the demos. If I were to guess, I would say within about 8 years, which is about the time that the successors to the recently released PS4/One are expected to be released. I think John Carmack pointed out in a recent talk that we’re reaching the point where graphics are not limited necessarily by technology, but by the sheer volume and cost of the artistic assets. With the new VR boom, I think having immersive tech-driven sound is something that we’ll see migrate to AAA games eventually, whether it’s a full GA system, or something more of a hybrid system (traditional audio tracks + some geometrically derived parameters). Most of my talk is about some lessons I’ve learned in designing the system, I suspect that anyone that is engineering a similar system will run into the same problems/tradeoffs, and I wanted to give my take on approaches to solve those problems.

E4G: Since your focus is on the application of geometric acoustics (GA) in applications run on Fermi and Kepler GPUs, would the impact of GA apply be similar, or scaled-back, when filtered through mass market, less powerful GPUs?

Zuofu: One of the things holding back this technology is that there needs to be an experience that is good enough on most systems to justify the paradigm shift from the current “multi-track mixing console with effects” approach to sound to a simulated tech-driven approach. We saw a similar problem with physics engines: when they were first being brought to market, many of the titles using them were flawed because computers weren’t really fast enough to run an ‘uncompromised’

version of the tech and it actually looked worse than traditional key-frame or motion-captured animation. We’re conscious of the problem that having to scale-down the tech too much will in many cases make it sound worse than what’s already out there.

With that being said, there are a lot of unused compute resources, even on a gaming PC. For example, the integrated GPU is almost always idle when an external GPU is installed (although that extra thermal headroom helps the CPU reach the highest frequency bins). One of our hopes for the future is that the integrated GPU is going to be fast enough to do audio in an un-compromising way.

E4G: What do you feel are the ideal applications for geometric acoustics, whether a game or another application?

Zuofu: Generally the more ‘realistic’ the experience that the game is trying to convey, the better suited it is for a GA engine. Specifically, we imagine games like survival horror, tactical shooters, driving games, etc. One of the great examples of current (or future) game that would benefit from our tech is DayZ, which is an open world multiplayer zombie survival game, since it tries to be immersive and realistic. However, the graphics don’t necessarily have to be realistic; Minecraft, or example would benefit from a GA engine, since it requires a high level of spatial awareness despite the pixel-art graphics due to the dynamic geometry.

Another (perhaps surprising) genre of game/app that would benefit is virtual worlds such as Second Life. The reason for this is because voice over-IP presents a unique challenge for sound designers. If the voices are simply mixed into the listener’s audio, it can sound quite chaotic as many voices (potentially strangers) sound like they are simultaneously speaking in the listener’s head. This makes it difficult to keep track of conversation threads.

On the other hand, if the sound is mixed in a diegetic way without GA (that is, in-world, from a listener’s perspective), you have artifacts due to the lack of geometric correspondence. For example, someone in a walled booth behind the listener sounds the same as someone in the same booth as the listener, by virtue of being the same distance away irrespective of the geometry. GA gives the listener some context and some spatial cues to separate listeners – this effect is actually well researched as the “cocktail party effect”. Our work-in-progress game, A

Voice in Darkness, focuses on this effect in a multiplayer survival context. Players are presented a hostile world in near total darkness, and they must use audio cues as well as each other’s voices to navigate, communicate, and survive.

E4G: Taken to the end user level, what would the impact of effective geometric acoustics have on their game play versus what they are presently experiencing in PC games?

Zuofu: As the Guerilla Games people pointed out in their excellent GDC talk, it is possible to emulate some of the effects of our engine with existing systems and a talented sound designer who accounts for the most relevant psychoacoustic effects manually. But more broadly, what we hope our technology will do is encourage gamers to hone their critical listening skills as a gameplay mechanism. After all, this is why, as humans, we can detect sounds – it helps us survive by having an omnidirectional detector for nearby activity out of our field of view. We want to present an experience that is realistic (perhaps hyper-realistic) enough that the gamer needs to pay attention to sound to virtually survive. The way sound is presented in games right now is as an addition that makes the experience more exciting or cinematic; we want to get away from that model (for some games), into one that reflects sound’s purpose as a survival aid. That also opens the door to other types of gameplay, including VR and augmented reality.

Imagine a game, for example, where the sound is good and scary enough and it is simultaneously recording your reaction using the microphone, the player would have to stay quiet in reality to remain undetected in the virtual world.

E4G: Do you foresee a time when GA could be applicable — or useful — in mobile applications and games, say, if processed through the Kepler K1 architecture?

Zuofu: Absolutely, some of the best audio-only games (Papa Sangre, among others) are on the mobile platform, partially because of proliferation of headphones and also because of the lack of traditional controls on the platform encourages more experimental games rather than ports of AAA titles. GA could add another dimension to games like Papa Sangre (for example, being able to detect a wall in the near field using only sound). It would also be interesting for some of the more social situations we’ve already touched upon, like in a virtual world-type application. Mobile’s unique sensors (GPS, accelerometers, cameras, etc) allow for games like virtual worlds to be integrated with augmented reality and VR, and GA adds another degree of immersion to these applications.

E4G: What other topics and sessions at GTC 2014 are you interesting in checking out, both from the perspective as an engineer and a gamer?

Zuofu: Engineering wise, I’m always interested in looking at what other people are doing with signal processing; mostly for ideas to speed up avidengine. One of the related things that comes up all the time is the question: ‘when are games going to go to ray-traced rendering?’ It seems like every generation, ray-tracing gets faster, but then some smart person figures out how to get traditional (rasterized) graphics to do another graphical effect that we thought could only be done by ray-tracing. I find that sort of a parallel to my work in GA; every time we make another optimization, clever people figure out a way to make some more effects in a traditional context. I still think GA will get there, but there’s a reason why I said 8 years, even though the technology sounds quite good already.

Paul Philleo, Contributing Editor

Updated: March 25, 2014 — 6:31 am