Improving my VRChat performance audio
Now that I’m doing even more performing in VRChat and also making use of occasional backing tracks (instead of just doing everything totally live), it’s time for me to improve my audio setup. But Windows audio is super tricky and annoying, so it took me a bunch of iteration to figure out how I want to make it work now.
Old setup
Previously, I used my VR headset’s mic as my vocal mic, and then routed my instruments through the line input on my computer’s motherboard. Then I used Voicemeeter to combine them together into a single virtual input for VRChat for mic-boosted performances, and kept them separate and used a media source in OBS (with local monitoring) for backing tracks when I did a streaming performance.
This worked pretty well, except for the issue that Voicemeeter adds a bit of latency, because it has a mixing buffer to deal with, and then that mixing buffer gets additional latency because it goes through Windows' mixing API. The problem here is that since backing tracks were being played and then looped back, my live components (voice and guitar) were delayed somewhat relative to the backing track on the Voicemeeter signal.
This was fine for streamed shows, but it sounds like absolute crap on mic boosted shows, and unfortunately a lot of the shows I do are still mic boosted.
There is a way to get Voicemeeter into a low-latency mode where things all line up, but unfortunately that requires giving its mix output exclusive access to the headset’s headphone output, meaning I couldn’t hear anything else (including in-game audio or the show producers) unless I also routed that through Voicemeeter, which then would route that stuff through the stream or mic causing feedback and/or a lot of audio being broadcast that I didn’t want broadcast.
Current setup
I have a Focusrite Scarlett 18i8 left over from my major studio upgrade, which I was just using as a fancy headphone amp/splitter on my office computer. This is complete overkill, as I wasn’t even using its inputs at all (because most audio I do in there is on Discord and Zoom which don’t get along with multichannel interfaces and so I just use my webcam mic).
So, I switched my office computer back to using my actual headphone amp/splitter, and put the 18i8 on my VR computer. And then was reminded just how lousy Windows audio routing is.
After installing the Focusrite Control software on my Windows machine, I enabled the additional channels (tl;dr: open the system tray, click the big gray F icon, select “Expose / Hide Windows Channels,” and then enable all of the inputs and outputs you’re expecting to use).
Here’s my current audio setup:
- A decent studio mic on input 1
- My instrument’s signal chain on input 2
- My backing track player (VLC) set to play to Scarlett Output 1-2
- Scarlett Loopback set with the live mix of the above
- VRChat’s audio going to my VR headset
- OBS audio capture sources:
- Scarlett loopback
- Scarlett input 1-2
- Scarlett input 3-4
- Scarlett output 1-2 (backing track)
- Scarlett output 3-4 (game audio)
- Scarlett output 5-6 (control room audio)
- Headset mic
- Headset speakers
- Monitor outputs going to some headphones and a small guitar amp (which also serves as my pedalboard’s power source)
Next, I set up a live mix on the loopback device (in Focusrite Control), combining my mic, instruments, and Output 1-2.
Then I set OBS’s capture output channels to:
- Stream output (live mix)
- Input 1-2
- Input 3-4
- Backing track
- Headset mic on left, control room on right
- Headset speakers + Output 3-4 (game audio)
Now, when I’m doing a mic-boosted performance, I set VRChat’s mic input to the loopback device, and if I’m doing a streamed performance I keep it on the headset mic (so I get correct lipsync).
If I’m doing a mic-boost show that I’m also streaming to the outside (such as in one of my small solo shows) I can add track 6’s inputs to track 1 as well, so that the streaming audience can hear whatever audience I have in VRChat. However, for a streamed show it’s important to not do that, or else the in-game audience will hear themselves talking on a bit of a delay.
The recording of the headset mic on track 5 can then also be used to improve the vocal recording for a later edit, since it’ll be pristine and from a microphone that follows my head (unlike my mic stand which is easy for my mouth to not be placed perfectly in front of).
Since I’m using loopback for the live mix I don’t care about separating the mono channels out in OBS; if I want to use them as mono channels I can separate them out in my editing software, and the only in-OBS use I have for them is displaying waveforms and that plugin already lets me choose a mono channel from a stereo pair. But if you want to separate them out on the OBS side for some reason:
- Create an input capture source that points to its stereo pair
- Open “advanced audio properties”
- Set the source to Mono, and hard-pan it left (for input 1/3/5/7) or right (for input 2/4/6/8).
- To independently pan the mono sources, you’ll need to add an audio pan filter, because for some damn reason OBS doesn’t come with that built-in.
Another nice thing about this setup is that I can put “control room” chatter (usually from Discord voice chat) on a separate output that isn’t captured as game audio, with this setup:
- Game audio on output 3-4
- Control room audio on output 5-6
- Track 5 gets headset mic on the left, and control room on the right (because both of those are mono sources)
Things I dislike about this setup
Unfortunately the Bigscreen headphones and microphone are USB and separate from the Scarlett. As a result, I can’t just mix the mic in with my “live mix” input, nor can I seamlessly transfer my monitoring over to the headphones, without incurring the latency I’m trying to get away from to begin with.
I have two choices for how to monitor my audio and also hear the game audio:
- Set my system audio output to Scarlett Output 3-4 and hear everything on the monitor headphones
- Set my system audio output to the Bigscreen Beyond headphones and put the monitor headphones over the Beyond headphones
Both of these are annoying because my monitor headphones' cord is much shorter than my headset cable, and anyway it’s obnoxious to have an extra cable running alongside my headset. The first approach at least gives me better-sounding audio and a better fit for my headphones (since I can swing the Bigscreen headphones out of the way), but it means there’s extra steps to take when I go from being an audience member to getting ready for my performance.
Similarly, if I need to listen to the control room while I’m in audience mode (which I usually do, because that’s how the show runners inform me it’s time for my soundcheck), I have to remember to switch my Discord voice chat output from the Bigscreen headphones to Output 5-6 if I’m going to have it routed correctly.
And on the opposite end of things, the mic in the Bigscreen Beyond is actually quite good, and has the nice advantage of always being right over my mouth so I don’t have to worry about where a mic stand is situated. It would be really nice if I could just combine the Bigscreen mic with my other audio sources when I’m doing a mic-boosted performance, without incurring extra latency. It was being unable to do that which sent me down this path to begin with. And, I can do the zero-latency mix-in for a streamed performance (by routing the mic to the live mix track), but then I’d have to have separate setups for mic-boost vs. streamed performances (both in OBS and in Focusrite Control), which increases the potential for mistakes and oversights to happen.
Basically, it’d be really great if there were a low-latency way of overlaying multiple separate audio interfaces on Windows!
Alternately, it’d be great if all of my performances were only done using a streaming setup, but open mics and private/solo shows are almost always mic-boosted, and performing for an audience on Zoom is essentially the same as a mic-boosted show as well, as far as how Zoom’s audio works. (And my next gig will be for a private audience over Zoom on Friday, so it’s super important that I have it working!)
This wouldn’t be an issue on Linux or macOS
Pulseaudio+JACK on Linux has nearly-zero latency capture and audio routing/mixing.
AudioHijack on macOS can also do this stuff with nearly-zero latency (it’s what I’ve used when doing livestreams from my recording studio), and a super easy user interface to boot.
I feel like this is yet another reason why I should try again to move my VR system over to Linux. But gosh, there’s always so many obnoxious things to deal with there, especially with a Bigscreen Beyond, which requires patching the kernel for silly reasons.
Unfortunately, the chances of VRChat ever becoming usable on a Mac are essentially zero. While it’s technically possible for all of the machinations that get it to work on Linux to also work on macOS, Linux has native, supported SteamVR, while the macOS version of SteamVR was done as a one-off beta in 2019 and doesn’t support any current headsets.