Archive for April, 2018

Oppen Do Down–first Web Audio API piece

In my previous post, I made notes about my reading of and preliminary understanding of Chris Wilson’s article on precision event scheduling in the Web Audio API–in preparation to create my first Web Audio API piece. I’ve created it. I’d like to share it with you and talk about it and the programming of it.

Oppen Do Down, an interactive audio piece

The piece is called Oppen Do Down. I first created it in the year 2000 with Director. It was viewable on the web via Shockwave, a Flash-like plugin–sort of Flash’s big brother. But hardly any contemporary browsers support the Shockwave plugin anymore–or any other plugins, for that matter–the trend is toward web apps that don’t use plugins at all but, instead, rely on newish native web technologies such as the Web Audio API, which requires no plugins to be installed before being able to view the content. The old Director version is still on my site, but nobody can view it anymore cuz of the above. I will, however, eventually release a bunch of downloadable desktop programs of my interactive audio work.

You can see the Director version of Oppen Do Down in a video I put together not long ago on Nio, Jig-Sound, and my other heap-based interactive audio work.

I sang/recorded/mixed the sounds in Oppen Do Down myself in 2000 using an old multi-track piece of recording software called Cakewalk. First I recorded a track of me snapping my fingers. Then I played that back over headphones, looping, while I recorded a looping vocal track. Then I’d play it back. If I liked it I’d keep it. Then I’d play the finger snapping and the vocal track back over headphones while I recorded another vocal track. Repeat that for, oh, probably about 60 or 70 tracks. Then I’d pick a few tracks to mix down into a loop. Most of the sounds in Oppen Do Down are multi-track.

As you can hear if you play Oppen Do Down, the sounds are synchronized. You click words to toggle their sounds on/off. The programming needs to be able to download a bunch of sound files, play them on command, and keep the ones that are playing synchronized. As you turn sounds on, the sounds are layered.

As it turns out, the programming of Oppen Do Down was easier in the Web Audio API than it was in Director. The reason for that is all to do with the relative deluxeness of the Web Audio API versus Director’s less featureful audio API.

Maybe the most powerful feature of the Web Audio API that Director didn’t offer is the high-performance clock. It’s high-performance in two ways. It has terrific resolution, apparently. It’s accurate to greater precision than 1 millisecond; you can use it to schedule events right down to the level of the individual sound sample, if you need that sort of accuracy. And the Web Audio API does indeed support getting your hands on the very data of the individual samples, if you need that sort of resolution. But the second way in which the high-performance clock is high-performance is that it stops for nothing. Which isn’t how it normally works with timers and clocks programmers use. They’re usually not the highest-priority processes in the computer, so they can get bumped by what the operating system or even the browser construes as more important processes. Which can result in inaccuracies. Often these inaccuracies are not big enough to notice. But in Oppen Do Down, we need pretty accurate rhythmic timing.

Director didn’t offer such a high-performance clock. What it had was the ability to insert cue-points into sounds. And define a callback handler that could execute when a cue-point was passed. That was how you could stay in touch with the actual physical state of the audio, in Director. The Web Audio API doesn’t let you insert cue-points in sounds, but you don’t need to. You can schedule events, like the playing of sounds, to happen in the time coordinate system of the high performance clock.

This makes synchronization more or less a piece of cake in the Web Audio API. Because you can look at the clock any time you want with great accuracy (AudioContext.currentTime is how you access the clock) and you can schedule sounds to start playing at time t and they indeed start exactly at time t. And the scheduling strategy Chris Wilson advocates, which I talked about in my previous post, whereby you schedule events a little in advance of the time they need to happen, works really well.

There are other features the Web Audio API has that Director didn’t. But, then, Director was actually started in 1987, whereas the Web Audio API has only been around for a few years as of this date in 2018. You can synthesize sounds in the browser, though that isn’t my interest; I’m more interested in recording vocal and other sounds and doing things with those recorded sounds. You can also process live input from the microphone, or from video, or from a remote stream. And you can create filters. And probably other things I don’t know anything about, at this point.

Anyway, Oppen Do Down links to two JavaScript files. One, oppen.js, is for this particular app. The other one, sounds.js, is the important one for understanding sound in Oppen Do Down. The sound.js file defines the Sounds constructor, from top to bottom of sound.js. In oppen.js, we create an instance of it:

gSounds=new Sounds([‘1.wav’,’2.wav’,’3.wav’,’4.wav’,’5.wav’,’6.wav’]);

Actually there are 14 sounds, not 6, but just to make it prettier on this page I deleted the extra 8. I used wav files in my Director work. I was happy to see that the Web Audio API could use them. They are uncompressed audio files. Also, unlike mp3 files, they do not pose problems for seamless looping; mp3 files insert silence at the ends of files. I hate mp3 files for that very reason. Well, I don’t hate them. It’s more like I show them the symbol of the cross when I see them.

The gSounds object will download the sounds 1.wav, etc, and will store those sounds, and offers an API for playing them.

‘soundsAreLoaded’ is a function in oppen.js that gets called when all the sounds have been downloaded and are ready to be played.

gSounds adds each sound (1.wav, 2.wav, … 14.wav) via its ‘add’ method, which creates an instance of the Sound (not Sounds) constructor for each sound. The newly created Sound object then downloads its sound and, when it’s downloaded, the ‘makeAvailable’ function puts the Sound object in the pAvailableSounds array.

When all the sounds have been downloaded, the gSounds object runs a function that notifies subscribers that the sounds are ready to be played. At that point, the program makes the screen clickable; the listener has to click the screen to initiate play.

It’s important that no sounds are played until the user clicks the screen. If it’s done this way, the program will work OK in iOS. iOS will not play any sound until the user clicks the screen. After that, iOS releases its death grip on the audio and sounds can be played. Apparently, at that point, if you’re using the Web Audio API, you can even play sounds that aren’t triggered by a user click. As, of course, you should be able to, unless Apple is trying to kill the browser as a delivery system for interactive multimedia.

I’ve tested Oppen Do Down on Android, the iPad, the iPhone, and on Windows under Chrome, Edge, Firefox and Opera. Under OSX, I’ve tested it with Chrome, Safari and Firefox. It runs on them all. The Web Audio API seems to be well-supported on all the systems I’ve tried it on.

When, after the sounds are loaded, the user clicks the screen to begin playing with Oppen Do Down, we find the sound we want to play initially. Its name is ‘1’. It’s the sound associated with the word ‘badly’. We turn the word ‘badly’ blue and we play sound ‘1’. We also make the opening screen invisible and display the main screen of Oppen Do Down (which is held in the div with id=’container’).

var badly=gSounds.getSound(‘1’);

The ‘’ method is, of course, crucial to the program cuz it plays the sounds.

It also checks to see if the web worker thread is working. This separate thread is used, as in Chris Wilson’s metronome program, to continually set a timer that times out just before sounds stop playing, so sounds can be scheduled to play. If the web worker isn’t working, ‘’ starts it working.Then it plays the ‘1’ sound.

Just before ‘1’ finishes playing–actually, pLookAhead milliseconds before it finishes–which is currently set to 25–the web worker’s timer times out and it sends the main thread a message to that effect. The main thread then calls the ‘scheduler’ function to schedule the playing of sounds which will start playing in pLookAhead milliseconds.

If the listener did nothing else, this process would repeat indefinitely. Play the sound. The worker thread’s timer ticks just before the sound finishes, and then sounds are scheduled to play.

But, of course, the listener clicks on words to start/stop sounds. When the listener clicks on a word to start the associated sound, ‘’ checks to see how far into the playing of a loop we are. And it starts the new sound so that it’s synchronized with the playing sound. Even if there are no sounds playing, the web worker is busy ticking and sending messages at the right time. So that new sounds can be started at the right time.

Anyway, that’s a sketch of how the programming in Oppen Do Down works.

Chris Joseph gave me some good feedback. He noticed that as he added sounds to the mix, the volume increased and distortion set in after about 3 or 4 sounds were playing. He suggested that I put in a volume control to control the distortion. He further suggested that each sound have a gain node and there also be a master gain node, so that the volume of each sound could be adjusted.

The idea is that as the listener adds sounds, the volume remains constant. Which is what the ‘adjustVolumes’ function is about. It works well.

I am happy with my first experiment with the Web Audio API. Onward and upward.

However, it’s hard to be happy with some of the uses that the Web Audio API is being put to. The same is true of the Canvas API and the Web RTC API. And these, to me, are the three most exciting new web technologies. But, of course, when new, interesting, powerful tools arise on the web, the forces of dullness will conspire to use them in evil ways. These are precisely the three technologies being used to ‘fingerprint‘ and track users on the web. This is the sort of crap that makes everything a security threat these days.

Event Scheduling in the Web Audio API

I’ve been reading about the Web Audio API concerning synchronization of layers and sequences of sounds. Concerning sound files, specifically. So that I can work with heaps of rhythmic music.

A heap is the term I use to describe a bunch of audio files that can be interactively layered and sequenced as in Nio and Jig Sound, which I wrote in Director, in Lingo. The music remains synchronized as the sound icons are interactively layered and sequenced. The challenge of this sort of programming is coming up with a way to schedule the playing of the sound files so as to maintain synchronization even when the user rearranges the sound icons. When I wrote Nio in 2000, I wrote an essay on how I did it in Nio; this essay became part of the Director documentation on audio programming. The approach to event scheduling I took in Nio is similar to the recommended strategy in the Web Audio API.

Concerning the Web Audio API, first, I tried basically the simplest approach. I wanted to see if I could get seamless looping of equal-duration layered sounds simply by waiting for a sound’s ‘end’ event. When the ‘end’ event occurred concerning a specific one of the several sounds, I played the sounds again. This actually worked seamlessly in Chrome, Opera and Edge on my PC. But not in Firefox. Given the failure of Firefox to support this sort of strategy, some other strategy is required.

The best doc I’ve encountered is A Tale of Two Clocks–Scheduling Web Audio With Precision by Chris Wilson of Google. I see that Chris Wilson is also one of the editors of the W3C spec on the Web Audio API. So the approach to event scheduling he describes in his article is probably not idiosyncratic; it’s probably what the architects of the Web Audio API had in mind. The article advocates a particular approach or strategy to event scheduling in the Web Audio API. I looked closely at the metronome he wrote to demonstrate the approach he advances in the article. The sounds in that program are synthesized. They’re not sound files. Chris Wilson answered my email to him in which I asked him if the same approach would work for scheduling the playing of sound files. He said the same approach would work there.

Basically Wilson’s strategy is this.

First, create a web worker thread. This will work in conjunction with the main thread. Part of the strategy is to use this separate thread that doesn’t have any big computation in it for a setTimeout timer X whose callback Xc regularly calls a schedule function Xcs, when needed, to schedule events. X has to be set to timeout sufficiently in advance of when sounds need to start that they can start seamlessly. Just how many milliseconds in advance it needs to be set will have to be figured out with trial and error.

But it’s desirable that the scheduling be done as late as feasibly possible, also. If user interaction necessitates recalculation and resetting of events and other structures, probably we want to do that as infrequently as possible, which means doing the scheduling as late as possible. As late as possible. And as early as necessary.

When we set a setTimeout timer to time out in x milliseconds, it doesn’t necessarily execute its callback in x milliseconds. If the thread or the system is busy, that can be delayed by 10 to 50 ms. Which is more inaccuracy than rhythmic timing will permit. That is one reason why timeout X needs to timeout before events need to be scheduled. Cuz if you set it to timeout too closely to when events need to be scheduled, it might end up timing out after events need to be scheduled, which won’t do—you’d have audible gaps.

Another reason why events may need to be scheduled in advance of when they need to happen is some browsers—such as Firefox—may require some time to get it together to play a sound. As I noted at the beginning, Firefox doesn’t support seamless looping via just starting sounds when they end. That means either that the end event’s callback happens quite a long time after the sound ends (improbable) or sounds require a bit of prep by Firefox before they can be played, in some situations.

OK, so we need to schedule events a little before those events have to happen. We regularly set a timer X (using setTimeout or setInterval) to timeout in our web worker thread. When it does, it posts a message to the main thread that it ticked, that it’s time to see if events need scheduling.
If some sounds do need to be scheduled to start, we schedule them now, in the main thread.

But to understand that process, it’s important to understand the AudioContext’s currentTime property. It’s measured in seconds from a 0 value when audio processing in the program begins. This is a high-precision clock. Regardless of how busy the system is, this clock keeps good time. Also, when you pause the program’s execution with the debugger, currentTime keeps changing. currentTime stops for nothing! The moral of the story is we want to schedule events that need rhythmic accuracy with currentTime.

That can be done with the .start(when, offset, duration) method. The ‘when’ parameter “should be specified in the same time coordinate system as the AudioContext’s currentTime attribute.” If we schedule events in that time coordinate system, we should be golden, concerning synchronization, as long as we allow for browsers such as Firefox needing enough prep time to play sounds. How much time do such browsers require? Well, I’ll find out in trials, when I get my code running.

The approach Chris Wilson recommends to event scheduling is similar to the approach I took in Nio and Jig Sound, which I programmed in Lingo. Again, it was necessary to schedule the playing of sounds in advance of the time when they needed to be played. And, again, that scheduling needed to be done as late as possible but as early as necessary. Also, it was important to not rely solely on timers but to ground the scheduling in the physical state of the audio. In the Web Audio API, that’s available via the AudioContext’s currentTime property. In Lingo, it was available by inserting a cuePoint in a sound and reacting to an event being triggered when that cuePoint was passed. In Nio and Jig-Sound, I used one and only one silent sound that contained a cuePoint to synchronize everything. That cuePoint let me ground the event scheduling in a kind of absolute time, physical time, which is what the Web Audio API currentTime gives us also.

I’ll revisit this stuff in later posts as I build my code.