Getting the 4-1-1

"City and state, please"

In 2003, Tellme sought to redefine the experience of calling 4-1-1 to include an Interactive Voice Response (IVR) system and, as necessary, a seamless transition to the operator by passing along a voice recording and pop-up GUI display on the operator terminal.

 Desk Set (1957). Expert operators at a TV network's research department must adapt to new technology. Fabulous movie.

Desk Set (1957). Expert operators at a TV network's research department must adapt to new technology. Fabulous movie.

I was the second employee on the project that later became Tellme's biggest revenue generator. Verizon, AT&T Wireless/Cingular, Sprint and other carriers eventually jumped on board. Well over 50% of the 411 calls in the nation are now answered by Tellme's system (-> Microsoft -> 24/7). Billions of calls each year are answered by a system our team designed.

Callers pay to use 411 and expect fast, accurate, friendly service. They might call for a business, a residential number, a reverse lookup, or a variety of other things - 6% called for the time of day. It's a user experience where the entire interaction may last 10-45 seconds, every second counts, and accuracy (and well-handled mistakes) translate into credibility, repeat callers, and revenue.

We were bound by legacy systems and legacy interactions. City and state needed to proceed the listing name, for example, for listing lookups. The number read out was done by another vendor. We needed to work with the vendor who provided the operator's expert system to determine what / how much we could relay to the operator (was the first listing utterance more useful, or the second? The operator's seconds on the call - and the time the caller is waiting - is expensive when multiplied by billions of calls.) 

Designing for audio is much like designing any other user experience (navigation flows, user scenarios, technology constraints and affordances) but, due to speech recognition accuracy, is also a probability tightrope between natural, effortless, human-like conversation and a repetitive, robotic nightmare.

We began with contextual inquiries at call centers all over the country. We learned techniques of the best operators and saw what additional information they posted around their cubicle - frequently requested, hard to search phone numbers, maps of the local area, where to find current movie listings.

When we rolled out residential lookups, we user tested over 15 alts for the phrase "Business or residence?" trying to arrive at a question that was a) easily answered and b) would result in utterances that we'd recognize with high probability, and c) would avoid the system sounding dumb. Once we narrowed to our top three, we rolled them out to tightly-controlled a:b tests.

As we chased the tail trying to improve accuracy and automation, we learned that a big part of callers feeling confident about the listing match we offered depended on our ability to pronounce the city name, listing name or street name with the correct regional pronunciation. For example, "Villa St, is that right?" - needs to be pronounced like a country estate 'vill-uh' in New England, and more like the Spanish - 'vee-uh' - in California - so that callers would be less likely to say no.

Such a tightrope. So much fun!