Sunday, 23 February 2014

Displayless interfaces

Thanks to the tube strike the other week, I was forced to get some exercise. I walked the few miles from Liverpool St to the office where I work, and because I wasn’t familiar with the route and didn’t want to walk head-first into a lamp-post while looking at the map on my phone screen, I put an earphone in one ear and enabled voice-guidance on Google Navigation, leaving the phone in my pocket, so that I could concentrate on avoiding lamp-posts and not getting run over.

Disappointingly, the voice directions proved to be less than adequate. For some reason, the spoken instructions are simpler when Walk is selected instead of Drive. At all but the simplest of junctions, the terse “turn left” or “continue straight” message in my ear was ambiguous, and if I turned around to consider the options then any sense of direction was quickly lost.

What I needed was more context. “Turn left into Aldgate St,” instead of just, “Turn left”, would have been a good start. Better still, if I could ask questions such as, “Which road should I take?”, “Describe this junction,” or the kids’ favourite - “Are we nearly there yet?”

Google Glass could have put the map into my view, but who wants to look like part of the Borg Collective? And for that matter, I don’t much want to be seen talking to myself either, so give me an earphone for one ear and some buttons I can press without looking at the them, and I’ll be happy.

I think Google may have missed a trick - Google Glass is expensive, but Google Ear could be a free downloadable app.

There’s a similar problem when using SatNav in my car. If I want to detour to get fuel, then using the touchscreen to zoom out and look around the local area is tricky, not mention dangerous and illegal. There’s trend among car manufacturers now to replace dashboard controls with a large touchscreen, expanding this problem to even more tasks, from turning up the fan to changing radio station. (At least someone is thinking about this, but the communication is not rich enough in that interface for what I want.)

If I was using Google Ear to navigate while driving, and the set of standard buttons I can operate without looking at are mounted on the steering column, I could reroute without even looking at the screen, let alone touching it.

The key requirements are these:
  1. A set of buttons that can be operated without looking at them. Not too few, and not too many, and I’ll need a set I can use in my car, and a set I can use while walking, and perhaps a set at my desk.
  2. The set of buttons needs to be standardised, so that multiple manufacturers can supply them in various forms - a set on a bluetooth-connected key-fob, a set for the steering column, a set on the side of a bluetooth earpiece (killing two birds with one stone), ...
  3. An intuitive set of conventions for the use of said buttons, which remain the same in all contexts. Think of the buttons on a games console controller - the left and right buttons always mean the same thing, so their use quickly becomes second nature.
  4. Audible communication from the device. It wouldn’t all have to be spoken - short sounds can indicate status, progress, etc
  5. Some conventions for the structure of the voice output, which fit in with the button input conventions, and aim for efficient interaction between device and user.  For example, if I search for something, the spoken response could start with a simple, "28 results," and then I can decide if I want it to start listing the results or if I will refine the search first.
  6. Voice input commands for situations where button input would be too complex - for example, trigger voice mode with the buttons and say, “Find nearby petrol stations.”
  7. It lives in my phone, so I have it with me at all times.

What do you think? Would you use it?


Miles Goodhew said...

Yup, I'd use it!

I've often thought of things like this in the past for e.g. navigating Audio player menus using buttons and earpiece only (Have you seen some of the dinky little screens they put on "Non Apple" personal audio devices to this day?! - Ick!).

Using recognisable tone responses like the TiVo bleeps and bloops plus a simple annunciation to go through a tree-structured menu using a 4-way controller seems quite plausible. Might need some way to control/identify an "input grab" or "modal shift" if an application requires all buttons.

Another couple of applications I can think of are the Ingress game (Which, AFAICT requires the screen to be on to play) and a fitness interval-training timer (e.g. for running while "eyes-off")

Rob Noble said...

Have you ever used the text editor, Vi? In edit mode, there's a kind of "grammar" that combines keypresses to form "sentences". It takes a while to learn it, but the "verbs" and "nouns" become quite intuitive, and it makes the communication from human to machine quite rich. For example, the key sequence, "d3t/" means, "delete from here up until just before the 3rd '/'." and, "d10j" means, "delete this line and 10 below it." Context changes the meaning of the verb, "d", from "delete characters" to "delete lines", making the language more flexible.

The big question is, what set of verbs would be best for the buttons of a portable input device?

Jules May said...

Hey, Rob! Long time no see!

It seems to me that the problem comes about because a device speaks to you in a collection of languages much richer than the language you use to communicate with it. It can talk to you, it can show you maps (and of course, the device could say "Let me show you",just the same way that a human guide could), and it can even sound or buzz warnings if you're about to walk off a cliff.

But what do you do? You prod it. Whether you prod a soft reconfigurable screen or a standardised button-box, it seems to me that mode of interaction is fundamentally limited.

It also seems to me that devices are slowly attaining some facility with the mode of communication which humans find most natural - speaking. I mean, the most basic function of a phone, surely, is that you're going to speak into it. All of the dialogical examples which you used in your post were expressed in verbal terms. Now, you could, I suppose, introduce a translation layer whereby you represent those verbal interactions as yet more prods, but (on the assumption that phones can learn to understand you) why bother?

The answer, of course, is that my assumption is not yet manifest - phones are pretty poor at understanding people (or at least, they're pretty poor at understanding me!) But, if I'm being guided by a person, and they say "Turn left now", and I say "What's the street name?", and they say "Aldgate", then that seems pretty straightforward. If I'm driving and the naggybitch says "Take the third exit at the roundabout" and (assuming the naggybitch isn't sufficiently clever to know that I'm running low) I say "How far to the nearest petrol station?", it doesn't seem too much to expect that it would show me.

The problem, then, is not to introduce a button-box to talk to your phone. The problem is to get the phones to pay better attention to what you tell them. And that's already in-hand.

Rob Noble said...

Hey Jules, it's been too long.

I agree - I want to communicate to my device in a way that is natural to me, but I don't want to do it by talking when it's not obvious who (or what) I'm talking to.

Maybe it would be enough if I simply had to make the mock telephone gesture (thumb and little finger to the ear and mouth - hopefully you know what I mean) to activate it.

Jules May said...

Well, in Star Trek (TNG, to be precise) they tickled their nipples. I think touching the bluetooth in my ear seems like a more natural gesture :-)

As for the "phone me" gesture: been done:

Rob Noble said...

That's clever, but who wears gloves all the time?

No - on balance, I'm sticking with my request for buttons, but I would like the input sequence to be like a language of nouns and verbs, so that I can compose them in complex but intuitive ways.

I think I feel a prototype coming on...