For many decades now touch and sight have been the senses we use to ‘do’ computing. The dominant interaction paradigm has involved our eyes and hands. Be it with punch cards, keyboards, touchscreens or a mice, our hands have been at the heart of how we experience computing.

Hands need an opposable thumb to be really effective and when it comes to computing eyes have been the thumb to our four fingers. From around 1973, the Graphical User Interface or GUI has been the defining feature of computing. This is now changing…

The rise of voice

Voice is becoming – or anticipated to become -a key feature of our everyday computing experiences. Amazon Echo, Google Home and Apple’s Siri all put voice at the centre of the interaction. Voice control has long had a place in the Ubicomp vision but now this promises to be a reality.

Voice at the heart of everyday computing is going to need some adjustments. We are going to have to become less reliant on taken for granted features of our current computing paradigm. We are going to be flying blind.

My family and I been living with Amazon Echo for over 6 months. (And I have written before – Learning to live with Alexa – about how we speak to Alexa and the potential for rudeness to creep into our interactions). We use the Echo quite a bit. But, just as I observed with other users in research on digital assistants, we use her in limited ways. It may even be that we’re using her for fewer and fewer things. I think I know why that is.

Browsing without menus

I receive a weekly email from Amazon with a whole host of new ‘Skills’ that I can try out on Alexa. Skills are services or integrations. You can have Alexa order you a pizza from Dominos, get her to update you on the latest polls in the US election or read a haiku.

A week or so ago the email announced Alexa now had 1000 ‘Skills’. That’s testament to Amazon’s smart developer / integration strategy. Skills make Alexa more useful.

1000 skills

But, I am beginning to think, how do I remember the 1000 things she can do? Where is the menu? Where are the prompts? Where’s the list I can place near her? In short, I now realise that Alexa is great, and voice control is good, but I still need my eyes.

When I want to listen to music on Apple Music I use my eyes to browse ‘My Music’ or the ‘For You’ menu. My response to what my eyes see is either “Yes”, “No, what else” or “Not that, but now I have seen that album I think I would like to listen to…”. When I stand in front of Alexa to cue up some music the task is quite different. I have to engage my brain to recall bands, composer, albums or playlists without a visual cues. I’m on my own. I’m paralysed.

Shopping without shelves

Retail is an important aspect of Echo. Alexa is a shop assistant in “The Everything Store”. But here again the issue of vision versus voice becomes something worth considering.

Shopping has always been a deeply visual (and indeed, multi-sensory experience). From department stores and supermarkets to online retail, vision is at the heart of the experience. We shop with our eyes. Standing in front of a supermarket fixture we look for familiar colours, shapes, offers and brand designs.

Standing in front of Alexa to shop for toothpaste or breakfast cereals I lack those visual cues. Do I stick with what I know? “Alexa, order me a box of 24 Weetabix” will work fine. But I’ve no idea what specific toothpaste I’m currently using. I might recall the brand but was it Whitening, Super Whitening or Stain Remover Max? And in any event I tend buy on price. What’s BOGOF right now? I need cues, stickers and other prompts to guide my eyes and choices. Shopping with the mind’s eye is more difficult than shopping with my eyes.

cereal_aisle_cc

Shopping with the mind’s eye is more difficult than shopping with my eyes.

Doing computing with mind and voice but without any obvious visual inputs is going to require some adjustment.

Users will need to make decisions on what to do – or in this case say – without visual cues and prompts to help. Encouraging an increasing range of uses, not a diminishing set, won’t be easy. And if my experience with Echo is anything to go by this will be a non -trivial challenge for designers.

We think that for brands and retailers shopping with the voice is going to throw up an even bigger set of challenges:

How do we help people navigate large ranges and promotions in a non-visual universe?

How do we disrupt habitual brand relationships in voice-based scenarios?

What’s the equivalent of the Planogram in a voice-based retail world?

Does your brand or specific product have a good ‘bar call’ – i.e., is it memorable and easy to say?

Amara’s law states that “we tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run”. We not going to doing computing, or shopping for that matter, with voice alone for a very long time, if ever. There are signs that text (+ bots) and voice is a more likely combination. However voice is on the march, as evidenced by Siri’s arrival on Mac.

As voice-based computing becomes a reality we will have to engage with how people navigate devices, software, shelves and services with mental maps and only half remembered menus.

That will force some re-thinking of long established ways of doing computing, and how we do the shopping.

 

Articles