Software applications that we use everyday are equipped with user interfaces to handle communications with users. For example, when a user passes a request through an application interface, the said application captures the request, processes the request, and passes on the response to the application’s user interface. This is the general flow of communications via software applications regardless of the technology behind it.

When it comes to improving user experiences, the right technologies can help create a winning difference. The greatest and most recent example is “Voice First” technology. If a user can communicate with machines using voice, it can bridge the gap between human and machine interaction. When voice is involved with user interfaces,  it has been said to make significant impacts to user experiences. Voice is naturally a faster communication method when compared to text. Consider the example;


The average typing speed of a human is 45 to 50 words a minute and average number of words can be spoken in a minute is 120 to 150 words.


The vocal speed of communication is not the only reason of its success but it also helps make the interaction feel more humanly.

The traditional screen based user experiences are made up of visual cues such as text, buttons, hyperlinks, animations and some audio elements. Designers create these visual elements to direct users to services provided by the software, and how to use the said features.

Today we are realising of the limitations that ‘graphics led’ user interfaces impose on user interactions with software applications. Credit where credit is due, voice-first led interfaces can open up a whole new arena in the fields of user experiences and digital services.

Following, let’s take a look at a few popular “Voice-First” led products that are available in the marketplace

Amazon Echo & with Alexa

Amazon Echo is a smart audio device that is equipped with an intelligent voice controlled personal assistant system called Alexa. Amazon Echo assists users with a number of services such as: play music, make calls, set musical alarms and timers, ask intelligent questions, control smart home devices and many more. All of these voice services can be activated instantly with Alexa by asking a question or making a voice command.

The user interacts with the Amazon Echo by way of voice commands. In order for the user to listen to the device’s voice, the said device needs to be equipped with audio speakers, and to listen to the user the device requires a microphone. These speakers and microphones are highly accurate and are smart enough to deal with environmental variations.


Google Home with Google Assistant

Similar to Amazon’s Echo, Google Home is another smart speaker developed by Google. Google Home is also equipped with a voice controlled digital personal assistant service named Google Assistant.

Some of the services provided by Google Home are voice enabled communications with connected Google Home devices, smart home controllers, setting reminders, and making hands free calls, and so on.

Apple Home Pod

Apple Home Pod is another intelligent voice controlled audio device Apple. The Apple Home Pod can help people with their day-to-day tasks, similar to how Google Home and Amazon Echo can help people interact and issue voice commands to the device.

Future of machine-human interaction

Amazon, Google and Apple have been trying to move user-computer communications from the traditional screen-oriented paradigm to a more human friendly level.

Does this mean that screens will be useless in future? Can voice be the only interaction method in future?

That we do not know yet. There is no telling when or how technology will change.


However, the screens we have for user interactions in websites, mobile apps, and device-softwares are still valuable to us. At this moment the best course of action is to get whatever value that voice-first technology offers over screen-based user-interaction, and make improvements to current screen-based user experiences.

Further analysis for voice first technology
Human focused interaction

A dialogue between two people includes not only questions and answers, but also facial expressions and body-language which helps make exchanges more meaningful. Certain words used in a conversation can mean different things to different people.

For example, consider the words; ‘big’, ‘giant’, and ‘large’. Technically there are no specific sizes that these words imply. The human brain can make sense of these words according to the context of the dialogue.

When it comes to voice-first, catering to the above mentioned scenarios is a challenge for designers. However, the interface-design should not restrict users from asking questions based on selected words. It is the responsibility of the designer to handle all possible scenarios and train the system to assist users. Sampling and categorisation of synonyms is a solution to this problem. Likewise, natural-language scenarios have to be addressed in voice-first technology.


Individualised Interaction

In the context of traditional user-interfaces, consistency is a key feature of individualised interaction. In general, when user interfaces are consistent, it is easy for users to apply their general knowledge and familiarise themselves with the system. This rule is not applicable for voice-first user interactions.

The user interaction component of voice-first interfaces needs to be flexible enough to support the user – depending on how the user wants it. In this case, user interfaces do not always have to be visually consistent, rather, they can be variable.

Consider a scenario where a particular user wants to buy a T-shirt through the application. Normal shopping cart applications are equipped with pre-defined sets of actions to be followed until the checkout happens. This waterfall-like design has so many steps which might have to be repeated several times in order to arrive to the checkout page, most of which can be skipped if the user interaction was properly streamlined. This kind of interaction is not particularly user-friendly.

Voice interface has no predefined flow. Instead, it is a set of actions that are designed to assist the user based on his requirement. The below diagram shows us how a voice interaction can be more efficient and how it can be more user-friendly too especially when the application has a set of questions that are to be answered to proceed to the next step.

This means the user can interact with the app in any format of questioning and with any combination of questions. The application voice-interaction is intelligent enough to handle the situation with only some of the questions being answered, and without having all of the questions answered.


Fig 1 - Voice interaction

(Fig 1 – Voice interaction)

Enhanced user Interface with voice interaction

Voice alone is not fully capable of handling applications that we require on a daily basis.

Not limiting to auditory devices alone, applications from other spheres of user interactions such as shopping applications may still require graphical user interfaces alongside voice interfaces because shoppers may need to see the product before committing to a purchase.

The below diagram shows how voice interaction can be flexible and support graphical user interfaces (GUI). In such situations, we need a simplistic GUI to aid users in interacting with the application while voice is used independently, and simultaneously.


Fig 2 - Voice supported graphical user experience

(Fig 2 – Voice supported graphical user experience)


Take away

Going forward, in the fields of application development and advanced user interactions, voice enabled interactions are likely to make seamless computer-human interactions. It is established that voice is a primary source of communications among humans, and activating machines and software to communicate with users via the same voice commands will lead to intuitive user interfaces.

We have already witnessed state of the art voice controlled applications such as Amazon’s Alexa, Google’s Home and Apple’s Home Pod leading the charge in delivering voice controlled devices to the homes of the general consumer. Proper application and correct usage of voice-first technologies can help take other applications to the next levels of human-computer interactivity.  

Thank you for reading our latest Mitra Innovation blog post. We hope you found the lessons that we learned from our own startup story interesting, and you will continue to visit us for more articles in the field of computer sciences. To read more about our work please feel free to visit our blog.


Chathuraka Mallawaarachchi

Senior Software Engineer| Mitra Innovation

Leave a Reply