Amazon Echo with a Camera

TL;DR

The Problem

One of my many frustrations with the Amazon Echo is having to say “Alexa…” before I put in a request or query. This is frustrating as it’s not how humans interact with each other. 


Whenever we speak to each other as humans, we often start by saying the name of the person we want to interact with and carry on the conversation from there. We do not feel the need to repeat the person’s name before each sentence because that would be tasking and redundant. It is often obvious who you are speaking to because you are looking at the person or you are within a very close distance to the person you are communicating to [1].


However, this is not the case with Alexa and other conversational AI devices. You have to repeat their name everytime you want to interact. This is extremely frustrating and on multiple occasions, I have seen people continue a conversation with Alexa after the first instance only to realise, the name has to be said every time you want to interact with it.

•   •   •

The Solution

Add a camera that detects whenever a human is looking at an Amazon Echo.

Mimicking how humans interact by adding a camera to the top of an Amazon Echo alleviates the issue of continuously saying “Alexa…” if you would like to put in multiple requests or have a conversation. The camera would recognise when a human is looking at it and light up ready to receive a request or query. Consider the differences between these two conversations:


Without Camera

- Human: Alexa who is the founder of Amazon?

- Alexa: Amazon was founded by Jeff Bezos

- Human: Alexa how old is Jeff Bezos?

- Alexa: Jeff Bezos is 56 years old

- Human: Alexa Where was Jeff Bezos born?

- Alexa: Born in Albuquerque New mexico 1964 

With Camera

- Human: Alexa who is the founder of Amazon?

- Alexa: Amazon was founded by Jeff Bezos

- Human: How old is he?

- Alexa: Jeff Bezos is 56 years old

- Human: Where was he born?

- Alexa: He was born in Albuquerque New mexico 1964 


In the ‘With Camera’ column, the human is able to continue a conversation without saying “Alexa” every time.


A photoshopped Camera on an Amazon Echo


Add a sensor to the Amazon Echo

Adding a sensor to Amazon Echo which can detect when a human is nearby can also achieve the same effect of the camera. If a human comes nearby, then the Echo can recognise their close proximity and light up ready to receive a request. However, in order for the Echo not to be confused by other conversations in which the human may be having with others in the background, the word “Alexa” should always be triggered first before subsequent requests are made.


An Amazon Echo with a Sensor


Pushback- Privacy Concerns

One of the reasons this solution may not work is due to privacy concerns. Amazon Echo currently faces some backlash over the “Always listening feature” and adding a camera to the device may not help their case. However, I believe these concerns can be tackled by not sending any of the data from the camera to the cloud for storage. I.e. all information it receives via the camera will stay on the device.

I also do not think Amazon will shy away from putting a camera in the home as they recently announced a new product called “The Always Home Cam” by Ring.

Amazon could also release the device with just the sensor instead of the camera.


•   •   •