Project 8:  Web Browsing By Voice

Due: Thursday, November 30th  *in class*

In this project, you’ll write a script that enables speech control of a web page on which it is included. It’s a little different this time because we will not be writing an extension. The reason is that due to permission restrictions, we can’t easily access the microphone (and, thus, speech recognition from an extension).

You can, however, access speech recognition relatively easily from a web page hosted on a secure server.

Hosting A Page On A Secure Server

We suggest hosting your HITs on Github Pages. The reason is that it’s free, relatively easy, and supports https, which Mechanical Turk requires. It’s not fancy web hosting. You’ll have ~zero control over the backend. But, as we’ll see, almost everything we’ll be doing is on the front-end anyway.

There’s a good walk-through of how to do this on the Github Pages page:

After these commands, I could visit: and see a web site that I created:

Adding files to your Github pages site is as simple at adding, committing, and pushing using git.

Recognizing Speech

A great thing about modern web browsers it that they include speech recognition services that you can access using Javascript, either directly from a web page or from a browser extension.

The the following to trigger the speech recognizer when the user presses a button on any web page that they visit (for instance, the spacebar) --

A Guide to the Chrome Speech API

Insert a little bar at the bottom of each page that is loaded, and display the recognized speech to the user.

Recognizing Commands

Your extension will recognize three different actions (verbs):

  1. click [phrase describing the thing to click on]
  2. enter [text to enter]
  3. scroll [down/up]

To figure out which command the user said, you’ll use a regular expression.

The code that we wrote in class for regular expressions.

Click [phrase describing the thing to click on]

One of the challenges in clicking something is finding the element that was mentioned. Fortunately, only some elements on a page can be clicked, and we can go through and manually find those that match the phrase that was mentioned.

You can start with something like this:
$(“input,a”).each(function() {

  // is this the element that should be clicked?


Enter [text to enter]

Similar to the last assignment, keep track of the last item that accepts text that has been clicked, and enter text into that item when requested.

Scroll [down/up]

Scroll up or down as requested.


People will always request things that your system can’t handle, or have their valid commands garbled by the speech recognizer. In that case, given them an error.


Main components of our grading will be:

Using the appropriate keyboard commands, does the following happen:

This page and contents are copyright Jeffrey P. Bigham except where noted.