Project 8: Web Browsing By Voice
Due: Thursday, November 30th *in class*
In this project, you’ll write a script that enables speech control of a web page on which it is included. It’s a little different this time because we will not be writing an extension. The reason is that due to permission restrictions, we can’t easily access the microphone (and, thus, speech recognition from an extension).
You can, however, access speech recognition relatively easily from a web page hosted on a secure server.
Hosting A Page On A Secure Server
We suggest hosting your HITs on Github Pages. The reason is that it’s free, relatively easy, and supports https, which Mechanical Turk requires. It’s not fancy web hosting. You’ll have ~zero control over the backend. But, as we’ll see, almost everything we’ll be doing is on the front-end anyway.
There’s a good walk-through of how to do this on the Github Pages page:
After these commands, I could visit:
https://jbigham.github.io/ and see a web site that I created:
Adding files to your Github pages site is as simple at adding, committing, and pushing using git.
The the following to trigger the speech recognizer when the user presses a button on any web page that they visit (for instance, the spacebar) --
Insert a little bar at the bottom of each page that is loaded, and display the recognized speech to the user.
Your extension will recognize three different actions (verbs):
- click [phrase describing the thing to click on]
- enter [text to enter]
- scroll [down/up]
To figure out which command the user said, you’ll use a regular expression.
Click [phrase describing the thing to click on]
One of the challenges in clicking something is finding the element that was mentioned. Fortunately, only some elements on a page can be clicked, and we can go through and manually find those that match the phrase that was mentioned.
You can start with something like this:
// is this the element that should be clicked?
Enter [text to enter]
Similar to the last assignment, keep track of the last item that accepts text that has been clicked, and enter text into that item when requested.
Scroll up or down as requested.
People will always request things that your system can’t handle, or have their valid commands garbled by the speech recognizer. In that case, given them an error.
Main components of our grading will be:
Using the appropriate keyboard commands, does the following happen:
- Upon clicking a button, allows user to give a speech command
- If the command is to click, it clicks
- If the command is to enter text, it enters text
- If the command is to scroll, it scrolls
- If the command is something else, it displays a message saying it did not understand