DIY IVR for PVR

The title is a bit of a mouthful, but basically being able to set up recordings on my PVR by calling the my home phone number and just speaking.

This is one of the projects I wanted to play with after setting up my OBi110 and Asterisk PBX.

Setting up systems where you press digits on your phone to navigate menus in Asterisk is pretty simple, but systems that listen to what you say and then interpret that are a little trickier. To make it work you need 3 main parts:

  • A system to record the audio
  • A way to parse the audio and turn it into text
  • Something to extract the interesting bits from the text

Asterisk will record the audio if poked the right way which leaves the last two bits to sort out.

Some of the guys at work do voice recognition projects and pointed me at some of the open source toolkits1, but these normally involve lots of training to get things accurate and reasonably meaty boxes to run the systems on. Since I’m running Asterisk on a Raspberry Pi I was looking for something a little more light weight.

A bit of searching round turned up a project on Github that uses Google’s Voice to Text engine with Asterisk already. This looked spot on for what I needed. Getting hold of a Google Speech API key is a little tricky as it’s not really a public API, but support for it is build into the Chromium web browser so following these instructions helped. The API key is limited to 50 calls a day but that should be more than enough for this project.

Once installed the following flow in the Asterisk dialplan lets you dial extension 200 and it will record any speech until there are 3 seconds of silence then it forwards it on to the Google service, when it returns it puts the text into a dialplan variable called utterance, along with a value between 0 and 1 indicating how confident Google is in what it says in a variable called confidence.

exten => 200,1,Answer()
exten => 200,n,AGI(/opt/asterisk/asterisk-speech-recog/speech-recog.agi,en-GB,3,#,NOBEEP)
exten => 200,n,Verbose(1,The text you just said is: ${utterance})
exten => 200,n,Verbose(1,The probability to be right is: ${confidence})
exten => 200,n,Hangup()

An example output:

The text you just said is: record bbc1 at 9 p.m.
The probability to be right is: 0.82050169

Now I’ve got some text to work with I needed something to make sense of it and turn it into an action that can be followed. A simple way to do this would be with some regular expressions2 but I wanted to try something a little smarter that I could also use to add support for other bits of my home automation system. This means looking at some proper NLP and Text Analytics technology.

Dale has recently written about deploying some simple Text analytics tools to BlueMix which I used as a starting point along with this set of introductory tutorials for IBM Languageware Workbench.

Following the instructions I built a number of databases, the main one of television channel names to make them easy to match and to include multiple versions to help smooth out how the voice to text engine interprets things like “BBC One” which could easily end up being mapped to BBC 1 or BBC1 to name but two. Then a bunch of rules to match times. It’s a little long winded to go into here, if I get time I’ll do a separate post on writing UIMA rules. Once the rules were finished I exported them as a PEAR file and wrote a Java Servlet to feed text into the pipeline and extract the useful bits from the CAS. The source for the servlet can be found on Github here. When I get a bit more time I’ll do a more detailed post on how I actually created these rules.

Now that I had a web end point I could send text to and get it marked up with all the interesting bits I needed a way to forward text to it from within the Asterisk dialplan. I used the earlier Voice to Text example to put together this little bit of perl

#!/usr/bin/env perl


use warnings;
use strict;
use URI::Escape;
use LWP::UserAgent;
use JSON;

my %AGI;
my $ua;
my $url = "http://192.168.122.1:8080/PEARWebApp/Processor";
my $response;
my $temp;

# Store AGI input #
($AGI{arg_1}) = @ARGV;
while (<STDIN>) {
        chomp;
        last if (!length);
        $AGI{$1} = $2 if (/^agi_(w+):s+(.*)$/);
}

$temp = "text=" . uri_escape($AGI{arg_1});

$ua = LWP::UserAgent->new;
$ua->agent("ben");
$response = $ua->post(
	"$url",
	Content_Type => "application/x-www-form-urlencoded",
	Content => "$temp",
);
if (!$response->is_success) {
	print "VERBOSE "some error"n";
	checkresponse();
} else {
	print "SET VARIABLE "action" "$response->content"n";
	checkresponse();
}
exit;

sub checkresponse {
        my $input = <STDIN>;
        my @values;

        chomp $input;
        if ($input =~ /^200/) {
                $input =~ /result=(-?d+)s?(.*)$/;
                if (!length($1)) {
                        warn "action.agi Command failed: $inputn";
                        @values = (-1, -1);
                } else {
                        warn "action.agi Command returned: $inputn" if ($debug);
                        @values = ("$1", "$2");
                }
        } else {
                warn "action.agi Unexpected result: $inputn";
                @values = (-1, -1);
        }
        return @values;
}

The response looks like this which I then used to feed a script that uses the MythTV Services API to query the program guide for what is showing at that time on that channel then to schedule a recording.

{
  "time": "9:00 am",
  "action": "record",
  "channel": "BBC ONE"
}

And I included the script in the dialplan like this:

exten => 200,1,Answer()
exten => 200,n,AGI(/opt/asterisk/asterisk-speech-recog/speech-recog.agi,en-GB,3,#,NOBEEP)
exten => 200,n,Verbose(1,The text you just said is: ${utterance})
exten => 200,n,Verbose(1,The probability to be right is: ${confidence})
exten => 200,n,AGI(/opt/asterisk/uima/action.agi,"${utterance}")
exten => 200,n,AGI(opt/asterisk/mythtv/record.agi,"${action}")
exten => 200,n,Hangup()

I need to add some more code to include some confirmation in cases where the confidence in the extracted text is low and also once the program look up has happened to ensure we are recording the correct show.

Now I have the basics working I plan to add some more actions to control and query other aspects of my home automation system.

1 Kaldi seams to be one of the interesting ones recently.
2 did I really say simple and RegExp in the same sentence?