Copyright (C) 2001, Particle Corp., Alex S.
To get a description of what's happening inside the program, and how the database is organized, look though the description.html file.
This document is to give you an overview of what each file does, how to go about modifying Prof.Phreak, how to make it work, etc., and mentions some of the problems.
There are currently 2 database files (you can have more if you like though), one is mostly for direct keyword matches, and the other is mostly for pattern matching. This is not strictly the case though, and I had to move things around a bit to make it give intelligent responses.
Reason for 2 databases: The idea of the search is to find the maximum size match. Some of the more general rules like "What (.*)" were matching rather large sequences, and easily defeated rules like "what is the weather" when the user would say something like "what is the weather today"... so, more direct patterns were put into the first database, so, if a rule like "what is the weather" would not be found, then the general rule "what .*" would be hit in the second database.
File phreak_s.db has all the simple direct matches (and is searched first), and file phreak_t.db has all the big template matches.
Note: the convention has not worked for some rules, so, the current database is a huge big mess.
The primary code is in prof.pl. That's the file used for testing the database. You just run the program, type in input, and it prints out output. Every time you give it input, it reloads all the databases. This lets you modify the database, and try out some rule without having to restart the program. prof.pl is the program used for development, and testing/modification of the database. It is *the* code.
It is not advisable to call this program from a CGI script, since that would create another process (the last thing you want inside a CGI script is to create new unnecessary processes). For this, there is a separate CGI script that includes all the code from prof.pl
In order to provide a CGI interface, a MakeHTML file was created. That's profphreak_ss.cgi file. You can use MakeHTML processor to compile it into a fully functioning CGI script (you might have to modify it to make it look reasonable though). You can get MakeHTML at http://www.theparticle.com/
Note: ALL the prof.phreak code that's in the CGI is inside the prof.pl. The MakeHTML file is just a wrapper to give the code a nice web interface. (you can easily create your own wrapper, etc...)
One idea I've been thinking about is to have a Java Applet connect to some primitive CGI for interaction (as opposed to having an HTML interface). The Applet might be more user friendly.
Obviously, the strength of this program lies with it's database. The bigger and more accurate the database, the better the replies it gives. So, reorganizing, and modifying the database is the primary mode of modification.
(tip: keep logs of conversations, then read them, and whenever prof.phreak goofed at the answer, stick in something into the database to correct it; this iterative approach works quite well.)
Modifying the program itself to handle more specific databases. If someone wants to talk about art, talk about art. If someone wants to talk about philosophy, talk about philosophy, etc. You can also keep the 'topic' of the conversation as an indicator of which database to search first (so, if your last talked about the weather, and then you get a query like "how is it", you might reply "hot. I hope it gets cooler soon." or something like that.)
Store specific information about the conversation. If someone told you their name, try it put it into some replies. (ie: "What do you think?" ... "Well, John, I think a lot about a lot of things. But I won't tell you what I think."). These things can be very convincing. Enable some rules to be fired only if you posses some information about the conversationalist.
To store the conversation, and reply not just to the 'current' sentence, but to the whole conversation (concentrating on the last statement the most).
Rewriting the program in another language (Java might be fun...)
Speeding up the database search. It is reasonably fast as it is, but it is extremely wasteful of resources.
The major problem with Prof.Phreak is the database mess. This problem leads to another related problem: some rules in the database are unreachable. This means that no matter what the user inputs, the rule will be matched by some other rule, and will never hit some nicer rule.
This is a big problem. It unnecessarily increases the size of the database, and neglects all the work put in into creating those unreachable rules in the first place. It also makes the program dumber, since your effective rules database is smaller than it really is.
One way to solve it is to write test suites, that include every rule and inputs. Then, after every database modification run, go through the test suite, and make sure that all the rules in the database get called (and no rule obscures another rule). If it does, fix it so that it doesn't. You might have to get rid of the old rule, modify the new rule to be a bit more picky so that the old rule still gets called on the desired criteria, etc.
This program is being released under the GNU General Public License (GPL). Read the gpl.txt file for details.