Copyright (C) 2001, Particle Corp., Alex S.
Note: this current Prof.Phreak is only a Proof Of Concept (POC) for use of this type of technology & database.
The entire program is a relatively small (but complex) Perl script. The database has keywords, phrases, pattern matching, etc., in order to make this whole thing work. The database makes use of Perl regular expressions to facilitate easier (and more precise) string matching. A sample rule looks like this:
// simple regular expression match test
([a-z]+) hits ([a-z]+)
Why would $1 hit $2?
Can you prove that $1 hit $2?
I hope $2 didn't get hurt.
Is $1 violent?
Did $2 provoke $1?
Any Rule can have multiple regular expression matches (the rule above only has one), and multiple replies (rule above has five). Replies are chosen in sequence relative to the server (but not necessarily the client).
The rules can also have parameters. In the case above, ([a-z]+) matches any word. So, the rule above can be more or less translated into:
SUBJECT hits OBJECT
Where both SUBJECT and OBJECT are words. We can then place them into replies... like:
Why would SUBJECT hit OBJECT?
These matching rules combined with some hard coded rules make for some powerful database capabilities. There is also an additional 'type' of rule, which is not like the one shown (it's similar though):
// you are (complement)
(u|you|(you.*are)|(are.*you)).*(clever|fun|interesting|cool|smart|good)
How could you tell? Yes, I'm $4.
It must be my $4'nes. Everybody tells me that.
Yep. I'm $4.
Well, I admit it. I'm $4.
You got me. I'm definitely $4.
These new type of rules are more flexible, since you can apply them to specific areas. For example, there is a general rule "you are (.*)", in which case Prof.Phreak would reply "how did you know I am $1" or something like that... but that's not very nice, especially when people make fun, and say things like "you are a dork" and Prof.Phreak replies with "How did you know I am a dork," but now with these more targeted matches (which are searched first) you might get a more precise match, and things like "you are a dork" will be met with "You're the dork'est dork I've ever met!" Giving the impression that Prof.Phreak does not allow himself to be insulted (and understands what "dork" means).
There is another feature to this:
After all the matches have been completed, the program determines which rule matched the most. For example, we wouldn't want to call rule "you are good" when we see a pattern "you are good bastard" in the input. The maximum match almost guarantees we will only get the largest (and usually correct) match. (i.e.: in this case, Prof.Phreak would catch the insult, even though it's disguised by a complementary "you are good" phrase.
That's not the end either! The program also pre-processes all input, and converts things like: "I'm" into "I am" and "don't" into "do not", etc., which makes the database a bit smaller. Also, the number of spaces between words makes no difference. For example: "How are you?" is interpreted the same as "Howareyou?" and the same as "How ARE YOU?"
In the end, the program also post-processes all output to make sure things like "me" in the input refer to "you" in the output, and similar things like that.
I would like to give some credit to the thousands of people who wrote such a program and let other people play with it. Before writing Prof.Phreak I roamed the web through a lot of these types of programs, and obviously got many ideas from them.
The Prof.Phreak database is riddled with database chunks from many other such programs. I've modified it enough to give it a distinct personality, but if you really look for things, I'm sure you can find replies which you've already seen elsewhere. The databases used a lot are:
Bob (Bob Over Bob) by Luka
ElizaApplet (Cheezy Eliza) by Shoel Perelman
The others, are just my paranoia...