Tuesday, February 19, 2019

Chapter 7. PARSER: How Now Brown Cow (Part 1) - Data Structures

  • Arguments: None
  • Result: TRUE if command is valid, FALSE if command is not valid

7.1 Introduction

Probably the most intrigue feature of Infocom games has always been the parser. The minimal online documentation touched only on the features of PARSER but never the mechanism behind PARSER. Since Zork 1, the general approach to parsing has been relatively unchanged with successive Infocom games. Improvements were made, but they mainly expanded the syntaxes that the game would understand. Many games have special commands or ways to interact with the PLAYER that results in modifications in PARSER. Finally, the first EZIP game, AMFV, added new parser commands like OOPS.

7.2  Characters vs. Tokens

The ZIP language's READ routine will take a sequence of characters and store it in the input buffer (INBUF). The first byte in that buffer has the number of characters in the input. The input ends with a zero byte and does not include the terminating character like a carriage return. READ will then match words in the input buffer to those in the game’s vocabulary and create a separate buffer of values corresponding to the matched words (called tokenizing). Words are separated by a space or designed separator characters stored in the vocabulary. For each word, a 4 byte block is created from three pieces of information for each token. First, the Z-string of the word is created and matched to the words in the vocabulary. All ZIP 1 to 3 games were limited to the first six characters of a word. ZIP 4 and 5 could use up to nine characters. If a match is found, the address of that word in the vocabulary table is saved in the first two bytes of the block. If no match is found, 0 is used. The third byte in the block will be the length of the token. The last byte is the offset from the start of the input buffer. The token buffer (LEXV) starts with two bytes. The first byte in the token buffer is the maximum number of tokens allowed. The second byte is the actual number of tokens in the buffer. The rest of the token buffer are groups of 4 byte token data blocks to represent the words in the input.

7.3 PARSER Variables and Grammatical Structure Definitions

In “The Parser’s Role” section of “Learning ZIL”, PARSER takes the input and tries to identify the action number for PRSA and the object numbers for PRSO (parser direct object) and PRSI (parser indirect object). This is not quite correct. It will set PRSA with the action referred by the verb in the input. However, PARSER does not set PRSO or PRSI. The only exception is if the PRSA is GO. Then the PRSO is set to the exit direction. The routine actually fills two tables (P-PRSO and P-PRSI) with all the direct and indirect objects requested in the command.
The basic grammar structure for a command is:

verb + prep + noun clause + prep + noun clause + end-of-command +
verb + prep + noun clause + prep + noun clause + end-of-command ...
Only the verb is required. All other parts are optional. A noun clause is a noun or set of nouns connected by conjunctions (AND or commas). These nouns can be modified by adjectives, quantifiers (ALL, A, or ONE), or other special tokens excluding prepositions (OF, BUT, or EXCEPT). The entire noun clause is referred as a direct or indirect object clause. Prepositions are not included in the noun clause.
For example:

DROP THE YELLOW BALL AND CROWBAR
INSERT A DOLLAR INTO THE RED SLOT 
TAKE ALL EXCEPT THE CANDLES
THE YELLOW BALL AND CROWBAR, A DOLLAR, THE RED SLOT,and ALL EXCEPT THE CANDLES are the noun clauses.
Individual commands can be connected together with end-of-command tokens (THEN, AND,, or periods) that indicate where a command stops. So:

DROP THE YELLOW BALL AND TAKE THE CROWBAR
DROP THE YELLOW BALL. TAKE THE CROWBAR

are equivalent to

DROP THE YELLOW BALL THEN TAKE THE CROWBAR
Commas can separate multiple objects in a single noun clause or indicate the start of a new command. So:

DROP THE BALL, A RAKE AND SHOVEL
DROP THE BALL, TAKE THE SHOVEL

are processed differently based upon word type after the comma. More in the details below.

7.4 PARSER Table (ITBL)

The main goal of PARSER is to extract the action (PRSA) and valid objects in the direct and indirect object clauses from the given command. To assist other routines in extracting this information, PARSER will store specific information related to the verb, prepositions, and location of noun clauses in a 10 word table, ITBL:
Word 0
Word 1
Word 2
Word 3
Word 4
Word 5
Word 6
Word 7
Word 8
Word 9
Verb Number
(VERB)
Verb Table Address (VERBN)
Prep Number (PREP1)
Addr of Prep (PREP1N)
Prep Number (PREP2)
Addr of Prep (PREP2N)
Start Addr of Direct Clause (NC1)
End Addr of Direct Clause (NC1L)
Start Addr of Indirect Clause (NC2)
End Addr of Indirect Clause (NC2L)
The verb number is a unique value for similar meaning verbs in the vocabulary. It is not the same as the action number. A verb will have the same verb number no matter the context of its use but it could have a different action number.  For example, LOOK has the verb number $E9 in all syntaxes, but the action number for LOOK FOR is $2D, LOOK IN is $3F, and LOOK is $D0 corresponding to different types of actions. The verb table contains the same information about the verb as in the token buffer: verb’s address (in Vocabulary), length, and location in the input buffer (INBUF). The start and end addresses of a clause refer to locations in the token buffer (LEXV).  Of note, this end address actually points to the token AFTER the last included token in the particular clause.

7.5 Checking Word Types with WT?

  • Arguments (Address, word type to match, word type to return)
  • Return ID value or FALSE if no match
WT? is one of the most important routines in Infocom games and sees if the Vocabulary entry at the given address has the given word type. This is the primary word type as described in Section 2.6. If the primary word type does not match the given word type, WT? returns with FALSE. If there is a match, the result returned depends on the third argument. If no third argument is given, the routine will return TRUE. If the third argument matches the secondary word type (as described in Section 2.6), then the secondary ID is returned. Otherwise, the primary ID is return regardless if it is a valid word type for the primary ID.
Primary word type
Secondary word type
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$80
$40
$20
$10
$08
$04
$02 = Adjective
$00 = Noun
Noun
Verb
Adjective
Direction
Preposition
Special
$03 = Direction
$01 = Verb
Using:

$4386:3A 6B C4 D9 62 B5 B4

the first 4 bytes are the z-string for “inflat”. $62 indicates it is an verb and adjective. So,

            CALL WT?($4386, $40 or $20) will return TRUE
CALL WT?($4386, $40 or $20, $02) will return the secondary ID of $B5.
CALL WT?($4386, $40 or $20, $00 or $01 or $03) will return the primary ID of $B4.
CALL WT?($4386, $10) will return FALSE as there is no match.
Infocom games interestingly do not put the special ID values for directions, verbs, or adjectives as the primary ID value when those tokens only have one word type. For example, the Vocabulary entry for “search” is:

$4967:61 46 DD 0D 41 E0 00

with the word type as 41 (primary type is verb, secondary type is verb). The primary ID value is $00 though. To get the verb number ($E0), you have to access the secondary ID by using $03 as the third argument:

        CALL WT?($4967, $40, $01)

Since a third argument is needed anyways to get an ID value, the designers to just put it as the secondary ID value.

Almost everyone game uses the same WT? routine. So games did not even use a separate routine but just hard coded the check when needed. LGOP did add an additional word type check for nouns, $80. If it was found, the routine would quickly exit (no value is returned for nouns) and bypass checking for secondary word types. Sherlock uses the newer compressed Vocabulary entry format. Since secondary word type checking happened mainly with prepositions, WT? only allowed prepositions to be checked then when secondary word type arguments are given. This is done by searching the Preposition table. If a match is found the preposition value is calculated based upon the token’s position in the Preposition table. Internal Infocom notes mentions a special WT? for The Lurking Horror where 3 ID values were stored for each token, but no evidence of this can be found of this routine in the 3 known game releases.

No comments:

Post a Comment