Basic Principles

For the language to be useful, a suitable method for specifying linguistic features is needed.  In English, many linguistic differences a provided by separate words.  For example, "a dog", "a few dogs", "those dogs", "that dog", "all dogs" and "any dog" all specify different sub-groups of the class of all dogs.  However, in other languages, these distinctions may be made by modifying the word for "dog".  I'll discuss how the word can be modifed later, but for now, all I care about is how to represent the information that makes the meanings unique.  First, what are the distinctions between those phrases:

So what can we deduce from this?  Not too much, but enough to notice that there are two features (number and specificity) which have different values for each of the phrases:

Feature Structures

The number and specificity features above are examples of a more general representation known as a feature structure.  A feature structure is a general purpose data structure which identifies and groups together individual features, each of which associates a name with one or more values. Because of the generality of feature structures, they can be used to represent many different kinds of linguistic information.

In the representation chosen, features are strongly typed and fall into two basic classes: simple features with a set of typed values and complex features which comprise of a set of sub-features.  If you find that a particular feature appears to need both a value and comprises a set of sub-features, then a simple sub-feature must be defined to hold the value.


<consonant +voiced>
<consonant -voiced>
<consonant <voiced true>>
<word <stem "dog"> <number paucal> <specificity generic>>
<word <stem "fish"> <number singular>>
<word <stem "fifth"> +ordinal <arabic_value 5>>
<word <stem "five"> +cardinal <arabic_value 5>>
<agreement <number {singular, dual, plural}> <person {first, second}> <gender animate>>

In the examples above, notice that true is highlighted in blue.  This indicates that true is a reserved word.  This convention will be used throughout these pages.  Noticed the use of <consonant +voiced> and <consonant <voiced true>> above.  These specify exactly the same feature structure: +voiced and -voiced  are just a short-hand way of specifying the value of boolean valued featues

The language is strongly typed and features must be declared before they are used.  By default, here are no restrictions on the use of sub-features for a complex feature.

Head element


Feature Declarations

A few example feature declarations are shown below:

feature stem;
feature noun_phrase, verb_phrase;
bool voiced;
int arabic_value;
double weight is 0.045;
enum person { first, second, third };
enum number { 1, 2, 3 };
string stem;

To restrict the allowable sub-features of a complex feature the following syntax is used:

feature agreement constraint {+stem, +agreement, person, case, -weight};

In constraints, a "+" preceding a feature name means that feature is mandatory and the constrained feature must include an instance of the feature.  Similarly a "-" indicates that a feature is forbidden and the constrained feature cannot include an instance of the feature.  Feature names without a "+" or a "-" are optional and the constrained feature may or may not include an instance of the feature.  By default, when a feature is declared there are no constraints and it may include any sub-feature (except itself).  When a constraint is applied, then only the features listed in the constraint are allowed.

At first glance, it may seem that "-" constraints are unnecessary.  After all, if only the listed features are allowe, then an unlisted feature is forbidden.  However, a feature is sometimes a sub-type of another feature and may have different constraints.  For example, noun and verb could be considered as subclasses of a part_of_speech feature.  If part_of_speech included a "tense" feature, then this would usually be forbidden for nouns but mandatory for verbs..  The syntax for declaring a feature as a subclass of another is shown below:

feature part_of_speech constraint {stem, agreement, tense};
feature noun constraint {-tense} : part_of_speech ;
feature verb constraint {+tense } : part_of_speech;

Instances of a feature structure can also be declared.  These are useful when a particular feature structure needs to be repeated many times.

height #close_mid is <height -open -raised +mid +close>;
vowel #letter_e is <vowel #close_mid #front #rounded>;

The first declaration declares "#close_mid" as shorthand for writing "<height -open -raised +mid +close>".

Simple Features

Lingual includes five types of simple feature: bool, int, double, enum and string.  Bool and enum features are restricted to a finite set of values.  The remaining simple types represent non-finite types and have an infinite number of possible values.  Simple features may be either fully specified or partially specified.

A feature is fully specified values when there is a one-for-one association of a value with the feature:

bool:	 <voiced true>
enum: <number singular>

int: <arabic_value 7>
string: <stem "dog">

A feature is partially specified values when there is not a one-for-one association of a value with the feature.  A partially specified feature may be unbound, constrained or restricted.

Unbound.  A feature that has not been bound to a value yet.

bool:    <voiced _>
enum: <number _>

Constrained.  A feature that is restricted to a particular set of values in a particular context.

enum:    <number {singular, plural}>
int: <arabic_value {1, 2, 3}>
string: <stem {"dog", "fish"}>

Restricted.  A feature where certain values are not permitted in a particular context.

int : <arabic_value -{0, 1, 2, 3}>>
enum: <number -{paucal, dual}>


Complex Features

Copyright © 2005 Tony Jebson