linguistic attributes (lemma, pos and features) for TEIEduard Drenth.

Author: Eduard Drenth2023-01

Schema corpora_linguistics: Elements

<join>

<join>
Module linking
Attributes att.linguistic (@pos) att.features (@islemma, @abbr, @poss, @reflex, @prefix, @prontype, @case, @tense, @voice, @number, @person, @verbtype, @verbform, @polite, @numtype, @degree, @mood, @gender, @hyph, @prodrop, @clitic, @inflection, @suffix, @valency, @convertedfrom, @predicate, @construction)
Contained by
May contain Empty element

<m>

<m>
Module analysis
Attributes att.linguistic (@pos) att.features (@islemma, @abbr, @poss, @reflex, @prefix, @prontype, @case, @tense, @voice, @number, @person, @verbtype, @verbform, @polite, @numtype, @degree, @mood, @gender, @hyph, @prodrop, @clitic, @inflection, @suffix, @valency, @convertedfrom, @predicate, @construction)
Contained by
May contain Empty element

<TEI>

<TEI>
Attributes
linguisticsversion
Status Required
Datatype teidata.enumerated
Legal values are:
2
1
Contained by
May contain Empty element

<w>

<w>
Module analysis
Attributes att.features (@islemma, @abbr, @poss, @reflex, @prefix, @prontype, @case, @tense, @voice, @number, @person, @verbtype, @verbform, @polite, @numtype, @degree, @mood, @gender, @hyph, @prodrop, @clitic, @inflection, @suffix, @valency, @convertedfrom, @predicate, @construction)
Contained by
May contain Empty element

Schema corpora_linguistics: Attribute classes

att.features

att.features Developed by the Fryske Akademy. july, 2019. Contents adheres to https://universaldependencies.org.
Module analysis
Members join m w
Attributes
islemma https://universaldependencies.org/u/overview/morphology.html Boolean, is this a base form
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
abbr https://universaldependencies.org/u/feat/Abbr.html Boolean feature. Is this an abbreviation?
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
poss https://universaldependencies.org/u/feat/Poss.html Boolean feature. Is this word possessive?
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
reflex https://universaldependencies.org/u/feat/Reflex.html Boolean feature, typically of pronouns or determiners. It tells whether the word is reflexive, i.e. refers to the subject of its clause.?
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
prefix https://universaldependencies.org/u/feat/Prefix.html Boolean feature, Is this a prefix word in a compound, that usually cannot stand on its own?
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
prontype https://universaldependencies.org/u/feat/PronType.html This feature typically applies to pronouns, pronominal adjectives (determiners), pronominal numerals (quantifiers) and pronominal adverbs.
Status Optional
Datatype teidata.enumerated
Legal values are:
prs
personal pronoun or determiner
rcp
reciprocal pronoun
art
Article is a special case of determiner that bears the feature of definiteness
int
interrogative pronoun, determiner, numeral or adverb
rel
relative pronoun, determiner, numeral or adverb
ind
indefinite pronoun, determiner, numeral or adverb
emp
Emphatic pro-adjectives (determiners) emphasize the nominal they depend on.
exc
exclamative determiner
dem
Demonstrative pronouns are often parallel to interrogatives.
case https://universaldependencies.org/u/feat/Case.html Case is usually an inflectional feature of nouns.
Status Optional
Datatype teidata.enumerated
Legal values are:
nom
nominative
acc
accusative
dat
dative
gen
genitive
ins
instrumental / instructive
par
partitive
tense https://universaldependencies.org/u/feat/Tense.html Tense is typically a feature of verbs.
Status Optional
Datatype teidata.enumerated
Legal values are:
past
past tense
pres
present tense
fut
future tense
voice https://universaldependencies.org/u/feat/Voice.html Voice is typically a feature of verbs.
Status Optional
Datatype teidata.enumerated
Legal values are:
act
The subject of the verb is the doer of the action (agent).
pass
The subject of the verb is affected by the action (patient).
number https://universaldependencies.org/u/feat/Number.html Number is usually an inflectional feature of nouns.
Status Optional
Datatype teidata.enumerated
Legal values are:
sing
A singular noun denotes one person, animal or thing.
plur
A plural noun denotes several persons, animals or things.
ptan
Plurale tantum, some nouns appear only in the plural form even though they denote one thing.
coll
Collective or mass or singulare tantum applies to words that use grammatical singular to describe sets of objects.
person https://universaldependencies.org/u/feat/Person.html Person is typically feature of personal and possessive pronouns / determiners, and of verbs.
Status Optional
Datatype teidata.enumerated
Legal values are:
1
The first person refers just to the speaker / author and in plural one or more additional persons.
2
The second person refers to the addressee(s).
3
The third person refers to one or more persons that are neither speakers nor addressees.
verbtype https://universaldependencies.org/u/feat/VerbType.html distinctions on top of verb and aux.
Status Optional
Datatype teidata.enumerated
Legal values are:
mod
Verbs that take infinitive of another verb as argument and add various modes of possibility, necessity etc.
tense
Verb used to create periphrastic verb forms (tenses, passives etc.).
verbform https://universaldependencies.org/u/feat/VerbForm.html form of verb or deverbative.
Status Optional
Datatype teidata.enumerated
Legal values are:
inf
Infinitive is the citation form of verbs in many languages.
part
Participle is a non-finite verb form that shares properties of verbs and adjectives.
ger
Gerund is a non-finite verb form that shares properties of verbs and nouns.
conv
The converb, also called adverbial participle or transgressive, is a non-finite verb form that shares properties of verbs and adverbs.
polite https://universaldependencies.org/u/feat/Polite.html Various languages have various means to express politeness or respect.
Status Optional
Datatype teidata.enumerated
Legal values are:
infm
usually meant for communication with family members and close friends.
form
usually meant for communication with strangers and people of higher social status.
numtype https://universaldependencies.org/u/feat/NumType.html numeral type.
Status Optional
Datatype teidata.enumerated
Legal values are:
ord
ordinal number (first, second,..)
card
cardinal number (one, two, many,....)
degree https://universaldependencies.org/u/feat/Degree.html Degree of comparison is typically an inflectional feature of some adjectives and adverbs.
Status Optional
Datatype teidata.enumerated
Legal values are:
cmp
comparative, second degree
sup
superlative, third degree
dim
Added to features in universaldependencies. Diminutive.
mood https://universaldependencies.org/u/feat/Mood.html Mood is a feature that expresses modality and subclassifies finite verb forms.
Status Optional
Datatype teidata.enumerated
Legal values are:
imp
The speaker uses imperative to order or ask the addressee to do the action of the verb.
sub
The subjunctive mood is used under certain circumstances in subordinate clauses, typically for actions that are subjective or otherwise uncertain.
ind
A verb in indicative merely states that something happens, has happened or will happen.
gender https://universaldependencies.org/u/feat/Gender.html gender.
Status Optional
Datatype teidata.enumerated
Legal values are:
masc
masculine gender
fem
feminine gender
neut
neuter gender
com
Some languages do not distinguish masculine/feminine but they do distinguish neuter vs. non-neuter. The non-neuter is called common gender.
hyph https://universaldependencies.org/u/feat/all.html#hyph-hyphenated-compound-or-part-of-it Is this part of a hyphenated compound? Depending on tokenization, the compound may be one token or be split to several tokens; then the tokens need tags.
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
prodrop Added for Frisian to MISC in universaldependencies. pronoun drop, omission of pronouns because they can be inferred
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
clitic Added for Frisian to features in universaldependencies. Most personal pronouns have a clitic form, which is the result of either vowel deletion, vowel reduction, monophthongization or schwa deletion, while there are also cases of suppletion.
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
inflection Not in universaldependencies. The modification of a word to express different grammatical categories such as tense, case, voice, aspect, person.
Status Optional
Datatype teidata.enumerated
Legal values are:
infl
Not in universaldependencies. inflected
uninf
Not in universaldependencies. uninflected
suffix Not in universaldependencies Boolean feature, Is this a suffix word in a compound, that usually cannot stand on its own?
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
valency Not in universaldependencies. Verb valency or valence is the number of arguments controlled by a verbal predicate.
Status Optional
Datatype teidata.enumerated
Legal values are:
1
An intransitive verb takes one argument (no object)
2
A monotransitive verb takes two arguments (of which one object)
3
A ditransitive verb takes three arguments (of which a direct and an indirect object)
convertedfrom Not in universaldependencies. Words belonging to one part of speech category used as another category.
Status Optional
Datatype teidata.enumerated
Legal values are:
adj
Not in universaldependencies. adjective used as another category
adv
Not in universaldependencies. adverb used as another category
ver
Not in universaldependencies. verb used as another category
num
Not in universaldependencies. numeral used as another category
pro
Not in universaldependencies. pronomen used as another category
part
Not in universaldependencies. verbform part used as another category
predicate Not in universaldependencies. Predicate.
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
Not in universaldependencies. statement about the subject
construction Not in universaldependencies. Construction.
Status Optional
Datatype teidata.enumerated
Legal values are:
attr
Not in universaldependencies. attributive

att.linguistic

att.linguistic 
Members join m
Attributes
pos https://universaldependencies.org/u/pos/index.html These tags mark the core part-of-speech categories.
Status Optional
Datatype teidata.enumerated
Legal values are:
adj
Adjectives are words that typically modify nouns and specify their properties or attributes.
adp
Adposition is a cover term for prepositions and postpositions.
adv
Adverbs are words that typically modify verbs for such categories as time, place, direction or manner.
aux
An auxiliary is a function word that accompanies the lexical verb of a verb phrase and expresses grammatical distinctions not carried by the lexical verb, such as person, number, tense, mood, aspect, voice or evidentiality.
cconj
A coordinating conjunction is a word that links words or larger constituents without syntactically subordinating one to the other and expresses a semantic relationship between them.
det
Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context.
intj
An interjection is a word that is used most often as an exclamation or part of an exclamation.
noun
Nouns are a part of speech typically denoting a person, place, thing, animal or idea.
num
A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.
part
Particles are function words that must be associated with another word or phrase to impart meaning and that do not satisfy definitions of other universal parts of speech.
pron
Pronouns are words that substitute for nouns or noun phrases, whose meaning is recoverable from the linguistic or extralinguistic context.
propn
A proper noun is a noun (or nominal content word) that is the name (or part of the name) of a specific individual, place, or object.
punct
Punctuation marks are non-alphabetical characters and character groups used in many languages to delimit linguistic units in printed text.
sconj
A subordinating conjunction is a conjunction that links constructions by making one of them a constituent of the other.
sym
A symbol is a word-like entity that differs from ordinary words by form, function, or both.
verb
A verb is a member of the syntactic class of words that typically signal events and actions.
x
The tag X is used for words that for some reason cannot be assigned a real part-of-speech category.
Eduard Drenth. Date: 2023-01