Fryske Akademy

linguistic attributes (lemma, pos and features) for TEIEduard Drenth.

Author: Eduard Drenth2020-12-13

Schema corpora_linguistics: Elements

<join>

<join>
Modulelinking
Attributesatt.linguistic (@pos) att.features (@islemma, @abbr, @poss, @reflex, @prefix, @suffix, @prontype, @case, @tense, @voice, @number, @person, @verbtype, @verbform, @polite, @numtype, @degree, @mood, @gender, @pronoun, @diminutive, @inflection, @valency, @construction, @convertedfrom, @predicate)
Contained by
May containEmpty element

<m>

<m>
Moduleanalysis
Attributesatt.linguistic (@pos)
Contained by
May containEmpty element

<w>

<w>
Moduleanalysis
Attributesatt.features (@islemma, @abbr, @poss, @reflex, @prefix, @suffix, @prontype, @case, @tense, @voice, @number, @person, @verbtype, @verbform, @polite, @numtype, @degree, @mood, @gender, @pronoun, @diminutive, @inflection, @valency, @construction, @convertedfrom, @predicate)
Contained by
May containEmpty element

Schema corpora_linguistics: Attribute classes

att.features

att.features Developed by the Fryske Akademy. july, 2019. Contents adheres to http://universaldependencies.org.
Moduleanalysis
Membersjoin w
Attributes
islemmahttps://universaldependencies.org/u/overview/morphology.html Boolean, is this a base form
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
abbrhttp://universaldependencies.org/u/feat/Abbr.html Boolean feature. Is this an abbreviation?
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
posshttp://universaldependencies.org/u/feat/Poss.html Boolean feature. Is this word possessive?
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
reflexhttp://universaldependencies.org/u/feat/Reflex.html Boolean feature, typically of pronouns or determiners. It tells whether the word is reflexive, i.e. refers to the subject of its clause.?
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
prefixhttps://universaldependencies.org/u/feat/Prefix.html Boolean feature, Is this a prefix word in a compound, that usually cannot stand on its own?
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
suffixnot in universaldependencies Boolean feature, Is this a suffix word in a compound, that usually cannot stand on its own?
Status Optional
Datatype teidata.enumerated
Legal values are:
yes
prontypehttp://universaldependencies.org/u/feat/PronType.html This feature typically applies to pronouns, pronominal adjectives (determiners), pronominal numerals (quantifiers) and pronominal adverbs.
Status Optional
Datatype teidata.enumerated
Legal values are:
prs
personal pronoun or determiner
rcp
reciprocal pronoun
art
Article is a special case of determiner that bears the feature of definiteness
int
interrogative pronoun, determiner, numeral or adverb
rel
relative pronoun, determiner, numeral or adverb
ind
indefinite pronoun, determiner, numeral or adverb
emp
Emphatic pro-adjectives (determiners) emphasize the nominal they depend on.
exc
exclamative determiner
dem
Demonstrative pronouns are often parallel to interrogatives.
casehttp://universaldependencies.org/u/feat/Case.html Case is usually an inflectional feature of nouns.
Status Optional
Datatype teidata.enumerated
Legal values are:
nom
nominative
acc
accusative
dat
dative
gen
genitive
ins
instrumental / instructive
par
partitive
tensehttp://universaldependencies.org/u/feat/Tense.html Tense is typically a feature of verbs.
Status Optional
Datatype teidata.enumerated
Legal values are:
past
past tense
pres
present tense
fut
future tense
voicehttp://universaldependencies.org/u/feat/Voice.html Voice is typically a feature of verbs.
Status Optional
Datatype teidata.enumerated
Legal values are:
act
The subject of the verb is the doer of the action (agent).
pass
The subject of the verb is affected by the action (patient).
numberhttp://universaldependencies.org/u/feat/Number.html Number is usually an inflectional feature of nouns.
Status Optional
Datatype teidata.enumerated
Legal values are:
sing
A singular noun denotes one person, animal or thing.
plur
A plural noun denotes several persons, animals or things.
ptan
Plurale tantum, some nouns appear only in the plural form even though they denote one thing.
coll
Collective or mass or singulare tantum applies to words that use grammatical singular to describe sets of objects.
personhttp://universaldependencies.org/u/feat/Person.html Person is typically feature of personal and possessive pronouns / determiners, and of verbs.
Status Optional
Datatype teidata.enumerated
Legal values are:
first
The first person refers just to the speaker / author and in plural one or more additional persons.
second
The second person refers to the addressee(s).
third
The third person refers to one or more persons that are neither speakers nor addressees.
verbtypehttp://universaldependencies.org/u/feat/VerbType.html distinctions on top of verb and aux.
Status Optional
Datatype teidata.enumerated
Legal values are:
mod
Verbs that take infinitive of another verb as argument and add various modes of possibility, necessity etc.
tense
Verb used to create periphrastic verb forms (tenses, passives etc.).
verbformhttp://universaldependencies.org/u/feat/VerbForm.html form of verb or deverbative.
Status Optional
Datatype teidata.enumerated
Legal values are:
inf
Infinitive is the citation form of verbs in many languages.
part
Participle is a non-finite verb form that shares properties of verbs and adjectives.
ger
Gerund is a non-finite verb form that shares properties of verbs and nouns.
conv
The converb, also called adverbial participle or transgressive, is a non-finite verb form that shares properties of verbs and adverbs.
politehttps://universaldependencies.org/u/feat/Polite.html Various languages have various means to express politeness or respect.
Status Optional
Datatype teidata.enumerated
Legal values are:
infm
usually meant for communication with family members and close friends.
form
usually meant for communication with strangers and people of higher social status.
numtypehttp://universaldependencies.org/u/feat/NumType.html numeral type.
Status Optional
Datatype teidata.enumerated
Legal values are:
ord
ordinal number (first, second,..)
card
cardinal number (one, two, many,....)
degreehttp://universaldependencies.org/u/feat/Degree.html Degree of comparison is typically an inflectional feature of some adjectives and adverbs.
Status Optional
Datatype teidata.enumerated
Legal values are:
cmp
comparative, second degree
sup
superlative, third degree
moodhttp://universaldependencies.org/u/feat/Mood.html Mood is a feature that expresses modality and subclassifies finite verb forms.
Status Optional
Datatype teidata.enumerated
Legal values are:
imp
The speaker uses imperative to order or ask the addressee to do the action of the verb.
sub
The subjunctive mood is used under certain circumstances in subordinate clauses, typically for actions that are subjective or otherwise uncertain.
ind
A verb in indicative merely states that something happens, has happened or will happen.
genderhttp://universaldependencies.org/u/feat/Gender.html gender.
Status Optional
Datatype teidata.enumerated
Legal values are:
masc
masculine gender
fem
feminine gender
neut
neuter gender
com
Some languages do not distinguish masculine/feminine but they do distinguish neuter vs. non-neuter. The non-neuter is called common gender.
pronounNot in universaldependencies. pronoun drop or clitic
Status Optional
Datatype teidata.enumerated
Legal values are:
drop
Not in universaldependencies. pronoun drop, omission of pronouns because they can be infered
clitic
Not in universaldependencies. pronoun clitic, most personal pronouns have a clitic form, which is the result of either vowel deletion, vowel reduction, monophthongization or schwa deletion, while there are also cases of suppletion.
diminutiveNot in universaldependencies. Diminutive.
Status Optional
Datatype teidata.enumerated
Legal values are:
dim
Not in universaldependencies. diminutive
inflectionNot in universaldependencies. The modification of a word to express different grammatical categories such as tense, case, voice, aspect, person.
Status Optional
Datatype teidata.enumerated
Legal values are:
infl
Not in universaldependencies. inflected
uninf
Not in universaldependencies. uninflected
valencyNot in universaldependencies. Verb valency or valence is the number of arguments controlled by a verbal predicate.
Status Optional
Datatype teidata.enumerated
Legal values are:
mtran
Not in universaldependencies. a monotransitive verb takes two arguments (of which one object)
tran
Not in universaldependencies. a transitive verb requires one or more objects
intran
Not in universaldependencies. an intransitive verb takes one argument (no object)
ditran
Not in universaldependencies. a ditransitive verb takes three arguments (of which a direct and an indirect object)
constructionNot in universaldependencies. Construction.
Status Optional
Datatype teidata.enumerated
Legal values are:
attr
Not in universaldependencies. attributive
convertedfromNot in universaldependencies. words belonging to one part of speach category used as another category.
Status Optional
Datatype teidata.enumerated
Legal values are:
adj
Not in universaldependencies. adjective used as another category
adv
Not in universaldependencies. adverb used as another category
ver
Not in universaldependencies. verb used as another category
num
Not in universaldependencies. numeral used as another category
pro
Not in universaldependencies. pronomen used as another category
part
Not in universaldependencies. verbform part used as another category
predicateNot in universaldependencies. Predicate.
Status Optional
Datatype teidata.enumerated
Legal values are:
pred
Not in universaldependencies. statement about the subject

att.linguistic

att.linguistic 
Membersjoin m
Attributes
poshttp://universaldependencies.org/u/pos/index.html These tags mark the core part-of-speech categories.
Status Optional
Datatype teidata.enumerated
Legal values are:
adj
Adjectives are words that typically modify nouns and specify their properties or attributes.
adp
Adposition is a cover term for prepositions and postpositions.
adv
Adverbs are words that typically modify verbs for such categories as time, place, direction or manner.
aux
An auxiliary is a function word that accompanies the lexical verb of a verb phrase and expresses grammatical distinctions not carried by the lexical verb, such as person, number, tense, mood, aspect, voice or evidentiality.
cconj
A coordinating conjunction is a word that links words or larger constituents without syntactically subordinating one to the other and expresses a semantic relationship between them.
det
Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context.
intj
An interjection is a word that is used most often as an exclamation or part of an exclamation.
noun
Nouns are a part of speech typically denoting a person, place, thing, animal or idea.
num
A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.
part
Particles are function words that must be associated with another word or phrase to impart meaning and that do not satisfy definitions of other universal parts of speech.
pron
Pronouns are words that substitute for nouns or noun phrases, whose meaning is recoverable from the linguistic or extralinguistic context.
propn
A proper noun is a noun (or nominal content word) that is the name (or part of the name) of a specific individual, place, or object.
punct
Punctuation marks are non-alphabetical characters and character groups used in many languages to delimit linguistic units in printed text.
sconj
A subordinating conjunction is a conjunction that links constructions by making one of them a constituent of the other.
sym
A symbol is a word-like entity that differs from ordinary words by form, function, or both.
verb
A verb is a member of the syntactic class of words that typically signal events and actions.
x
The tag X is used for words that for some reason cannot be assigned a real part-of-speech category.
Eduard Drenth. Date: 2020-12-13