-
Here you can create a new private corpus.
A private corpus allows you to upload and search through your own data.
Corpora you create are not visible to others unless you explictly share them, and they are restricted in their maximum size.
-
Select the format of the data you intend to upload to this corpus here.
Because annotated data can be structured in many different ways, you will need to define how the data you intend to upload to this corpus should be indexed.
Some of the more well-known types, such as TEI
and FoLiA
are already pre-supported.
If your data is in a format that's not in this list, it's possible to create your own custom format definition by clicking the new format
button at the bottom of the page.
The new format will then become available in this list.
-
Click here to delete this corpus. You can only delete your own corpora.
-
Click here to add some data to your corpus. It is currently not possible to remove data from the corpus.
Once indexing is finished, the new data is immediately available for searching.
-
Click here to share your corpus with other users. You will have to know their usernames.
-
Select the file(s) you want to add to the corpus here.
You should only select files appropriate for the corpus. Invalid files will cause the whole lot to be rejected.
See the hint below this button for a reminder of the type(s) of files that can be added to this corpus.
-
If your files link to external metadata in different files, you should also upload those files here.
Support for external data has to be configured in a custom import format, so you usually won't need this option.
For more information on how to configure a format for using linked/external files,
see here.
-
If your corpus material is in a format that we don't support out of the box (yet), you can customize how your data is treated by creating a new format here.
After you've done so, you will need to create a new corpus that uses the format and add some files to it.
-
Formats can be written in either Json or Yaml.
Changing this setting will also change the syntax highlighting so you can more easily spot mistakes.
-
A good place to start writing a format is usually to download one of our presets, and edit it until it matches the structure of your corpus material.
Select a format to start with in the dropdown then click download
to open it in in the editor.
You can also download another user's format, if you know the name.
To do so, enter their username followed by ':', followed by the name of the format username:format
in the box next to the download button.
When you load one of your own formats, its name will automatically be filled in, so any changes you save will overwrite the format.
-
When you're done editing your format, save it by clicking here.
The format will be saved using the name you entered to the left.
If you already own a format with this name, the format will be overwitten.
If you save over a format that's already being used in one of your corpora,
then any new data you upload to that corpus will be indexed according to the updated format.
-
Edit your format here.
Information on how to write a format can be found
here.