Data Sources & Licenses

This page lists external lexical resources used to derive CEFR onboarding seed artifacts.

German Wiktionary

Lemma-level CEFR evidence (Wortschatz-Niveau conventions) from the German Wiktionary project.

dewiki-wordrank

German Wikipedia word-form frequency list used to weight lexical evidence.

TU Darmstadt A1 XML (optional anchor)

Optional A1 lexical anchor dataset used when available.

Morphological Mapping (optional)

Optional form-to-lemma mapping can be supplied as a local TSV/XML file and is used to increase form coverage.

  • Recommended source family: Morphy / LanguageTool German POS resources
  • License expectation: CC BY-SA 4.0 or equivalent compatible license

Attribution and license obligations apply to source datasets and derivative artifacts.