MTAAC and CDLI Documentation
  • MTAAC research process
  • Guides and references
    • Data pre-processing
      • Text selection
      • Data cleaning
      • Brute pre-processing
    • Manual annotation
      • Morphology
      • Syntax
    • Data processing
      • Automated Annotation
      • Machine translation
    • Data sharing
      • Linked Dictionaries
      • Corpus tool
    • Project homes
      • MTAAC home
      • CDLI home

    MTAAC Research Process » Brute pre-processing

    On This Page

    • Tokenization
    • Lemmatization
    • Stemming

    This page is about brute pre-processing, consisting mainly of tokenization, lemmatization and brute stemming techniques.

    Tokenization

    Lemmatization

    Stemming

    Share on

    Twitter Facebook Google+ LinkedIn

    The content of this site has been released to the public domain except when noted otherwise. If information contained on this site is used in an academic context, authors must be acknowledged to avoid plagiarism. Authors of specific pages are mentioned at the bottom of each document.

    CC0

    The template remains Copyright (c) 2019 Michael Rose: Minimal Mistakes on GitHub.

    This documentation was prepared by team members of the MTAAC project: Émilie Pagé-Perron, Ilya Khait, Lucas Reckling, Jayanth Jayanth, Pouya Lajevardi, Prashant Rajput, Maria Sukhareva, Robert K. Englund, Heather D. Baker and Christian Chicarcos

    • Follow:
    • Feed
    © 2022 MTAAC and CDLI Documentation. Powered by Jekyll & Minimal Mistakes.