NLP and the Web
Natural Language Processing and the Web
Course content will be available at the NLPWeb Moodle site.
Requirements
To pass, each student has to take the written exam at the end of the semester.
There are assignments including a final project in the practice class, which are required to pass the module.
Course content
The Web contains more than 10 billion indexable web pages, which can be retrieved via search queries. The lecture will present Natural Language Processing (NLP) methods to (1) automatically process large amounts of unstructured text from the web and (2) analyse the use of Web data as a resource for other NLP tasks.
- Processing of unstructured web content
- Introduction
- NLP Basics - Tokenisation, Part of Speech Tagging, Chunking, Stemming, Lemmatization
- NLP pipelines: principles and applications
- Data collection and Annotation
- Web contents and their characteristics - diverse genres of web contents, e.g. personal web sites, news sites, blogs, forums, wikis
- Web contents and their characteristics - continued
- Web as corpus - innovative use of the web as a very big, distributed, linked, growing and multilingual corpus
- Web as corpus - continued
- NLP applications for the web
- Sentiment and Hate Speech analysis: Comments, Reviews, Social media content, Hate, and abusive text,
- Information retrieval - introduction to the basics of information retrieval
- Web information retrieval - natural language interfaces for web information retrieval
- Question answering
- Summarization
- Crossmodal Learning
- Mining Web 2.0 Sites, such as Wikipedia and Wiktionary
- Quality Assessment of Web Contents
- Machine Translation, Seq2Seq, Neural MT, Statistical MT
Literature
- Kai-Uwe Carstensen, Christian Ebert, Cornelia Endriss, Susanne Jekat, Ralf Klabunde, Computerlinguistik und Sprachtechnologie. Eine Einführung, Heidelberg: Spektrum-Verlag, März 2010. (3. Auflage)
- Adam Kilgarriff & Gregory Grefenstette, Introduction to the special issue on the web as corpus, Computational Linguistics, MIT Press, 2003, 29, 333-347
- Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.
- Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, and Harshit Surana, 2020: Practical Natural Language Processing : A Comprehensive Guide to Building
Real-World NLP Systems