Web Interfaces for Language Processing Systems SS2017
The following systems are successfully completed as part of the master project web interfaces for language processing systems in the summer semester 2017.
I. AnonML
Machine Learning to anonymize court decisions.
Team
Mirco Franzek and Matthias Schildwächter
Purpose
- Court decisions in Germany can only be published in anonymized form.
- Anonymization is done by hand.
- Only a very small fraction of all decisions is published.
- Courts are not capable of anonymizing a larger share.
- There is little research about how the law is applied by lower-level courts.
Idea
- Create software that is able to anonymize court decisions automatically.
- Enable legal research and practitioners to better predict court decisions.
- Improve transparency and legitimacy – truly make the world a better place.
- Create the foundation for a better database and better search tools.
Requirements
- Recognize text fragments that have to be anonymized: names, addresses, locations, license plates, companies, identifiable descriptions etc.
- Possibly extract anonymisation rules from anonymized decisions.
- Replace or delete these fragments with placeholder.
- Calculate confidence scores to require manual control.
- No legal knowledge required.
Data
- A few hundred decisions from the European Court of Justice
Result
- The software suggests possible anonymizations which have to be accepted or declined.
- The user can add missing anonymizations.
- The application can be retrained with the information received from manual corrected decisions to improve the suggestions and speed up the process.
- In the end the anonymized document can be exported.
Documentation
The documentation containing the installation guide and general descriptions is available here.
Source Code
The source code is available here
Demo
II. new/sleak Extension
This project extends the graph and document processing functionalities of the new/s/leak project
Team
Alvin Fazrie and Thorben Wiese
Purpose
- Enable adding new entities and keywords in to the system.
- Enable adding of new entity types.
- Provide keyword graphs alongside the entity graph
- Enhance entity and keyword blacklisting
- Improve analyzability of connections between entities, keywords and tags.
Documentation
The documentation is available here.
Source Code
The source code is available here.
Demo
The demo is available here, which is based on the Enron Email Dataset
III. News-crawler
Team
Sönke Behrendt
Project Description
This project is crawling, extracting, indexing and processing the content of daily published news articles. The extracted content is indexed in ElasticSearch for further processing. This project also provides tooling to extract and preprocess the content for the NoD project.
Documentation and Source Code
And the link for source code and documentation: https://github.com/thesoenke/news-crawler
Demo
The demo is available here