Web Interfaces for Language Processing Systems SS2017
The following systems are successfully completed as part of the master project web interfaces for language processing systems in the summer semester 2017.
Machine Learning to anonymize court decisions.
Mirco Franzek and Matthias Schildwächter
- Court decisions in Germany can only be published in anonymized form.
- Anonymization is done by hand.
- Only a very small fraction of all decisions is published.
- Courts are not capable of anonymizing a larger share.
- There is little research about how the law is applied by lower-level courts.
- Create software that is able to anonymize court decisions automatically.
- Enable legal research and practitioners to better predict court decisions.
- Improve transparency and legitimacy – truly make the world a better place.
- Create the foundation for a better database and better search tools.
- Recognize text fragments that have to be anonymized: names, addresses, locations, license plates, companies, identifiable descriptions etc.
- Possibly extract anonymisation rules from anonymized decisions.
- Replace or delete these fragments with placeholder.
- Calculate confidence scores to require manual control.
- No legal knowledge required.
- A few hundred decisions from the European Court of Justice
- The software suggests possible anonymizations which have to be accepted or declined.
- The user can add missing anonymizations.
- The application can be retrained with the information received from manual corrected decisions to improve the suggestions and speed up the process.
- In the end the anonymized document can be exported.
The documentation containing the installation guide and general descriptions is available here.
The source code is available here
II. new/sleak Extension
This project extends the graph and document processing functionalities of the new/s/leak project
Alvin Fazrie and Thorben Wiese
- Enable adding new entities and keywords in to the system.
- Enable adding of new entity types.
- Provide keyword graphs alongside the entity graph
- Enhance entity and keyword blacklisting
- Improve analyzability of connections between entities, keywords and tags.
The documentation is available here.
The source code is available here.
This project is crawling, extracting, indexing and processing the content of daily published news articles. The extracted content is indexed in ElasticSearch for further processing. This project also provides tooling to extract and preprocess the content for the NoD project.
Documentation and Source Code
And the link for source code and documentation: https://github.com/thesoenke/news-crawler
The demo is available here