Florian Schneider

Name:	Florian Schneider
Position:	PhD Student
Email:	florian.schneider-1 (ät) uni-hamburg (dot) de
Phone:	+49 40 42883 2387
Office:	F-430
Address:	Informatikum Vogt-Kölln-Straße 30 22527 Hamburg

Hi, I'm Flo!

I'm a passionate computer scientist whose IT journey began back in 2005 when I first got in touch with archaic MS-DOS. Now, 16 years later, in the summer of 2021, I graduated as M.Sc. at the University of Hamburg and just started working as a research assistant in the university's Language Technology (LT) group under the direction of Prof. Dr. Chris Biemann. Further, I'm an associate of the D-WISE project.

During my studies, I found keen interest and amazement in Natural Language Processing and Computer Vision, which are now my active research fields. More specifically, my main interest lies in multi-modal visio-linguistic methods and models to achieve a more natural and human-like grounding and understanding of natural language through visual information and vice versa.

While doing my M.Sc. I was working for LT group as a student researcher, among other things as a scientific software developer in the CodeAnno / WebAnno project.

You can download my current CV here.

Research Interests:

Cross-Modal Information Retrieval and Representation Learning
Large Multi-Modal Models (LMMs) for low-resource languages and cultures
Developing scientific software tools that work with multi-modal data, such as text, image, audio, and video data.
Machine Learning for the Digital Humanities (DH)

Awards:

GSCL 2023 Thesis Award for the best master's thesis. (news article)

Publications:

2025

Florian Schneider, Carolin Holtermann, Anne Lauscher (2025): "GIMMICK - Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking" (preprint pdf)
Fabian David Schmidt*, Florian Schneider*, Chris Biemann, Goran Glavaš (2025): "MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching" (preprint pdf)
Gregor Geigle*, Florian Schneider*, Carolin Holtermann, Chris Biemann, Radu Timofte, Anne Lauscher, Goran Glavaš (2025): "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model." (preprint pdf)
Carolin Holtermann, Florian Schneider, Anne Lauscher (2025): "SoS: Analysis of Surface over Semantics in Multilingual Text-To-Image Generation" (preprint pdf TBA)
Dementieva, D., Babakov, N., Ronen, A., Ayele, A. A., Rizwan, N., Schneider, F., Wang, X., Yimam, S. M., Moskovskiy, D. A., Stakovskii, E., Kaufman, E., Elnagar, A., Mukherjee, A., Panchenko, A. (2025): Multilingual and Explainable Text Detoxification with Parallel Corpora. Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), Abu Dhabi, UAE. (pdf)

2024

Schneider, F., and Sitaram, S. (2024): M5 - A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings), Miami, Florida, USA. Association for Computational Linguistics (ACL) (pdf)
Hinck, M., Holtermann, C.*, Lyle, M.*, Schneider, F.*, Yu, S., Bhiwandiwalla, A., Lauscher, A., Tseng, S., and Lal, V. (2024): Why do LLaVA Vision-Language Models Reply to Images in English? The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings), Miami, Florida, USA. Association for Computational Linguistics (ACL) (pdf)
Strich, J., Schneider, F., Nikishina, I., and Biemann, C. (2024): On Improving Repository-Level Code QA for Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 303–338, Bangkok, Thailand. Association for Computational Linguistics (ACL). (pdf)
Schneider, F., and Biemann, C. (2024): WISMIR3: A Multi-Modal Dataset to Challenge Text-Image Retrieval Approaches. In Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR), pages 1–6, Bangkok, Thailand. Association for Computational Linguistics. (ACL). (pdf)
Anastasi, S., Schneider, F., Fischer, T., and Biemann, C. (2024). VIDA: The Visual Incel Data Archive. A Theory-oriented Annotated Dataset To Enhance Hate Detection Through Visual Culture. In Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), pages 59–67, Mexico City, Mexico. Association for Computational Linguistics (ACL). (pdf)
Fischer, T., Schneider, F., Geislinger, R., Helfer, F., Koch, G., and Biemann, C. (2024): Concept Over Time Analysis: Unveiling Temporal Patterns for Qualitative Data Analysis. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations), pages 148–157, Mexico City, Mexico. Association for Computational Linguistics (ACL). (pdf)
Fischer, T., Schneider, F., Haque, A., Koch, G., Biemann, C. (2024): Extending the Discourse Analysis Tool Suite with Whiteboards for Visual Qualitative Analysis. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7017–7022, Torino, Italy. ELRA and ICCL. (pdf)

2023

Schneider, F., Dash, S., Bagchi, S., Mihindukulasooriya, N., Gliozzo, A. M., (2023): NLFOA: Natural Language Focused Ontology Alignment. In Proceedings of the 12th on Knowledge Capture Conference (K-CAP 2023), Pensacola, Florida, USA. (pdf)
Fynn Petersen-Frey, Tim Fischer, Florian Schneider, Isabel Eiser, Gertraud Koch, and Chris Biemann (2023): From Qualitative to Quantitative Research: Semi-Automatic Annotation Scaling in the Digital Humanities. In Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023), pages 52–62, Ingolstadt, Germany. (pdf)
Anastasi, S., Fischer, T., Schneider, F., Biemann, C., (2023): IDA - Incel Data Archive: A Multimodal Comparable Corpus for Exploring Extremist Dynamics in Online Interaction. Proceedings of the 10th International Conference on CMC and Social Media Corpora for the Humanities (CMC 2023), Mannheim, Germany, vol. 10, pp. 30-35.
Schneider, F.*, Fischer, T.*, Petersen-Frey, F., Eiser, I., Koch, G., Biemann, C. (2023): The D-WISE Tool Suite: Multi-Modal Machine-Learning-Powered Tools Supporting and Enhancing Digital Discourse Analysis. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), System Demonstrations Track, Toronto, Canada. (pdf)
Schneider, F. and Biemann, C. (2023): LT at SemEval-2023 Task 1: Effective Zero-Shot Visual Word Sense Disambiguation Approaches using External Knowledge Sources. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), Toronto, Canada. Association for Computational Linguistics (ACL). (pdf)
Schneider, F., Yimam, S. M., Petersen-Frey , F., Biemann, C., von Nordheim, G., Kleinen-von Königslöw, K., (2023): CodeAnno: Extending WebAnno with Hierarchical Document Level Annotation and Automation. The 17th Conference of the European Chapter
of the Association for Computational Linguistics (EACL 2023), System Demonstrations Track, Dubrovnik, Croatia (pdf)
Fischer, T., Eiser, I., Schneider, F., Petersen-Frey, F., Biemann, C., Koch, G. (2023): D-WISE - Digitale Wissensoziologische Diskursanalyse. Abstracts of DHd 2023: Open Humanities, Open Culture, Luxemburg/Trier. (preprint pdf)
Eiser, I., Fischer, T., Schneider, F., Koch, G., Biemann, C., Petersen-Frey, F. (2023): Open Science Prinzipien und interdisziplinäre Kollaboration in D- WISE: Zwischen Hermeneutik und Digitaler Methode in der Diskursanalyse. Abstracts of DHd 2023: Open Humanities, Open Culture, Luxemburg/Trier. (preprint pdf)

2022

Wiehe, A. O., Schneider, F., Blank, S., Wang, X., Zorn, H. P., Biemann, C., 2022: Language over Labels: Contrastive Language Supervision Exceeds Purely Label-Supervised Classification Performance on Chest X-Rays. The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP 2022). (link)
Koch, G., Biemann, C., Eiser, I., Fischer, T., Schneider, F., Stumpf, T., García, A., D-WISE Tool Suite for the Sociology of Knowledge Approach to Discourse. In: Rauterberg, M. (eds) Culture and Computing. HCII 2022. Lecture Notes in Computer Science, vol 13324. Springer, Cham. (link)
Schneider, F., and Biemann, C., 2022, Golden Retriever: A Real-Time Multi-Modal Text-Image Retrieval System with the Ability to Focus. In Proceedings of The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), System Demonstrations Track. ACM, New York, NY, USA, 5 pages. (link)
Wang, X.*, Schneider, F.*, Alacam, Ö., Chaudhury, P., Biemann, C. MOTIF: Contextualized Images for Complex Words to Improve Human Reading. In Proceedings of the 2022 Language Resource and Evaluation Conference (LREC), 2022. (link)

2021

Schneider, F., Alaçam, Ö., Wang, X., Biemann, C. (2021): Towards Multi-Modal Text-Image Retrieval to improve Human Reading. NAACL 2021 Student Research Workshop, Mexico City, Mexico (online) (link)