Topic Tiling is a LDA based Text Segmentation algorithm. This algorithm is based on the well-known TextTiling algorithm, and segments documents using the Latent Dirichlet Allocation (LDA) topic model. TopicTiling performs the segmentation in linear time and thus is computationally less expensive than other LDA-based segmentation methods.
The software is released under GPL 3.0 at sourceforge: link
- Riedl, M., Biemann, C. (2012): Text Segmentation with Topic Models. Journal for Language Technology and Computational Linguistics (JLCL), Vol. 27, No. 1, pp. 47--70, August 2012 (pdf)
- Riedl M., Biemann C. (2012): How Text Segmentation Algorithms Gain from Topic Models, Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012), Montreal, Canada. (pdf)
- Riedl M., Biemann C. (2012): TopicTiling: A Text Segmentation Algorithm based on LDA, Proceedings of the Student Research Workshop of the 50th Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea. (pdf)
- Riedl M., Biemann C. (2012): Sweeping through the Topic Space: Bad luck? Roll again! In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP held in conjunction with EACL 2012, Avignon, France (pdf)