Multi-document summarization is an automatic procedure aimed at the extraction of information from multiple texts written about the same topic. Multi-document summarization systems are complementing news aggregators on the road of coping with information overload. We aims to extract the summary over multiple documents in correspondence with the given query. The task is close to systems of automatic answer generation like IBM Watson, but of course much simpler.
This project is a great opportunity to dive into the full stack of information retrieval! The implementation can vary from a feasibility prototype based on Bayesian Inference between sentences up to a pretty complex system that uses Semantic Graphs and Entities Disambiguation.