Fig. 2: Cascaded semantic search engine architecture. | npj Digital Medicine

Fig. 2: Cascaded semantic search engine architecture.

From: COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization

Fig. 2

a Indexing: Raw documents are processed into a searchable format. Documents are split into paragraphs and image captions, embedded with an SBERT deep learning model, and stored into an index. The raw documents are also embedded with two-keyword-based models (TF-IDF and BM25). b Retrieval and re-ranking: The system computes a linear combination of TF-IDF and SBERT retrieval scores, then combines them with the retrieval scores of BM25 using reciprocal rank fusion31, to generate a sorted candidate list. k-Nearest-Neighbors are used for TF-IDF and SBERT, and the Lucene Inverted Index is used for BM25. The retrieved documents and the query are parsed using a question answering model and an abstractive summarizer prior to being re-ranked based on answer match, summarization match, and retrieval scores.

Back to article page