Published Literature findings act as a key driver for decision making in drug discovery and biotechnology industry. Majority of the researchers prefer PubMed for published literature since it is a free resource for dissemination of scientific information. Data mining biomedical information from PubMed has been a long-standing problem for many researchers across the globe. On an average more than 50,000 scientific abstracts get published in PubMed every month and for a researcher it is a highly time consuming task to pick the most relevant abstracts everyday and annotate them accurately for research purposes. Though many text mining NLP (Natural Language Processing) engines are used to address this problem, manually annotated data has its own advantages in terms of accuracy and quality of the data that is captured. However, one of the major disadvantages with manual data mining still happens to be the processing time and its cost intensive nature.
So the need arises for a system, which can accurately process this vast repertoire of scientific information with a quick turn around time. We have tried to address both these problems with XTractor. XTractor provides the researchers with the most relevant scientific facts manually annotated (It is proven by our analysis to be 12-35% more accurate than the text mining engines) and delivered with in the shortest turnaround time. Also, XTractor comes with advanced analytical features such as Semantic Search, Concept linking and comprehensive downloadable reports for faster analysis of the biomedical data.
Currently, the XTractor knowledgebase (XTractor KB) is growing at the rate of more than 1000-1200 facts extracted from PubMed everyday. The knowledgebase presently has more than 150,000 scientific facts.
The Data behind XTractor KB comes from manual annotation carried out by qualified scientists (Masters and PhDs in Life Sciences/Chemistry) at Molecular Connections, INDIA (http://www.molecularconnections.com). Our scientists regularly handpick and categorize the sentences in abstracts to 13 definite categories such as Biomarker studies, Knockouts, RNAi studies, mutations, disease mechanisms, pathways and also annotate the most relevant sentences for proteins, drugs, diseases and biological processes with different Public repositories / functional ontologies which includes Entrez Gene, UniProt, MeSH, Gene Ontology, Pubchem, Genbank etc. XTractor is updated daily and the latest facts are delivered with in 10 days of publication from PubMed.
Also, XTractor comes with highly advanced analytical features, which includes
# Summary Search- obtain downloadable reports and graphs # Semantic Search-ontology based search for easy refining # Bibliographic Search- track competition and the hot areas of research # Concept Linking- associate concepts, which were earlier, not related # Watch List- track your favorite protein or drugs as latest data gets added to XTractor # Network Visualization- integrated with Cytoscape # Save session and retrieve history of search # Downloadable reports in XML, pdf formats
Tracking common gene polymorphisms across multiple Diseases using XTractor:
We searched XTractor Premium (http://www.xtractor.in/premium) for Rheumatoid arthritis (RA) in Summary Search. All search panels in XTractor comes with an auto complete feature that enables the researcher to locate the terms using any of the synonyms. We were able to obtain more than 785 annotated sentences for RA.
The Summary Search Provides you with quick report on all the associated entities (Drugs, Diseases, Proteins and Biological Processes) for the searched entities. Clicking on these entities performs a refining over the search results. We were interested in locating the most prominently studied drugs for RA. 18 drugs were listed for RA among which Methotrexate had the maximum occurrence with RA. (Fig. 1)
We clicked on Methotrexate to narrow down our search results to 28 sentences and studied them- XTractor abstract view also enables the user to view the complete annotated abstract (is out linked to PubMed) and also save the relevant sentences.
We noted that MTHFR polymorphisms C677T and A1289C are associated with toxic responses as a result of impaired Methotrexate clearance in RA patients1.
Using the Category filter we were able to get all the studies on polymorphisms of MTHFR studies.
Also, the entity report in XTractor provides integrated information for each of the annotated proteins, drugs, diseases and biological processes in the sentences from across 30 different public databases such as Wikipedia, USFDA, clinical trials.gov, USPTO, WIPO and many more. (Fig. 2)
We found one of the sentences in the RA analysis that stated MTHFR polymorphism is also associated with certain cancers2. So this made us to extend our analysis further to explore the role of MTHFR polymorphism in other disease types.
To explore this possibility we moved to Semantic Search. We added RA, Methotrexate and MTHFR to the query panel.
XTractor comes with an advanced feature called Concept Linking. This feature enables us to find all other association for MTHFR such as all associated diseases, drugs and proteins without disturbing the flow of earlier analysis. (Fig. 3)
To enable concept linking, we checked on MTHFR and searched for all other diseases along with which MTHFR has been associated. We found 86 disease types across which MTHFR has been studied. On studying these sentences we found that MTHFR polymorphism C677T plays an important role in at least 20 major diseases such as Breast Cancer, Hypertension, Colorectal adenoma, Schizophrenia, Bladder Cancer, Strokes and many more.
With a quick 15-20 mins analysis on XTractor we were able to discover some of the hidden facets in published literature. Similar analysis can be performed in XTractor for target and drug reusability studies, Biomarker related analysis, comparing effects over protein families, Study Knockouts/ loss of function studies and co-relate them to drug effects.
Above all the platform comes at a very reasonable price and free updates for one complete year, therefore no renewal cost for one year of data.
XTractor basic version (http://www.xtractor.in) is available for FREE to the scientific community and is being used by more than 2000 researchers from around 300 organizations across the globe.



