Project Ideas

ImageInterpret evaluation

ImageInterpret (http://www.imageinterpret.de/) is an image interpretation tool based on decision trees and case-based reasoning for the life sciences. It provides high content analysis for retrieval and recognition of biological objects such as cells and fungi. The creators of the software are interested in having an evaluation conducted on their software used in a NY state laboratory for interpretation of hepatitis cell images. This company is an excellent example of how research can create innovative jobs in the community.

Text clustering for scientific literature analysis

Interesting work has been emerging allowing to track and meta-analyze literature, which is so important as scientific research expands. One approach proposed recently is to use software developed in bioinformatics for clustering species, to cluster and group papers in a scientific research area to learn major research themes, to characterize them by a list of keywords, and to learn most influential papers. One such clustering algorithm is called Ensemble Non-negative Matrix Factorization, and we propose to use this tool, available from http://mlg.ucd.ie/nmf) to analyze the research in case-based reasoning in health sciences, and to learn its major themes. The project also involves a comparison with the manual clustering and learning of research themes published by Dr. Bichindaritz at ICDM 2012 (http://www.springerlink.com/content/84pt267hu1hm304r/).

Tracking the evolution of scientific literature

Another aspect of the previous project is that we would like to track the evolution of ideas in the literature as we track the evolution of species in nature, through citation analysis (see the very interesting Eigenfactor project at http://www.eigenfactor.org/). By developing – or most likely adapting the tools developed for this project, which are based on an algorithm similar to Google’s Pagerank – the project aims at mapping the evolution of ideas in the literature on case-based reasoning in the health sciences – and possibly to compare with a broader domain, which is case-based reasoning.

Harvesting scientific literature

Computing disciplines are known to become more and more diffused in application domains, such as the health sciences. Finding pertinent literature becomes more and more difficult because few tools allow for interdisciplinary scientific literature search. The proposed project would propose a system to search for literature across domain, by use of ontologies and semantic mapping. It would be applied to the search for scientific literature in a domain entitled “case-based in the health sciences”, however would be generic and applicable to other domains of computer and information science and their respective literature in application domains. A previous system developed by a graduate student would be made available to start from.

Learning research questions in the scientific literature

This project proposes to mine articles for the research question and hypothesis they answer. The task would be, from a dataset of articles in a specific domain (case-based reasoning in health sciences), to conceive an algorithm to extract research questions from the article. The algorithm would analyze the summary and define a set of templates that could be used as a basis for the determination of the research question/ hypothesis. Questions addressed would be: where is the research question represented, through which templates, how to represent it in a simplified manner (very likely a triple <concept, relationship, concept>), how to use available ontologies to facilitate the search. More importantly, the system would aim at cooperating with a human user to build a robust semi-automatic system learning research questions and hypotheses. Text analysis tools, ontologies, and a training dataset (with human provided solutions) will be made available for this project.

Learning research findings in the scientific literature

This project proposes to mine articles for the research findings presented in an article. The project will build on work performed by a former graduate student, analyze the remaining issues, and attempt to improve on these. The results were very encouraging. A test dataset with human determined research findings will be made available. In addition to the code developed by the former graduate student, natural language interpretation tools and ontologies will be made available to the student. This project complements the project entitled “Learning research questions in the scientific literature”.

Bioinformatics research

This project proposes to write a literature survey on bioinformatics research related to classification, prediction, and survival analysis. The paper will study which algorithms were used for these tasks, which datasets were used, whether these datasets are available or can be made available, and whether these algorithms are available as well. Then the paper will propose a plan to evaluate a novel bioinformatics algorithm at a larger scale than previous literature – namely on ALL the datasets previously gathered and analyzed. The project will also analyze how stringent was the evaluation in these articles and propose guidelines for evaluating such research. The project may also carry-out such analysis to validate its evaluation approach – in particular, we are interested in how cross-validation, leave-one-out cross validation, and independent test and training set, compare in terms of stringency of the evaluation.

Incremental approach to support-vector machine

This project proposes to study the feasibility of designing an incremental version of support vector machine (SVM) algorithm. SVM is a very efficient machine learning algorithm used for example for classification, in particular in bioinformatics. The project will study this algorithm and try to come up with an incremental version which will be added as a possible strategy in a nearest-neighbor algorithm. The code for the nearest neighbor and the SVM algorithm will be provided so that little coding is expected. The project will also evaluate the algorithm in comparison with either SVM or nearest neighbor algorithms on bioinformatics datasets.

Translational research

This project proposes to study recently proposed alternatives to randomized clinical trials to assess the effectiveness of biomedical research. One such promising method is comparative effectiveness. The project will reflect on how this method can be applied to the use of technology in clinical settings, and propose a plan to evaluate a decision-support system in a clinical environment in the context of an electronic medical record.

Translational research (II)

This project is based on the work of Swanson et al. (Swanson, Don (1988). "Migraine and Magnesium: Eleven Neglected Connections", Perspectives in Biology and Medicine 31 (4): 526–557) who designed methods to derive new research questions based on literature analysis across domains. His pioneering work has led to forecast important medical discoveries, and to propose new ideas to investigate in medicine. The project proposes to conceive a similar system to link research in bioinformatics with research in medicine by finding genetic or metabolic markers of diseases. For example, if a certain gene is known in basic science as being associated with a particular chemical, we could infer that the same gene could play a role in a certain disease which is also linked to the presence of a particular chemical. Synonyms between terms could be resolved by the knowledge derived from the Unified Medical Language System and GeneOntology for example. A simple Web-site could be designed to query associations across domains through a choice of keywords.


	[Home] [Syllabus] [Lecture notes] [Schedule] [Assignments] [Graduate Guidelines] [Annotated Bib] [Project Ideas] [Project] [Project Topic] [Project Draft] [Project Bibs] [Project Plan] [Project Proposal] [Oral Presentation] [Writing Hints] [Turn-in area] [Message board] [Communication] [Tests] [Grades]