Research

Research areas - PhDs - Publications - Previous work

Research areas

I am a Senior Lecturer in the Bioinformatics and Health Informatics Group of the Department of Computer Science, Aberystwyth University. My interests include data science, statistics, machine learning, genome and metagenome analysis and microbial bioinformatics. I am interested in sequence analysis, for genomics, time series and text processing and in particular, in what we can do with computers (algorithms, statistics, data structures and artificial intelligence) to help with this analysis.

PhD students

PhD or MPhil students I've supervised or co-supervised, past and present:

Current: Lily Major, Deborah Oladele, Jack Book
Completed: James Ravenscroft, Nick Dimonaco, Sam Nicholls, Emmanuel Isibor, Elizabeth Donkin, Michael Riley

Publications and preprints

Dimonaco, N., Clare, A. and Vickers, M. (2025) Genome Assemblies and Annotations Are Not Static and Need Support for Tracking Their Evolution. preprint
Oladele, D.B., Swain, M., Robinson, G., Clare, A., and Chalmers, R.M. (2025) A review of recent Cryptosporidium hominis and Cryptosporidium parvum gp60 subtypes, Current Research in Parasitology & Vector-Borne Diseases, 8, 100292
Major, L., Clare, A., Daykin, J. W., Mora, B., Zarges, C. (2025) Heuristics for the Run-length Encoded Burrows-Wheeler Transform Alphabet Ordering Problem. Journal of Heuristics 31 article 11 and arXiv preprint
Clare, A., Aubrey, W., Surette, M.G., and Dimonaco, N (2024) Predicting coding regions on unassembled reads, how hard can it be? Poster at Genome Informatics 2024
Clare, A., Aubrey, W., Surette, M. and Dimonaco, N. (2024) Partial gene predictions on unassembled reads: evaluating the Good, the Bad and the slightly ORF. Poster at RECOMB-Seq
Major, L., Clare, A., Daykin, J. W., Mora, B., Zarges, C. (2024) A visualization tool to explore alphabet orderings for the Burrows-Wheeler Transform. arXiv preprint
Dimonaco, N. J., Clare, A., Kenobi, K., Aubrey, W. and Creevey, C. (2023) StORF-Reporter: Finding genes between genes. Nucleic Acids Research 51(21) p11504-11517, or bioRxiv preprint
Dimonaco, N. J., Aubrey, W., Kenobi, K., Clare, A. and Creevey, C. (2022) No one tool to rule them all: Prokaryotic gene prediction tool performance is highly dependent on the organism of study. Bioinformatics 38(5):1198-1207, or bioRxiv preprint
Ravenscroft, J., Cattan, A., Clare, A., Dagan, I., Liakata, M. (2021) CD²CR: Co-reference Resolution Across Documents and Domains EACL 2021. (also see pre-submission arXiv preprint)
Nicholls, S. M., Aubrey, W., de Grave, K., Schietgat, L., Creevey, C. and Clare, A. (2021) On the complexity of haplotyping a microbial community. Bioinformatics 37(10):1360-1366, preprint
Ravenscroft, J., Clare, A. and Liakata, M. (preprint, 2020) Measuring prominence of scientific work in online news as a proxy for impact. arXiv preprint
Major, L., Clare, A. Daykin, J. W., Pena Gamboa, L., Mora, B., Zarges, C. (2020) Evaluation of a Permutation-Based Evolutionary Framework for Lyndon Factorizations. PPSN 2020. Accepted version of the paper.
Jozwik, A., Aubrey, W., Clare, A. Smallbone, W. Martin, K. (2019) Intelligent Decision Making Support for Water Quality Monitoring. Presentation OR61A156 at OR 61.
Nicholls, S. M., Aubrey, W., Edwards, A., de Grave, K., Schietgat, L., Huws, S., Soares, A., Creevey, C. and Clare, A. (preprint, 2019) Recovery of gene haplotypes from a metagenome bioRxiv preprint.
Clare, A., Daykin, J. W., Mills, T. and Zarges, C. (2019) Evolutionary search techniques for the Lyndon factorization of biosequences. Workshop on Evolutionary Computation for Permutation Problems, GECCO 2019. Accepted version of the paper.
Clare, A. and Daykin, J. W. (2019) Enhanced string factoring from alphabet orderings. Information Processing Letters 143:4-7. Also arXiv preprint. See also the poster for Genome Science 2018.
Ravenscroft, J., Clare, A. and Liakata, M. (2018) HarriGT: A Tool for Linking News to Science. Proceedings of ACL 2018, System Demonstrations P18-4004, p19-24. Try out HarriGT.
Garland, O., Clare, A. and Aubrey, W. (preprint, 2018) GiraFFe Browse: A lightweight web based tool for inspecting GFF and FASTA data bioRxiv preprint.
Nicholls, S. M., Aubrey, W., Edwards, A., de Grave, K., Schietgat, L., Huws, S., Soares, A., Creevey, C. and Clare, A. (preprint, 2018) Computational haplotype recovery and long-read validation identifies novel isoforms of industrially relevant enzymes from natural microbial communities bioRxiv preprint.
Ravenscroft, J., Liakata, M., Clare, A. and Duma, D. (2017) Measuring Scientific Impact Beyond Academia: An assessment of existing impact metrics and proposed improvements. PLoS One doi:10.1371/journal.pone.0173152, blog post about measuring scientific impact, download the data
Donkin, E., Dennis, P., Ustalkov, A., Warren, J. and Clare, A. (2017) Replicating complex agent based models, a formidable task. Environmental Modelling and Software, 92:142-151
Veneman, J.B., Saetnan, E., Clare, A., Newbold, C. (2016). MitiGate; an online meta-analysis database for quantification of mitigation strategies for enteric methane emissions. Science of the Total Environment 572 pp. 1166-1174 doi: 10.1016/j.scitotenv.2016.08.029
Nicholls, S. M., Clare, A. and Randall J. C. (2016) Goldilocks: a tool for identifying genomic regions that are 'just right'. Bioinformatics 32 (13): 2047-2049, doi: 10.1093/bioinformatics/btw116, blog post about Goldilocks.
Duma, D., Liakata, M., Clare, A., Ravenscroft, J., Klein, E. (2016) Rhetorical Classification of Anchor Text for Citation Recommendation. WOSP Workshop (5th Intl Workshop on Mining Scientific Publications). Full text of article in D-Lib Magazine.
Aubrey, W., Riley, M. C., Young, M., King, R. D., Oliver, S. G. and Clare, A. (2015) A Tool for Multiple Targeted Genome Deletions that Is Precise, Scar-Free, and Suitable for Automation. PLOS One 10(12): e0142494 doi: 10.1371/journal.pone.0142494, blog post about seamless gene deletion.
Sapstead, S., Daniel, I. and Clare, A. (2015) Automatically Geotagging Articles in the Welsh Newspapers Online Collection. In proceedings of AI 2015. doi: 10.1007/978-3-319-25032-8_28
Runciman, C., Clare, A. and Harkness, R. (2014) Laboratory automation in a functional programming language. Journal of Laboratory Automation 2014 Dec; 19(6):569-76. doi: 10.1177/2211068214543373. Blog post describing this article, github code and preprint pdf.
Riley, M. C., Aubrey, W., Young, M. and Clare, A. (2013) PD5: a general purpose library for primer design software. PLoS One, DOI: 10.1371/journal.pone.0080156. Get the code at the PD5 web site.
Ravenscroft, J., Liakata, M. and Clare, A. (2013) Partridge: An effective system for the automatic classification of the types of academic papers. In proceedings of AI 2013, Dec 2013. Try out the Partridge system!
Sparkes, A. and Clare, A. (2012) AutoLabDB: a substantial open source database schema to support a high-throughput automated laboratory. Bioinformatics 28(10) 1390-1397. doi: 10.1093/bioinformatics/bts140 (abstract, pdf).
Clare, A., Croset, A., Grabmueller, C., Kafkas, S., Liakata, M., Oellrich, A., Rebholz-Schuhmann, D. (2011) Exploring the Generation and Integration of Publishable Scientic Facts Using the Concept of Nano-publications. 1st International Workshop on Semantic Publication (SePublica 2011). pdf.
Alsberg, B. and Clare, A. (2010) Wiki based management of chemometric research projects. Journal of Chemometrics 24(7-8) p408-417
Sparkes, A., Aubrey, W., Byrne, E., Clare, A., Khan, M. N., Liakata, M., Markham, M., Rowland, J., Soldatova, L. N., Whelan, K. E., Young, M. and King, R. D. (2010) Towards Robot Scientists for autonomous scientific discovery. Automated Experimentation 2010, 2:1 doi:10.1186/1759-4499-2-1
Sparkes, A., King, R. D., Aubrey, W., Benway, M., Byrne, E., Clare, A., Liakata, M., Markham, M., Whelan, K. E., Young, M., Rowland, J. (2010) An Integrated Laboratory Robotic System for Autonomous Discovery of Gene Function JALA 15(1) pages 33-40.
King, R. D., Rowland, J., Aubrey, W., Liakata, M., Markham, M., Soldatova, L. N., Whelan, K. E., Clare, A., Young, M., Sparkes, A., Oliver, S. G., Pir, P. (2009) The Robot Scientist Adam, IEEE Computer, vol. 42, no. 8, pp. 46-54, August, doi:10.1109/MC.2009.270
King, R. D., Rowland, J., Oliver, S. G., Young, M., Aubrey, W., Byrne, E., Liakata, M., Markham, M., Pir, P., Soldatova, L. N., Sparkes, A., Whelan, K. E., Clare, A. (2009) The Automation of Science. Science 324(5923):85-89, 3rd April 2009. (preprint pdf, before final corrections)
Soldatova, L., Aubrey, W., King, R. D. and Clare, A. (2008) The EXACT description of biomedical protocols. Bioinformatics 2008 24: i295-i303. Special issue for ISMB 2008. See also EXACT webpage.
Riley, M.C., Clare, A. and King, R. D. (2007) Locational distribution of gene functional classes in Arabidopsis thaliana BMC Bioinformatics 8:112
Blockeel, H., Schietgat, L., Struyf, J., Dzeroski, S., Clare, A. (2006) Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics. In proceedings of PKDD 2006.
Soldatova, L., Clare, A., Sparkes, A. and King, R. D. (2006) An ontology for a robot scientist. Bioinformatics 2006 22: 464-471. Also in ISMB 2006. Archived in CADAIR here.
Clare, A., Karwath, A., Ougham, H. and King, R. D. (2006) Functional Bioinformatics for Arabidopsis thaliana. Bioinformatics 2006 22: 1130-1136
Struyf, J., Dzeroski, S. Blockeel, H. and Clare, A. (2005) Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics. In proceedings of the EPIA 2005 CMB Workshop. Springer link
Clare, A. (2005) Integration of genomic and phenotypic data. Data Analysis and Visualization in Genomics and Proteomics, Eds. Francisco Azuaje and Joaquin Dopazo, Wiley, London. ISBN: 0-470-09439-7
Clare, A., Williams, H. E. and Lester, N. (2004) Scalable multi-relational association mining. In proceedings of the 4th IEEE International Conference on Data Mining (ICDM '04). p355-358. abstract, software
King, R. D., Wise, P. H. and Clare, A. (2004) Confirmation of Data Mining Based Predictions of Protein Function. Bioinformatics 20(7) 1110-1118, abstract, genepredictions.org
Clare, A. and King, R. D. (2003) Predicting gene function in Saccharomyces cerevisiae. ECCB 2003 (published as a journal supplement in Bioinformatics 19: ii42-ii49, abstract
Clare, A. (2003) Machine learning and data mining for yeast functional genomics. PhD thesis. University of Wales Aberystwyth. pdf (1Mb) This was a runner-up in the 2004 BCS Distinguished Dissertations Award.
Clare, A. and King R.D. (2003) Data mining the yeast genome in a lazy functional language. In Practical Aspects of Declarative Languages (PADL'03) (won Best/Most Practical Paper award), abstract, pdf
Clare, A. and King R.D. (2002) How well do we understand the clusters found in microarray data? In Silico Biol. 2, 0046, abstract, html, further data
Clare, A. and King R.D. (2002) Machine learning of functional class from phenotype data. Bioinformatics 18(1) 160-166. abstract, pdf, further data
Clare, A. and King R.D. (2001) Knowledge Discovery in Multi-Label Phenotype Data. In proceedings of ECML/PKDD 2001. abstract, pdf, further data, code
King, R.D., Karwath, A., Clare, A., & Dehaspe, L. (2001) The Utility of Different Representations of Protein Sequence for Predicting Functional Class. Bioinformatics 17(5) 445-454. abstract, pdf, further data
King, R.D., Karwath, A., Clare, A., & Dehapse, L. (2000) Accurate prediction of protein functional class in the M. tuberculosis and E. coli genomes using data mining. Comparative and Functional Genomics 17 283-293 (nb: volume 1 of CFG was volume 17 of Yeast). actual article, preprint postscript, further data
King, R.D., Karwath, A., Clare, A., & Dehapse, L. (2000) Genome scale prediction of protein functional class from sequence using data mining. In: The Sixth International Conference on Knowledge Discovery and Data Mining (KDD 2000). pdf, further data
Rose, T., Elworthy, D., Kotcheff, A., Clare, A., Tsonis, P. (2000) ANVIL: a system for the retrieval of captioned images using NLP techniques. In Challenge of Image Retrieval, Brighton, 2000. gzipped doc

Previous work

I held an RAEng/EPSRC Industrial Fellowship with Dŵr Cymru Welsh Water (2021-2022) on Advanced statistical process control for water treatment. This investigated automated detection of anomalous sensor readings in drinking water treatment for Dŵr Cymru Welsh Water, for early detection and risk managment.

I held an RAEng/EPSRC Research Fellowship to "Engineer the Intelligent Scientific Laboratory" (2006-2011). This project involved work on the Robot Scientist project, where intelligent software created scientific hypotheses, designed experiments to distinguish between these hypotheses, controlled a lab robot to conduct these experiments, and then uses the results to design the next round of experiments. There were many aspects to the work on this project, including data formalism, experimental protocols, data collection, inference and querying, planning and scheduling, and the practicalities of working in a real lab with real automation equipment.

Before this I held an 1851 Research Fellowship to investigate Grid-enabling lab robots for the Robot Scientist. This was a two year project, Oct 2004 to Sep 2006.

Previously, as a post doc on a BBSRC funded grant, and as a PhD student, I've used machine learning (including ILP) and data mining (particularly multi-relational associations) for functional genomics - elucidating the biological functions of the parts of a genome. When a genome is sequenced, and we have the predicted locations of the genes within the genome, the next stage is to work out the possible functions of these genes. We've been looking at genes in Saccharomyces cerevisiae and Arabidopsis thaliana, the first plant genome to be sequenced. Detailed results for yeast and Arabidopsis are available. This has involved looking at ways to make use of different kinds of data, from microarray data, sequence statistics, homology data, predicted secondary structure, QTLs, and phenotypic data. Also ways to make use of background information, hierarchical information, and also to take into account that proteins have more than one function, a classification problem where each item fits into more than one class.

I've also spent 3 months working with RMIT's Search Engine Group making a multi-relational data mining tool Radar based on inverted indexing.

Back to Amanda Clare