A successful PhD project starts with finding the right combination of supervisor and topic.If you intend to apply for a PhD programme, you are welcome and encouraged to formulate your own research proposal.
Nevertheless, this page suggests some research topics which may help you to quickly identify a PhD project of your interest under my supervision.
If you wish to discuss a particular topic (included or not in the following list), please send an email to "G.DiFatta at reading ac uk".
List of research topics: (click to expand)
Big Data Analytics and Mining
Big Data indicates very large and complex data sets that are difficult to process using
traditional and sequential data processing applications. Data-intensive, parallel and distributed approaches
are typically employed, such as the MapReduce programming paradigm (Apache Hadoop). However, one of the most
difficult and interesting challenge is not about the size of the Data, rather it is about the insight and the
impact the analysis of the data can generate. From this perspective, providing effective and efficient
algorithms and tools for Big Data Analytics and Mining is a fundamental aspect. The potential of Big Data is
in our ability to provide solutions to business problems, to provide new business opportunities and to
facilitate a data-driven discovery in Science. The project will investigate and test distributed formulations
of data mining algorithm that are suitable for the MapReduce paradigm and for other distributed computing approaches.
Keywords: Big Data, Data Analytics and Mining, Parallel and Distributed Computing
Data Integration, Processing, Analysis, Exploration and Visualisation
Open-source, user-friendly Data Mining workflow management environment are more often adopted for data
integration, processing, analysis, exploration platform and visualisation. The project will contribute
to widen a repository of algorithms and allows their composition in an intuitive way. It can be extended
and customised by means of the flexible meta-programming paradigm of eclipse plug-ins.
Keywords: Data Mining, Knowledge Discovery in Databases, Intelligent Data Analysis
Frequent Pattern Mining
The identification of regular patterns in large sets of data can be formulated as
Association Rule Mining, Frequent Itemset Mining, Frequent Subgraph Mining, according to the particular
application domain and problem. They share a combinatorial complexity and can be solved with analogous
algorithmic approaches. When patterns are naturally classified in two categories, one important
application is the identification of the features which allow discriminating one class from the other.
Highly scalable algorithms for the "Discriminative" Subgraph Mining problem can be applied, for example,
to the identification of candidates in the drug discovery process.
Keywords: Data Mining, Subgraph Mining, Knowledge Discovery in Life Science Repositories
High Performance and Scalable Clustering
Clustering is a classical unsupervised machine learning problem of the identification of
groups of similar objects within a set. One of the most popular and influential algorithm in Data Mining is
k-Means. So far, the most efficient implementation of k-Means is based on multi-dimensional trees (KD-Trees).
BSP-kMeans is an even more efficient and scalable k-Means variant, which can be applied to very large data
sets (millions of patterns) with high numbers of features and clusters.
Keywords: Data Mining, Clustering, Scalable Algorithms
Data Mining and Visualisation in High Dimensional Spaces
The mining and the visualisation of data in high dimensional feature spaces require the
design of efficient algorithms. In high dimensional data spaces distance functions lose their usefulness
and optimisation techniques, Bayesian statistics, machine learning and data mining algorithms are very
inefficient and ineffective. This problem is referred to as 'the curse of dimensionality' and is caused
by the exponential increase in volume associated with adding extra dimensions to a mathematical space.
In general, dimensionality reduction is a fundamental methodology for the success of the knowledge discovery
process in many real-world applications.
Keywords: Data Mining, Data Visualisation, Dimensionality Reduction
Epidemic Protocols for Fault-tolerant Extreme-scale Computing
Epidemic or Gossip-based protocols adopt a bio-inspired communication strategy which
is based on a similar mathematical model of the exponential and incontrollable spread of infectious diseases.
Epidemic protocols are suitable for large and extreme-scale, distributed and dynamic systems. They can be
adopted to disseminate information (broadcasting) in a large-scale distributed environment using randomised
communication. Their advantages over global communication schemes based on deterministic overlay networks
are their inherent robustness and scalability. Epidemic protocols can also be adopted to solve the data
aggregation problem in a fully decentralized manner. The project will focus on Epidemic protocols and on
practical extreme-scale applications which can be built on them.
Keywords: Epidemic Protocols, Gossip-based Protocols, Extreme-scale Computing
Large-scale Distributed Data Mining
Emerging challenges in ubiquitous networks and computing include the ability to
extract useful information from a vast amount of data which are intrinsically distributed.
Research on Distributed Data Mining (DDM) has focused on the formulation of data mining algorithms
for distributed computing environments, where each node processes its local data and contributes to
compute a global solution. In many applications the solution is required to be available at every node.
This is particularly important when considering applications in networked systems where each node is
autonomous and active, like in peer-to-peer systems, mobile ad hoc networks, vehicular ad hoc networks,
mobile social networks, wireless sensor networks. Obviously it is also desirable that the solutions at
different nodes are identical or within a bounded approximation error.
Keywords: Data Mining, Parallel and Distributed Computing
Bayesian Inference for modelling human decision making
Systems Engineering often involves computer modelling the behaviour of proposed systems
and their components. Where a component is human, fallibility can be modelled by a stochastic agent.
Bayesian inference can be applied to a set of past decisions to learn a model of decision-making over
quantifiable options. The model allows to assess and to predict skilled behaviour such as human expertise
in problem solving and decision making. Typical application domains include: students performance monitoring
and assessment (intelligent tutoring system and adaptive learning platforms), human operator training such as
laparoscopic (minimally invasive) surgery, air traffic controllers, sports and games (e.g. Chess players rating).
Keywords: Statistical inference, Bayes' rule, decision makers models
Intelligent Data Analysis in Bioinformatics
(Quality Assessment of Protein Structure Models)
One of the most important research goals in bioinformatics is the prediction of the three-dimensional structure of a protein, the so-called tertiary structure, from its amino acid sequence. Protein structure prediction has made significant progress over the last decade due to the advancement in the algorithms and the public availability of sequence and structure databases. When many alternative structures predictions are generated for a given sequence, it is important to perform a quality assessment of the prediction models. Estimating the accuracy, or quality assessment, of a prediction model is crucial for a practical use of the model in application domains such as biochemical experimental design, drug design and in biotechnology, for example, for the design of novel enzymes. The aim of the proposed research project is the application of Intelligent Data Analysis to large repositories of amino acid sequences and protein tertiary structures to identify accurate quality assessment methods.
Keywords: Intelligent Data Analysis, Bioinformatics, Protein Tertiary Structure
Mobile and Cloud Computing for Global Data Mining
Emerging challenges in ubiquitous networks and computing include the ability to deploy
large scale applications anywhere anytime. Next generation applications will be based on the integration of
lightweight mobile devices and on demand storage and computing resources (Cloud Computing). Data Mining
applications will obviously play a key role in this scenario. Data captured by smartphones can be stored
and processed into a Cloud based expert system for intelligent analysis.
The project will focus on the customisation of an open-source Cloud computing toolkits using Cloud Computing
Standards and integration of open Data Mining workflow management systems.
Keywords: Data Mining, Cloud Computing, Android
Opinion Leader: fully-decentralised online opinion polls
Surveys of public opinions are typically drawn from a very small sample of the entire population.
They also rely on a centralised server or service (e.g., a poll agency). Obvious issues are associated with the
centralised nature of this model. Does the (small) sample size provide sufficient guarantees for extrapolating
general conclusions? Will a centralised service run under a private administrative control be unbiased and objective?
Would the results be available anytime and anywhere without the interference of policy makers and private interests?
Decentralised mobile applications do not rely on a server or a service provider: they rely on a voluntarily/collaborative
peer-to-peer model. The project will implement "Opinion Leader", a fully decentralised online application for opinion pools.
Anyone can start an opinion pool or become the next opinion leader by initiating a viral poll. No one can stop or
interfere with a real-time global aggregation of opinions by means of an epidemic communication protocol.
Keywords: Viral Computing, Opinion Polls, Data Mining, Android, Epidemic Protocols