Target Audience: KR researchers interested in machine learning (ML)
Prerequisites: Basic knowledge of first-order logic will be helpful but not essential as we will provide a brief refresher at the start of the tutorial.
Tutorial Outline
The main limitations of standard ML approaches include poor generalisation, a lack of interpretability, and a need for large numbers of training examples (Marcus 2018; Chollet 2019; Bengio et al. 2019). Unbeknown to many researchers, recent work in Inductive Logic Programming (ILP) (Muggleton 1991; Cropper et al. 2022) has shown promise at addressing these issues. ILP is a form of AI that combines knowledge representation and machine learning. In other words, ILP combines logical reasoning and data-driven learning.
The goal of ILP is to induce a hypothesis logic that generalises training examples and background knowledge. The distinguishing feature of ILP is that it uses first-order logic to represent hypotheses, examples, and background knowledge. This tutorial will build on a recent JAIR paper (Cropper and Dumancic 2022) and a AAAI23 tutorial and provide an introduction to ILP. We will cover major recent breakthroughs (Cropper et al. 2022), notably in recursion and predicate invention. We will also emphasise the connections between ILP and the wider AI community, principally the KR community. We hope to bridge the gap between these communities and foster future research by highlighting these connections.
Goals of the Tutorial
The two main goals of this tutorial are to (G1) introduce ILP to a KR audience and (G2) help bridge the gap between the ML and KR communities. For G1, recent breakthroughs in ILP have shown promise at addressing the major limitations of standard ML approaches (Cropper et al. 2022), chiefly through learning recursive theories and predicate invention (the automatic discovery of novel high-level concepts). Our goal is to disseminate these breakthroughs to a wider audience.
For G2, ILP is uniquely placed to attract a broad AI audience and help bridge the gap between ML and KR. Because it uses logic programming as a uniform representation for data, the tutorial will interest the logic programming and knowledge representation and reasoning communities. For instance, many recent ILP approaches use answer set programming (Corapi, Russo, and Lupu 2011; Law et al. 2020; Kaminski, Eiter, and Inoue 2019; Schüller and Benz 2018; Evans et al. 2021). Similarly, many recent advances in ILP come from using state-of-the-art techniques from constraint programming and Boolean satisfiability.
We want to bridge the gap between ILP and these communities. By doing so, ILP can greatly benefit from the ideas and expertise of researchers in these fields. Similarly, other communities can benefit from the challenges faced by ILP. By the end of the tutorial, we hope to have increased the likelihood of collaboration and interchange of ideas between the communities.
History
This tutorial is a revised and improved version of a tutorial ran at AAAI23. The motivation for running a tutorial now is that ILP has recently turned 30 (Cropper et al. 2022). To acknowledge this milestone, two of the presenters recently published a JAIR paper (Cropper and Dumancic 2022) aimed at introducing ILP to a broad audience, which has thus far been well received. This tutorial will build on this paper.
Moreover, the past decade has seen major breakthroughs in ILP (Cropper et al. 2022), which have opened up ILP to many new and interesting programs, such as learning string transformation programs (Lin et al. 2014), understanding biological networks (Bain and Srinivasan 2018), and explaining sensory data (Evans et al. 2021).
Most Related Recent Tutorials:
• From Statistical Relational to Neuro-Symbolic Artificial Intelligence by L. De Raedt, S. Dumančić, R. Manhaeve and G. Marra (AAAI 2021, IJCAI 2021).
• Automated Synthesis: Towards the Holy Grail of AI by K. Meel, S. Chakraborty, S. Akshay, P. Golia, S. Roy (AAAI 2022, IJCAI 2022)
Content
We will split the tutorial into five parts.
We will start with a high-level introduction to ILP. We will use three different motivating scenarios/examples throughout the tutorial: (i) algorithm discovery, (ii) game playing, and (iii) scientific discovery. We will highlight the key advantages of ILP, principally explainability, knowledge transfer, and the ability to generalise from small numbers of examples. We will introduce the basic concepts of relational learning and logic programming. To do so, we will use standard textbooks (De Raedt 2008; Lloyd 2012; Gebser et al. 2012).
In this section, we will explain that building an ILP system requires making several choices or assumptions. Understanding these choices is key to understanding ILP. We will explain these choices.
Learning setting
A key decision is how to represent examples. Representations include boolean concepts, input-output examples, interpretations, and transitions. The representation determines the learning setting which in turn defines what it means to solve the ILP problem. We will introduce the various ILP learning settings.
Representation language
ILP represents data as logic programs. There are, however, many logic programming languages, each with strengths and weaknesses. For instance, Prolog is a Turing-complete logic programming language. Datalog is a syntactical subset of Prolog that sacrifices features (such as data structures) and expressivity (it is not Turing-complete) to gain efficiency and decidability. We will explain that choosing a suitable representation language is crucial in determining which problems a system can solve.
Defining the hypothesis space
The fundamental ILP problem is to search the hypothesis space for a suitable hypothesis. The hypothesis space contains all possible programs that can be built in the chosen representation language. Unrestricted, the hypothesis space is infinite, so it is vital to restrict it to make the search feasible. As with all ML techniques, ILP restricts the hypothesis space by enforcing an inductive bias. For instance, a language bias enforces restrictions on hypotheses, such as how many variables or relations can be in a hypothesis. We will explain how choosing an appropriate language bias is necessary for efficient learning and is a major challenge. We will introduce the most commonly used language biases, namely modes (Muggleton 1995) and metarules (Muggleton, Lin, and Tamaddoni-Nezhad 2015).
Search method
Having defined the hypothesis space, the problem is to efficiently search it. The traditional way to categorise approaches is whether they use a top-down (Quinlan 1990; Blockeel and De Raedt 1998) or bottom-up (Muggleton and Feng 1990; Muggleton and Buntine 1988) search. However, a third approach has recently emerged called meta-level ILP (Inoue 2016; Cropper et al. 2022). Most meta-level approaches delegate the search for a hypothesis to an off-the-shelf solver, such as an ASP (Law et al. 2020) or constraint solver (Albarghouthi et al. 2017). We will explain the various search methods.
In part 3, we will focus on which features to support in an ILP system, such as whether to support noisy data. We will highlight the main features and the current approaches to support them.
Noise
Noise handling is important in ML. Most ILP systems support learning programs from noisy examples. For instance, ILASP (Law, Russo, and Broda 2014) uses the optimisation features of ASP solvers to handle noise. We will describe the main approaches to handling noise in ILP.
Optimality
There are often multiple (sometimes an infinite number) hypotheses that solve the ILP problem. In such cases, which hypothesis should we choose? Many recent systems try to learn a textually optimal/minimal hypothesis (Corapi, Russo, and Lupu 2011; Law, Russo, and Broda 2014; Muggleton, Lin, and Tamaddoni-Nezhad 2015; Cropper and Morel 2021). Some recent approaches can even learn the most efficient program (Cropper and Muggleton 2019). We will cover these recent approaches.
Recursion
Recursion is often crucial to learn general theories from small numbers of examples. For instance, consider learning the concept of reachability in a graph. Without recursion, an ILP system would need to learn a separate rule to define reachability at each different depth. By contrast, with recursion, an ILP system can learn a small two-rule program that generalises reachability to arbitrary depth. The ability to learn recursive programs has transformed ILP and has opened many applications, such as string transformations (Lin et al. 2014). We will cover these major breakthroughs.
Predicate invention
Russell (Russell 2019) argues that the automatic invention of new high-level concepts is the most important step needed to reach human-level AI. In ILP the ability to invent new high-level concepts is known as predicate invention (PI). PI has repeatedly been stated as a major challenge (Muggleton et al. 2012). However, recent work has made major breakthroughs in PI (Muggleton, Lin, and Tamaddoni-Nezhad 2015; Hocquette and Muggleton 2020), which we will cover in this section.
To help introduce ILP to a wide audience, we will cover two systems in detail. We will focus on two recent systems because of their strong connections with answer set programming and constraint solving.
ASPAL
ASPAL (Corapi, Russo, and Lupu 2011) was one of the first meta-level ILP systems. ASPAL is one of the simplest ILP systems to explain. It precomputes every possible rule that could be in a hypothesis. It then searches for a subset of the rules that generalises the training examples. To perform this search, ASPAL encodes the problem as an ASP problem. Although simple, ASPAL greatly influenced ILP and many approaches use the same rule selection approach (Law, Russo, and Broda 2014; Raghothaman et al. 2020; Evans and Grefenstette 2018; Kaminski, Eiter, and Inoue 2019; Si et al. 2019; Evans et al. 2021).
POPPER
Similar to ASPAL, POPPER frames the ILP problem as a constraint satisfaction problem (CSP), where each solution to the CSP represents a hypothesis. However, whereas ASPAL and its descendants all first precompute every possible rule in a hypothesis, POPPER (Cropper and Morel 2021) lazily generates programs. The key idea of POPPER is to discover constraints from smaller programs to rule out larger programs. For each hypothesis generated, POPPER tests it on the training examples. If the hypothesis is not a solution, then POPPER tries to explain why. It then builds constraints from the explanation to add to the CSP problem. Adding constraints eliminates solutions to the CSP which in turn prunes the hypothesis space. POPPER has been shown to vastly outperform existing approaches and has been widely used by other researchers, such as to learn explainable solutions to Rubix Cubes (Lakkaraju et al. 2022) and higher-order programs (Purgał, Cerna, and Kaliszyk 2022).
We will conclude the talk by discussing the main challenges faced in ILP and how there are opportunities for collaboration with the wider AI community. We briefly discuss two below.
Learning from raw data
Most ILP systems require data in symbolic form, i.e. as a logic program. However, many real-world data, such as images and speech, cannot easily be translated into a symbolic form. A grand challenge in ILP (and perhaps the whole of AI) is to learn how to both perceive sensory input and learn a symbolic program to explain the input, such as learning to perform addition from MNIST digits. Developing better ILP techniques that can both perceive sensory input and learn complex relational programs would be a major breakthrough not only for ILP but the whole of AI.
Constraint solving
Many recent breakthroughs in ILP come from using constraint solvers. Almost all the recent approaches use ASP. However, ASP is only one form of constraint solving and has limitations for ILP, such as difficulty handling real numbers. Other constraint-solving approaches might be better suited to tackle the challenges faced by ILP. For instance, satisfiability modulo theories (SMT) solvers (de Moura and Bjørner 2008) naturally support arithmetic reasoning over infinite domains, yet have not been used in ILP. Likewise, many ILP problems can be directly translated to cases of the MaxSAT problem. For instance, many of the search problems in POPPER are ideally suited to incremental MaxSAT solvers, so experts in this topic could greatly improve and advance POPPER. Similarly, the organisers of the recent MaxSAT 2022 competition state that they need more challenging incremental MaxSAT problems and benchmarks, which ILP could provide. By running this tutorial we hope to bridge the gap between these communities and foster future research.
Presenter CV
Andrew Cropper
Andrew is a research fellow in the computer science department at the University of Oxford. He runs the Logic and Learning (LoL) group. He is the principal investigator of the ESPRC-funded (£1.4m) Automatic Computer Scientist (AutoCS) project and another EPSRC-funded project on explainable drug design. He previously held an independent junior research fellowship (2018-2021) at the University of Oxford. He studied for a PhD in computer science from Imperial College London (2018) under the supervision of Stephen Muggleton.
Andrew works on ILP. He developed the ILP system Metagol and methods to learn higher-order (Cropper, Morel, and Muggleton 2020) and efficient (Cropper and Muggleton 2019) programs. Andrew has since developed a new area of ILP called learning from failures and the system POPPER (Cropper and Morel 2021; Cropper 2022). His work has won three best paper awards at the ILP conference (2014, 2018, 2019).
Andrew has taught courses on logic and formal proof at the University of Oxford and Stanford University and has given talks on ILP at various institutions, including MIT and Berkeley. With Sebastijan Dumančić, he has written a survey paper (Cropper et al. 2022) and a tutorial-level article (Cropper and Dumancic 2022) on ILP.