Posters I Abstracts


Désirée Martin and Michael W. Schmidt    

Discrepancies Between AI Regulation and Ethics: The case of well-being and beneficence  



Daniel Bracker, Rene Van Woudenberg, Christopher Ranalli

Authorship & ChatGPT

Is ChatGPT an author? Given its capacity to generate something that reads like human-written text in response to prompts, it might seem natural to ascribe authorship to ChatGPT. However, by scrutinizing the normative aspects of authorship, we argue that ChatGPT is not an author. ChatGPT fails to meet the criteria of authorship because it lacks the ability to perform illocutionary speech acts such as promising or asserting, lacks the fitting mental states like knowledge, belief, or intention, and only with many qualifications can provide testimony. Three perspectives are compared: liberalism (which ascribes authorship to ChatGPT), conservatism (which denies ChatGPT's authorship for broadly normative and metaphysical reasons), and moderatism (which treats ChatGPT as if it possesses authorship without committing to the existence of mental states like knowledge, belief, or intention). We conclude that conservatism provides a more nuanced understanding of authorship in AI than liberalism and moderatism, without denying the significant potential, influence, or utility of AI technologies such as ChatGPT.


Eloise Soulier

Should we talk about machine agency?     

An ever-increasing number of digital technologies exhibit capabilities of the kind that were so far typically attributed to humans. These technologies can drive us around, assist us in a broad range of decision-making tasks or in care work, and over the last year, the world has become widely aware that generative AI can also produce sophisticated text and image content. This raises a number of important and broadly discussed ethical issues about safety, trustworthiness, human autonomy, to mention only a few. These developments also require us to think about the way in which we want to, or should, relate to these technologies. Would it be right to consider them our friends, our lovers? Do we want to grant technologies that display human-like skills moral personhood, legal liability? These questions are the object of philosophical discussion in their ethical, legal and practical consequences (e.g. Bryson 2010, Frank and Nyholm 2017, Nyholm 2020), but also in their more fundamental theoretical underpinnings.

This article is interested in the epistemic issue that goes hand in hand with the practical and moral reflections on how we should relate to these technologies: which terms should we use to talk about them? Should we be relating to them using terms that have been so far used to qualify human capacities and behavior, or should we rather use terms that emphasize the differences between machine and human capabilities? I will argue that in the same way that the mode of relation we want to have with these technologies is primarily a normative question, so is also the way in which we talk about them.

In the context of this paper, I will be focusing primarily on the question of the applicability of the concept of “agency” to machines. Drawing from the recent literature on pragmatic approaches to conceptual engineering (Thomasson 2020 and 2021, Löhr 2023), I will argue that whether we talk about machine “agency” or whether we decide to use this term exclusively when talking about humans is a practical and normative question. What is relevant is whether it is morally, instrumentally, or politically useful to talk about machine agency. I will argue that there is no central function to the concept of agency or ways in which it serves our purposes better (Thomasson 2020), that would support its applicability to machines. In particular, I will question the validity of the argument according to which we should accommodate spontaneous agency attributions (Nyholm 2020).

Finally, I will question how fruitful this conceptual project can be. I argue that in general we should “(at least try to) take control over meanings, for if we don’t, others will” (Haslanger 2020) and in the present context, terminological choices have important political and economic implications.


Fatemeh Amirkhani et al.

Psychotherapist Bots: Transference and Countertransference Issues


Psychotherapist bots based on AI are rapidly advancing and will likely become much more widely used in clinical applications in the near future (Holohan & Fiske, 2021). Chatbots and bots (collectively known as bots) may facilitate treatment by reducing the barriers and increasing access to care. In numerous studies, chatbots like Tess, Wysa, Sara, or Woebot have been shown to be as effective as classic face-to-face psychotherapy, using cognitive-behavioral techniques (Fiske et al., 2019). Researchers found that psychotherapist bots reduced depression and anxiety symptoms (Bendig et al., 2019). While the field of bot implementation in psychotherapy is nascent, there are likely to be significant questions.

In human/psychotherapist bot interactions, an overlooked yet crucial subject is transference and countertransference. In literature, the projection of unconscious emotions and feelings towards a significant other, for instance the therapist, is referred to as transference (Freud, 1912). Countertransference describes the feelings that a therapist may have for their clients when they are working with them (Jenks & Oka, 2020). They are both related to the ability of a human to recognize the outside world. Transference and countertransference of feelings/reactions offer valuable insight into the inner world of a patient (Fiske et al., 2019). However, there are clinical and ethical issues relating to these topics in the humans/psychotherapist bots’ interaction and this study aims to clarify these overlooked issues during interactions between patients and bots within a novel therapeutic framework.

Transference and Countertransference with Psychotherapist Bots

Based on the studies, we can see that some users develop human-like connections with their bot (Holohan & Fiske, 2021). It is readily apparent that the apparatus has changed when the therapist is no longer a human but a bot. In the context of user/therapist bot interactions, transference might manifest as the user projecting their emotions, experiences, and expectations onto the AI-based psychotherapist bot. Transferential emotions can provide insights into the user's underlying emotional issues and thought patterns. In the new setting, transference can manifest in two distinct levels:

1- designer/bot: transference in design

The characteristics of the bot can be influenced by the certain fundamental archetypes of the designer which have shaped his personality, including those influenced by his psychological growth during childhood (Dietrich, Fodor, Zucker & Bruckner, 2009).

2- client/bot

• Transference in therapeutic communication

Humans experience negative and positive transferences such as love, hate, admiration, fear, closeness, and distance in relation to the bot therapist, as they do in relation to other humans. There are questions such as whether the transfer experience in relation to the bot therapist is the same as the experience with a human therapist? How might the patient be relating transferentially to the bot (through what words, behaviors, demeanor, etc.)? In approximating the responses of a human therapist, are there specific speech patterns, forms of questioning, or other features of AI communication that might give rise to specific forms of transference in the therapeutic encounter?

Therefore, when designing a therapist bot, we should train it how to guide the path of the conversation to prompt the client to show transferential reactions and then use those reactions and emotions to find out the clients’ defective patterns of interactions and roots of sufferings.

• Transference in therapeutic setting

Traditionally, the patient comes to the therapist's office for treatment. As for the bot, the patient can talk to the bot while lying on the bed in pajamas and without make-up or while sitting in the classroom (Butterworth, 2020). These can shape different modes of transference experiences in relation to the bot. There is a possibility that the bot will show the same transference to everyone due to data bias (Szalai, 2021). In addition, countertransference from the bot can make it more in control, increase the possibility of manipulation, and end the treatment process.

As of yet, limited research has been conducted about countertransference between humans and bot therapists. There is a possibility that countertransference from the bot can make the psychotherapist bot more in control, increase the possibility of manipulation, and end the treatment process.

Some Ethical Issues

Transference can help foster therapeutic alliance, especially in the preliminary stages of the treatment. A positive transference can make it possible for the patient to face difficult topics by helping them feel supported and understood (Holohan & Fiske, 2021). In addition, by transferring emotions from the patient to the bot, the patient can feel a sense of belonging (Dietrich, Fodor, Zucker & Bruckner, 2009) which can trigger moral dilemmas and we examine a number of concerns and issues pertaining to the client’s transferential emotions and reactions during the period of identification of transferential reactions as well as the management and resolution phase of these emotions. Here are a few instances illustrating these concerns:

• Empathy

Empathy is required for addressing transference and countertransference to respond authentically and compassionately to the patient's suffering. It is one of the critiques of integrating bots into mental healthcare practices that they cannot empathize (Montemayor et al., 2021). During in-person therapy, the brain focuses partly on the words spoken but also considers dozens of non-verbal signals and lays the groundwork for intimacy (Sayers, 2021). As psychotherapist bots imitate empathy, curiosity, and understanding, users and bots will form a therapeutic alliance which raises a number of ethical issues (Fosch Villaronga, 2019).

• Judgment and Acceptance

Patients using psychotherapeutic bots feel relieved in not being judged since they know they are interacting with a bot. This makes them feel safe, making it easier to talk about difficult topics. However, the patient might contrast this absence of judgment with their mother's overly judgmental attitude. They might ultimately fail to take their bot seriously, or even treat it with disdain because, through their transference, they ascribe a lack of authority to the bot, even though interacting with it makes them feel safe and cared for (Holohan & Fiske, 2021).


• Privacy/Safety

It is essential that the therapist allow the patient to use the therapist as a symbol and as a container for conflicted and unresolved emotions. The transference of emotional issues must be protected and analyzed. As a result, confidentiality, privacy, and forbidden matters are crucial factors in therapy (Gordon, 1993). Additionally, data-driven sexist or racist bias may be introduced onto mental health devices by AI-enabled algorithms, causing unintended harm or excluding people (Fiske et al., 2019).


Bendig, E., Erb, B., Schulze-Thuesing, L., & Baumeister, H. (2019). The next generation: Chatbots in clinical psychology and psychotherapy to Foster Mental Health – a scoping review. Verhaltenstherapie, 1–13.

Butterworth, J. (2020). Online Therapy: A New Dimension to Transference and Countertransference. Journal of the New Zealand College of clinical psychologist, 30(1), 7-8.

Dietrich, D., Fodor, G., Zucker, G., & Bruckner, D. (Eds.). (2009). Simulating the mind: A technical neuropsychoanalytical approach. Vienna: Springer Vienna.

Fiske, A., Henningsen, P., & Buyx, A. (2019). Your bot therapist will see you now: Ethical implications of embodied artificial intelligence in psychiatry, psychology, and psychotherapy. Journal of Medical Internet Research, 21(5).

Fosch Villaronga, E. (2019). “I love you,” said the bot: Boundaries of the use of emotions in human-bot interactions. Human–Computer Interaction Series, 93–110.

Freud, S. (1912). The dynamics of transference. In J. Strachey (Ed.), The standard edition of the complete psychological works of Sigmund Freud, Volume XII (pp. 97-108). Hogarth Press.

Gordon, R. M. (1993). Ethics based on protection of the transference. Issues in Psychoanalytic Psychology, 15(2), 95–105.

Haraway, D. J. (2008). When Species Meet. Minneapolis: University of Minnesota Press.

Holohan, M., & Fiske, A. (2021). “like I’m talking to a real person”: Exploring the meaning of transference for the use and design of AI-based applications in psychotherapy. Frontiers in Psychology, 12.

Jenks, D. B., & Oka, M. (2020). Breaking hearts: Ethically handling transference and countertransference in therapy. The American Journal of Family Therapy, 49(5), 443–460.

Luxton, D. D. (2014). Recommendations for the ethical use and design of artificial intelligent care providers. Artificial intelligence in medicine, 62(1), 1-10.

Miner, A. S., Laranjo, L., & Kocaballi, A. B. (2020). Chatbots in the fight against the COVID-19 pandemic. Npj Digital Medicine, 3(1).

Montemayor, C., Halpern, J., & Fairweather, A. (2021). In principle obstacles for empathic AI: why we can’t replace human empathy in healthcare. AI &Amp; SOCIETY.

Norton, C. L. (2011). Developing empathy: A case study exploring transference and countertransference with adolescent females who self-injure. Journal of Social Work Practice, 25(1), 95–107.

Prasko, J., Diveky, T., Grambal, A., Kamaradova, D., Mozny, P., Sigmundova, Z., Slepecky, M., & Vyskocilova, J. (2010). Transference and countertransference in cognitive behavioral therapy. Biomedical Papers, 154(3), 189–197.

Sayers, J. (2021). Online psychotherapy: Transference and countertransference issues. British Journal of Psychotherapy, 37(2), 223–233.

Stiefel, S. (2018). 'The Chatbot Will See You Now': Mental Health Confidentiality Concerns in Software Therapy. SSRN Electronic Journal.

Szalai, J. (2021). The potential use of artificial intelligence in the therapy of borderline personality disorder. Journal of Evaluation in Clinical Practice, 27(3), 491-496.

Weber-Guskar, E. (2021). How to feel about emotionalized artificial intelligence? when bot pets, holograms, and chatbots become affective partners. Ethics and Information Technology, 23(4), 601–610.


Herman Veluwenkamp and Daphne Brandenburg          

On Reproaching Robots      



Huanfang Dong

Induction, Model selection, and Bias

In this paper I will formulate and justify two ideas. The first idea is that model selection problems are the ground of inductive problems in the sense of scientific reduction. The second idea is that methods for model selection cause one import class of biases in inductive inference, like unfairness in machine learning. Based on these ideas, we propose a framework of studying issues like transparency and fairness in machine learning and causal discovery from the perspective of philosophy of science.

1. Model selection as ground of inductive problems

Model selection is a concept of statistical inference. In statistics, model selection [1] usually means determining criteria used to select one best function class form a family of classes, like AIC, BIC etc. Statistical inference (and statistical learning) consists of model selection and parameter estimation. So model selection is ground of statistical inference and statistical learning. We call this statement as the Ground statement. In this paper we generalize the meaning of model. Model could be a language, a conceptual space, confirmation measure or a neural network. We will generalize the Ground statement into classical inductive problems, which will be helpful to uniformly study inductive problem. With this uniform perspective, we could possibly study fundamental issues like transparency and fairness in machine learning and AI in the framework of philosophy of science, also could build a bridge between the community of philosophy of science and the community of fundamental research in machine learning.

We will justify the Ground statement on Grue problem, black raven problem and causal discovery problem. According to Peter Gardenfors’ theory of three levels of inductive inference [2], any inductive problem could be inferred in three possible levels, i.e. the linguistic level, the conceptual level and the subconceptual level. Grue problem have been tried from many perspectives [3]. Quine’s solution [4] could be considered as inference in the linguistic level, and Grue problem was reduced into language selection. Gardenfors’ solution [5] could be considered as inference in the conceptual level and Grue problem was reduced into selection of best conceptual space. Sober’s solution [6] could be considered as inference in the subconceptual level and Grue problem is reduced into either selection of prior probability or selection of confirmation measure. We can conclude that model selection is more basic than Grue problem in the sense of scientific reduction.

2. Bias caused by methods for model selection

Bias is a core concept in statistical learning theory [7]. No-free-lunch theorem (NFL) is one pillar of statistical learning theory. NFL states the relation between bias and model selection [8]. Methods for model selection cause one class of bias in statistical learning. According to the Ground statement in the first section, we can conclude that model selection also cause bias in inductive inference. For example, Harman [9] proposed in a solution for Grue problem by linking Grue problem and regression problem and applying simplicity characterized by the order and numbers of parameters of function class. In this section we will give a uniform analysis of bias caused by methods for model selection in inductive inference consisting of Grue problem, statistical inference, machine learning and causal discovery. Such analysis provides a framework of studying issues like transparency and fairness in machine learning and causal discovery from the perspective of philosophy of science. AIC [10] is a method for model selection in statistical inference, which provides a criterion selecting function class. The underlying idea of AIC is a tradeoff between best fitting and simplicity characterized by the number of parameters. BIC [11], as another method from Bayesian perspective, is similar to AIC w.r.t the tradeoff but with different characterization of simplicity. But both AIC and BIC cause bias, which are justified in statistics [12] and in philosophy [13]. Methods for model selection based on simplicity or Ockham Razor also cause bias in causal discovery problem [14] [15].



[1] Jie Ding, Vahid Tarokh, and Yuhong Yang. Model Selection Techniques- An Overview. IEEE Signal Processing Magazine, November 2018

[2] Peter Gardenfors. Three levels of inductive inference. Lund University Cognitive Studies – LUCS 9 1992. ISSN 1101–8453.

[3] Douglas Stalker. Grue. The New Riddle of Induction. 1994

[4] W.V. Quine. Natural Kinds. From Essays in Honur of Carl G. Hempel, edited by Nicholas Rescher et al., 1-23, Dordrecht: D. Reidel, 1970.

[5] Peter Gardenfors. Induction, Conceptual Spaces and AI. Philosophy of Science, Mar., 1990, Vol. 57, No. 1 (Mar., 1990), pp. 78-95

[6] Elliott Sober. No Model, No Inference: A Bayesian Primer on the Grue Problem.

[7] Shai Shalev-Shwartz, Shai Ben-David. Understanding machine learning from theory to algorithms. 2014

[8] Tom Sterkenburg and Peter Grünwald. The No-Free-Lunch Theorems of Supervised Learning. Synthese. 2021

[9] Gilbert Harman and Sanjeev Kulkarni. Reliable Reasoning. 2007

[10] Malcolm Forster and Elliott Sober. How to Tell when Simpler, More Unified, or Less Ad Hoc Theories will Provide More Accurate Predictions. British Journal for the Philosophy of Science 45 (1994). 1-35

[11] Prasanta S. Bandyopadhyay and Robert J. Boik. The Curve Fitting Problem: A Bayesian Rejoinder. Philosophy of Science, Volume 66 , S390 - S402,1999.

[12] Jun Shao. An asymptotic theory for linear model selection. Statistica Sinica 7(1997), 221-264.

[13] Kevin Kelly. Simplicity, Truth and Probability. Handbook Philosophy of Statistics 2011.

[14] Jiji Zhang. A Comparison of Three Occam’s Razors for Markovian Causal Models. British Journal for the Philosophy of Science 64 (2013), 423–448

[15] Malcolm Forster, Garvesh Raskutti, Reuben Stern, and Naftali Weinberger. The Frugal Inference of Causal Relations. British Journal for the Philosophy of Science 69 (2018), 821–848


Jacob Sparks and Ava Thomas Wright     

The Human in Human Centered AI          



Martha Kunicki

Co-Creativity with Artificial Intelligence



Alex Wiegmann et al.          

Humans’ and GPT-4’s judgments about lying and falsity in borderline cases across six languages and cultures

Our study examines GPT-4’s alignment with human perceptions of lying and falsity by comparing its responses to those of 3,660 participants from ten countries. We focus on deceptive implicatures, which are technically true statements that imply something false. For example, Jack, suspecting Amanda isn’t over her ex, Paul, asks if she’s seen him recently. Amanda, who met Paul just that morning, replies, “Paul has been sick for two weeks.” Her response, while true about Paul’s illness, misleadingly implies she hasn’t seen him, which is false.

GPT-4 evaluated 11 vignettes in six languages, with 30 responses per condition (the last vignette was used as a control). Its lie attributions (71%) exceeded human attributions (57%). While humans considered six out of ten vignettes as lies, GPT-4 judged nine as such. Conversely, GPT-4’s falsity attributions (42%) were lower than humans’ (49%), with a different distribution across vignettes and languages.

Correlation analysis indicated a strong alignment between GPT-4 and human judgments, with r=0.5 for lie attributions and r=0.74 for falsity attributions. There was also a high correlation across languages among humans (lie r=0.9, falsity r=0.87) and within GPT-4 (lie r=0.87, falsity r=0.6).

Although a strong correlation was found, systematic differences exist between human and GPT-4 attributions of lying and falsity.


Anders Søgaard

5 Positions in the LLM Understanding Debate



Andre Schmiljun and Alexander Scheidt  

The Sapience Problem of Artificial Intelligence: Exploring the Distinction Between Sapience and Intelligence

Joseph Weizenbaum (1976) distinguishes artificial intelligence from human wisdom. He argues that we should refrain from assigning certain tasks to machines unless they possess wisdom. In this paper, we seek to provide a more precise definition of the differences between wisdom and intelligence. Our argument highlights that wisdom – or sapience – possesses properties not encompassed by current AI approaches.

Contemporary artificial intelligence (AI) primarily relies on machine learning, particularly artificial neural networks. Conversely, older AI approaches, referred to as "good old-fashioned AI", employ rule-based methods. Both AI approaches can be explicated through simple philosophical descriptions. Rule-based approaches align with logical reasoning and symbolic representation, echoing ideas found in Leibniz's characteristica universalis and Frege's Begriffsschrift (1879). In contrast, machine learning algorithms like decision trees, Bayes, or KNN correspond to empiricist approaches in epistemology. Deep Learning, for example, resonates with Locke's notion that sensory data are imprinted in the mind as primitive ideas. As envisioned in behaviorism and cybernetics, these evolve through complex and multi-layered conditioning processes. Overall, current AI approaches represent significant achievements in explicating specific facets of thinking and learning, enabling machines to process them effectively. It is to be expected that future research in the philosophy of AI will further explicate facets of the mind.

Nevertheless, doubts persist regarding the creation of a machine capable of "contemplating the whole domain of human thought." (Weizenbaum, 1976, 203). Despite the absence of precise definitions, such a machine could be referred to as Artificial General Intelligence (AGI) in contemporary terminology (Goertzel, 2014; McLean et al., 2023).

Building on Weizenbaum's (1976) insights, we present an argument to substantiate these doubts. Our argument is rooted in the recognition that AI is fundamentally a science. Similar to other sciences, solving one problem within AI often begets a new set of unresolved problems. The scientific process adheres to Descartes' method of rational reasoning, which involves deconstructing a problem into manageable sub-problems and addressing them incrementally until the overarching problem is resolved. Progress in knowledge is derived from methodical problem-solving, one task at a time. AI research follows this task-oriented approach. As long as the task is well-defined, there can exist an algorithm to address it, at least in approximation.

However, this method presupposes well-defined problems. When problems are vague and only intuitively accessible, only partial solutions can be attained. In the absence of clear criteria, comprehensive solutions remain elusive. Nonetheless, this state of ambiguity and vagueness is an inherent element of science, often referred to as the "context of discovery", characterized by states of “vague puzzlement” (Belnap & Steel, 1976, 11) or “p-predicament” (Bromberger, 1992, 4). These states emerge when concepts and observations contradict newly discovered solutions. Human reason, or sapience, is marked by the capacity to ask questions and intuitively discern problems. Science's value lies in the ability to intuitively identify genuine problems and translate them into problems that can be discursively addressed. Therefore, it may be a fundamental challenge for AI research to solve the intuitive aspects of the human mind, as vagueness and tacit knowledge involve nuanced and context-dependent elements that cannot be defined with sufficient precision.

One potential solution to this challenge is to translate the intuitive aspects of the discovery context into machine learning. If intuition or indeterminacy, central to human exploration, could be articulated explicitly, then we could establish clear criteria for a "sapient machine." In this paper, we present various attempts, including creative abduction, interrogative logic, and epistemic curiosity, which are suitable for explicating states of puzzlement and active learning.

Examining past and present AI approaches, we find that endeavors related to sapience are rare. The term "artificial intelligence," as conceived at the Dartmouth conference, still predominantly revolves around methodologies that define intelligence as a specialized problem-solving capability. Even today's multimodal generative AI models, despite their impressive performance, are fundamentally designed to solve specific, well-defined problems. They rely on human-provided instructions and feedback, involving partially supervised learning with human-generated training data. This form of learning diverges significantly from the original concept of intelligence, which encompasses understanding and intuition.

Sapience, in addition to its characterization as vague intuition, can also encompass understanding. However, the term "understanding" has fallen out of favor in epistemology, especially within hard-criteria-oriented perspectives, following the separation of "humanities" and "natural sciences" and, by extension, "understanding" and "explaining." "Understanding" is often regarded as a vague term with strong connotations of human empathy (Salmon, 1989, 127). In our paper, we argue that fundamental AI research should place greater emphasis on these facets of understanding.

As Weizenbaum (1976) underscores, insight and understanding are intrinsic to human thought and action but not inherent in the concept of "artificial intelligence." Even in contemporary AI research, few approaches attempt to bridge the gap between understanding and explanation, or tackle this complex problem. This lack of focus in AI research has ethical implications, as it prioritizes what is achievable over what should be done, as Weizenbaum (1976) asserts. However, "ought" is a foundational element of ethics. Moreover, considering the sapience problem is also a prerequisite for any consideration of AGI.


Barnaby Crook

Risks Deriving from the Agential Profiles of Modern AI Systems

Modern AI systems process large volumes of information to learn complex representations that support goal-directed behaviours (LeCun et al., 2015). This makes them markedly more agential than traditional tools. At the same time, such systems lack critical aspects of biological agency such as embodiment, animacy, and self-maintenance, thus distinguishing them from living organisms (Moreno & Etxeberria, 2005). In this paper, I argue that this combination of properties creates a challenge for categorising and reasoning about modern AI systems, and that this challenge plays a role in explaining their potential to subvert human goals. In particular, modern AI systems combine 1) an almost total absence of the superficial features which ordinarily allow us to recognise the presence of adversarial agents, and 2) sophisticated information processing capabilities which support goal-directed behaviour. This distinctive agential profile can be dangerous when AI systems pursue goals incompatible with our own in a shared environment (Russell & Norvig, 2020). Highlighting the increase of damaging outcomes such as digital addiction (Meng et al., 2022), I argue that the agential profiles of modern AI systems play a crucial and underappreciated role in explaining how and why they produce harms.

In the last decade, deep learning has become the dominant way to develop AI systems (Chollet, 2021; LeCun et al., 2015). Such systems have achieved impressive results in a wide variety of domains and become deeply integrated into numerous everyday technologies (Chen et al., 2019; Wang et al., 2022). These developments have prompted discussions in philosophy, cognitive science, and related fields about the extent to which such systems should be considered agents (e.g., Popa, 2021). Here, I survey literature from several fields to extract dimensions of variation which pertain to the attribution of agency (e.g., Bandura, 2006; Okasha, 2018; Moreno & Etxeberria, 2005). Having done so, I analyse modern AI systems with respect to those dimensions to derive an agential profile, i.e., a specification of the extent to which such systems exhibit each of the relevant properties. Modern AI systems have a very particular agential profile. They are lacking in properties foregrounded in discussions of biological agency, including embodiment, organizational closure, and autonomy (Moreno & Etxeberria, 2005). They also lack features considered constitutive of psychological notions of agency, such as self-reflection and consciousness (Swanepoel, 2021). However, modern AI systems do exhibit important dimensions of agency. In particular, they are goal-directed, exhibit flexible, experience-dependent behaviour, are adapted to a particular environment, and, in some cases, can produce and carry out plans (Russell & Norvig, 2020).

Various authors have commented on the risks posed by modern AI systems’ integration into the attention economy (Chianella, 2021; Davenport & Beck, 2001; Floridi et al., 2018). I contend that the agential profiles of such systems play an underappreciated role in contributing to these harms. My argument is as follows. Modern AI systems lack the core features we rely on to recognize agency in other systems, such as embodiment and animacy (Carey & Spelke, 1994; Gergely & Jacob, 2012). However, they possess critical features that make it necessary to negotiate agency, where this means continuing to achieve one’s goals in the presence of other agential systems. The dissociation of features that enable recognition of agency and those that demand negotiation of agency gives rise to a particular hazard. That is, it leads to scenarios in which one is engaged in interaction with adversarial agential systems unknowingly. AI systems are adversarial with respect to a human subject if they behave as if they are maximising a performance measure whose value depends on the subject’s behaviour being contrary to their own higher-order goals (Russell & Norvig, 2020). In such cases, especially with highly capable AI systems, there is a danger of human subjects’ higher-order goals being subverted.

Though the argument above is presented in abstract terms, I claim that the description is satisfied by instances of habitual overuse of digital technologies involving AI recommendation systems (Hasan et al., 2018). In particular, I suggest an AI system’s agential profile plays a crucial role in explaining a subject’s maladaptive habitual behaviour when the following conditions are met. First, the subject displays maladaptive habitual behaviour (i.e., engages in habitual behaviour which consistently undermines their higher-order goals) (Bayer et al., 2022). Second, the AI system behaves as though it is maximising a performance measure whose value depends on the subject’s behaviour (Russell & Norvig, 2020). Third, the maximisation of that performance measure depends upon the subject behaving in ways incompatible with that subject’s own higher-order goals (Franklin et al., 2022). Fourth, the agential properties of the AI system make a (significant) difference to its capacity to induce the habitual behaviours in question. Fifth, the difficulty of recognising the agential properties of the AI system makes a difference to its capacity to induce the habitual behaviours in question.

If the argument above is correct, modern AI systems’ agential profiles are crucial to explaining why they are liable to subvert human goals. Such explanations do not compete with, but augment explanations of maladaptive habit formation in terms of the brain mechanisms of habit formation (e.g., Serenko & Turel, 2022). Though I present theoretical reasons supporting the plausibility of my argument, empirical work is needed to assess whether and to what degree it holds in practice. If my claims are borne out empirically, there are implications for ameliorative policies. For example, cues indicating that one is interacting with a system with a particular agential profile could induce more mindful and prudential behaviour, limiting the danger of maladaptive habit formation. In the longer term, assuming society continues to produce and deploy AI systems with unfamiliar agential profiles, refinement of our collective conceptual understanding through education may be required to protect human values from further risks.


Bandura, A. (2006). Toward a Psychology of Human Agency. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 1(2), 164–180.

Bayer, J. B., Anderson, I. A., & Tokunaga, R. S. (2022). Building and breaking social media habits. Current Opinion in Psychology, 45, 101303.

Carey, S., & Spelke, E. (1994). Domain-specific knowledge and conceptual change. In Mapping the mind: Domain specificity in cognition and culture (pp. 169–200). Cambridge University Press.

Chen, M., Beutel, A., Covington, P., Jain, S., Belletti, F., & Chi, E. H. (2019). Top-K Off-Policy Correction for a REINFORCE Recommender System. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 456–464.

Chianella, R. (2021). Addictive digital experiences: The influence of artificial intelligence and more-than-human design. Blucher Design Proceedings, 9(5), 414–425.

Chollet, F. (2021). Deep Learning with Python, Second Edition. Simon and Schuster.

Davenport, T. H., & Beck, J. C. (2001). The Attention economy. Ubiquity, 2001(May), 1.

Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., Chazerand, P., Dignum, V., Luetge, C., Madelin, R., Pagallo, U., Rossi, F., Schafer, B., Valcke, P., & Vayena, E. (2018). AI4People—An Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations. Minds and Machines, 28(4), 689–707.

Franklin, M., Ashton, H., Gorman, R., & Armstrong, S. (2022). Recognising the importance of preference change: A call for a coordinated multidisciplinary research effort in the age of AI (arXiv:2203.10525). arXiv.

Gergely, G., & Jacob, P. (2012). Reasoning about instrumental and communicative agency in human infancy. Advances in Child Development and Behavior, 43, 59–94.

Hasan, M. R., Jha, A. K., & Liu, Y. (2018). Excessive use of online video streaming services: Impact of recommender system use, psychological factors, and motives. Computers in Human Behavior, 80, 220–228.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), Article 7553.

Meng, S.-Q., Cheng, J.-L., Li, Y.-Y., Yang, X.-Q., Zheng, J.-W., Chang, X.-W., Shi, Y., Chen, Y., Lu, L., Sun, Y., Bao, Y.-P., & Shi, J. (2022). Global prevalence of digital addiction in general population: A systematic review and meta-analysis. Clinical Psychology Review, 92, 102128.

Moreno, A., & Etxeberria, A. (2005). Agency in natural and artificial systems. Artificial Life, 11(1–2), 161–175.

Okasha, S. (2018). Agents and Goals in Evolution. Oxford University Press.

Popa, E. (2021). Human Goals Are Constitutive of Agency in Artificial Intelligence. Philosophy and Technology, 34(4), 1731–1750.

Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th Edition). Pearson.

Serenko, A., & Turel, O. (2022). Directing Technology Addiction Research in Information Systems: Part II. Understanding Technology Addiction. ACM SIGMIS Database: The DATABASE for Advances in Information Systems, 53(3), 71–90.

Swanepoel, D. (2021). Does Artificial Intelligence Have Agency? In R. W. Clowes, K. Gärtner, & I. Hipólito (Eds.), The Mind-Technology Problem: Investigating Minds, Selves and 21st Century Artefacts (pp. 83–104). Springer International Publishing.

Wang, Y., Wang, J., Zhang, W., Zhan, Y., Guo, S., Zheng, Q., & Wang, X. (2022). A survey on deploying mobile deep learning applications: A systemic and technical perspective. Digital Communications and Networks, 8(1), 1–17.


Céline Budding and Carlos Zednik

Does Explainable AI Need Cognitive Models?

Large language models (LLMs), such as ChatGPT (OpenAI, 2022) and Claude (Anthropic AI, 2023) have recently exhibited impressive performance in conversational AI, generating seemingly human-like and coherent texts. In contrast to earlier language modeling methods, transformer-based LLMs do not contain any prior linguistic knowledge or rules, but are trained on next-word prediction: predicting the most likely next word in an input sequence based on large quantities of training data. Despite this seemingly simple training objective, transformer-based LLMs far exceed the performance of earlier methods.

Given this impressive performance, there is an increased interest in not only studying the behavior of these systems, but also investigating the underlying processing and how these systems make their predictions. That is, the focus is not only on the performance, but increasingly also on something like what Chomsky (1965) called the underlying competence. For instance, it has been proposed that LLMs might not just perform next-word prediction based on surface statistics, but that they in fact learn something akin to symbolic rules (Lepori et al., 2023; Pavlick, 2023) or even knowledge (Dai et al., 2022; McGrath et al., 2022; Meng et al., 2022; Yildirim & Paul, 2023). While it might seem appealing to attribute knowledge to LLMs to explain their impressive performance, what seems to be missing in the recent literature is a clear way to determine if LLMs are capable of acquiring a form of knowledge and, if so, when knowledge can be attributed.

Taking inspiration from earlier debates between symbolic and connectionist AI in the 1980s and 90s (e.g. Clark, 1991; Fodor & Pylyshyn, 1988), I propose that tacit knowledge, as defined by Davies (1990, 2015), provides a suitable way to conceptualize potential knowledge in LLMs. That is, if the constraints as set out by Davies (1990) are met by a particular LLM, that system can be said to have a form of knowledge, namely tacit knowledge. Tacit knowledge, in this context, refers to implicitly represented rules or structures that causally affect the system’s behavior. As connectionist systems are known not to have explicit knowledge, in contrast to earlier symbolic systems, tacit knowledge provides a promising way to nevertheless conceptualize and identify meaningful representations in the model internals. The aim of this contribution is to further explain Davies’ account of tacit knowledge, in particular the main constraint, and to show that at least some current transformer-based LLMs meet the constraints and could thus be said to acquire a form of knowledge.

While Davies’ account of tacit knowledge (1990) bears similarities to earlier accounts of tacit knowledge (e.g. Chomsky, 1965) insofar as it appeals to implicit rules, it is targeted specifically to connectionist networks. That is, Davies challenged the claim that connectionist networks cannot have knowledge and proposed that their behavior might be guided by implicit, rather than explicit, rules. To further substantiate his claim, Davies proposed a number of constraints, the most important being causal systematicity. Causal systematicity specifies what kind of implicit rules might be learned by a connectionist system by constraining the internal processing. In particular, causal systematicity requires that the internal processing of a system reflects a given pattern in the data, for example semantic structure. For example, imagine a network that only processes two kinds of inputs: ones related to ‘Berlin’ and ones related to ‘Paris’. In a causally systematic network, there should be one causal structure, called a causal common factor, that processes all inputs related to Berlin, and another processing all inputs related to Paris.

For LLMs to be attributed tacit knowledge, they should thus meet the constraint of causal systematicity: their internal processing should reflect the semantic structure of the data. Nevertheless, Davies himself argued that more complex connectionist networks with distributed representation, which LLMs are an example of, cannot meet this constraint. If this objection were to hold, tacit knowledge would be unsuitable for characterizing potential knowledge in LLMs. In this contribution, I challenge Davies’ objection and argue that his objection does not hold for contemporary transformer-based LLMs. More precisely, architectural innovations in the transformer architecture, as compared to the connectionist networks Davies was concerned with, ensure that LLMs could in principle be causally systematic. As such, Davies’ objection to applying tacit knowledge to connectionist systems does not hold for transformer-based LLMs.

So far, I have explained the main constraint that should be met for attributing tacit knowledge and addressed a potential objection. I have not yet shown, however, whether any LLMs in fact meet this constraint. In the final part of my contribution, I analyze recent technical work by Meng and colleagues (2022), who identify representations of what they call factual knowledge in the model internals of a recent LLM. I show that these representations seem to fulfill the requirement of causal systematicity. While further verification of these results is needed, this suggests that some LLMs, like the one investigated by Meng and colleagues (2022), could meet the constraints for tacit knowledge and, as such, can be said to acquire a form of knowledge.

Taken together, the aims of this contribution are as follows: 1) to illustrate why LLMs might be thought to learn more than mere next-word prediction, 2) to propose Davies’ account of tacit knowledge as way to characterize and provide conditions for attributing a form of knowledge to LLMs, 3) to address Davies’ objection to applying tacit knowledge to connectionist networks, and 4) to illustrate how this framework can be applied to contemporary LLMs. Although this work is exploratory, these results suggest that LLMs are capable of acquiring a form of knowledge, given that they meet the conditions as set out by Davies.


Anthropic AI. (2023). Introducing Claude. Anthropic.

Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press.

Clark, A. (1991). Systematicity, Structured Representations and Cognitive Architecture: A Reply to Fodor and Pylyshyn. In T. Horgan & J. Tienson (Eds.), Connectionism and the Philosophy of Mind (pp. 198–218). Springer Netherlands.

Dai, D., Dong, L., Hao, Y., Sui, Z., Chang, B., & Wei, F. (2022). Knowledge Neurons in Pretrained Transformers (arXiv:2104.08696). arXiv.

Davies, M. (1990). Knowledge of Rules in Connectionist Networks. Intellectica. Revue de l’Association Pour La Recherche Cognitive, 9(1), 81–126.

Davies, M. (2015). Knowledge (Explicit, Implicit and Tacit): Philosophical Aspects. In International Encyclopedia of the Social & Behavioral Sciences (pp. 74–90). Elsevier.

Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1), 3–71.

Lepori, M. A., Serre, T., & Pavlick, E. (2023). Break It Down: Evidence for Structural Compositionality in Neural Networks (arXiv:2301.10884). arXiv.

McGrath, T., Kapishnikov, A., Tomašev, N., Pearce, A., Hassabis, D., Kim, B., Paquet, U., & Kramnik, V. (2022). Acquisition of Chess Knowledge in AlphaZero. Proceedings of the National Academy of Sciences, 119(47), e2206625119.

Meng, K., Bau, D., Andonian, A., & Belinkov, Y. (2022). Locating and Editing Factual Associations in GPT (arXiv:2202.05262). arXiv.

OpenAI. (2022, November 30). ChatGPT: Optimizing Language Models for Dialogue. OpenAI.

Pavlick, E. (2023). Symbols and grounding in large language models. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 381(2251), 20220041.

Yildirim, I., & Paul, L. A. (2023). From task structures to world models: What do LLMs know? (arXiv:2310.04276). arXiv.