Artificial intelligence (AI) is expected to occupy an increasingly important place in diagnostic tasks in health care.
The principles underlying learning are similar for human and artificial intelligences, but the respective approaches to diagnosis are markedly different.
Clinicians approach diagnosis in an intuitive and deductive manner, whereas AI is chiefly analytical and inductive.
The wholesale replacement of human intelligence by AI in diagnostic tasks is unlikely, apart from some highly targeted tasks; instead, AI should be considered as a tool to help clinicians in their reasoning.
Artificial intelligence (AI) is often presented as the future of medical practice. The concept of AI was developed in the 1950s and has been defined as “the use of a computer to model intelligent behaviour with minimal human intervention.” 1 It is an alternative to human intelligence, particularly as a replacement for the diagnostic skill of physicians. For several years, the scientific literature and lay media have commented that nonhuman intelligence could equal or even exceed human intelligence in diagnostic tasks.2
Human intelligence is evident in the concept of clinical reasoning, 3 which has been defined as “the internal mental processes that a physician uses when approaching clinical situations.”4 This central component of physicians’ competence, once honed, allows them to make diagnoses.3 In medicine, clinical reasoning is often understood from the perspective of cognitive psychology’s information process theory.4 Artificial intelligence may refer to several different methods. Most AI diagnostics are based on machine learning algorithms that are “intelligent” enough to handle difficult and complex problems; algorithms rely on human intelligence for their creation.5 Recently, substantial progress has been made in this field through the resurgence of neural networks — a family of methods of machine learning — and particularly deep neural networks.6 Herein, we focus mainly on machine learning (specifically deep neural networks). We analyze the differences in the ways humans and AI approach diagnostic reasoning to argue that human reasoning will not become obsolete in medical diagnosis.
How do humans and AI perform diagnostic tasks and learn to make diagnoses?
Both humans and AI learn through repeated exposure to clinical cases, referred to as “experiences” for human intelligence and “examples” for AI. For both to develop, feedback, based on the intervention of an expert, is important. A physician solves most clinical problems in an intuitive and deductive way, whereas AI problem-solving depends on access to and analytical and inductive processing of large quantities of data that relate to the case.
Deductive versus inductive; intuitive versus analytical
To learn to make diagnoses, medical students must organize their experiences of many clinical cases in long-term memory.4 However, in addition to broad-ranging experience, the development of expertise requires understanding of context and the way in which disease is presented in that context; this is crucial to being able to solve new cases through a generalization process.7 Immediate, appropriate feedback on decision-making consolidates knowledge and enables future clinical reasoning.7
Physicians mainly use a hypothetico-deductive approach to make diagnoses.8 After generating diagnostic hypotheses early, they spend most of their diagnostic time testing them by collecting more data. This approach is underpinned by cognitive processes that, according to the dual-process theory, can be either intuitive or analytical.7 Intuition — sometimes referred to as “pattern recognition” — is a process that works automatically and subconsciously.7,9 It allows humans to generate diagnostic hypotheses early by taking a few pieces of information, associating them and comparing the result with patterns stored in long-term memory.7 These patterns are built through academic and clinical learning experiences, particularly repeated confrontation with similar situations.8 Intuition allows humans to consider only a few solutions — the most likely in the context — among all those that could be considered given the available data. This approach is essential given the limited capacity of the human brain to process information. Most researchers agree that intuitive processes are the main source of generation of diagnostic hypotheses for humans.10
Machine learning, however, depends on the development of an algorithm that “learns” important features from a data set known as a “training set” to then make predictions about other unknown data.11 For the learning to occur, data used for training must be labelled according to their association with the solution; these data are referred to as the “ground truth.” For example, a patient’s physiologic data must be associated with a label indicating whether the patient is sick or healthy. The ground truth is provided by a human expert (most often a physician), either directly (e.g., image annotations) or through documents (e.g., clinical reports). Thus, unlike humans, who know thousands of small pieces of information (often referred to as “common sense”), AI is limited to the specific information provided for a specific task. Furthermore, for every new task, AI systems must usually start from scratch.
Artificial intelligence systems are composed of a model (representing the learned knowledge), a decision function (making it possible to answer to the problem when a new input is given) and an evaluation metric (to evaluate the quality of the answer provided by AI compared with the ground truth). In AI, acquired knowledge can be stored in different ways. Deep neural networks are composed of layers of interconnected artificial neurons forming a “model.” The architecture of the network and the weights associated with each connection represent a “decision function.” From an input (e.g., a histopathological image), the neural network provides a prediction as an output (e.g., cancer or not cancer). To learn, the algorithm automatically optimizes its solution by calculating an evaluation metric function, which is basically the difference between the output proposed by the algorithm and the ground truth. In deep neural networks, the error computed by the evaluation metric is back-propagated through the layers of the network, and the algorithm modifies the weights of the connections between the neurons. The process is iterated until the algorithm proposes accurate outputs on the training set.
Problem solving by AI is thus different from the hypothetico-deductive approach used by humans. Intuitive reasoning is difficult to model or simulate as it is based on experience that bypasses a conscious “orderly sequential analysis” of a situation, which is the core of an algorithm. Therefore, AI uses an analytical approach in an inductive mode (i.e., it systematically moves from data toward the solution).12 Although humans understand cause-and-effect relations, these are not yet modelled in AI. This subject has been studied for a long time in AI, but it is only recently that first attempts to define an AI that “thinks like a human” have been proposed.13
Data
Physicians need very few data (i.e., 2 to 4 pieces of contextual or clinical information) to generate diagnostic hypotheses through intuition.7,14 Subsequently, and to verify the hypotheses generated, additional data guided by the hypotheses are collected through the interview, clinical examination and additional tests. Human intelligence will transform data collected during the patient interview into something that can be processed through “semantic transformation.”15 For example, clinicians might transform “the first time” into “inaugural,” or “several episodes” into “iterative.”
Most AI systems do not model intuition and therefore require substantial data to make a relevant diagnosis.12 This is why AI is presently most effective in situations where all the data of the problem to be solved are immediately accessible, such as in medical imaging. Artificial intelligence also requires data transformation, but in AI this a much more complex and time-consuming process. Through data integration or data preprocessing, the data must be transformed to be computational, which means that all information needs to be digitized and categorized to be interpreted by the machine. This is one of AI’s great challenges.16
How do humans and AI misdiagnose?
The rate of diagnostic errors in medical practice is estimated at about 5%–15%, depending on the specialty.17 This translates into more than 12 million misdiagnoses annually in the United States alone.18 Cognitive biases are considered to be the cause of most diagnostic errors19 and many biases have been reported in the medical scientific literature.8 Premature closure bias (i.e., the tendency to stop considering other hypotheses after reaching a diagnosis) is considered to be the most common.20 Three other common biases are anchoring bias (the tendency to focus early on 1 or more salient features of the initial presentation of the problem and failure to change this first impression in the light of data gathered later), availability bias (the tendency to consider diagnoses that are easy to remember, often because they have recently been made, as more likely) and confirmation bias (the tendency to consider only confirmatory data in relation to the generated hypothesis, while ignoring or underestimating contradictory data).8
In most instances, the error rate for AI can be calculated accurately by comparing the results provided by the AI model to expected results (considered to be the truth).21 Errors in AI are not comparable to human errors as they mostly result from problems that arise during the learning step, usually poor training data quality or an irrelevant evaluation metric.22 Having a data set that expresses the entire variety of the data and the real associations between them, and that does not contain misclassified examples and does not present any bias that could lead the AI to learn false assumptions, is essential. Other sources of errors, imprecisions or uncertainty could include the use of an inappropriate model (e.g., unable to represent the knowledge to learn) or poor experimental design (e.g., stopping learning too early).
What evidence supports the role of AI in medical diagnosis?
Artificial intelligence was shown to be capable of classifying skin cancers with a level of performance comparable to that of dermatologists when it was trained using a data set of nearly 130 000 images and then tested on its ability to distinguish between 2 common cancers and between a benign and a malignant lesion.2 Artificial intelligence was able to detect diabetic retinopathy just as well as 8 ophthalmologists, while providing more consistent interpretation, high sensitivity and specificity, and an instantaneous result, following training using a data set of nearly 130 000 retinal images and validation using 2 further data sets.23 In an evaluation of more than 30 deep-learning algorithms, 7 diagnostic algorithms were shown to be better than 11 histopathologists at diagnosing breast cancer metastases to lymph nodes in images of tissue sections when human specialists and AI were similarly time constrained.24 An AI algorithm trained on a data set of more than 100 000 images was better than specialist radiologists at detecting pneumonia using chest radiographs.25 A machine-learning framework was trained to perform better than emergency medical dispatchers in recognizing cardiac arrest in emergency phone calls.26
What are the criticisms of AI in medical diagnosis?
Many studies conducted in the field of medical AI have been criticized for lack of scientific rigour, an unsatisfactory evaluation process or insufficient information reported in the methods.27 Moreover, the scientific literature skews toward publishing successful projects, whereas failures are rarely reported on blogs or consumer articles, if they are reported at all. These concerns undermine trust in AI.
A recent article28 described 4 essential characteristics for trusting AI systems: fairness (training data and models must be free of bias to avoid unfair treatment of certain groups of patients), robustness (AI systems should be safe and secure), explainability (decisions provided by AI must be understandable by their users) and transparency (AI systems should include details of their development, deployment and maintenance). Explainability is perhaps the most challenging issue to solve. Although it is usually possible to explain physicians’ reasoning and the origin of their decisions, many of the most powerful AI methods (e.g., deep neural networks) are often criticized for being a “black box.”29 Currently, machine learning on medical data most often takes the form of retrospective analysis of large routinely collected data sets with careful scrutiny of the results proposed by the AI.
An active and fast-growing field of AI seeks to make AI decisions explainable and understandable by users, with any preliminary research studies being conducted to reach this goal.30–32 Another challenge is to propose robust machine-learning methods. 33 Meta-learning34 and transfer learning35 are 2 promising avenues of research to help AI “remember” something and to learn “how to learn.”
Future directions
Several studies have shown the extent to which AI can be used to make and support diagnosis in medicine. Since current evidence supports the effectiveness of AI for only a small selection of diagnostic tasks and human experts remain able to learn and diagnose a wide array of conditions, human intelligence would seem to remain essential to diagnosis for now. However, the consistency with which AI can be trained to perform diagnoses when exposed to similar data independent of context — with errors fixable by improving the quality of data supplied for learning — supports the continued development of AI diagnostics. Physicians’ reasoning has been shown to be sensitive to factors such as fatigue, sleep deprivation, interruptions, cognitive overload, noise or psycho-emotional status,10 and to be influenced by cognitive biases,17 with human error impossible to eliminate entirely and even difficult to reduce substantially.8 AI is becoming, and will continue to develop to be, a useful tool to mitigate human error and improve quality in medical practice. Yet the idea that AI is able to learn on its own and will replace physicians is a myth that needs to be deconstructed.36,37 The potential of AI in medicine can be realized only if it is designed by the collaborative human intelligence of a physician and a data scientist.38
Because human and artificial intelligences are different and complementary, it is unlikely that AI will entirely replace the physician in the resolution of clinical problems. Artificial intelligence will be among the tools available to physicians seeking to make a diagnosis, to help with reasoning, reduce diagnostic uncertainty and augment shared decision-making, which also involves other health professionals and the patient. Diagnostic uncertainty is common in medical practice.39 Artificial intelligence can enable physicians to favour one diagnostic hypothesis over another or to generate hypotheses that they had not previously considered.
The tasks facing stakeholders in the development of AI, among whom physicians will play a central and essential role, will be improving the quality and accessibility of medical data that can be used as a source of learning for AI while carefully respecting ethical considerations; being able to explain the results produced by AI to human intelligence; overcoming physicians’ resistance related to fears of being downgraded when certain diagnostic tasks no longer rely solely on their intelligence; and training medical students early on in the integration of AI tools into their diagnostic practice, which implies extracting themselves from a historical and firmly rooted posture of the physician-centred diagnostic process.40 Under these conditions, AI can assume its place as a routine tool in medical practice.
Acknowledgement
This article is partly based on a lecture given by the first author at the congress of the French National Society of Internal Medicine on June 6, 2019, and published in the congress proceedings.
Footnotes
Competing interests: None declared.
This article has been peer reviewed.
Contributors: All of the authors contributed to the conception of the manuscript, drafted the manuscript, revised it for important intellectual content, gave final approval of the version to be published and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.