Challenges Posed by Big Data on the Foundations of Scientific Thinking

The 16th Berggruen seminar, Challenges Posed by Big Data on the Foundations of Scientific Thinking, was held at OWSpace in Beijing on September 28, 2021. Around 60 participants attended the event in person, while the number of viewers watching the livestream via Bilibili peaked at 14,000.

The event was moderated by Bai Shunong, 2020-2021 Berggruen Fellow and Professor at the School of Life Sciences at Peking University. The keynote presentation was made by Dr Wu Jiarui, researcher at the Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences. Dr. Wu is also the current Executive Dean of the School of Life and Health Sciences at the Institute for Advanced Study, University of Chinese Academy of Sciences (UCAS) Hangzhou and the Director of the Key Laboratory of Systems Biology at the Chinese Academy of Sciences. Throughout his decades-long academic career, he has focused on how to combine the fruits of frontier research with philosophical inquiry. His research attempts to understand life and the world amidst the mutual inspiration and interrogation engendered by collisions between science and philosophy.

In his view, science is founded on humankind’s pursuit of certainty. Its goal is to understand, explain, and control the physical world. This pursuit of certainty is primarily expressed as the exploration of causal relationships behind the occurrence and development of events through research. Its foundational thinking is built upon determinism, whereas the arrival of the big data era has posed huge challenges for determinism: all things are interconnected and mutually interacting, forming an open and borderless “Internet of Things.” Relationships among things, therefore, are correlative rather than causal. Causality is but a scientific illusion of the “small data era.” What big data brings is not the comprehensive and certain truth of the classical scientific era, but rather incomplete knowledge that can be repeated, or “iterated.”

Determinism: Foundations of Scientific Thinking
Pierre-Simon Laplace, the great 18th century French mathematician, once said: “We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at any given moment knew all of the forces that animate nature and the mutual positions of the beings that compose it, if this intellect were vast enough to submit the data to analysis…the future, just like the past, would be present before its eyes.” Dr. Wu believes that this determinism constitutes the foundational thinking underlying conventional scientific research.

At the ontological level, this form of determinism is reflected as a form of reductionism. As Erwin Schrödinger pointed out in his 1944 treatise What is Life?, “the events that happen within [an organism] must obey strict physical laws.” This inspired a huge number of physicists and chemists who realized that life is not mysterious; its foundations were merely the molecules of genes and proteins. Physics and chemistry could similarly enter the science of life. As a result, the field of molecular biology emerged in the 1950s and quickly became a highly productive and prominent frontier approach to biological research. Protein structures and models were explained through chemistry — the mechanisms whereby viruses infected cells were subjected to biochemical analysis. Discovering these reasons and mechanisms for physiological and pathological workings became the most important research direction in the life sciences, based on a reductionist, deterministic understanding of life.

At the epistemological level, the ultimate goal of modern biological research driven by determinism — as Francis Crick put it — was to use physics and chemistry to explain all biological phenomena. This turned biology, traditionally a descriptive science, into an experimental science driven by hypotheses. A closely related outcome was that reductionism assumed a dominant position — in other words, complex life systems could be understood by deconstructing them into parts that were then individually analyzed. Put simply, life throughout most of the twentieth century was like a big machine. Once we knew how each part worked, life would no longer be mysterious and unknowable. The central task of the biologist was to propose hypotheses that would be tested through well-designed physical and chemical experimentation.

However, traditional molecular biology’s conceptions of life were challenged in the late 20th century by the emergence of life sciences research based on big data.

Big Data: Open Science
Karl Popper once said that all scientific propositions must be falsifiable; unfalsifiable theories cannot become scientific theories. Imre Lakatos later amended this theory and introduced the concept of “research programmes.” Molecular biology adhered to this principle. Guided by its research program, it discovered and first proposed a set of hardcore principles (for example, the central principles behind the transcription of DNA into RNA and the translation of RNA into proteins) that could not be empirically refuted, as well as tested, modified, and updated. This is a characteristic of life sciences methodology that is based on reductionism.

However, the Human Genome Project, which began in the 1990s, used another method. The researchers hoped to use big data to thoroughly understand the human genome, i.e. the genetic components of 46 chromosomes, to decipher the mysteries of 3 billion base pairs. In 2001, the project outlined a draft sketch of the human genome that was roughly 95% complete, a milestone in life sciences research. Even today, the project has yet to be fully completed. Limited by our understanding of chromosomes, research may perhaps continue perpetually and continually iterate and improve, such that knowledge will continue to advance. Big-data-based “iterative” methodology differs from traditional forms of research. It does not propose a core framework or principle, but instead relies on data discoveries to constantly update core perspectives, grasp the “continuity” of knowledge, and look towards open and unpredictable outcomes.

When life sciences shifted from reductionism to “iterative” methodologies, they completed the process Thomas Kuhn called “paradigm shifting,” wherein hypothesis-driven research paradigms move toward data-driven methodologies. The “Big Data+” research model is revamping our understanding of various disease relationships and pathologies — and driving significant advances in the field.

Big Data: A Correlative World
Conventional biological research founded upon determinism is often geared towards seeking the molecular causes of pathological activity. However, in many circumstances, we can at best discover a sufficient condition for a physiological or pathological event. The pursuit of causal relationships that are certain, sufficient, and necessary is extremely challenging.

Meanwhile, contemporary life sciences research based on big data no longer solely seeks specific causal relationships, but rather, looks for more precise and broader descriptive correlations. One example would be how diet intervenes with the occurrence and development of disease: those who consume fruits daily respectively have a reduced chance of dying from cardiovascular disease or coronary heart disease by 40% and 34%, respectively, compared to those who do not.

Views of life based on big data subscribe to a form of gridded causal inference. Each of our 20,000-odd genes may more or less contribute to the formation of a disease. Every part of an organism participates in life processes holistically. Under this interpretation, correlation can be significantly extended, but causality is immensely difficult to pin down. Each of us is like a living network where all things are interconnected, while life is a complex, layered system constituted by genes, proteins, cells, organs, and, finally, the organism. 

Dr. Wu argues that the biggest challenge faced by the life sciences comes from the conflict between researchers’ deterministic thinking and the daily occurrences of life. Humankind continues to hope for definite answers, causal relationships, and the interpretability of the world. But in reality, life is an open and complex system. There is no creator that mapped out each step in the evolution of life on earth four billion years in advance. Evolutionary processes are, for the most part, uncertain. Put differently, happenstance is the true driver of life. This view happens to coincide with the ideas that Professor Bai Shunong has espoused and advocated in the column Baihua.

After the end of the keynote presentation, Professor Bai expanded upon Dr. Wu’s views and suggested that researchers should perhaps closely consider the origins of our preference for “certainty,” free themselves from a narrow view of life, and truly attempt to understand the physical and intellectual world from the perspective of randomness.

In the future, the Berggruen Research Center at Peking University will continue to explore scientific and philosophical conceptions of “life” in an era of change. We hope to leave behind one-sided interpretations and instead seek “uncertain” answers within a broader natural world to achieve the iteration of conceptualizations and renewal of ideas in the era of big data.

Original article in Chinese by Li Zhilin
Trasnlated: Intern, Berggruen Research Center at Peking University
Additional editing: Christopher Eldred and Sarah Gilman