Exclusive Interview with Professor Härdle at Fudan University

You can read the full transcript below:

1. About Machine Learning

Q1: First of all, it is a great honor to invite Professor Härdle to Fudan again. Looking back at your previous courses on machine learning and financial technology, what is the core content you hope students will learn from them?

Härdle : Thank you very much. In fact, fintech is essentially a data problem with very high dimensions and a large number of features, which requires researchers to devote their energy to intensive data analysis. For this reason, we need to use machine learning or more broadly, we need to use a quantitative method for non-parametric approximation of very high-dimensional objects to discuss fintech problems. The core content of our course is to convey the applicability of such statistical techniques, especially to clarify which areas in modern data science are more suitable for researchers to apply.

Q2: As an expert in semiparametric and nonparametric estimation methods, what do you think of machine learning and deep learning algorithms? Why did these methods not achieve the same success in applications in the 80s and 90s as they do today?

Härdle : The research methods of machine learning and deep learning have actually been around for a long time, but people didn’t call them that in the past. Just like everyone is talking about electric cars now, but electric cars were already produced in Berlin, Germany about 100 years ago. The same is true for machine learning and deep learning techniques. For example, the neural network method in machine learning uses logistic regression and gradient descent, and logistic regression, as a member of the exponential family, has been introduced into the corresponding research very early. As you mentioned, machine learning and deep learning were not unsuccessful in the 80s and 90s. Statistical techniques were also well developed at that time, but now we have more data than in the past and get better results than before.

Q3: You emphasize in class that machine learning methods should be described as “algorithms” rather than “models.” What do you think is the difference between a “machine learning algorithm” and a “statistical model”?

Härdle : Algorithms and models are different. Models are theories based on certain assumptions. In models, it is necessary to consider how variables on both sides of the equation interact, whether there are confounding variables in the model, or whether instrumental variables are needed. In contrast, machine learning algorithms use mathematical methods to calibrate researchers’ alternative models or create models from data.

As for what kind of problems machine learning is suitable for solving, I think those problems that have high complexity and flexibility. For example, Newton’s theorem explains the fall of apples, and the gravity is behind it. Such equations are relatively easy to calibrate. But when we talk about social systems, it is much more complicated, such as the design of government policies on taxation or real estate. This may be where machine learning methods can play a useful role.

Q4: What are the application prospects of machine learning in economic research? How can it help economists solve problems in a better way?

Härdle : There are countless possibilities for the application of machine learning in economics. If machine learning methods can be effectively applied, there is an opportunity to provide key statistical references for government economic decision-making. However, the first problem for economists is to create models. We need to conduct in-depth data analysis to create models. Machine learning, as a model creation tool, can help with this.

Q5: With the development of data analysis technology represented by machine learning, the research paradigms of many disciplines have also changed. In addition to the integration with traditional science and engineering, computational social science, which combines big data technology with traditional social science, has also gradually emerged. What do you think about this?

Härdle : This question is highly related to the previous one. Let me reiterate that machine learning is a non-parametric or semi-parametric mathematical tool, but it has a specific name. The basic issues that scholars are concerned about remain unchanged, but machine learning provides us with more insights into data analysis methods, such as which feature variable has the best load and which feature variable carries the most explanatory information. Current hot topics such as Shapley Values and Explainable AI all need further research.

2. About the Big Data Era

Q6: What is the core content you hope students will learn from your course “Smart Data Analytics”?

Härdle : The course is called “Smart Data Analysis” because I want students to use their own judgment, combined with the statistical methods they have mastered, and analytical thinking to analyze. I want students to be familiar with data codes, economic real-world problems (such as scoring, or time series forecasting, etc.), and the relationship between the two. With this knowledge, students can conduct data analysis independently in their future careers.

Q7: Some people may think that all economic phenomena can be described by data. How do you view the impact of big data on econometrics? And in the era of big data, how do you view the relationship between statistics and econometrics?

Härdle : In my opinion, the biggest challenge facing economics is actually the heterogeneity of us humans in time and space. Unlike Newton’s apple in physics or other natural science experiments, the observational data in economics will never remain independent and identically distributed in time and space. I can’t imagine that my grandmother has the same preferences when buying bread, milk and butter every day. Therefore, since the independent and identically distributed world cannot be observed, microeconomic theory cannot be essentially proved by data analysis techniques. Of course, if we turn to group behavior preferences, some mass phenomena can be identified through information such as emotions or intonation expressed in social networks. The same mathematical techniques are used in the issue of news fraud and analyzing text emotions in news streams. Mass phenomena like these can be created and identified through machine learning techniques such as Markov chains. In my course at Fudan, I also introduced the case of using machine learning methods to analyze fake news. For example, the previous analysis of the Twitter stream of former US President Trump was tested using Markov chains.

Q8: With the widespread application of big data in the financial field, what changes will occur in the era of financial technology? What are the shortcomings of traditional financial statistics? What responses are needed for risk control in the era of big data?

Härdle : This is a very important issue in future practice. As the data pool of FinTech continues to expand, the first problem that arises is lag. If we only consider the process of collecting data, the problem of lag is not serious; but the more important issue is to process the data in an acceptable (short) time. Obtaining data and processing it in a short time has now become an interaction problem between IT practitioners and customers. Today’s mobile customers include a large group of young people who are impatient and want to get answers immediately after pressing a button. The development of Chinese FinTech to the extent that it can effectively provide instant feedback is admirable. Therefore, we need well-educated data engineers who can effectively integrate computer science, statistics, economics and modern data science. In fact, this is exactly what we hope to do in this course at Fudan: to train future data engineers.

Q9: As you said, FinTech is changing our world. In your opinion, what opportunities and challenges will it bring to the construction of the social credit system, ecological development, and urban innovation? And how should we respond?

Härdle : I completely agree that FinTech is changing the world. In fact, FinTech is also challenging our moral system. Each country or region has its own data protection rules, and may also have data protection concepts and technical solutions. In essence, the development model of FinTech is different in different parts of the world. Therefore, the change of the times brought by FinTech is indeed a global challenge. But although the situation in each country is different, we need to look at this issue in the long run. Perhaps a certain type of “social group” can control or even expand the data pool, which will not only help its own members, but also benefit the interests of members in neighboring countries. So FinTech will indeed change the way we provide financial services in the future, but it will challenge our moral system even more. There are still many possibilities for the future development of FinTech.

Q10: What do you think of the development of financial technology in China?

Härdle : China is very advanced in this regard. To be more specific, we visited the WeBank Technology Center in Shenzhen a few years ago. We were very impressed by WeBank’s technical level, customer understanding, and application of technical tools. If other countries are willing to copy China’s successful experience in this regard, there will be a bright future for all countries in the world. For example, social network-based scoring or the use of digital payment tools can be easily done on mobile devices in China, but in other countries this is very complicated. I admire China’s excellent system.

Q11: The 2021 Nobel Prize in Economics was awarded for “empirical research based on causal inference methods.” Could you explain what causal inference is and what its contributions are? Additionally, how does this empirical research method impact our examination of today’s complex global economic phenomena?

Härdle : Thank you, this is a very up-to-date question. Many scholars, including me, have studied the problem of causal inference in economics in the 1980s and 1990s. Oliver Linton of Cambridge University in the UK and Professor Yang Lijian of Tsinghua University and I wrote many papers on local linear models in the 1990s. There are a lot of studies on this wonderful method in economics, combining the complex non-parametric part of the model with the low-parameter representation. So technically, causal inference has existed since the 1930s and 1940s. But as the geometric mathematician Peter Scholze said, every 30 or 40 years, some old theories will be restated and extended. The basic ideas of those 19th-century mathematicians and statisticians are still driving the development of science in the 21st century.

3. Personal research and future recommendations

Q12: How do you define and recognize your identity? Is it mainly as a statistician, mathematician, or economist? What prompted you to switch from theoretical statistics to financial statistics research, and now become an expert in machine learning? What is the biggest challenge in your career transition? Do you have any advice for our students and young scholars who are just starting out?

Härdle : My academic identity depends on the context of course, but also on the career stage or platform I am in. At the beginning of my academic career, my academic identity was actually a theoretical mathematician. I wanted to write my doctoral thesis in the field of number theory or algebraic theory, but my first job was actually in biostatistics. I needed to perform statistical analysis on the EEG of young children and classify their behavior using frequency analysis techniques such as spectral analysis. In this way, I became proficient in some statistical software and learned about mathematical tools such as Fast Fourier Transforms… all of which I had never heard of before. My next job was to work at a mathematical economics institute. The institute wanted to conduct a series of studies in the field of mathematical economics, but lacked scholars familiar with data analysis. So I was hired as a data scientist at the institute, and I also learned that logistic regression or binary response models in biostatistics have corresponding names in economics terms (discrete choice models). Both use the same maximum likelihood method for estimation, and at their core, they are based on mathematics. There is a famous quote: “Mathematics is both the queen and the servant of science.” (This quote is from Eric Temple Bell’s 1951 book) I think you should look at this issue this way.

For data scientists, it is not difficult to switch from one research area to another. When I was working at the Center for Economic Risk Research at Humboldt University in Berlin, we were studying neural data for economic analysis, using the so-called functional magnetic resonance imaging (fMRI) to deal with the risk-perceived investment decision (RPID) task. The question here is: How does the investor’s brain process signals? For example, when you see the Yahoo Finance curve on your browser, will you buy Yahoo shares? Will you take investment risks? For me, my research direction did not actually change. A banker in Berlin asked me for a consultation and wanted us to analyze dynamic implied volatility, etc. This may be the opportunity for me to become a fintech researcher. I am often invited to talk about data science in finance. In fact, I was just looking for the truth: looking for the structural properties of data, which made me a data scientist.

Q13: The courseware you use comes from Quantinar, a website founded by your team. The website has rich learning resources. Can you tell us about the creation of this website, why you created this open course knowledge platform, and the subject areas of the knowledge points on the platform?

Härdle : We have been working with Springer for 20 to 30 years. For 20 years, all the books we have published with the publisher have a Quantlet logo. If you have to rewrite the code every time you encounter a Black-Scholes equation or an integral equation, it is not only troublesome but also prone to errors.The purpose of our website Quantlet is to provide a transparent, vertical knowledge space, which not only contains courseware handouts, but also contains the book materials of our team in cooperation with Springer. The book provides a link to a web page, where the corresponding program code is hosted, which is the structure of the Quantlet website. More than 20 years ago, we called this website Explore, hoping to express the meaning of “exploratory regression analysis”. The word Quantinar is the abbreviation of Quantitative Seminar. The Quantinar website was developed during the COVID-19 pandemic, but the basic idea of ​​the website already existed in the earlier Explore website. This idea is called “Autopilot Support System” (APSS). In fact, all the books we have collaborated with Springer have been converted to our website. There, users can browse the code and reproduce the code examples through the Java client. Quantinar was launched by my doctoral student Raul Bâg and Professor Cathy Yi-Huan Chen of the University of Glasgow. Our team has developed some core courses on the website, one of which is similar to the course I teach at Fudan University this time, called Digital Economy Decision Analytics (DEDA). With Quantinar, users can combine short courses very flexibly to build their own course system. Right now, the website is still in the introduction stage, so we will not charge for knowledge, but provide it to learners free of charge. But we plan to put the website on the blockchain in the future. If users upload learning materials or write code, they can get tokens, and downloading knowledge materials will require tokens. For future blockchain deployment, our website is fully mature. We also upload other courses based on this website, such as Clustering, Statistics of Financial Markets, or topics related to Financial Econometrics, among others. The short courses within Quantinar can cover all the interesting content in modern digital finance and fintech.

Q14: How do you view the relationship between theory and empirical analysis in economics? Few studies can combine theoretical analysis with empirical testing well. What kind of research do you think is good research?

Härdle : When I started my academic career, there was an important research topic called “learning from data”. That is, you have to use the characteristics of the data itself to make judgments and infer the appropriate model from the data characteristics. At the same time, the model can also tell researchers what data should be collected and where the limitations of data analysis are. Of course, we cannot specify whether good research is more theoretical or practical. In my opinion, research is a constantly changing job because data is constantly changing. Not only is the data itself not stationary, but the model itself will also be improved. Therefore, the underlying concept of research is the interactive balance between “what you see” and “how to interpret it”, which can also be called the “competition” between “data” and “model”. In research, we must always maintain a strong balance between these two research elements, both of which will drive academic research and social development forward.

Q15: You have trained dozens of outstanding doctoral students during your teaching career, and have also established stable academic cooperation relationships with many young talents from China. What kind of scientific research quality do you appreciate most in your students, and what kind of peers are you more willing to cooperate with?

Härdle : OK, let me first briefly review my academic path and research ideas. I was lucky to be involved in the research topic of nonparametric smoothing or nonparametric regression, which has great potential in both model analysis and data use. At the age of 35, I was very lucky to be appointed as a full-time professor in Belgium, and then to teach at Humboldt University in Berlin. I was also appointed as a professor at Stanford University and the University of North Carolina at Chapel Hill, all of which focused on the interaction between data and models. The first research occasion where I encountered Chinese scholars was the Collaborative Research Center of Quantification and Simulation of Economic Processes. This research center aims to study the economic synergy and convergence between East and West Germany, as well as the development changes in Europe over the past decades. I was fortunate to lead a team to study the issue of “economic risk”. It is undoubtedly helpful for Chinese students to be exposed to a large number of English-speaking scholars from the United States. The following is my advice only to young Chinese scholars, especially to Chinese teachers who teach these younger generations: use more English teaching and discussion methods. Only in this way can we create an academic atmosphere that is completely different from domestic teaching and have a more interactive spirit.

Q16: For those students who are interested in doing econometrics and machine learning research, can students with a background in mathematics or computer science make more progress than those with a background in economics? So, for students with a background in economics, if they are interested in doing econometrics and machine learning research, what should they do to improve faster?

Härdle : This is an interesting question, but it is necessary to discuss it separately according to the different training programs in different countries. Take the Netherlands as an example. They have a great applied econometrics program, and almost every econometrician there can be called a computer scientist or mathematician. There are such programs in the United States as well, depending on which university you are in. But I believe you are more interested in the situation in China. It is difficult to predict future developments, but there will always be a tendency for an institution to be more theoretical or more practical. We need to reflect and control such questions: What kind of students do we want to train? Are their careers successful? Before making clear suggestions and answers to your questions, teaching institutions must track students’ careers a little. As for my course, I can say that everyone likes it very much (at least from what I heard at lunch) because it fills a gap in the field of knowledge to some extent. As for what this gap is, I leave it to you to summarize it.

Q17: You attach great importance to the interaction with students in class, and you have been trying to use a lively and interesting teaching style to keep students focused and keep up with your pace. Do you think the interaction of students in class is important? Should most classroom teaching require students to actively participate in discussions?

Härdle : I think that learning has to be fun, and fun can come from jokes and so on, as you said. Through fun, you can get past certain difficulties without having to get too dry and persistent about these difficult problems. You can always come back to the main topic. This means that what I do is not just interactive teaching. I kind of jump forward a big step, then jump back, and then jump back and forth, so that you don’t just see the linear structure of the course handouts and programming, but you can see the whole picture. You not only have a “deep” understanding of the knowledge, but also a “broad” view of the subject we are teaching. This is why I like to have such interactions, to make sure that everyone is on the same level. I strongly believe that working in a team and producing collaborative results is the way to achieve excellence.

4. Easter Eggs

Q18: You have visited China many times and have a deep curiosity and love for Chinese culture, cuisine, and more. Could you share some interesting experiences you’ve had in China? What is the source of your love for China?

Härdle : That’s a good question. I can’t really say when I started to like Chinese culture and Chinese food. I come from a small city in the Black Forest region of Germany, and there are no Chinese restaurants there. I went to a Chinese restaurant for the first time when I was about 19 years old. Since then, I have liked Chinese food more and more. My Chinese friends and I cook Chinese food together, and I also cook Chinese food such as Peking duck at home. I have all the necessary kitchen utensils, including Chinese bowls and chopsticks. From time to time, we change up the taste and make some Chinese dishes. What I like about China is that it always shows great resilience despite the objective challenges that are everywhere and at all times. China has a strong interest in promoting scientific and social progress. Although I sometimes get bored of following all the administrative procedures, there are still many enthusiastic people who offer help in the end. I am very happy to come here often and hope to come again next year.

Q19: Among Chinese delicacies, you particularly like lamb, and even think that you can “eat lamb three times a day.” Why is that?

Härdle : When it comes to lamb, it’s like asking why you prefer the sun at 7 o’clock in the morning to the sun at 8 o’clock. I’m sorry, I can’t answer that, it’s a matter of personal taste.