2024.09.17
Read more

Training professionals who will push data science further into society

Data science brings together various disciplines — statistics and mathematics, programming, and specialized knowledge in specific research fields — to analyze all kinds of data and extract meaningful knowledge from data. Data science’s profile continues to rise due to the enormous amounts of data being generated thanks to the proliferation of the Internet and IoT devices. In concert with this, demand for experts well versed in data science is skyrocketing. 

Professor Murakami explains the appeal of data science. “I often hear from students in the Department of Applied Mathematics who want to become data scientists. They want to work in all kinds of fields, from biology and economics to medicine, pharmaceutical sciences, and sociology.” 

The course introduced in this article is Data Processing, in which students learn about data analysis, the core of data science. In the course, third-year students, who have the necessary background knowledge in mathematics, statistics, and other areas, develop their ability to identify the characteristics of a data set, learn practical statistical methodologies, and then work on data analysis assignments using statistical analysis software to gain the knowledge and skills needed to perform data analyses in the real world.

Tackling data analysis problems using statistical analysis software

Data analysis in the business world involves the use of some kind of statistical analysis software. The objective of having students use statistical analysis software in the course to work on assignments is not, of course, to learn how to use the software. Data scientists deal with many kinds of data types and data structures and require just as many methods to analyze each of these. This is why acquiring the knowledge and skills to handle all these different cases is imperative. 

Compared to the complexity of using different programming languages, using the course’s statistical analysis software is quite straightforward. For example, solving the multiple regression analysis problem given in the course last year only required a few lines of input. 

Professor Murakami explains: “Even if you know how to code various data analyses, without a solid understanding of the statistical theory on which the code is based, you may end up erroneously believing convenient results that appear by happenstance. Theoretical and applied knowledge go hand-in-hand, and you must have a firm grip on both before you can conduct effective analyses.” To correctly understand the data itself and manipulate it properly, one needs knowledge and experience fostered through systematic education.

Society needs people with the capacity to handle data correctly

Understanding the theory behind the statistical methodologies necessary for data analysis is essential to analyzing data. Furthermore, data scientists must have the ability to handle data correctly. Professor Murakami describes the reasoning. “In situations where you have a past example and successfully used a particular statistical method to analyze the data and are dealing with a similar data structure, it may be sufficient to apply the same methodology. However, when you are facing data with a completely different structure, without knowledge of theory, you’ll be stumped: You won’t know whether past methodologies will work, and you won’t know how to adapt existing methodologies.” 

There are many industries where you can find work as a data scientist, including manufacturing, financing, and IT. Each firm will naturally deal with different kinds of data and use different statistical analysis software packages and programming languages for data analysis. This is precisely why businesses are looking for people who have a firm grasp of statistical theory. Recognition of data analysis is rising all the time.

Faculty of Science Division I, Department of Applied Mathematics
Professor Hidetoshi Murakami

■ Main research themes

Professor Murakami’s primary field is statistical science and his main research areas are nonparametric statistical methods and mathematical statistics. His research focuses on the development of new test statistics and the derivation of their approximate distributions.

Recommended Initiatives