I want to create new technologies that become common in the world, making a significant impact on society.
Hello, this is Akaiwa from the Corporate Planning Division. This time, I spoke with Mr. Mori and Ms. Yin about the voice corpus ""ReazonSpeech,"" which they released in January 2023. A voice corpus is a collection of paired audio and text data organized by speech units, which can be utilized, for example, in transcription functionalities. Both of them are affiliated with Reazon Holdings' research institute, the ""Human Interaction Lab,"" where they work. Even among Reazon employees, I think there are many aspects of what the ""Human Interaction Lab"" does and what they aspire to do in the future that may not be fully understood. Including the behind-the-scenes story of the birth of the voice corpus ""ReazonSpeech,"" there are plenty of interesting stories to share. I would be delighted if through this article, you could gain a better understanding of these aspects.
Graduated from Kyushu Institute of Technology, Department of Acoustic Design. Engaged in research and development of information retrieval technology at Nippon Telegraph and Telephone Corporation's Human Interface Laboratory. Developed full-text search software at Mirai Kensaku Brazil. Established the Human Interaction Lab at Reazon Holdings.
Completed the Language Technology Master's Program at Carnegie Mellon University before joining Reazon Holdings as a new graduate. Currently engaged in research on human interaction technology.
Creating an environment where I can dive into a high degree of freedom and focus on research myself.
First of all, I would like to ask you about your careers and experiences before joining Reazon. Could you tell us about your previous experiences and career paths?
Originally, I developed a large-scale full-text search system used as the core of an internet portal site at NTT research institute. At that time, the internet industry was booming, and the demand for search engines like Google was very high, so I was writing search engine programs every day.
After leaving NTT, I developed open source full-text search engine software. This software (Senna/Groonga) became a library that could update indexes instantly at high speeds. It could be embedded in various database management systems such as MySQL and PostgreSQL, so it was very user-friendly, and many systems adopted it at that time.
You seem to have been consistently involved in search engine system development for a long time.
On the other hand, I understand that Yin joined Reazon as a new graduate in 2022. What did you study in your school days?
Originally, around my high school days, automatic translation system technology had not advanced so much, and I became interested in machine translation and other things from wondering how to make something that was more accurate. So, I became interested in machine learning, statistics, natural language processing, and other subjects while studying at Carnegie Mellon University, where I wanted to participate in research on them. In addition, I also had the opportunity to participate in a research workshop called "JSALT (*1)," where I was able to engage in speech recognition research.
So you were involved in research that would lead to your current job even during your student days.
Both of you have completely different career paths, but what led you to join Reazon?
For me, the good impression I got when I was invited to the office and visited was the beginning of everything. The office was full of vigour and overflowing with people of various talents. A video of people playing the guitar at an entrance nearby was displayed on a large wall-mounted display. There was no sound, but just by watching the movements of the players, I could tell how amazingly well they played. It turned out the person was a person who aimed for jazz, graduated from a famous music school in the United States that everyone who aspired to be a jazz musician dreams of, and was now in charge of the information systems department. I learned that there are many members with interesting backgrounds gathered together, and I remember feeling that it was something interesting, different from the company I had been with before.
A place where many diverse members gather is one of Reazon's attractiveness and strengths, isn't it?
You were attracted to that culture, and that's why you decided to join Reazon. By the way, did a research-focused department like the Human Interaction Research Institute even exist before Mori joined?
When I joined Reazon, there was no R&D organization. I raised the theme of conducting research on the human interaction field that I had wanted to work on since then and set up a research institute at the same time as I joined the company. I was very happy to be able to start research in a new field by joining Reazon.
Why did you want to do research in the field of human interaction?
I have been working in the information retrieval business for a long time, and I think that "search" is one of the valuable opportunities for people to encounter better information and is a worthwhile and challenging field to pursue. However, I started to feel that the progress of search engine services has stalled across the industry. To overcome that, I thought that it was essential to be able to communicate what users are thinking smoothly, so I became interested in conducting research on the human interaction field.
Having an environment that supports the challenges of what you want to do is also one of Reazon's attractions.
Yin also had something they wanted to do and decided to join Reazon, right?
Yes. When I was job hunting, I was mainly focused on working for Japanese companies and working on jobs related to data science that I was interested in.
Originally, I liked Japanese games and content, and I had a lot of opportunities to touch Japanese culture, so I vaguely thought that I wanted to work for a Japanese company. However, in the case of new graduate recruitment by Japanese companies, there were often recruitment for general positions, and I had an impression that there was lower freedom in choosing my profession during the job-hunting period. However, Reazon had separate recruitment positions for each job type and I found out that it was relevant to my interests and was able to work on what I was genuinely interested in immediately after joining the company, so I decided to join.
*1.JSALT: A research workshop hosted by Johns Hopkins University in the summer, focusing on speech and language processing.
The Unimaginable Effort Behind the Release of "ReazonSpeech"
Both of you had a clear goal in mind and ended up at Reazon as a result of seeking an environment that would allow you to achieve it.
Have you been able to make progress on the research you wanted to do since joining Reazon?
Yes. Since joining the company, I have been able to focus on human interaction research consistently. There are various ways to do human interaction research, so we are conducting research in a wide range of fields, such as measuring brain waves and muscular activity, measuring gaze tracking, and conducting speech recognition verification. Among them, speech recognition was the first to produce results.
That speech recognition research results led to the release of the speech corpus "ReazonSpeech" the other day.
I would like to ask you more about "ReazonSpeech," but first of all, what is a speech corpus?
A speech corpus refers to a collection of speech data paired with text data at the utterance unit. It is used as a material to create speech recognition models and greatly influences the accuracy of speech recognition, depending on the size and quality of the corpus.
In "ReazonSpeech," a voice corpus is automatically extracted from One-segment broadcast recording data, and it currently has the highest quality Japanese speech recognition model and the world's largest 19,000-hour Japanese language speech corpus.
It seems that it took a lot of time, including data collection, to become the world's largest Japanese language speech corpus. Have you been researching it since joining Reazon?
I had been steadily working on data collection at the individual level even before joining the company. However, when I actually started working on it in earnest, there were many technical difficulties in obtaining high-quality corpus in large quantities. We had to have a lot of talented colleagues and couldn't have made it to the release without working on it as an organization. In addition to technical issues, we also needed to confirm that there were no legal or rights-related issues. As a result, it took several years until the release.
Behind the release of "ReazonSpeech" were many efforts that can hardly be quantified.
Yin, since joining Reazon, I think you have been researching speech recognition-related matters. Did you have any difficulties in that process?
I didn't have much trouble with research, such as understanding the issues and proposing methodologies that I did during my student days. However, unlike research at university, I learned that the scale of this corpus is overwhelmingly large, and that it can't be opened up by using existing resources alone and that support from various people is necessary.
What is important is to continue updating from now on.
As you say, corporate research requires coordination among various parties, and there are many difficult parts. However, I think that such efforts led to this release.
We were trying trial and error until the very end, so I felt like I had done everything that I should have done just before the release. But when I actually released it, I felt like I had finally reached the starting line.
Since the accuracy of speech recognition improves as the size of the corpus increases, I feel that it is necessary to update it continuously. Also, since Japanese is a language with many homophonic words that stand out even among the world languages, I think there are limits to what can be done with speech recognition. I hope to create a mechanism that solves this problem by using this release as a stepping stone.
I was happy that my achievements were being used by many people, and I felt purely happy. I hope that more and more people will use "ReazonSpeech" in the future, and I think it is important to continue updating it.
I want to create something that will become the norm that everyone can share and contribute to creating a world in which everyone can share.
You both express a desire to share the value of "ReazonSpeech" with more people by continuing to update it. I think there are other companies that are releasing voice recognition products besides Reazon, but is the existence of other companies on your mind?
Personally, my goal isn't just to create speech recognition with higher accuracy than other companies. I'm more focused on creating a foundation for sharing different voice corpus. While it's important for a company to be aware of the competitive advantage of its own services, I believe it's more crucial in Japan as a whole to uplift the foundational technologies of things. I want to change the structure where we compete for a small market domestically and risk being left behind globally.
You mentioned wanting to make it shareable to more people. Do you have plans to expand beyond Japanese to other languages in the future?
Because laws vary by country, immediate expansion might be challenging. Of course, creating an English corpus in Japan for domestic use is fine, but whether that corpus can be legally used in other countries requires verification and organization.
In the United States, there's the concept of "fair use," which has served as a legal basis for machine learning research using collected data. While Japan doesn't have a fair use law per se, a 2019 legal amendment allows information analysis like machine learning and corpus generation from collected data for both commercial and non-commercial purposes. This system is a step further than fair use in the U.S. Thanks to this amendment, we were able to release "ReazonSpeech" under a free license. Despite Japan lagging in digitalization and having a lower lever of production compared to other countries, utilizing such excellent legal frameworks to establish affordable and high-quality technological infrastructure as societal infrastructure offers ample opportunity for recovery.
Creating a world where everyone can share is a goal that requires several steps, but it's something worth achieving.
Could you also share the future prospects of the Human Interaction Laboratory?
I'm particularly interested in how to smoothly convey what people are thinking to others within human interaction. Speech recognition is one aspect, but there are various other means. Technologies like remote-operated robots expanding human activity range and collaborative robots (*2) working with people also fall within this scope. Currently, there's a shortage of talent to research all these fields comprehensively. In the future, I want to build an organization where we can increase the number of colleagues who research together and pour passion into each field.
From such research organizations, I aim to create technology that leads from 0 to 1 in innovation, contributing to Reazon's vision of being the world's best company. Additionally, apart from the research institute, I'm planning to establish an organization dedicated to promoting the practical application of highly practical AI technologies. I'm eager to welcome engineers who love AI and machine learning technologies and are more interested in practical applications than research.
*2. Collaborative Robots: Industrial robots that can work together with humans in the same space
Do you have a specific ideal candidate for the Human Interaction Lab in mind?
Someone deeply knowledgeable in cutting-edge technologies like machine learning and robotics, actively engaged in hands-on creation and programming. Active participation in open-source software activities would be even better. It's crucial that they personally experience how today's technological innovations are advancing by standing on the shoulders of giants. I hope to attract colleagues who can think together about how they can contribute to the community as leaders of open innovation to join Reazon.
I look forward to seeing technologies produced by the Human Interaction Research Institute that will have a significant impact on society.
Thank you all for gathering today despite your busy schedules!
Lastly...
Congratulations to "ReazonSpeech" for winning the Excellence Award at the 29th Annual Conference of the Association for Computational Linguistics!