22-8-2023 (NEW YORK) Facebook’s parent company, Meta Platforms, has introduced a powerful AI model, known as SeamlessM4T, capable of translating and transcribing speech across dozens of languages. This technology represents a significant step towards enabling real-time communication across language barriers.
According to a blog post by Meta Platforms, the SeamlessM4T model can facilitate translations between text and speech in nearly 100 languages. Furthermore, it offers full speech-to-speech translation for 35 languages, a fusion of capabilities previously available only through separate models.
Meta CEO Mark Zuckerberg has expressed his vision of these tools serving as a bridge for interactions among users worldwide within the metaverse. The metaverse is a network of interconnected virtual worlds upon which Meta is betting its future.
Meta Platforms has made this model accessible to the public for non-commercial purposes.
Throughout this year, the world’s largest social media company has introduced a series of mostly free AI models. This includes the release of Llama, a substantial language model that poses a significant challenge to proprietary models offered by Microsoft-backed OpenAI and Alphabet’s Google.
Zuckerberg asserts that an open AI ecosystem benefits Meta, as the company gains more from effectively crowdsourcing the development of consumer-oriented tools for its social platforms than from charging for model access.
However, Meta faces similar legal challenges as the rest of the industry concerning the training data used to create its models.
In July, comedian Sarah Silverman and two other authors filed copyright infringement lawsuits against both Meta and OpenAI, alleging the unauthorized use of their books as training data.
Regarding the SeamlessM4T model, Meta researchers explained in a research paper that they collected audio training data from 4 million hours of “raw audio originating from a publicly available repository of crawled web data,” without specifying the exact repository’s source.
A Meta spokesperson did not respond to inquiries regarding the origin of the audio data. The research paper detailed that text data was sourced from datasets created the previous year, which extracted content from Wikipedia and related websites.