Описание проекта
Ayta.ai addresses the issue of speech barriers for people who stutter, enabling them to communicate more freely during online calls on platforms like Zoom, Google Meet, and others.
Our solution is based on the ability of many individuals who stutter to speak fluently when whispering: the system captures the user's whisper and instantly converts it into regular speech while preserving individual vocal characteristics. The conversion delay does not exceed one second, ensuring real-time naturalness and eliminating discomfort for conversation partners.
Thanks to Ayta.ai, people who stutter can confidently participate in work meetings, educational events, and informal gatherings without hesitation. The project enhances quality of life and removes social barriers, allowing users to express their thoughts clearly and confidently.
Our solution integrates seamlessly into existing video conferencing platforms, simplifying implementation and scaling.
Технологии, использованные в проекте:
Real-Time ASR (Automatic speech recognition): Based on the HuBERT architecture for accurate and fast whisper recognition, ensuring near-instant response.
Real-Time TTS (Text-to-Speech): Utilizes StyleTTS2 for generating realistic, emotionally rich speech in real-time.
Voice cloning: The ECAPA-TDNN model preserves the unique vocal characteristics of each user, allowing their voice to sound natural and recognizable after converting whisper to normal speech.
Presented the developed models at two leading international A-level conferences* in the field of speech technologies:
Interspeech 2024 (http://dx.doi.org/10.21437/Interspeech.2024-2091)
ICASP 2024 (http://dx.doi.org/10.1109/OJSP.2023.3343342)
Стадия проекта
Working solution
Рынки и сферы применения
Education: Supporting students with speech impairments in distance learning.
Corporate sector: Promoting inclusivity in workplaces and participation in online conferences.
Healthcare: Rehabilitation and support for people with stuttering and other speech disorders.
Social platforms: Removing barriers in communication through voice chats.
Cybersecurity: Biometric authentication by voice for users with speech impairments.
Ключевые достижения
The Ayta.ai team has developed a technology for converting whisper into fully articulated speech with high similarity to the user's original voice. This solution significantly reduces communication barriers for people who stutter.
We have minimized the delay between the end of a spoken phoneme and the playback of the resulting speech to 800 milliseconds, ensuring comfortable real-time interaction.
A key achievement is that the program has received positive feedback from individuals with severe stuttering, for whom the solution has proven to be not only a practical tool but also a therapeutic aid.
Access to this technology enables users to confidently participate in business meetings, educational events, and public platforms, thereby expanding their social and professional opportunities.
The developed system integrates seamlessly with leading video conferencing platforms, ensuring ease of connection and extensive scalability.
Измеримые результаты
During project testing, we received over 85 responses from users with stuttering who praised the convenience and effectiveness of the solution. Their feedback indicates that the technology not only facilitates communication but also positively impacts emotional well-being by reducing anxiety and insecurity in conversations.
The high degree of similarity to the user's voice makes the communication process natural, encouraging more active participation of people with stuttering in professional and social events.
In the long term, this can lead to economic benefits by expanding the client and partner base, as well as social significance by improving quality of life, increasing engagement, and enhancing employment opportunities for people with stuttering.
Уникальность проекта
Our project is unique because, unlike competitors, we apply a holistic approach based on neural networks (HuBERT, StyleTTS2, ECAPA-TDNN), ensuring high sound quality and solution flexibility.
The market features only one similar product from the Netherlands (https://whispp.com), which uses classical digital signal processing that does not provide a comparable level of quality and naturalness of speech.
Thanks to our developments in voice recognition, synthesis, and cloning, Ayta.ai delivers extremely low latency while preserving the user's unique vocal characteristics and achieving a "live" sound.
This comprehensive neural network approach makes our solution more accurate, adaptive, and comfortable for real-world application.
Планы на будущее
We plan to further optimize operational speed, reducing latency to 300–400 milliseconds to enhance communication comfort even more.
To expand our audience, we will collaborate with medical centers, speech therapists, and public organizations, assisting people with stuttering and other speech disorders (e.g., dysphonia or post-surgery throat conditions).
Additionally, we are developing a mobile version of our solution that can be used for both video calls and regular cellular calls. We believe this will attract a new audience and provide a convenient alternative to traditional speech rehabilitation methods.
Beyond helping people with stuttering, we see potential applications in telemedicine and remote rehabilitation, contributing to improved quality of life for individuals with various speech disorders.
Партнеры или инвесторы
Arsen Tomsky - Initiator and Investor of Ayta AI, CEO, Founder of inDrive