• News
  • Subscribe Now

How Juvoly built its own AI speech recognition to beat OpenAI’s Whisper

By Unknown Author|Source: Techzine Europe|Read Time: 5 mins|Share

Now, Juvoly's AI can accurately transcribe medical conversations in multiple languages, making doctors' work easier and more efficient.

How Juvoly built its own AI speech recognition to beat OpenAI’s Whisper
Representational image

Introduction to Juvoly V2

We spoke with Juvoly co-founder and CEO Thomas Kluiters to learn more. Before diving deeper into the startup’s broader solution, it’s crucial to understand the origins of Juvoly V2, their advanced speech recognition model. Juvoly V2 was specifically created to address the shortcomings of OpenAI’s Whisper when it comes to documenting Dutch conversations in the medical field. While models like GPT-4 and GPT-4o captivated global attention through ChatGPT, Whisper had been considered the benchmark for speech recognition since its launch in September 2022.

Although OpenAI claimed Whisper supported multiple languages, in reality, its handling of other language —particularly when deployed to transcribe medical terminology—proved inadequate. In this instance, we’re referring to its poor capabilities in Dutch, although Kluiters notes similar HONESTAI ANALYSISs have been drawn by speakers of other languages apart from English. The less well-represented in the training data, the worse Whisper’s outputs become.

Juvoly's Starting Point

According to Kluiters, this flaw remained largely unnoticed: “Many assume speech recognition is a solved problem, but that’s simply not true.” While Whisper can perform reasonably well in English medical contexts, it falls short elsewhere. “If you want a reliable Dutch-language model, you need developers who understand the language thoroughly,” he explains, highlighting that subtle errors are often overlooked by less specialized models. Moreover, benchmarks rarely reflect real-world AI performance accurately. Whisper, with its 3.3 billion parameters, tends to be overly “creative,” often resulting in inaccuracies or “hallucinations,” which are particularly detrimental for precisely recording doctor-patient conversations. Additionally, cloud-based speech recognition services are notoriously costly due to their per-hour pricing.

Juvoly V2 addresses these issues by simplifying the model compared to Whisper and specifically training it on Dutch medical conversations. The result is a solution that’s significantly cheaper, less prone to hallucinations, and about 10% more accurate than Whisper, while being 40 times faster. This speed enables real-time applications. Juvoly V2 also greatly reduces energy consumption, using just 350 Wh per 100 users—compared to the standard 11,000 Wh (11 kWh). A full year’s usage for a Juvoly customer equates to emitting only 200 grams of CO2, similar to driving a gasoline car for two kilometers.

Upcoming Innovations from Juvoly

In the next two weeks, Juvoly will launch an upgraded model, Juvoly V3, which promises even better performance, automatic language recognition, and speaker identification. “Everyone thinks you’re crazy if you build your own speech model,” says Kluiters. “But we did, and it’s paying off significantly now.” Unlike multimodal models like Google’s Gemini, Juvoly ensures data security by keeping patient information within Europe and predominantly within NorthC’s data centers. If customers had chosen Gemini instead, sensitive conversations might remain unencrypted in the cloud for up to 55 days for “abuse monitoring” purposes—a significant risk for healthcare providers.

Moving Away from the Cloud

Juvoly’s ultimate goal is independence from cloud services entirely. The recent acquisition of NVIDIA’s new B200 system—a significant step forward—will soon be inaugurated by Constantijn van Oranje. The “B” represents Blackwell, NVIDIA’s newest GPU architecture. Juvoly currently owns two B200 nodes, aiming for eight by year-end. Each node contains eight GPUs, meaning Juvoly plans to have 64 Blackwell GPUs operational by late 2025. They also currently utilize earlier NVIDIA architectures like H100 (Hopper) and L40S (Ada Lovelace), which remain effective, though less efficient than their younger GPU cousins.

Future Plans and Challenges

Interestingly, GPUs sometimes outpace CPUs in the tasks Juvoly sets out for them, creating bottlenecks. Kluiters notes scenarios where GPUs finish tasks in 12 milliseconds while CPUs take around 60 milliseconds, leading to precious idle time for the NVIDIA chips. Juvoly also leverages the new GPU capabilities for large language models (LLMs), forming the backbone for real-time summaries in Juvoly’s QuickConsult. Physicians can instantly track symptoms discussed during consultations, reducing reliance solely on transcripts. For post-consultation summaries, Juvoly still utilizes Azure’s GPT-4o, but during conversations, open-source models like Gemma or Llama identify and classify symptoms. The company’s objective is clear: running all workloads locally within NorthC Datacenters. Though buying hardware independently can seem daunting, Kluiters praises NorthC for making the transition straightforward. Rather than paying thousands monthly for cloud nodes, Juvoly now spends just a few hundred euros per month with dedicated hardware and ample room for growth.

NorthC's Role in Facilitating Startups

Piet Sjoukes, Director of Sales at NorthC Datacenters, elaborates on facilitating startups like Juvoly. He emphasizes continuity: “Our core service is reliability. Clients can’t afford downtime from cooling or power failures.” He humorously describes their real product as “a good night’s rest”—peace of mind for clients who rely heavily on uptime. About half of NorthC’s clientele, including Juvoly, are high-tech innovators, often pushing hardware boundaries. “They operate at the cutting edge of technology,” says Sjoukes, highlighting AI’s immense computing demands. Data centers face challenges accommodating the vastly higher power density required today, sometimes exceeding 40kW per rack, compared to the traditional 3kW. NorthC employs modular data center construction to manage these varying demands efficiently, blending traditional and advanced cooling methods like immersion, on-chip cooling, and hot aisle containment.

HONESTAI ANALYSIS

As healthcare increasingly seeks efficiency amid resource constraints, technology providers like Juvoly become invaluable partners. Juvoly demonstrates how innovative software paired with efficient, powerful hardware can significantly enhance healthcare delivery. While software drives meaningful improvements, it remains dependent on robust, energy-efficient infrastructure. Clear communication and collaborative planning between startups and data centers like NorthC prove essential to achieving sustained growth and innovation. Ultimately, Juvoly’s approach highlights the value of targeted innovation—efficiently serving specific niches, like Dutch medical professionals, with tailored solutions. This careful integration of technology, infrastructure, and human-centered design promises substantial benefits for both doctors and patients.


By entering your email you agree to our terms & conditions and privacy policy. You will be getting daily AI news in your inbox at 7 am your time to keep you ahead of the curve. Don't worry you can always unsubscribe.