Launch of Babel Fish AI: Chrome Extension for Voice Transcription and Translation

Babel Fish AI, a personal project, is an innovative Chrome extension that turns voice into text with exceptional accuracy, while offering an optional automatic translation feature. Designed to be reliable and ad-free, it provides high-quality voice transcription via OpenAI’s Whisper API. I created this extension partly to address a personal need: to simplify communication with AIs by dictating my requests. You can use it exclusively for transcription or enable translation to facilitate multilingual communication. It was also the perfect opportunity to test Roo Code, an AI-assisted development tool.

1. Overview

Babel Fish AI is a personal project, developed with the help of AIs such as Claude, Gemini and OpenAI via Roo Code. Thanks to its intuitive interface and customizable configuration, Babel Fish AI provides fast and accurate transcription, with an immediate translation option to overcome language barriers.

2. Features and Options

Advanced Voice Transcription:
- High-quality audio capture via the microphone.
- Transcription performed by OpenAI’s Whisper API, with automatic insertion into the active field or display in a dialog window.
Optional Automatic Translation:
- Fast and faithful translation of the transcribed text, activatable as needed.
- Option to use the extension solely for transcription.
Full Multilingual Support: Babel Fish AI processes voice input and displays text in various languages, broadening its international use.
Supported languages: 🇸🇦 Arabic, 🇩🇪 German, 🇺🇸 English, 🇪🇸 Spanish, 🇫🇷 French, 🇮🇳 Hindi, 🇮🇹 Italian, 🇯🇵 Japanese, 🇰🇷 Korean, 🇳🇱 Dutch, 🇵🇱 Polish, 🇵🇹 Portuguese, 🇷🇴 Romanian, 🇸🇪 Swedish, 🇨🇳 Chinese.
Intuitive and Customizable Interface:
- Choice between inserting text into the active input area or showing it in a dialog window.
- Customizable status banner (color, opacity, duration) for immediate visual feedback.
Advanced Options:
- Expert mode offering detailed configuration (API URLs, per-domain parameters, etc.).
- Extension settings accessible via a user-friendly options page.

3. Behind the Scenes of Development: My Journey with AI and Roo Code

Babel Fish AI is not just a Chrome extension; it is also the result of an exciting experiment with artificial intelligence and an innovative tool: Roo Code.

Let me tell you how I turned an idea into reality, relying on the power of several AIs and a unique development environment.

3.1. Roo Code: My AI Coding Companion

Roo Code is much more than a simple VS Code extension. It is a true autonomous coding agent that assists you in your work. Imagine an AI co-pilot capable of:

Communicating in natural language.
Reading and writing files directly in your workspace.
Executing terminal commands.
Integrating with various APIs (OpenAI, Google Gemini, etc.).
Adapting its “personality” through “custom modes”.

To develop Babel Fish AI, I used Roo Code like a conductor, guiding different AIs to collaborate on the extension’s creation.

3.2. A Cast of AIs for a Unique Project

I called on a real panel of AIs, each with its strengths and weaknesses:

o3-mini, Claude 3.5 Sonnet, Gemini (exp 1206, 1.5 pro exp 0827, Flash Think), GPT-4o: These models were my main collaborators, each contributing their part to the project.
Gemini (free models): Their being free was a major advantage, even if I had to juggle quotas (fortunately, Roo Code handles repeated attempts in case of failure!).

3.3. Overcoming Obstacles: A Journey of Challenges (and Solutions!)

Development was not a smooth ride. I encountered several challenges, notably:

Microphone Access and Permissions: At first, I couldn’t get permission to access the microphone! The AIs insisted on adding a “microphone” permission that doesn’t exist in Chrome’s Manifest V3. I had to dive into the official documentation and provide it to Claude 3.5 Sonnet so he could find the solution.
Secure Storage of the API Key: Another headache: how to securely store and retrieve the OpenAI API key in Chrome? I provided the official documentation to Gemini, which managed to propose the appropriate code to interact with Chrome storage.

3.4. Collaboration and Experimentation: The Key to Success

I used many different AIs for options and localizations, and I wanted to test Roo Code.

Beyond these specific challenges, I was able to:

Test Roo Code in real conditions: Developing Babel Fish AI was also an opportunity to put this promising tool to the test.
Generate the icon and background image: Thanks to DALL·E 3 (integrated with OpenAI), I was able to create a unique visual identity for the extension.
Customize the Status Banner: I used GIMP to extract colors from the image, then asked the AI to code the options part allowing configuration of the banner colors.
Translate the Interface into Multiple Languages: I leveraged AI via Roo Code to generate the localization files (_locales) and translate the options into many languages, a huge time saver!
Creation of dropdown menus: the AI generated everything, a massive time saver.
Handle the HTML rendering of the options: in collaboration with the AI.

3.5 Difficulties encountered

Problem injecting text into an iframe for Google Chat: An exception was added, and I left a default pop-up.
Expert Mode and permissions: To allow changing the API URLs, I had to authorize all hosts in the manifest. This seems to cause issues with Chrome’s enhanced browsing, and I don’t see a solution for now (if anyone has an idea, I’m all ears!).

4. Why Babel Fish AI? An Extension Born from a Need

Before diving into technical details, it’s important to understand why I created Babel Fish AI. It all started from a personal need:

Tired of typing long prompts: I wanted to be able to communicate with AIs more naturally, by speaking directly to them.
Poor transcriptions: The transcription extensions I had tested were disappointing.
Desire for transparency: I wanted an open source, ad-free extension whose code I controlled.

Babel Fish AI was born from the desire to create a tool that met my own needs while being useful to the community.

5. A Closer Look at the Code and Technical Details

Babel Fish AI is built on a typical Chrome extension architecture, with some specifics related to the use of OpenAI’s APIs.

Here is an overview of the main components:

5.1. General Architecture

The extension is composed of several JavaScript files that interact with each other:

manifest.json: The main configuration file of the extension. It defines permissions, scripts, accessible resources, etc.
background.js: The service worker that runs in the background. It handles events (click on the icon, keyboard shortcuts), injects the content script if necessary, and communicates with the content script.
content.js: The script injected into web pages. It interacts directly with the DOM, captures microphone audio, calls the transcription and translation APIs, and displays results.
src/utils/api.js: Contains functions to call OpenAI’s Whisper API (transcription).
src/utils/translation.js: Contains functions to call OpenAI’s GPT API (translation).
src/utils/ui.js: Contains utility functions for managing the user interface (banner, dialog box, copy button).
src/constants.js: Defines constants for configuration, states, actions, etc.
src/pages/options/: Contains files for the extension’s options page (HTML, CSS, JavaScript).

5.2. `manifest.json`

The file manifest.json uses version 3 of the manifest. It declares the following permissions:

activeTab : To access the active tab.
storage: To store the API key and extension options.
commands: To define keyboard shortcuts.
scripting: To inject the content script.
host_permissions: https://api.openai.com/* for API calls, and https://*/* which is necessary in expert mode to allow the user to specify custom API URLs.

The content_scripts are injected into all URLs (<all_urls>) and run at document end ("run_at": "document_end"). The background script is declared as a module-type service worker ("type": "module").

5.3. `background.js`

The background script has several roles:

Injecting the content script: If the content script is not already injected, background.js injects it upon interaction (icon click or keyboard shortcut).
Event handling: It listens to events chrome.runtime.onMessage (messages from the content script), chrome.action.onClicked (icon click) and chrome.commands.onCommand (keyboard shortcuts).
Communication with the content script: It sends messages to the content script to start or stop recording ({ action: ACTIONS.TOGGLE }).
Icon update: It updates the extension icon badge to indicate recording state (recording, stopped, error).

5.4. `content.js`

The content script is the heart of the extension. It performs the following tasks:

Initialization: It initializes the API key and banner color options from Chrome storage.
Audio capture: It uses the navigator.mediaDevices.getUserMedia API to access the microphone and record audio.
Transcription: It uses the function transcribeAudio (src/utils/api.js) to send audio to OpenAI’s Whisper API and obtain the transcription.
Translation (optional): If the translation option is enabled, it uses the function translateText (src/utils/translation.js) to send the transcribed text to OpenAI’s GPT API and obtain the translation.
Display: It shows the transcription (and translation, if applicable) either in the active element of the page (if it’s a text field or editable element) or in a dialog box.
Banner management: It displays a status banner at the top of the page to indicate recording status or show error messages.
Communication with the background script: It sends messages to the background script to indicate start, end, or error of recording.
Listening to option changes: It listens for changes in the extension options and updates the display accordingly.

5.5. Communication

Communication between the background script and the content script is done via Chrome’s messaging API (chrome.runtime.sendMessage and chrome.runtime.onMessage).

5.6. API

The extension uses two OpenAI APIs:

Whisper: For voice transcription. The default URL is https://api.openai.com/v1/audio/transcriptions, and the model used is whisper-1.
GPT: For translation (optional). The default URL is https://api.openai.com/v1/chat/completions, and the model used is gpt-4o-mini.

The API URLs can be changed in the extension options (expert mode).

5.7. Storage

The extension uses chrome.storage.sync to store:

The OpenAI API key (apiKey).
The extension options (display, translation, banner colors, etc.).

5.8. User Interface

The extension’s user interface is managed by several functions in src/utils/ui.js and content.js:

Banner: A status banner is displayed at the top of the page to indicate recording status or show error messages. The banner’s color and opacity are customizable.
Dialog box: The transcription (and translation) can be displayed in a dialog box. The display duration is customizable.
Copy button: A copy button is added to the dialog box to easily copy the transcribed text.

5.9. Error Handling

Errors are managed centrally. Error messages are defined in src/constants.js (object ERRORS). Errors are shown to the user via the status banner.

6. Resources

GitHub repository: Babel Fish AI
Chrome Web Store: Babel Fish AI

7. Conclusion

Babel Fish AI offers a complete solution for accurate voice transcription and optional translation, while ensuring an intuitive interface and enhanced security. Thanks to this extension, the experience of converting voice to text is optimized, and the language barrier is lifted. Try Babel Fish AI on the Chrome Web Store and discover how it can transform your communication.

Thank you for reading, and enjoy using Babel Fish AI!

This document was translated from the fr version into the en language using the gpt-5-mini model. For more information on the translation process, consult https://gitlab.com/jls42/ai-powered-markdown-translator