Babel Fish AI, a personal project, is an innovative Chrome extension that transforms voice into text with exceptional accuracy while offering an automatic translation option. Designed to be reliable and ad-free, it provides quality voice transcription via OpenAI’s Whisper API. I created this extension partly to fulfill a personal need: to simplify communication with AIs by dictating my requests.
You can use it exclusively for transcription or activate translation to facilitate multilingual communication.
It was also the perfect opportunity to test Roo Code, an AI-assisted development tool.

1. Presentation

Babel Fish AI is a personal project, developed with the support of AIs such as Claude, Gemini, and OpenAI via Roo Code.
Thanks to its intuitive interface and customizable configuration, Babel Fish AI allows you to obtain fast and accurate transcription, with an immediate translation option to overcome language barriers.

2. Features and Options

  • Advanced Voice Transcription:

    • High-quality audio capture via the microphone.
    • Transcription carried out by OpenAI’s Whisper API, with automatic insertion into the active field or display in a dialog window.
  • Optional Automatic Translation:

    • Fast and faithful translation of the transcribed text, activatable as needed.
    • Option to use the extension solely for transcription.
  • Full Multilingual Support: Babel Fish AI processes the voice input and displays the text in various languages, broadening its international use.
    Supported languages: 🇸🇦 Arabic, 🇩🇪 German, 🇺🇸 English, 🇪🇸 Spanish, 🇫🇷 French, 🇮🇳 Hindi, 🇮🇹 Italian, 🇯🇵 Japanese, 🇰🇷 Korean, 🇳🇱 Dutch, 🇵🇱 Polish, 🇵🇹 Portuguese, 🇷🇴 Romanian, 🇸🇪 Swedish, 🇨🇳 Chinese.

  • Intuitive and Customizable Interface:

    • Choice between display in the active input area or a dialog window.
    • Customizable status banner (color, opacity, duration) for immediate visual feedback.
  • Advanced Options:

    • Expert mode offering detailed configuration (API URLs, per-domain settings, etc.).
    • Extension settings accessible via a user-friendly options page.

3. Behind the Scenes of Development: My Adventure with AI and Roo Code

Babel Fish AI is not just a Chrome extension; it is also the result of an exciting experimentation with artificial intelligence and an innovative tool: Roo Code.

Let me tell you how I turned an idea into reality by relying on the power of several AIs and a unique development environment.

3.1. Roo Code: My AI Coding Companion

Roo Code is much more than just a VS Code extension. It is a true autonomous coding agent that assists you in your work. Imagine an AI co-pilot capable of:

  • Communicating in natural language.
  • Reading and writing files directly in your workspace.
  • Executing terminal commands.
  • Integrating with various APIs (OpenAI, Google Gemini, etc.).
  • Adapting its “personality” through “custom modes.”

To develop Babel Fish AI, I used Roo Code like a conductor, guiding different AIs to collaborate in creating the extension.

3.2. A Cast of AIs for a Unique Project

I called upon a true panel of AIs, each with its own strengths and weaknesses:

  • o3-mini, Claude 3.5 Sonnet, Gemini (exp 1206, 1.5 pro exp 0827, Flash Think), GPT-4o: These models were my main collaborators, each contributing its part.
  • Gemini (free models): Their free availability was a major asset, although I had to juggle the quotas (luckily, Roo Code handles repeated attempts in case of failure!).

3.3. Overcoming Obstacles: A Journey Full of Hurdles (and Solutions!)

Development was not a smooth ride. I encountered several challenges, notably:

  • Microphone Access and Permissions: At first, it was impossible to obtain permission to access the microphone! The AIs insisted on adding a “microphone” permission that does not exist in Chrome’s Manifest V3. I had to dive into the official documentation and provide it to Claude 3.5 Sonnet so that he could find the solution.

  • Secure Storage of the API Key: Another headache: how to securely store and retrieve the OpenAI API key in Chrome? I provided the official documentation to Gemini, who managed to propose the appropriate code to interact with Chrome’s storage.

3.4. Collaboration and Experimentation: The Key to Success

I used almost all the AIs for options and localizations, and I wanted to test Roo Code.

Beyond these specific challenges, I was able to:

  • Test Roo Code in Real Conditions: The development of Babel Fish AI was also an opportunity to put this promising tool to the test.
  • Generate the Icon and Background Image: Thanks to DALL-E 3 (integrated with OpenAI), I was able to create a unique visual identity for the extension.
  • Customize the Display Banner: I used GIMP to extract colors from the image, then asked the AI to code the options section that allows configuring the banner colors.
  • Translate the Interface into Multiple Languages: I harnessed the power of AI via Roo Code to generate the localization files (_locales) and translate the options into many languages, saving considerable time!
  • Creation of Dropdown Menus: The AI generated everything, saving a tremendous amount of time.
  • Manage the HTML Rendering of the Options: In collaboration with the AI.

3.5. Difficulties Encountered

  • Issue with injecting text into an iframe for Google Chat: An exception was added, and I left a default pop-up.
  • Expert Mode and permissions: To allow changing the API URLs, I had to authorize all hosts in the manifest. This seems to cause issues with Chrome’s enhanced navigation, and I don’t see a solution for now (if anyone has an idea, I’m all ears!).

4. Why Babel Fish AI? An Extension Born from a Need

Before delving into the technical details, it is important to understand why I created Babel Fish AI. It all started from a personal need:

  • Tired of typing long prompts: I wanted to be able to communicate with AIs more naturally, by speaking to them directly.
  • Mediocre transcriptions: The transcription extensions I had tested were disappointing.
  • Desire for transparency: I wanted an open source, ad-free extension whose code I controlled.

Babel Fish AI was thus born from the desire to create a tool that meets my own needs while being useful to the community.

5. A Closer Look at the Code and Technical Details

Babel Fish AI is built on a typical Chrome extension architecture, with some specificities related to the use of OpenAI’s APIs.

Here is an overview of the main components:

5.1. General Architecture

The extension is composed of several JavaScript files that interact with each other:

  • manifest.json: The main configuration file of the extension. It defines the permissions, scripts, accessible resources, etc.
  • background.js: The service worker that runs in the background. It handles events (click on the icon, keyboard shortcuts), injects the content script when necessary, and communicates with the content script.
  • content.js: The script that is injected into the web pages. It directly interacts with the DOM, captures audio from the microphone, calls the transcription and translation APIs, and displays the results.
  • src/utils/api.js: Contains the functions to call the OpenAI Whisper API (transcription).
  • src/utils/translation.js: Contains the functions to call the OpenAI GPT API (translation).
  • src/utils/ui.js: Contains utility functions to manage the user interface (banner, dialog box, copy button).
  • src/constants.js: Defines constants for configuration, states, actions, etc.
  • src/pages/options/: Contains the files for the extension’s options page (HTML, CSS, JavaScript).

5.2. manifest.json

The manifest.json file uses version 3 of the manifest. It declares the following permissions:

  • activeTab: To access the active tab.
  • storage: To store the API key and the extension’s options.
  • commands: To define keyboard shortcuts.
  • scripting: To inject the content script.
  • host_permissions: https://api.openai.com/* for API calls, and https://*/* which is necessary in expert mode to allow the user to specify custom API URLs.

The content_scripts are injected into all URLs (<all_urls>) and run at the end of the document loading ("run_at": "document_end"). The background script is declared as a module-type service worker ("type": "module").

5.3. background.js

The background script has several roles:

  • Injection of the content script: If the content script is not already injected, background.js injects it upon an interaction (click on the icon or keyboard shortcut).
  • Event Handling: It listens for chrome.runtime.onMessage events (messages from the content script), chrome.action.onClicked (icon click), and chrome.commands.onCommand (keyboard shortcuts).
  • Communication with the content script: It sends messages to the content script to start or stop recording ({ action: ACTIONS.TOGGLE }).
  • Updating the Icon: It updates the extension icon badge to indicate the recording status (in progress, stopped, error).

5.4. content.js

The content script is the heart of the extension. It performs the following tasks:

  • Initialization: It initializes the API key and the banner color options from Chrome storage.
  • Audio Capture: It uses the navigator.mediaDevices.getUserMedia API to access the microphone and record audio.
  • Transcription: It uses the transcribeAudio function (src/utils/api.js) to send the audio to the OpenAI Whisper API and obtain the transcription.
  • Translation (optional): If the translation option is activated, it uses the translateText function (src/utils/translation.js) to send the transcribed text to the OpenAI GPT API and get the translation.
  • Display: It displays the transcription (and translation, if applicable) either in the active element of the page (if it is a text field or an editable element) or in a dialog box.
  • Banner Management: It displays a status banner at the top of the page to indicate the recording status or display error messages.
  • Communication with the background script: It sends messages to the background script to indicate the start, end, or an error during recording.
  • Listening for Option Changes: It listens for changes in the extension options and updates the display accordingly.

5.5. Communication

Communication between the background script and the content script is done via Chrome’s messaging API (chrome.runtime.sendMessage and chrome.runtime.onMessage).

5.6. API

The extension uses two OpenAI APIs:

  • Whisper: For voice transcription. The default URL is https://api.openai.com/v1/audio/transcriptions, and the model used is whisper-1.
  • GPT: For translation (optional). The default URL is https://api.openai.com/v1/chat/completions, and the model used is gpt-4o-mini.

The API URLs can be changed in the extension options (expert mode).

5.7. Storage

The extension uses chrome.storage.sync to store:

  • The OpenAI API key (apiKey).
  • The extension options (display, translation, banner colors, etc.).

5.8. User Interface

The extension’s user interface is managed by several functions in src/utils/ui.js and content.js:

  • Banner: A status banner is displayed at the top of the page to indicate the recording status or display error messages. The color and opacity of the banner are customizable.
  • Dialog Box: The transcription (and translation) can be displayed in a dialog box. The display duration is customizable.
  • Copy Button: A copy button is added to the dialog box to easily copy the transcribed text.

5.9. Error Management

Errors are managed centrally. Error messages are defined in src/constants.js (the ERRORS object). Errors are displayed to the user via the status banner.

6. Resources

7. Conclusion

Babel Fish AI offers a complete solution for precise voice transcription with an optional translation, while ensuring an intuitive interface and enhanced security. Thanks to this extension, the experience of converting voice to text is optimized, and the language barrier is lifted. Try Babel Fish AI on the Chrome Web Store and discover how it can transform your communication.


Thank you for reading, and enjoy using Babel Fish AI!

This document was translated from the French version into the English language using the o3-mini model. For more information on the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator