While using my AI-powered Markdown translation script (Mistral AI and Open AI) for the README of my Stable Diffusion on GitLab project, I encountered a major issue. Some parts of the text were not translated and some code blocks were incorrectly translated. This article presents the improvement made to solve this critical problem.

Encountered Problem

During the translation of the Stable Diffusion README, the script did not correctly differentiate code blocks from normal text in some places. This led to inappropriate translations of content that should have remained unchanged, thus revealing the need for finer handling of code blocks.

Resolution Strategy

To solve this problem, I enhanced the script to accurately identify and extract code blocks before AI translation, then properly restore them after translation. This approach prevents any unwanted alteration of content.

Improvement Mechanism

  • Accurate Identification and Extraction: Thanks to an improved regular expression, code blocks are now clearly identified and extracted before translation, thus avoiding their alteration.
  • Adequate Restoration: Code blocks are reinserted at their original location after translation, ensuring the fidelity of the final content.

Benefits of the Improvement

  1. Preservation of Code Blocks: Code remains intact, preserving its technical accuracy. Translation no longer affects code blocks, thus ensuring the integrity of technical content.
  2. Increased Reliability: The script now reliably handles complex Markdown documents containing significant code blocks. Translation is fully automated and requires no manual touch-ups, thus improving efficiency and speed of the process.
  3. Better Differentiation: The improvement in code block detection allows for better differentiation between text to be translated and code blocks to be preserved. This reduces translation errors and ensures a more accurate and coherent result.

Translation Results

To see the improvements in action, take a look at the translated versions of the original French README of the Stable Diffusion on GitLab project:

These translations demonstrate the ability of the enhanced script to effectively handle code blocks and provide accurate and consistent translations for different languages, all without any manual touch-ups.

Access to the Enhanced Script

You can find the enhanced script on the AI-Powered Markdown Translator project, available for use or adaptation according to your needs.

New Features and Improvements

In addition to the improvement in code block detection and handling, the AI-powered Markdown translation script has benefited from several other updates and enhancements. Here is an overview of the new features:

Improved Output File Management

The script now takes into account the existence of output files before starting the translation. If an output file already exists and the --force option is not enabled, the script will display a message indicating that the translation is not being performed and will move on to the next file. This avoids redundant translations and saves time.

Improved Existing File Detection

The detection of existing files has been improved by using the glob library. The script now checks if a translation already exists, regardless of the model used, by searching for files matching the base name of the original file and the target language.

Inversion of Model and Language in Output File Name

The format of the output file name has been changed to better reflect the target language and the model used. Now, the output file name will be in the format {base}-{target_language}-{model}.md instead of {base}-{model}-{target_language}.md.

Addition of --force Option

A new --force option has been added to the script. When enabled, the script will force the translation even if a translation already exists for the input file. This can be useful when you want to update the translations with a newer model or make changes to the translation parameters.

These improvements and new features make the AI-powered Markdown translation script even more powerful and flexible, making it easier to manage and translate your Markdown documents.

Conclusion

This update represents a significant step forward for the Markdown translation tool, expanding its ability to handle technical documents. Continuing to refine this tool aims to facilitate access to open source projects for a global audience.

Stay tuned for more updates and innovations in the exciting world of generative AI and automation!

This document was translated from fr version to en language using the claude-3-opus-20240229 model. For more information on the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator