While using my AI-powered Markdown translation script (Mistral AI and Open AI) for the README of my project Stable Diffusion on GitLab, I encountered a major problem. Some parts of the text were not translated and some code blocks were incorrectly translated. This article presents the improvement made to resolve this critical issue.

Encountered Problem

During the translation of the Stable Diffusion README, the script did not properly differentiate between code blocks and normal text in certain places. This resulted in inappropriate translations of content that should have remained unchanged, revealing the need for finer management of code blocks.

Resolution Strategy

To solve this issue, I improved the script to precisely identify and extract code blocks before the AI translation and then correctly restore them after the translation. This approach prevents any unwanted alteration of content.

Improvement Mechanism

  • Precise Identification and Extraction: Thanks to an improved regular expression, the code blocks are now clearly identified and extracted before translation, thus avoiding their alteration.
  • Adequate Restoration: Code blocks are reinserted at their original location after the translation, ensuring the fidelity of the final content.

Advantages of the Improvement

  1. Preservation of Code Blocks: The codes remain intact, preserving their technical accuracy. The translation no longer affects code blocks, thus ensuring the integrity of technical content.
  2. Increased Reliability: The script now reliably handles complex Markdown documents containing significant code blocks. The translation is fully automated and requires no manual retouching, thus improving the efficiency and speed of the process.
  3. Better Differentiation: The improved detection of code blocks allows for better differentiation between the text to translate and the code blocks to preserve. This reduces translation errors and ensures a more accurate and consistent outcome.

Translation Results

To see the improvements in action, take a look at the translated versions of the original French README from the project Stable Diffusion on GitLab:

These translations demonstrate the ability of the improved script to efficiently handle code blocks and provide precise and consistent translations for different languages, all without any manual retouching.

Access to the Improved Script

You can find the improved script on the project AI-Powered Markdown Translator, available for use or adaptation according to your needs.

New Features and Improvements

In addition to the improvement of code block detection and management, the AI-powered Markdown translation script has benefited from several other updates and improvements. Here is an overview of the new features:

Improved Output File Management

The script now takes into account the existence of output files before starting the translation. If an output file already exists and the --force option is not activated, the script will display a message indicating that the translation is not performed and will move on to the next file. This avoids redundant translations and saves time.

Enhanced Existing Files Detection

The detection of existing files has been improved using the glob library. The script now checks if a translation already exists, regardless of the model used, by searching for files matching the basename of the original file and the target language.

Reversal of Model and Language in the Output File Name

The format of the output file name has been changed to better reflect the target language and the model used. From now on, the output file name will be in the format {base}-{target_language}-{model}.md instead of {base}-{model}-{target_language}.md.

Addition of the --force Option

A new --force option has been added to the script. When activated, the script will force the translation even if a translation already exists for the input file. This can be useful when you want to update translations with a newer model or make changes to the translation settings.

These enhancements and new features make the AI-powered Markdown translation script even more powerful and flexible, thus easing the management and translation of your Markdown documents.

Conclusion

This update is a significant advancement for the Markdown translation tool, expanding its ability to process technical documents. Continuing to refine this tool aims to facilitate access to open-source projects for a global audience.

Stay tuned for more updates and innovations in the exciting world of generative AI and automation!

This document has been translated from the fr version to the en language using the gpt-4-1106-preview model. For more information about the translation process, visit https://gitlab.com/jls42/ai-powered-markdown-translator