Using my AI-powered Markdown translation script (Mistral AI and Open AI) for the README of my Stable Diffusion on GitLab project, I ran into a major issue. Some parts of the text were not translated and some code blocks were incorrectly translated. This article presents the improvement made to solve this critical problem.
Problem Encountered
While translating the Stable Diffusion README, the script failed at certain points to properly distinguish code blocks from normal text. This led to inappropriate translations of content that should have remained unchanged, revealing the need for finer-grained handling of code blocks.
Resolution Strategy
To fix this, I improved the script to precisely identify and extract code blocks before AI translation, then restore them correctly after translation. This approach prevents any undesired alteration of the content.
Improvement Mechanism
- Identification and Precise Extraction: Thanks to an improved regular expression, code blocks are now clearly identified and extracted before translation, preventing their alteration.
- Proper Restoration: Code blocks are reinserted at their original positions after translation, ensuring the fidelity of the final content.
Benefits of the Improvement
- Preservation of Code Blocks: Code remains intact, preserving its technical accuracy. Translation no longer affects code blocks, ensuring the integrity of technical content.
- Increased Reliability: The script now reliably handles complex Markdown documents containing important code blocks. Translation is fully automated and requires no manual touch-ups, improving both efficiency and speed.
- Better Differentiation: Improved detection of code blocks enables better differentiation between text to translate and code blocks to preserve. This reduces translation errors and ensures a more accurate and consistent result.
Translation Results
To see the improvements in action, take a look at the translated versions of the original French README for the Stable Diffusion on GitLab project:
- README in English (translated with gpt-4-1106-preview, with no edits)
- README in Spanish (translated with gpt-4-1106-preview, with no edits)
- README in Chinese (translated with gpt-4-1106-preview, with no edits)
These translations demonstrate the improved script’s ability to effectively handle code blocks and deliver accurate, consistent translations for different languages, all without any manual edits.
Access to the Improved Script
You can find the improved script in the AI-Powered Markdown Translator project, available for use or adaptation to your needs.
New Features and Enhancements
In addition to improved detection and handling of code blocks, the AI-powered Markdown translation script has received several other updates and enhancements. Here is an overview of the new features:
Improved Output File Handling
The script now checks for the existence of output files before starting translation. If an output file already exists and the --force option is not enabled, the script will display a message indicating that translation is not performed and will move to the next file. This avoids redundant translations and saves time.
Enhanced Existing File Detection
Existing file detection has been improved by using the glob library. The script now checks whether a translation already exists, regardless of the model used, by looking for files matching the base name of the original file and the target language.
Swapping Model and Language in the Output File Name
The output file naming format has been changed to better reflect the target language and the model used. From now on, the output file name will be in the {base}-{langue_cible}-{modèle}.md format instead of {base}-{modèle}-{langue_cible}.md.
Addition of the --force Option
A new option --force has been added to the script. When enabled, the script will force translation even if a translation already exists for the input file. This can be useful when you want to update translations with a newer model or change translation parameters.
These improvements and new features make the AI-powered Markdown translation script even more powerful and flexible, facilitating the management and translation of your Markdown documents.
Conclusion
This update represents a significant advance for the Markdown translation tool, expanding its ability to handle technical documents. Continued refinement of this tool aims to make open-source projects more accessible to a global audience.
Stay tuned for more updates and innovations in the exciting world of generative AI and automation!
This document was translated from the fr version into the en language using the gpt-5-mini model. For more information about the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator