While using my AI-powered Markdown translation script (Mistral AI and Open AI) for the README of my project Stable Diffusion on GitLab, I encountered a major problem. Some parts of the text were not translated and some code blocks were incorrectly translated. This article presents the improvement made to resolve this critical issue.
Encountered Problem
During the translation of the Stable Diffusion README, the script did not properly differentiate between code blocks and normal text in certain places. This resulted in inappropriate translations of content that should have remained unchanged, revealing the need for finer management of code blocks.
Resolution Strategy
To solve this issue, I improved the script to precisely identify and extract code blocks before the AI translation and then correctly restore them after the translation. This approach prevents any unwanted alteration of content.
Improvement Mechanism
- Precise Identification and Extraction: Thanks to an improved regular expression, the code blocks are now clearly identified and extracted before translation, thus avoiding their alteration.
- Adequate Restoration: Code blocks are reinserted at their original location after the translation, ensuring the fidelity of the final content.
Advantages of the Improvement
- Preservation of Code Blocks: The codes remain intact, preserving their technical accuracy. The translation no longer affects code blocks, thus ensuring the integrity of technical content.
- Increased Reliability: The script now reliably handles complex Markdown documents containing significant code blocks. The translation is fully automated and requires no manual retouching, thus improving the efficiency and speed of the process.
- Better Differentiation: The improved detection of code blocks allows for better differentiation between the text to translate and the code blocks to preserve. This reduces translation errors and ensures a more accurate and consistent outcome.
Translation Results
To see the improvements in action, take a look at the translated versions of the original French README from the project Stable Diffusion on GitLab:
- README in English (translated with gpt-4-1106-preview, without any retouching)
- README in Spanish (translated with gpt-4-1106-preview, without any retouching)
- README in Chinese (translated with gpt-4-1106-preview, without any retouching)
These translations demonstrate the ability of the improved script to efficiently handle code blocks and provide precise and consistent translations for different languages, all without any manual retouching.
Access to the Improved Script
You can find the improved script on the project AI-Powered Markdown Translator, available for use or adaptation according to your needs.
New Features and Improvements
In addition to the improvement of code block detection and management, the AI-powered Markdown translation script has benefited from several other updates and improvements. Here is an overview of the new features:
Improved Output File Management
The script now takes into account the existence of output files before starting the translation. If an output file already exists and the --force
option is not activated, the script will display a message indicating that the translation is not performed and will move on to the next file. This avoids redundant translations and saves time.
Enhanced Existing Files Detection
The detection of existing files has been improved using the glob
library. The script now checks if a translation already exists, regardless of the model used, by searching for files matching the basename of the original file and the target language.
Reversal of Model and Language in the Output File Name
The format of the output file name has been changed to better reflect the target language and the model used. From now on, the output file name will be in the format {base}-{target_language}-{model}.md
instead of {base}-{model}-{target_language}.md
.
Addition of the --force
Option
A new --force
option has been added to the script. When activated, the script will force the translation even if a translation already exists for the input file. This can be useful when you want to update translations with a newer model or make changes to the translation settings.
These enhancements and new features make the AI-powered Markdown translation script even more powerful and flexible, thus easing the management and translation of your Markdown documents.
Conclusion
This update is a significant advancement for the Markdown translation tool, expanding its ability to process technical documents. Continuing to refine this tool aims to facilitate access to open-source projects for a global audience.
Stay tuned for more updates and innovations in the exciting world of generative AI and automation!
This document has been translated from the fr version to the en language using the gpt-4-1106-preview model. For more information about the translation process, visit https://gitlab.com/jls42/ai-powered-markdown-translator