Hello Carl,
I guess that this is not so difficult to implement, and would
definitely consider doing so if I had more time on my days.
It may happen that we have another lockdown period here in France, but
I'm not sure that I will devote this to po4a this time, unfortunately.
In the meanwhile, could you please open a bug, either on github or on
https://salsa.debian.org/mquinson/po4a/ so that we don't forget?
I'll try to explain in your bug how this could be implemented, as I did in
https://github.com/mquinson/po4a/issues/272 The markdown parser is a
bit more hairly than the po4a script that we'll have to modify for
#272, but that's still doable.
One question is whether these Hugo shortcodes can appear in the middle
of a paragraph (or even in something that is not a plain pargraph,
such as a title), or whether they can only appear alone on a
paragraph. And how you want to react if they appear in the middle of
the paragraph: should we split the paragraph here (error prone for the
translators), or maybe put a placeholder in the text to translate
(somewhat harder to implement but we did it eg for HTML)?
Thanks for reporting,
Mt.
On Thu, Oct 08, 2020 at 10:51:17AM -0000, carl(a)carlschwan.eu wrote:
Hello,
One year ago at KDE we started using po4a for translating a part of
our main website
kde.org/announcements/releases. This is using
the Hugo static generator. The goal was to do it on a small portion
of the website and then extends to the entire website to finally
i18n the entire website and it worked but there were a few problems
with the extraction (related to the front matters and shortcode
detection), because of that I decided to write my own extractor in
python fixing the shortcomings of the po4a system.
But recently I learned that po4a now supports, since the May "LockDown"
release, extracting strings from yaml frontmatter. There is now only
one problem related to Hugo markdown file extractions and it is the
shortcodes.
Hugo's shortcodes are a system of a macro with a syntax similar to HTML, e.g.
{{< figure src="..." alt="Description" >}}. My script extracts
the string
"Description" but po4a extract the entire thing. This is quite problematic
in my experience because translators tend to translate "figure", or some
automatic tooling tends to replace the quotation mark with french guillemet
and this breaks the entire website generation and each time needs some
manual intervention fixing the translations (a quite stressful process
shortly before a release).
So I'm wondering if someone would be interested in adding this feature to
po4a markdown extractor or guide a Perl newbie like me to implement it.
This would allow me to stop maintaining a custom tooling and I think
this should make many devs using Hugo for their system quite happy and
if the structure is in place it could even be helpful for Jekyll that has
a similar feature with a different syntax.
Just for comparison, this is the output from my script:
https://invent.kde.org/-/snippets/1275
and this is the output from po4a
https://invent.kde.org/-/snippets/1274.
And this is the script:
https://invent.kde.org/websites/kde-org-announcements-releases/-/blob/mas...
but it is quite ugly and this is doing a bit more than just extracting the
content from the mardown files but also from the config files and the
Hugo string messages.
Regards,
Carl Schwan
https://carlschwan.eu
_______________________________________________
Devel mailing list -- devel(a)lists.po4a.org
To unsubscribe send an email to devel-leave(a)lists.po4a.org
--
Better to have an approximate answer to the right question than a precise
answer to the wrong question. -- John Tukey as quoted by John Chambers