po4a for extracting Hugo markdown

Thursday, 8 October 2020

Hello,

One year ago at KDE we started using po4a for translating a part of
our main website kde.org/announcements/releases. This is using
the Hugo static generator. The goal was to do it on a small portion
of the website and then extends to the entire website to finally
i18n the entire website and it worked but there were a few problems
with the extraction (related to the front matters and shortcode
detection), because of that I decided to write my own extractor in
python fixing the shortcomings of the po4a system.

But recently I learned that po4a now supports, since the May "LockDown"
release, extracting strings from yaml frontmatter. There is now only
one problem related to Hugo markdown file extractions and it is the
shortcodes.

Hugo's shortcodes are a system of a macro with a syntax similar to HTML, e.g.
{{< figure src="..." alt="Description" >}}. My script extracts
the string
"Description" but po4a extract the entire thing. This is quite problematic
in my experience because translators tend to translate "figure", or some
automatic tooling tends to replace the quotation mark with french guillemet
and this breaks the entire website generation and each time needs some
manual intervention fixing the translations (a quite stressful process
shortly before a release).

So I'm wondering if someone would be interested in adding this feature to
po4a markdown extractor or guide a Perl newbie like me to implement it.
This would allow me to stop maintaining a custom tooling and I think
this should make many devs using Hugo for their system quite happy and
if the structure is in place it could even be helpful for Jekyll that has
a similar feature with a different syntax.

Just for comparison, this is the output from my script:
https://invent.kde.org/-/snippets/1275
and this is the output from po4a https://invent.kde.org/-/snippets/1274.

And this is the script:
https://invent.kde.org/websites/kde-org-announcements-releases/-/blob/mas...
but it is quite ugly and this is doing a bit more than just extracting the
content from the mardown files but also from the config files and the
Hugo string messages.

Regards,
Carl Schwan
https://carlschwan.eu

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004