Is Monocode a different thing?—Attending the 2024 Unicode Technology Workshop

The Event

Last week (the week of October 20., 2024), I had the opportunity to attend the Unicode Technology Workshop at the Google Campus in Sunnyvale California. I originally become interested based on the recommendation of one of my professors who was teaching a class which heavily centered on the Unicode Bidirectional algorithm (the algorithm by which RTL text is displayed online, in text editors, and pretty much every electronic system you can think of). I didn’t know what to expect, to be honest, part of me was expecting that a lot of it would go right over my head, but I was pleasantly surprised. I found a lot of it very accessible and immediately applicable to my education at MIIS and my career search, and I could not wait to share what I learned with my professor. In this writeup, I’ll provide you with my overarching key takeaways, followed by a quick summary of how I intend to use what I learned going forward. I’ll close with a small list of all of the talks I attended, so you can get a feeling for what kind of topics are covered if you feel like you want to attend next year.

But, before I get to all of that, I wanted to take a moment to mention one of the things I appreciated so much about this conference. I’ve been to a few conferences, and they all tend to revolve around marketability, costs, services exchanged for money, and other very business-related aspects. These are all vital things to operate in the world today, and it is not my intention to slight that kind of thinking. I just want to express that it was nice to be a part of a conference that was more focused on accessibility than anything else; all that everyone talked about led back to making foreign languages better supported, better documented, or simply better recognized. It was a nice breath of fresh air and humanity.

3 Key Learnings

International Vital Encoding System, small team of volunteers

I was shocked to learn that Unicode, the system behind how most languages in the world are encoded and made able to be displayed electronically, is run by a relatively small group of primarily volunteers.
It was interesting to see the company structure and how they break projects into “working groups” which are populated by interested parties and volunteers
Something that stuck out to me was when somebody mentioned “if you know something about a Digitally Disadvantaged Language (DDL) and you’re in this room… that probably makes you the Unicode expert on that language.”
- I found this inspiring – it made me feel that Unicode allows individuals to make a huge difference in the world

DDLs, where we are and where we need to go

I learned about the designation DDL (Digitally Disadvantaged Language), which is assigned to languages that do not have significant support technologically
They have and are continuing to develop a special survey tool which allows users to gain information about a language and have it vetted and reviewed by others who also know that language. All the survey data is entered into the comprehensive database known as the Common Locale Data Repository (CLDR)
- People and organizations get representative voting power and can vote to affirm or change existing translations
- There’s an expansive, yet finite list of translations necessary for a language to be considered “supported” by Unicode
  - Supporting a DDL comes down to getting reliable and vetted translations for most if not all of the required translations
Supporting languages is not as simple as just polling users, especially when it comes to dialects and languages that aren’t officially supported by a government
- Who gets to decide what the ‘correct translations are?’ If the speakers exist in city A and city B, but both cities have different translations, whose should be entered into the database?
- If a government decides on a translation, but communities disagree, does the power of the government supercede that of the people?
Communities using DDLs can approach Unicode and work with them to build a virtual keyboard for their language

Emojis, the new language, the new cause of technical headaches

Emojis are encoded the same way that all characters are
- They are assigned a glyph and then a typeface builds a visual form of that emoji
  - This causes an interesting problem: everybody knows what an “a” should look like, so typefaces can design a fancy ‘a’ without too much worry, but what’s the prototypical peach emoji supposed to look like? Will it change drastically based on typeface?
Emoji use differs across cultures, sometimes greatly
- Emojis are associated with certain key terms that are recorded in the CLDR, but it is necessary to transcreate these key words into other languages
  - If an emoji is associated with happiness in one language but stress in another, the CLDR should be updated to indicate that
Emojis pose an interesting problem for RTL languages
- With emojis that imply or explicitly show a direction, the logic entirely flips when presented in an RTL language
- They recently encoded a special process by which you can flip every emoji that has an associated direction, which was more complicated than it sounds

Moving Forward

Looking forward, I can absolutely see how the insights I gained from the Unicode Technology Workshop will provide me with unique perspectives and a better understanding of the localization landscape in both my education and beyond. With a solid understanding of Unicode’s structure and mission, I feel equipped to begin or join conversations of how best to support language accessibility in technology, particularly for Digitally Disadvantaged Languages (DDLs). I plan to integrate this knowledge into my studies at MIIS, starting with an in-depth conversation with my RTL languages professor, but I also look forward to implementing more of Unicode’s message formats into my programs and code. Additionally, in my future career, this experience has inspired me to advocate for inclusivity in digital content and language support, ensuring that we, as members of the language community, keep people and the way they communicate at the forefront of our minds. The knowledge I’ve gained about the technical and cultural implications of how languages are represented in the CLDR, and further how they are encoded, will empower me to approach future projects with both technical perspectives and cultural sensitivity.

Conference at a Glance

Day 1

Behind the Curtains: Unicode Technical Groups
Mark Davis
A quick overview on how Unicode leadership is structured

Solving Inflection
Nebojša Ćirić, George Rhoten
A discussion on the ongoing issue of supporting inflection in languages, especially those that have several different inflections

Volunteers for Keyboards for Indigenous Language Communities
Tex Texin
A look into the processes of how Unicode helps build virtual keyboards for communities that need them

Indic Script Policy & Planning in the Digital Age
Karthik Malli
An overview of the linguistic richness of India and the challenge it was and continues to be to support all spoken languages in India

How To Not Run Towards The Bear: Directionality & Emoji
Kamilé Demir, Ben Joeng (Yang)
A discussion on the issues of directionality with regards to emoji and presenting some developed solutions

Date, Time and Timezone for Netflix Live Events
Shawn Xu, Chester Fung
An example of how Netflix currently solves the issue of translating time zones in their products

What is a Valid Person Name?
Michael McKenna
A talk about how difficult it is to validate names in languages and some approaches to doing so

A User-Centric Approach to a Bidi Text Interface
Adil Allawi
An overarching look at how the Unicode Bidirectional Algorithm can and should be implemented

Day 2

MessageFormat 2 Technical Preview: Where Are We Now?
Addison Phillips
An in-depth look at how the new proposed MessageFormat 2 can be utilized in programs going forward and how it improves upon MessageFormat 1

The Emoji Experience
Jennifer Daniel
A deep dive into how emojis are used, encoded, and a walkthrough of how new emojis come into being and become supported by Unicode

Common Locale Data Repository – Using the Survey Tool to Expand Language Coverage
Conrad Nied
A workshop where we got hands-on experience using the CLDR survey tool to propose new translations and vote on them