Skip to content

The Complex Script Dilemma

I got interested in Unicode complex scripts rendering in computers when I found out that Persian is not supported in graphical suites. I wanted to find a way to fix this or at least make a free program or a plugin to make Persian text render properly. 12 years ago I wrote my very first program to solve this problem and I’ve been improving and maintaining it since then.

This is a non-technical overview of how complex scripts like Persian work in computers and how to write Persian in programs that don’t support it.

What are Complex Scripts?

Scripts which require complex text layout for proper display are known as Complex Scripts. Think of this as having multiple shapes for each character. In English we have small and capital letters. In Persian we have 4 different shapes for each character and the shape or positioning of each character depends on its relation to other characters. This is called Contextual Shaping.

Let’s look at an example of four shapes of a character in Persian. The character below is used to make the sound B in Persian with four Initial, Middle, End and Isolated shapes (right-to-left).

ب‍ ‍ب‍ ‍ب ب

The Bi-directional Side

Did you think that was all? We’re just getting started! Not only scripts like Persian require complex text layout, they are also Bi-directional. Persian characters are written from Right-to-Left while numbers and Latin characters are written Left-to-Right. This will introduce some issues related to the order in which words are displayed specially for the text that contains mixed and special characters like period, exclamation mark, hash, ... etc.

This problem introduced a new set of characters called Unicode Control Characters.

What are Unicode Control Characters?

Many Unicode control characters are used to fix the issues related to the order in which words are displayed but these characters themselves have no visual or spatial representation.

Example Time!

In the example below, bi-directional and contextual shaping rules are applied step by step.

→ Typing order
ﻩﻥﻭﻡﻥ ﻥﺕﻡ Sample Text

← Bi-directional rules applied
ﻡﺕﻥ ﻥﻡﻭﻥﻩ Sample Text

Contextual shaping applied
ﻣ ﺘ ﻦ ﻧ ﻤ ﻮ ﻧ ﻪ Sample Text

Final text to render
متن نمونه Sample Text

The Good News

Most operating systems and web browsers today support Unicode and use algorithms to handle contextual shaping and bi-directional problems.

The Bad News

Sadly *all* graphical suites do not support complex scripts. Adobe products handle this a little bit better than others but in general all Adobe, Corel, Autodesk, and all other commercial programs do not support complex script rendering.

Psst. Linkedin, you do not support bi-directional text! And it’s very easy to fix if you contact me! ;)

Now you might be wondering why operating systems and browsers can handle this and not graphical suites. This goes back to the foundation of the program and how it was written. The developer must make sure that every part of the software is Unicode compliant. Failing this crucial step, will make it nearly impossible to support Unicode later.

There is Hope

Even with the developers like Autodesk that don’t care enough to implement Unicode properly, there is still a way to support Persian in these programs. Users can use LeoMoon ParsiNegar that I’ve developed, to write Persian or Arabic in graphical suites.

How Does ParsiNegar Work?

So far we’ve learned that to render complex scripts like Persian, the program must be Unicode compliant and must have algorithms to handle contextual shaping and bi-directional text.

Graphical suites on the other hand do not support Unicode, contextual shaping or bi-directional text. They only support 255 ASCII characters. To write Persian in these programs, you need to use LeoMoon ParsiNegar.

LeoMoon ParsiNegar is loaded with 400 special fonts. These fonts have all possible forms of all Persian characters mapped to the 255 ASCII characters. All you have to do is to type your text in PersiNegar editor and click Convert. Parsinegar will apply bi-directional rules, then will calculate the contextual shaping of every character and finally map the exact form of each character to where they are located in the special font.

In the target application, after pasting the converted text, user has to pick one of the special fonts to display Persian text correctly.

If you are interested in more technical overview of how ParsiNegar works, you can contact me using the contact form.

Phew... Never thought the language I’ve used my whole life is this complex!