Saturday, September 25, 2004

Unicode and Biblical Studies

I just explained to a friend of mine a little bit about unicode, but I thought I would go ahead and blog this, with my thoughts on what the biblical studies fields (and related, really) should do for the benefit of all.

First of all, Unicode is da bomb. Essentially, the idea of unicode is define a universal system of codes whereby every character in every language can be mapped to a specific code. The standardization group who defines what codes go with what character is the Unicode Consortium ( They are an international standards group that all major software groups listen to. Microsoft, Apple, and Linux vendors all do their work according to the standards produced by the consortium and all have unicode fonts by default on their systems. The way they do there work is to separate characters into character sets, or ranges. So the Greek character set would get one range of codes, the Hebrew another, Sanskrit another, etc.

And what it does for you is great. Let's contrast Greek unicode with non-unicode Greek. Generally an html page will have a default font that most of page is written in. Every time you want to depart from showing the default English characters you have to switch fonts to something else, like Mounce, the BibleWorks font, or whatever. And as you know for most Greek fonts used today this basically works by switching, for example, the "a" character with the "α" character. Or the quotation mark for a final sigma. And so you get Greek looking characters showing up on the web, or in print, whatever your medium is. That is the non-unicode way. In the unicode world, you never need to switch fonts to do this. These characters are embedded in one font. To type in the other character set you just switch keyboard layouts. In Windows XP (and 2000, I think), you can setup different keyboard layouts and alternate between them by pushing the alt-shift keys. Right now I have two setup, English and Greek. So to switch to typing Greek I just click alt-shift ανδ Ι αμ τυπινγ ιν Γρεεκ. Those last few words were unicode Greek, BTW. So whenever you write papers or whatever, you just switch keyboards instead of switch fonts. But, for production it isn't really much better than the non-unicode way (though I prefer it a little bit). It is the distribution that makes unicode so incredibly great. Write a paper that uses the Mounce font and the user has to have the Mounce font on his machine to view the Greek. If you decide you like the look of BibleWorks Greek font better than Mounce's you can't just change the font in the document, because the different English keystrokes map to different Greek keystrokes in the fonts. You have to manually change them or get a program that will do it for you. Unicode theoretically solves all of this. All someone would have to have is a unicode font that supports both the English and the Greek character sets (and they aren't hard to find).

So, essentially, that is unicode. Technology is moving that way already. In many areas of technology it is the only way of doing things now. The biblical studies world is lagging behind on this technologically, unfortunately, but that is nothing new. I don't generally like using Libronix very much, but that is one thing they are doing right. They use unicode. No other major Bible software vendor does.

Now, there are issues with regard to unicode. Greek, by in large, is covered well by font vendors and such. Last I heard, though, there was quite a bit of debate about how well Hebrew was doing because Hebrew itself is covered in unicode, but I don't believe all the symbols used in modern critcal editions of the Hebrew Bible are, such as the Masoretic symbols. Coptic, also, has issues. I haven't yet found a unicode font that supports Coptic-looking characters yet. The Coptic character set is lumped in with the Greek character set since they are identical with the exception of a few characters, but even though the Greek alpha and the Coptic alpha are the same letter they are generally formed very differently. So all unicode Greek fonts look Greek and not Coptic, which works out well for Greek and horrible for Coptic.

But, the tech world is ready to move on from the old way of doing fonts in general. Some work still needs to be done. If you're doing Greek, go unicode and save yourself some trouble in a few years when people will start getting annoyed when you don't use unicode.

At some point later (probably several days) I'll post something about specific fonts.


At 6:57 PM, Blogger Tim said...

Despite a few minor difficulties of the sort you mention, Hebrew unicode is great, mainly because it actually WORKS right to left. Most "font" solutions require the characters input left to right. Searching etc. is bizzare. Unicode works the right way round and displays the right way round too!

At 7:33 PM, Blogger Eric Sowell said...

Thanks, Tim! It is nice to hear from someone with more first-hand experience than I with Hebrew unicode(I've played with it a little and seen some discussions about it on some mailing lists). If you know of a good place to go on the web to find more information, send my a link. I'll post it so anyone interested can get the info they may need.


Post a Comment

<< Home