Unicode Security Risks

Abstract: Fundamentally, computers process numbers, not letters, so when a computer processes text, the characters must be converted into numbers prior to processing. There are many schemes for encoding characters as numbers. US-ASCII is one well-known scheme, but it encodes only the English (Latin) alphabet. By contrast, Unicode is an international standard that assigns a unique number to each of the characters in the world’s languages. This document provides a brief overview of Unicode and discusses the potential security risks posed by using Unicode. It includes background on the growth of Unicode, definitions of commonly used Unicode terms, tips for creating filters to avoid visual spoofing attacks, and links to tools and further information.

