The UTF-8 standard is a way of encoding Unicode characters, invented in a single evening by two computer programmers in New Jersey. UTF-8 history Each character is formatted into 8-bit bytes, with simple ASCII characters requiring only one byte, and more complex character sets taking up to 6 bytes. The principle of the UTF-8 standard is to represent as wide a range of characters as possible with a single byte, while also indicating how many (more) bytes are needed to complete the encoding.
Each byte sequence starts with a byte whose initial bits indicates how many total bytes it takes to encode the character. If the initial bit is 0, then the byte represents a 7-bit ASCII character which fits in one byte. Thus, any one of 27, or 128, ASCII characters may be represented. Nearly all text written in American English will fit in this encoding. (picture)