Determining if something is non-text content?
“any content that is not a sequence of characters that can be programmatically determined or where the sequence is not expressing something in human language”Non-text content definition on the W3.org website
Well thanks for that super helpful definition. Let me try to distill this into something that we can all understand.
“Any content” This is pretty straight forward depending on your definition of content. When we consider a web page there are a lot of different elements that could be considered content. I like to think of it as all of the consumables of the web page (visible or not). The <title> element for instance is content. Images are content. Buttons and inputs are content. Videos are content. Dividing lines (horizontal rules) are content. <meta> descriptions are content. Even the structure of the page can be considered content, but this one introduces more problems than it solves so we’ll table that discussion at this time. Now that we understand that “any content” is quite literally “any content” we’ll begin to define what types of content are considered “non-text”. Before we do that though, it’s helpful to understand what is considered “text content”, because it may include more than you realize.
“Sequences of characters” are considered any kind of character. This includes (but is not limited to) ASCII art, Font Glyphs, Emoticons, leetspeak, and images representing text. Images is not just referring to the <img> element. This can also refer to SVG, Glyphs, Canvas and many other element or content types. So an SVG, Glyph or Canvas that represents a word or a character would be considered as content representing text. Now that we’ve determined that a sequence of characters can have a broad scope of implementation lets move to the next step.
Take all the content on a page and then run it through three different filters.
- Is the content itself a sequence of characters that are programmatically determinable?
- If it is a sequence of characters, does the sequence of characters express something in human language?
Let’s take a moment and consider the first question. The key point here is to understand what “programmatically determinable” means. For example, an <img src=”text-representing-blue.jpg”> element representing the word “blue”, on it’s own, does not have any programmatically determinable text. In contrast, a sequence of characters presented within a <p> element is programmatically determinable. For example, <p>blue</p> the programmatically determinable text is “blue”. There is currently no assistive technology that incorporates OCR (Optical Character Recognition). This means that, while an image may look like a sequence of characters, it is nothing more than a sequence of pixels of different color aligned in an order that creates a visual only representation of text. While it may imitate text, on it’s own it cannot be consumed as text. The same is true for Emoticons, SVG, Canvas or other content (like buttons, inputs, CSS backgrounds, etc). The key here is to understand if the content is a sequence of characters that can be programmatically determined. in the two examples provided, the paragraph (<p>) element with the text would be considered text content and the image of text (<img>) would be considered non-text content.
Second, if it passes the first question, then it is programmatically determinable text! Hooray! Not so fast. Even though it is a programmatically determinable sequence of characters it may still be non-text content. For example, placing greater than and less than characters in sequence can look like a fish (e.g. <><). This is an extremely simple form of ASCII Art, but serves the purpose for illustration. This sequence of characters represents a fish, but the accessible name that is read on a screen reader is “Less than Greater than Less than”. I’m probably going to state the obvious here, but that’s not the same thing as a “fish”. Emoticons are similar since they often render these ASCII art into an image. Like the previous question, this one hinges on a specific concept. The concept that this hinges on is the “expressing something in human language” statement. ASCII Art, Glyphs, and Emoticons on their own do not express anything in human language. Emoticons sometimes get converted to something palpable, but on their own they are in essence ASCII Art. So if the sequence of characters that is presented visually does not express something in a human language, then it is non-text content.