Document Accessibility
Document Accessibility
The Simple Cases
Volker Sorge
Overview
- Accessibility of Simple Documents
- Revisit Example
- Triage issues
- How to fix them
- Guidelines for Simple Documents
- Experiments with transformation tools
Summary of Findings
Example site
- Missing first level heading
- Images missing alternative text
- Missing or uninformative page title
- Control elements without Label
- The
<html>
element does not have a lang attribute
- Element has insufficient color contrast
- Buttons have no meaningful content
- Content is ARIA hidden
- Links are hidden (3 behind images, one in a button)
- No page regions
Note that not all tools will find all problems!
Triaging the Problems
- Usability vs Accessibility
- Helpful for everyone vs important for some
- Content vs Structure
- Static information vs interactive features
- Syntactic vs Semantic
- The content of elements vs their meaning/usage
Triaging: Usability vs Accessibility
Usability
- Hidden links:
- Mouse pointer does not change or vanishes
- Link in a button
- Buttons: What do they do?
- Uninformative text
Accessibility
- Pretty much everything else
Triaging: Content vs Structure
- Content
- Images missing alternative text
- Control elements without Label
- Buttons have no meaningful content
- Element has insufficient color contrast
- Links are hidden (3 behind images, one in a button)
- Structural elements
- Missing first level heading
- Missing or uninformative page title
- The
<html>
element does not have a lang attribute
- No page regions
- Both: Content is ARIA hidden
Triaging: Syntactic vs Semantic
Let's have a closer look at Syntax vs Semantics
- Syntactic elements
- Images missing alternative text
- Hidden links
- Element has insufficient color contrast
- Semantic elements
- Buttons have no meaningful content
- Missing
- Ordering of headings
- ARIA hidden
- The
<html>
element does not have a lang attribute
- No page regions
Documents
Even for simple documents
- Usability increasingly harder:
- Print
- Electronic document
- Web document
- Accessibility increasingly easier:
- Print
- Electronic document
- Web document
- Really?
Analogy to Typesetting
- Content: the text you write
- Styling:
- LaTeX: added by the class file
- Word: WYSIWYG editor
- Interaction: e.g., slides, powerpoints
Separation of Concerns: Different Axes
Natural (and formal) languages:
- Syntax
- Semantics
- Pragmatics
Different Axes for Typesetting
- Syntax: the (visible) text
- Semantics: the styling or annotations
- Using
\section
, \subsection
, or Heading1
, Heading2
, ... instead of
font sizes
- Using lists and items instead of working with bullet points and indentation
- Pragmatics: This is a stretch!
- Semantics is exploited by different styles or classes
- Generation of different output formats
Output nearly always PDF of fixed size for print, so semantics is mostly discarded.
The Axes on the Web
We are interested in getting documents into flexible markup
- Content is still the text
- Markup is used to add semantics
- Adding context via styles or programmatically
But there is also much more and semantics can be continuously useful.
Semantics: How to get it
Getting semantics into documents
- Simple structural components
- Improved with simple attributes and ARIA
- Advanced with ARIA and JavaScript
Authoring vs Conversion
Authoring for the web
- Today web documents are rarely authored from scratch
- Sites are build by static site generators
- Or pages are authored in wiki-like management systems
- Content generally written in low-key markup languages like
markdown
What we are really interested in is converting content
- Documents authored in Word, LaTeX etc. already have desired structure
- Convert the documents from that format
- Preserve the good parts
Let's have a look at some conversion techniques
Simple workflow
- Take a source document
- Use a converter to generate HTML
- Check the generated HTML with an evaluation tool
- Test with keyboard and screen reader
Conversion towards web formats
- pandoc - bi-directional "swiss army knife"
- markdown, TeX, docx, and many many more
- LaTeX
- PDF
Preserving Structure 1
- convertor philosophies differ, ranging from layout-focus to extraction of abstract semantics
- no convertor has 'perfect' support (in any sense)
- some lack even basic features (tables, figures, math mode)
- non-standard packages usually need re-implementation (good and bad)
- some can digest enough TeX to resolve simple macros
Preserving Structure 2
- follow standard document setups
- use standard sectioning
- use standard environment with established defaults (e.g., list, figures, tables)
- keep compatibility in mind for both convertor and its dependencies (e.g., MathJax)
For Our Examples
We will work with two systems:
Please install either (or both) during the break.
Pandoc
- virtually every format
- TeX input fairly limited but useful when it works
- watch out for unsupported content disappearing
- TeX output very well supported (e.g., markdown+TeX)
- highly extensible ("filters")
- large community
tex4ht
- works via DVI thus very generic
- good mathjax support
- dual output well supported (config, custom macro)
- make4ht simplifies a lot
- maintainer active on tex.stackexchange
pandoc basic commands
Download tex example or word
example. Simply use:
pandoc input.docx -t html -o output.html --standalone
- Inspect the options:
-o
is followed by the output filename
--standalone
ensures that pandoc produces a fully HTML file not just an HTML fragment.
-t
is the target format. In our case HTML.
Note, that -t
can often be omitted if the output format is clear.
pandoc input.tex -o output.html --standalone
tex4ht/make4ht basic commands
tex4ht and
make4ht are part of standard TeX
distributions
Work for LaTeX documents only.
htlatex input.tex "xhtml,html5,charset=utf-8" " -cmozhtf -utf8"
Equivalent and simpler:
make4ht input.tex
Inspecting the Output
Output is best inspected when served in your browser
Serve via http
protocol, not file
. Here are some ways to run a simple server
in your directory
python -m SimpleHTTPServer 8000
python3 -m http.server 8000 --bind 127.0.0.1
- NodeJS (usually runs on port 8080)
npx http-server