Portable Document Format (PDF) accessibllity is not a new topic, it is well understood and explained by certain experts.
However, the implications are universally unknown by organisations.
Perhaps by outlining the four broad levels of technical PDF accessibility, and what most organisations do, someone will take note?
Levels of accessibility
Broadly speaking, any PDF will fall into one of the following categories depending on how it was created:
- Created from unsupported graphical tool. (Not accessible)
- Created from unsupported text based tool. (Probably not accessible)
- Tags automatically added. (Might be accessible)
- Tags added and carefully edited. (Accessible)
Created from unsupported graphical tool
A typical scenario is creating a poster or leaflet that is made with a desktop publishing tool such as Quark. This does not embed text in any useful way when a PDF is created, so it will be invisible to screen readers, and the other accessibility features available when reading will not work.
These files cannot be considered accessible in any way.
Create from unsupported text based tool.
When you use Acrobat Pro (version 5 or greater, preferably 7 or greater) with a supported application (e.g. MS Word), you can add ‘tags’ to PDF documents, providing a structure to the document that is used by screen readers and other access (and mobile) technologies.
If you aren’t using a supported application with Acrobat Pro, then you will be creating a PDF without the HTML-like tags. This is where there is some grey area.
If the document is simple, then it is likely that many people using access technologies will still be able to access it if they have the latest software. However, there is no guarantee, and it is possible many people won’t be able to access it.
These also cannot be considered accessible, and the vast majority of PDFs online fall into this category.
Tags automatically added
Using a supported application such as Word, with Acrobat Pro, and adding tags, you have the basis for an accessible PDFs.
However, whether people with access technologies will be able to use it is another matter, and varies greatly depending on the source document. The type of things that can and usually do go wrong are:
- Images included in a document don’t have alternative texts.
- Word styles are not used, meaning Acrobat has little chance of working out what the structure should be (although it does try).
- Word’s formatting has become confused, separating content in strange ways that are not visible until you check the reading order.
- The document includes tables, which need marking up through the Acrobat interface.
If you run an accessibility check (part of Acrobat Pro) on any document that hasn’t been manually altered, you are likely to get problems, and an automatic check cannot identify some of the above problems.
A document that has tags has a chance of being accessible. For simple documents (no columns, tables or images) it is probably OK. For anything more than basic, it is likely to take manual intervention to be properly accessible.
Tags added and carefully edited
If you’ve used the right software, and edit the PDF in Acrobat Pro afterwards, you can:
- Check the the structure is right, and correct it.
- Check the reading order is right, and correct it.
- Specify the language (e.g. English).
- Add alt texts.
- Mark up tables properly.
- Remove non-text artifacts.
And generally clean it up. At this point, you can be satisfied that the document is accessible.
In my experience, if you have a 40 page Word document which has been perfectly marked up with Word styles, includes 1 large table, and 10 images: Allow about 3 hours for this process. If the source isn’t Word, or it isn’t using styles for headings, double that time.
And if you or someone else edits the word document? You have to re-do it, PDF is an end format.
Implications for organisations
Many organisations put a lot of PDFs online, and certain content may only be available through that document. This is an accessibility issue, quite a big (legal/moral/technical) one.
What to do?
There are several factors to balance, the main ones are:
- The difficulty that people can have with PDF documents, especially those with access issues.
- How appropriate it is to put up long documents as web pages.
The bottom line in terms of accessibility is that some people cannot access PDF documents easily. However, that does not mean adding a very long HTML page (or set of pages) is most appropriate.
Where a document is very long and not suitable for online reading, a PDF would seem to be the most obvious format to use.
Working on the assumption that a typical scenario is for a PDF to be added to a news story, council meeting, AGM notice or similar, the steps to ensure accessibility & interoperability are:
- Include a summary on the HTML page (which itself is good for online reading).
- Make the PDF itself accessible (i.e. use one of the approved source applications such as Word and create a ‘tagged’ PDF document).
- Include the ability for people to request another format such as Word or Rich Text Format (RTF).
If the above advice is followed, an organisation is making a reasonable effort to ensure that everyone can access their content. PDFs are not inherently inaccessible, however, they have only become accessible in recent versions, and are not currently easily accessible.
In time (as people catch up with more up to date software), step C may not be necessary.
Taking a step back, this should obviously affect policy. It would be unrealistic to expect organisations to convert thousands of historical documents to accessible PDFs. However, as in the above recommendations, giving people the ability to request a different format would make it available in practice without creating undue work.
It is still difficult for organisations to change processes to start using accessible PDFs from this point on. It requires significant authoring changes to how people produce content such as enforcing the use of Word styles, and adding alternative texts for image. Then the knowledge to create accessible PDFs can be used.