PDF vs HTML for organisations

The Australian Government recently released a study into the Accessibility of the Portable Document Format (PDF) for people with a disability, which Duff Johnson analysed very effectively.

I can agree with almost all of Duff’s points, and it’s covered so well I didn’t feel I needed to check the source material (although I will). But as is the nature of blogging, there is something I’d like to disagree with:
The comparison with HTML.

Duff knows a great deal more than I do about PDF standards and technologies, however, I’m pretty strong on the web-standards side of things, so it’s a useful discussion around the intersection of these areas.

I’d not heard it put in this way, but it is an excellent point:

PDF creation is democratic, HTML is centralized

…most people don’t write HTML, so most don’t author documents with any attention to semantics. Since PDFs may be created and posted without the benefit of a content management professional, it’s harder to impose authoring standards outside of specific organizations.

I agree with the point made, but there is a good reason why HTML websites are more likely to be accessible than PDFs in this context: the interface.

Interface lockdown

When you setup the editor in a Content Management System, you can lock it down to only allow semantic elements. You can even make the inclusion of style-oriented elements look wrong by adding certain CSS.

The key thing is that the available options are accessible, and the inaccessible ones have been removed (e.g. font/background colours). There are other things to do, but that problem is solved once, and then works for that website ongoing. If the CMS is sufficiently usable, you can even extend the number of contributors without worrying about the accessibility.

Theoretically, you can also lock down Word templates so that you can only use Word’s styles (I’m not sure about InDesign?). However, it’s a pain to implement, and I’ve not come across an organisation prepared to do so.

How organisations typically publish documents

In a typical medium/large organisation that publishes web content, it is generally non-technical people updating web pages and uploading documents. When I’ve run courses teaching people how to make accessible PDFs, it is generally people on a web team that attend (government, private and charity sector organisations). Not coders, content authors and managers.

The web team are usually happy with the web pages, but they are sent inaccessible PDFs to publish, often without access to the source documents. The central team simply doesn’t have the resource to repeat the same accessibility fixes on every document they are sent.

I’m not blaming PDF for this situation, I’m even reluctant to blame Microsoft (where the interface matters most), it’s the people buying software that aren’t aware of the issue.

Pressure from organisational procurement (on Microsoft and Adobe) to provide a good option to use a locked down interface (and enforce it’s use) would prevent most of the accessibility problems we see. Oh, not to mention some more competition in the PDF-tagging software space.

Therefore, I agree with Duff that the report is likely to lead to an ineffective policy, even though I have a different perspective on why.

7 contributions to “PDF vs HTML for organisations

  1. So… how do we disagree? I’m not at all sure that we do. 🙂

    Web content (HTML) management is centralized whereas PDF is not, we agree on that. I also agree that this centralization makes it a lot harder for folks who use server-based web-content systems (CMS, wikis, etc) to create inaccessible content. That’s all great – so far as it goes.

    I concur that authoring software (for that’s where it needs to happen) should be capable – at least – of walking the author through the accessibility issues pertaining to their document, and advising them on how to go about ensuring that it’s accessible.

    Happily, the same (mostly) education that applies to basic HTML accessibility also applies to Word and to PDF.

    Perhaps my one quibble is that I feel that at the end of the day, you _have_ to educate – CMS template management is only a partial solution. Why use a table instead of the TAB key? Why use a list instead of paragraphs, or ensure that headings are logically organized? When (and how) should you implement footnotes? Software can’t do it all – users have to be clued in.

    I 100% agree with you re pressure on those companies (and others) to come up with better tools. While authoring tools are the critical issue, we need better tag manipulation tools as well.

    Thanks for your comments.

    Duff.

  2. There is some education needed (e.g. appropriate alt text), but a lot of things are preventable. If you are used to something like Dreamweaver then that is fairly equivalent to Word, however, a locked down-editor in a CMS is quite different.

    For example, pressing tab doesn’t create columns in many web editors, and the bullet button results in a nicer looking list than putting a dash at the beginning of a line. (Often making something look right is enough to encourage correct markup.)

    Two differences for a web page editor setup well in a CMS are:

    1. You do not have to change people’s learned behaviour, they consider it new/different. With tools that they’ve been using for years it’s an uphill battle.
    2. The authoring tool defaulted to the accessible way of doing things. Therefore less education is needed.

    A little example to highlight: When you add an image in our (Defacto) CMS, you upload the image and immediately have to add alt-text (with help available in context), it’s a required field.

    When you add an image in Word, you have to know to select ‘format picture’ and go to the Web/Alt Text tab.

    That difference in workflow means that people have to think about alt-text for web pages, but not for Word.

  3. Alastair,

    You’ve described the situation at the college where I work to a tee. Our web team consistently receives inaccessible PDFs with no source file every week. While the college has required training for staff for security, managing student info, training for the applications used throughout the college is not required. It’s frustrating since our core mission is education for our students; I wish that same focus applied to employees.

  4. Alastair, I’m fully with Duff on this point. Like him, I think you and I actually agree, too. In other words, it takes education.

    Although one can lock things down in a CMS in theory, try doing it in practice. There will be the group that uses a table instead of CSS to create a layout within the page content. And you have to allow headings in content. What if some of your content contributors don’t like the way that an <h2> looks in your CSS, so they skip to an <h3>? And what about the people who make an entire paragraph an <h3> because they want to give it extra emphasis?

    Those might be hypothetical problems in your environment, but they aren’t in mine.

    Still, the challenge is even greater with the PDF:

    Even if the source document has the right markup, the PDF must be created with the right tools and using the right settings. And even then, it must be reviewed and, sometimes, retouched.
    To put the correct markup on the source document, authors must know which of the commands and buttons offered by their word-processing software they should use — and which commands and buttons they should ignore.
    I have not yet found a commercially available word processor that has an interface — even an optional interface — that displays all the tools one can use to create an accessible document and hides all the tools that format text without adding markup.

    I’ve created a toolbar for Word 2003 that does a pretty good job of doing just that. I also have a tab for the Word 2007/2010 ribbon that similarly makes it easier to create accessible documents — and harder to create inaccessible documents.

    Even when we teach people the right thing to do, we need to be able to give them tools that make doing the right thing not only easy but natural. If the tool I have looks and feels like a hammer, it doesn’t matter how much I know about the right way to drive screws. Either I can do the wrong thing, or someone can help me figure out that I have to unscrew the base of the handle to find a screwdriver hidden inside.

    Open a word-processing application. Look at it as if you were seeing it for the first time. Even knowing what you know, what seems to be the right way to format your document? Doesn’t it look like someone has given you a hammer and asked you to drive a screw?

    I’m convinced that most of the problems we have with inaccessible PDFs can be fixed only by fixing the interface in the typical word-processing application.

    And it wouldn’t hurt if Adobe would do some serious usability testing on the process of checking PDFs for accessibility, identifying the source of each problem found, and fixing that problem. And, while they’re at it, do the same for the process of exporting an InDesign document as a PDF. Maybe it’s possible to get an accessible PDF that way, but the news I get from the people I know who use InDesign is that they can’t figure it out. And the chatter I see in online forums indicates that they’re not alone in their inability.

    But even with all those problems, I consider it shortsighted to blacklist the format. If I create a perfectly accessible PDF, why should I have to put the same file online in another accessible format?

  5. Hi Cliff,

    I know what you mean. Here are a couple of the things we do to get over those issues on the CMS side:

    • Make sure that you include some table-like layout templates, e.g. image to the left and text on the right.
    • Make sure that the headings ‘look’ right, in sizing and prominence.
    • We’re also working on a error-checking / highlighting to point out mis-nested headings, very long headings, or complete sentences of bold.

    I’m in complete agreement that the challenge is even greater with the PDF, and that the problems need fixing before the PDF stage.

    You can lock down Word templates to only use pre-defined styles, but it’s really hard to get people to accept that compared to an editor in a CMS.

    I wouldn’t blacklist PDF as a format either, but I would suggest that it’s easier for a web team to publish all web pages accessibly than all PDFs accessibly. That’s partly workflow, and partly people’s expectations of what they can do.

  6. Alastair,

    It’s very hard to compare HTML and PDF in this way. PDF is used precisely because one is NOT dependent on a web-content professional and CMS (locked-down or otherwise) to create it. Nor is one subject to the vagaries of connection, or browser.

    Quite apart from the fact that it can be authored offline, a PDF can include arbitrary pages, scans, diagrams, etc. HTML is only capable of addressing a tiny portion of the range of everyday document types that fall within the scope of PDF.

    Sure, if it’s a question of what format is best for squirting some paras and headings on a web-site, then HTML is usually entirely adequate. Those are the easy cases, no problem. But what of the many many harder cases?

  7. For online publishing I don’t think connection is an issue.

    I agree that PDFs can include more arbitrary elements than HTML, but then, support for those elements from assistive technology is either equivalent to HTML or behind HTML.

    You can also turn around the squirting some paras into a document and then a PDF is probably fine, but what about the harder cases? Most published PDFs I come across don’t even include headings.

    The reason I pick on the ‘for organisations’ in the title of the post is that you have two general situations for the formats in medium and large organisations:

    1. HTML, where you (can) have an expertly setup CMS that makes accessible HTML the default, easiest option to produce (including simple tables, images and multimedia). A wide variety of people can author content within that system, and a central team monitors the output fairly easily.
    2. PDF, where you have a wide variety of people creating PDFs with a small variety of authoring tools. The central team receive generally inaccessible PDFs and are rarely able to remedy them all.

    I see that situation again and again, and only once that I know of has an organisation put in place the equivalent for producing PDFs. (Their central team did accessible PDF training, and other authors from around the business did accessible Word training. They then created locked-down Word templates for use in creating online documents.)
    And looking at their site now, their CMS based content is still much better from an accessibility point of view.

    I’m not saying that PDF isn’t as accessible or shouldn’t be used, for example, we use them for in-depth reports.

    The difference for HTML (in the CMS case) is that the expertise is needed once, up-front, and then the authors with minimal training can use that tool to produce accessible content.

    Until there is an authoring environment that is as structured for creating PDFs, the average accessibility of CMS-produced HTML is going to be better.

    For policy, I would suggest accounting for this and making sure PDFs are used when appropriate and that there is sufficient resource and education for those instances. However, I don’t think a reduction in the sheer quantity of PDFs would be a bad thing overall.

Comments are closed.