AI for good accessibility

This is a post for the presentation I gave at Virta11y 2025. I'm not a fan of just posting slides, there isn't enough commentary and it's harder to make accessible. So this is slides and commentary, with a supplimentary video at the top. The text in the pictures of the slides is not very readable (although you can right-click and open in a new tab to expand it), but I cover the content in plain text next to / after the image.


Or there is a Youtube version if you'd prefer.

Introduction

Intro slide with logos of Gain and W3C, with a picture of Alastair talking, pale jumper with dark horizontal strips, and a dark beard with pale vertical stripes.

Quick introduction of me, Alastair Campbell. I spent several decades at Nomensa before we joined Gain. Most of that time has been working in accessibility, with some user-research, information architecture, and front-end code.

My other hat is co-chair of the Accessibility Guidelines Working Group, working on the Web Content Accessibility Guidelines.

A large umbrella icon, with 6 AI terms underneath. Expert systems, Machine learning, Deep learning (LLMs etc), Generative AI, Reinforcement Learning, and Probabilistic AI / Bayesian Methods.

The first thing to acknowledge is that AI is a big umbrella term.

There are lots of tools that come under this umbrella, from the more traditional expert systems to the latest Large Language Model (LLM).

I'm not an AI expert (is anyone an expert across all of this?), so I'm coming at it from an accessibility point of view.

What are the best tools for the job?

Oh, and this is the only slide where I used AI to fill in some of the text content, I thought it should know quite a bit about itself.

Three terms are left, Expert systems, Machine / Deep learning, Generative AI.

These are the main areas I've noticed affecting accessibility.

Expert systems are the more traditional version, pre-programmed, useful to give things like a pre-determined answer to a particular question.

The recent advancements in LLMs are what has taken all the oxygen in the discussion, and have massively improved things like:

  • Speech to text (and text to speech)
  • Image recognition and production
  • Other types of content creation, including coding

I'm taking a purely accessibility lens, I'm not touching on sustainability or copyright here. Those are very important issues, but not my area of expertise.

Mediocre AI

Title slide with nauseating purple and blue background. Title of Mediocre AI, sub title of Current Constraints.

I'm going to start with the negative.

I almost called this the "Evil AI" section, to balance the good section later.

However, it's not malicious, so I don't consider the tools themselves evil. It's just not very good (for accessibility), it's not the right tool for the job.

Hand-drawn style picture showing garbage flowing into and out of a computer.

The first thing I learned in computer science (GCSE) in the 90s, was GIGO: Garbage in, garbage out.

Large Language Models (LLMs) are trained on mass data, e.g. the web.

The average website has many accessibility issues. Therefore, the average code produced will have accessibility issues.

The average text content ingested doesn't consider disabilities. Therefore, the average text created could be exclusionary or use ableist language.

Screenshot from a linkedin post, follow the link for the details.
Linkedin post

You can add custom instructions for an LLM, e.g. "Always produce WCAG 2.2 compliant output", however, it doesn't "understand" concepts like WCAG compliance. It works by association, at best it's triggering code similar to that near such a statement, which is often not correct.

It would really need a specific (accessible) set of training data. Even then I worry that it would average out different patterns, for example, using attributes from a tab-pattern in an accordion, or mixing up HTML tables and ARIA grid roles. (I've seen developers do this during the learning process.)

There are a relatively small and known set of accessible patterns because there are only so many types of interaction supported by operating systems. Therefore, an expert-system or decision tree might be a better tool for accessible code generation, something more like a template that fills in content from your context.

Graph showing a hypothetical normal distribution, peaking in the middle and going to zero on each side. There is a little arrow on the left saying 'low vision', and one on the right saying 'Sensitivity to animation'.

As humans, and therefore any AIs trained on content from humans, we tend to assume averages.

In showing a normal distribution on screen I mean this in the mathematical sense. If you take any ability like sight, perception, mobility, there is a distribution of abilities, often in a bell-shaped curve.

I mean, nobody is really "normal", it's pretty rare to find a completely average person! We're all in various places across various scales.

My point here is that, by definition, accessibility is about people whose abilities are not average on one or two scales. Enough that it affects their daily life.

Which leads to...

Painted style picture of a robot using a desktop computer.

People have been using LLMs to evaluate interfaces, thinking of it as "usability testing".

According to some research I found on this, the results left a lot to be desired. It suffered from the "averaging" effect, the feedback was very generic. For example, "Users might not know what certain buttons or icons do without proper labelling or instructions". I'd expect more from a fresh graduate starting out in their UX career.

If it's not useful for general public users, it's definitely not useful for representing people with disabilities.

This process of testing sort of reminds me of the "model collapse" concept. If you train a model on its own output, the range of responses narrows dramatically until it essentially collapses. Not the same thing, but it's a type of inception I'd want to avoid.

Two screenshots from ChatGPT showing how it described each image. One is Donald Trump walking down the steps from Airforce one. The other is of Jordan Winn standing in the foreground with a line of dancers in identical white tops and red skirts in the background.

It would be remiss not to talk about hallucinations.

I put two images from BBC news stories into ChatGPT and asked it to describe them. One was a well known subject (ahem, Donald Trump). The other was also of a person, but from a local news story.

The image I uploaded is shown, and ChatGPT added one or two similar images to the output. I assume these are images it is referencing to generate the text output.

The Trump picture is very well described. I wouldn't say it was better than the concise alt-text provided by the website, but it was certainly accurate enough.

The odd thing though is the association for the local news story. Somehow a picture of a man in the foreground and dancers in the background is described as "a large military vehicle or tracked machine, possibly an armoured vehicle, moving or positioned in a snowy or icy landscape at night (or during very low-light conditions)."

My guess is that the colours (e.g. red dresses) are slightly similar to the sunset in the reference picture. However, there is no indication that it isn't confident the result, it is stated as fact. If it's usually right, you might not question it.

Ecosystem

Title slide:  Ecosystem, where should the action happen?

Something I don't see talked about enough is how accessibility is an ecosystem.

We have complex interactions between people, companies, products, services and the barriers that can be created.

Let's take a quick canter through the history of digital accessibility, with an eye to where in this ecosystem accessibility happens.

Three pictures, the first is two women working on very large computing machinery titled 1960s. The second shows a small computer terminal from the 1970s. The third is an early Mac desktop computer from the 1980s.

My aunt-in-law (is that a thing?) started programming with punch-card computers in the 50s or 60s. This was a very specialist kind of work, and there were no accessibility features (that I know of).

In the 70s computing moved to not-powerful terminals connected to a relatively powerful central server, where you would wait for computing time. Again, pretty much zero accessibility features included.

In the 80s things moved to desktop computing. These were generally not networked (at least at home), but they were powerful enough to run accessibility applications like screenreaders and voice input. They weren't great, but they were possible.

Three pictures, the first shows a Windows XP laptop and Google maps marked 2000s. The second shows Facebook on mobile, marked 2010s. The third shows a screenshot of ChatGPT marked 2020s.

In the 2000s computing became more efficient, laptops became possible, but the bigger thing was that the web took off.

This was very good for accessibility, as the web was built to be accessible by default. It is easy to add barriers, but it much easier to make something accessible compared to a platform where accessibility is always an extra. It's a standardised platform that worked across operating systems, so it was to everyone's advantage to be compatible with it.

In the 2010s mobile apps took off, but the combination of new platforms (where accessibility lagged) and touchscreens (a blank slate of an interface) was a major setback for a while.

Thankfully the mobile platforms like iOS and Android have come on a long way quite quickly, and in usability testing we often find people with disabilities preferring the mobile interface.

The interesting thing about the 2020s is that there isn't a new user-platform or a new interface, but there is a different way of interacting with companies and services.

Diagram that goes from left to right. On the left are developers, designers, content people feeding into content management systems (e.g. Drupal), and front-end frameworks. It flows through the internet and gets to users on the right hand side.

The interface is the thing that is accessible, or not.

Let's look at this as a means of communication.

People on one side of the internet are creating an interface and content for other people to use. Often people you don't know, so you don't know their abilities.

There are so many ways that digital interfaces are created, the diagram is one example, but the key is that lots of people and systems contribute to the interface.

Similar diagram flowing left to right, but expanding the user side. From the internet, there are layers for browsers, operating systems, and assistive technology before it gets to the user.

What most people on the creation side don't realise is how that communication is received and interpreted.

On the web, the browser receives the HTML, scripting, CSS and assets. They interpret that into a document object model, and an accessibility tree.

That accessibility information is passed onto the operating system, which provides it to assistive technology.

Each of these layers can translate or manipulate the interface for the user.

It is a little different for native apps (there's no browser), but there are operating system and assistive technologies available to customise app experiences.

A simplified diagram, from the content creator it says 'Standardised interface, content & code', and towards the user it says 'Standardised User-agent.

So the internet is the medium in the middle where the interface transits from the creator to the user.

Using standardised interfaces means that standardised user-agents know how to deal with them.

I'd like to highlight a few examples of how this can work.

The first example is visual manipulation. Someone with low vision might be using a large screen, but need the content to be bigger, much bigger.

Browser zoom (in this case) and/or magnification software can do that.

Some people with cognitive impairments, or some visual issues, might need a dark background and high contrast.

Some people with cognitive impairments might need a different font, or simply more spacing between words and letters.

These kinds of manipulations are commonplace when using the web. But they do require websites to be put together pretty well.

It isn't all great. There are plenty of sites, like Trip.com, which undermine the user-agent.

In this case they haven't considered zoom when fixing the header.

They also use fonts to create icons, rather than SVG, which breaks when people replace the font.

Three screenshots, the first is a listing in a podcast app. The second shows the

Not to leave out native, but Android takes a web / XML approach to layout and apps can have good text sizing / zoom support.

iOS (pictured) has quite a few visual options as well and allows for adjustable layouts. The layout aspect isn't always known to (or used by) developers. In my experience it's slightly less likely that the layout will adapt on iOS.

Screenshot of the App store in iOS, with Voice Control active and labels showing for each button.

Voice-input is massively useful for some people with mobility impairments, and if the buttons are all labelled well, it can be easy to use.

However, creating custom controls without the accessibility attributes can break this interface.

I'm showing the app store, with Voice Control showing all the button names, that makes is very quick to use.

Sometimes I turn this on when my phone is on my desk, so I'm working on my computer and talking to my mobile. I should probably stop multitasking that much though.

Another manipulation of the interface is using a screenreader. It takes in information about the content and code and creates a linear audio interface.

Imagine I want to buy an iPad air. The code and meta-data of the page makes this relatively easy.

I can navigate by heading, rather than trawling through all the content.

Real screenreader users might figuratively roll their eyes at this simple and slow example, but compared to trawling through all the content, it is a much more effective way of navigating.

It works when the site represents the visuals in the code.

Two screenshots, one of a news article on Sky.com, with adverts down both sides, in the middle, and video content rolling. The second screenshot is of 'reader mode' for the same page, a plain article.

For many people with cognitive impairments, or even people who wouldn't consider themselves disabled, sites can be overwhelming. Visual clutter, animations, noise around the focus of reading can disrupt peoples' attention.

Most browsers allow you to go into a reading mode, which can make the page actually readable for some people.

A repeat of the creator to user diagram, highlighting that all the solutions shown so far have been on the user-agent side.

All of these features are user-agent side.

The site can break these features, or test for compatibility and make sure they work.

A basic wireframe with a large grey box, a yellow button, and a yellow box overlaid the grey box taking up about one third of the space.

I'll tackle a contentious example. Well, it's not really contentious in the accessibility community, but legally it can be.

Overlays offer two things:

  • Features for people to customise the website.
  • Remediation of accessibility issues found on the site.

The most contentious thing is the marketing; they tend to offer more than they can really do. See the Overlay factsheet for more, signed by over 1000 accessibility folks.

Note: In the presentation I used a screenshot, but I don't have the time or persistence of Adrian Roselli.

The creator to user diagram, with the overlay inserted into the front-end of the website, on the server side of the equation.

The main issue I have with overlays (which I wrote about back in 2006) is where in the ecosystem they live, because:

  • If a feature is needed, it's needed for all websites/apps, not just the ones paying this provider. Assuming there is a market for these products (i.e. competition), there would never be a consistent and persistent interface for users across sites.
  • The remediation features are generic across sites; therefore, cannot do anything a user-agent couldn't do. If they offer customisation, that customisation would be better done to the site rather than through an abstraction layer.
  • Adding a script on top of modern JavaScript frameworks can be unpredictable.
  • An overlay lacks the user's context, it doesn't (and shouldn't) know their abilities or setup.

AI for good

Title slide: AI for good.

So where is AI useful now, and where does it fit?

Screenshot from Vinted, with 5 photos of the same smart black shoes.

Alternative text for images in e-commerce sites is tricky, especially for a site where the users upload the pictures of products. You can incorporate instructions in the interface, but you can't make people add useful alt-text.

Adding alt-text to these images, in the age of AI, could be done on the user-agent end, or generated by the site.

The best context is actually on the site's end, not the users. It would be possible to train a Machine Learning (ML) model on the site's back-catalogue with human-added alternative text.

Picture of a group of friends laughing on the beach, and the automated description from Jaws.

This description provided by JAWs (via ChatGPT or Claude), is very good. The short description is:

The image shows a group of six friends sitting on the beach. They are all smiling and laughing, and some of them are singing. One of them is playing the guitar. The sun is setting in the background, and the waves are crashing on the shore.

However, imagine this was a fashion website, it's not focused on the clothing, there's a lot of noise.

Again, the primary context is with the site's content rather than the user, so there is more potential for good alt-text at the site's end.

A picture taken at CSUN of Jenny Lay-Flurrie doing her presentation, in the background is a slide showing a complex diagram of Microsoft's quarterly income statement.

The translation of this diagram by ChatGPT was very good, I didn't spot any issues or mistakes, even though the presenter's head was blocking part of the diagram!

An advantage this description has over standard alt text is that it can include structure. Simple alt-text is plain-text with no structure. The description for this diagram really needs lists and headings.

The only thing I'd point out is that the description only needs to be generated once. If that were done at the site, that's one subscription. Or every user (who needs it) needs a subscription.

A screenshot of a video, and a long description from the Piccybot app.

There's an argument discussion in the Accessibility Guidelines Working Group about videos which have no pauses where you could insert audio description.

In this example video, of which I showed the first 7 seconds in the presentation, it is really packed with visual information.

There are apps and platforms which can describe these, and I used Piccybot to generate an automatic description of the visuals, and it did a good job. But because there's so much, it's looonnnggg.

It could be much more concise, more useful, if the author could produce it. Or at least edit it down.

The creators to users diagram, with an arrow pointing to the content-side saying 'best place for alternatives to be created, and an arrow pointing to assistive technologies saying 'good fallback'.

When creating alternatives, assistive technology (mainly screenreaders) lacks the relevant context, they can't know what is intended by the author.

A Specifically trained LLM / machine-learning 'thing' at the creator's end would be better.

However, I have to acknowledge that a screenreader's ability to create a description on-demand is a very useful fall-back. It is much better than nothing.

Screenshot from an Instagram real, showing the comments, and selecting text from the comments.

There is a massive volume of images with text embedded into them now. Often people on social media will try to get around character limits or just want a particular presentation style.

Optical character recognition has been an accessibility feature since at least the 90s, but it hasn't been mainstream for long.

Now it's built into things like iPhoto and Android.

I use it for copying recipes from video comments, as they prevent copy-paste. The screenshot is literally a screenshot I took of a recipe in the comments of Instagram, which prevents copying. So I screenshot it and use iPhoto to copy the text out.

Screenshot from the iOS Personal Voice feature.

I used the feature in iOS to create a synthetic voice based on my own. I just did the quick version and didn't do a great job, but I think it's recognisable as me? The audio plays the following in my voice:

"Personal Voice" allows users to create a voice that sounds like them.

It integrates with "Live Speech" so users can speak with their voice.

Could be a privacy nightmare, but it's on-device.

Two screenshots of a homepage, one includes lots of side-bar gumph, the other only shows the main content and included some custom icons with particular words.

The screenshots are taken from a personalisation spec at W3C, demonstrating that with a little markup and browser plugin you could personalise a page to massively simplify it. It could also add particular icons to words, enabling some people to understand the content.

I couldn't find an AI version of this, but it is ripe for being done with machine-learning.

Transforming an interface, including the actual content on the page for reading-level, simplicity, removing distractions, should be relatively easy for an AI plugin.

The content-creator to user diagram, pointing to the user-agent side of the diagram.

These features a very personal to the user, potentially unique, so the really must be done by the user-agent.

Linkedin post from Robbie Crow showing him reading to his very young daughter using Meta Glasses.

Robbie Crow (strategic disability lead) from the BBC posted this, where he used Meta Ray-Ban glasses (device) and Be My Eyes (service) to enabled him to read to his young daughter.

Although it would be lovely to have this feature run locally, it seems infeasible for the near to middle future, at least on a small device like glasses.

Tesco and 'Be My Eyes' logos next to a picture of someone in a shop scanning a product with their phone.

From Be My Eyes have also partnered with Tesco to provide a shopping service. From their press release customers can get live assistance with identifying products, checking special offers and checking out.

I can see this being very useful for people with enough vision to get around a shop, but not enough to confidently buy products independently.

In this case a lot of the context is with the shop, so having specialist (live or AI) helpers who know the products and check-out would be more effective than a generic AI on your phone (user-agent).

Screenshot of the Metra homepage with big text saying 'The world's most innovative thinkers are being overlooked.'

Mentra have developed an employment network by and for neurodivergent people. It matches jobs and candidates, as you'd expect. However, it incorporates work environment and communication factors that particularly affect neurodiverse people.

Interestingly, it reverses the usual process, and it isn't candidates who apply, it suggests candidates to employers who then get in touch. Apparently, this can de-stress the situation for the candidates.

The content-creator to user diagram, with the server-side area highlighted and saying that the server / service cannot be on-device for the user.

For the last few services, they really have to be server or service-side. An algorithm for matching people and jobs needs to have all the data. A service connecting users with specialist agents needs centralised infrastructure.

There are obvious privacy implications for this type of service, so it really comes down to a value vs trust proposition. I.e. Is the value you (as a user) get worthwhile? Is the data required by the service innocuous enough? How much trust/distrust you have in the provider?

Screenshot of a flight checkout labelled by computer vision, showing what type of component it thought each was. Also there's a Deque logo added next to the screenshot.

Large accessibility companies (generally located in the US) have lots of their own accessibility-specific data to train AI models. The one I know is working in this area (and publish info on their approach) is Deque, for example in this article on Enhancing Accessibility with AI and ML.

In the pictured example optical character recognition and computer vision is used to identify interface objects, i.e. what type of thing is this. A significant amount of time in accessibility-audits is spent working this type of thing out.

They note it is not perfect; it guides evaluation.

Screenshot of Eastern Sierra Avalanche Center labelled by ML, a particular navigation item is highlighted, and a panel shows on the right the helps the user evaluate the results.

Deque call this Human-Centered AI, where it is guiding humans rather than replacing them. The interface presents each result and asks the human in the result is accurate.

As well as improving accuracy, I assume this could also help to improve the system, to train it.

However, for a developer in a hurry, would they pick up the difference between a menu and list item? The naming of some components is not intuitive (e.g. navigation on a website is generally not a "menu").

Still, as part of the service there is a chatbot available that has been trained on Deque's accessibility materials. As someone who spends a lot of time answering developers (and designers) questions on accessibility, I'd guess it could cover at least 80% of the common queries effectively. So long as the person asking knows how to form the question well enough to trigger a good answer.

The content-creator to user diagram, pointing to the content-creator side for testing to ensure compatibility. There there's a question on the user-agent side, in future could the user-agent do this automatically?

Currently there's a lot of value in AI supported testing, IF you have good data to train it on.

But thinking ahead, the types of tasks the testing-AI provides could be done by the user-agent. Imagine that a site has ignored accessibility, just implementing what works visually.

The user-agent (e.g. screenreader) could check a webpage and discover there are big discrepancies between the markup and the visual intent. It could add its own layer on top, over-riding the poor code provided by the site.

There's another big IF though, it would have to be nearer to 100% accurate for this to work. A bit like voice-assistants, if it gets it wrong a couple of times it could be super-frustrating.

Screenshot of the of a product search listing from Innosearch, including an AI chat window on one side.

Another widescale approach is Innosearch, which aggregates products from across the web.

It also feeds the data into an LLM (I assume) and provides a chat so you can filter products down and interrogate the product details.

From Léonie Watson's experience, it provided better details than the product page on the website! It seems the ability to analyse the images contributes to the improved information which can then be provided in text or as part of the filtering.

Future gazing

Info-bot diagram, showing multiple interfaces (e.g. TV remote, thermostat) being feed into the info-bot and then provided in different forms to a tech-savvy blind person, and a low-vision person who struggles with menus.

Peaking a bit further into the future, Gregg Vanderheiden & Vint Cerf hosted a workshop on the future of accessibility, which came up with an "info-bot" idea:

  • An open-source bot taught to understand interfaces at least as well as the 50th percentile person (i.e. average person).
  • Point the info-bot to any interface.
  • Info-bot then translates the interface for someone, based on their needs.

It sounds great, academically (see the research paper). However, could it be a bit like the Telephone game where information is lost in the transformation?

Comic with two panels. Panel one shows someone at a computer saying: A.I. turns this single bullet point into a long email I can pretend I wrote. The second panel shows someone else sitting at a computer saying: A.I. makes a single bullet point out of this long email I can pretend I read.

There's a company called UiPath that provides a service that's a bit like the QA tools cucumber / selenium. It uses a browser like a human to complete tasks on enterprise systems, automating back-office / repetitive tasks.

All because it's cheaper than replacing legacy system with something that works direct.

Some of this reminds me of the comic where person 1 sends a bullet point to person 2, mediated by AI.

All this just makes me think:

Why not provide the data to the agent instead?

Don't bother creating an interface at all, let the agent do it.

Sidenote: Nilay Patel at the Verge calls this the Doordash problem, there are many commercial issues.

Conclusions

Title slide: Not evenly distributed.

Reviewing the pockets of progress in various areas, the William Gibson quote has never been more apt:

"The future is already here - it's just not evenly distributed"

Considerations for applying AI in accessibility

Where is the context? Where is the best information? If it's about the user, it should probably be a user-agent feature. If it needs information aggregated across users, then perhaps it needs to be centralised.

Who's paying for it? Why make hundreds or thousands of people subscribe to a service (e.g. describing videos) if you could do it once in the production process.

Privacy concerns, disability is a protected characteristic so it's best to structure a service so that you don't need to know if the person has a disability.

Given those, put the processing in the right place.

Considerations for AI as user-agent

If your interface is going to be consumed by an AI on behalf of humans, that is sort-of a Search Engine Optimisation paradigm, you are providing information for bots.

Make sure your interface can be accurately understood, quickly, by the average human! Use clear (even "boring") interface elements that provide good affordances.

It isn't clear how much code-analysing these AI bots do now, or will do later, so code may still be important.

If you know of aggregators trawling your information, check how it comes out as a user of their service, and bit like checking SEO results for your domain.

Overall: Please think before you AI...