Evaluating Generative AI Services
Like most people, I've got a picture based social media account (in this case, Instagram). Over the past 18 months, I've been pasted with ads advertising some service or other which is better with Generative AI; learning languages, getting style advice, video and picture editing, building websites, to name but a few. While some folks in tech pontificate about the direction of AI technology (such as AGI being achievable in the next 18 months), it feels like our current iteration - the next step on from data science - it still early in its legit development.
So, how can you evaluate if one of these ads (and don't get me started on how many ads are aimed at businesses) is worth investing your time/money in? In this post, I hope to share my thoughts for separating the hype from something that could be worth checking out. As always with a data problem, it goes back to looking at the sources.
Where could the source data come from?
Have I seen this online already?
This question is both the most underrated and most important, particularly for businesses. All Gen AI systems need to start with something - they are not capable of spontaneous planning and thought, only suggestions. Therefore, if a product seems to be offering to do or advise on something which isn't particularly readily available with a cursory wander around the internet, it's probably too good to be true, or getting its source data in a way you might not expect (i.e. your and other customers' data all put in a great big computerin' blender). Successful chatbots, for example, have been around for a long, long time - it's not difficult to combine a general LLM with an IVR/decision system (something else which is pretty old) to build an effective "AI Agent" to answer consumer enquiries, unless your service might actually be pretty niche or complex...
Does the data reflect the nuance?
In real life, is this a simple flow chart or the tree of life?
Local governments across the world are perpetually short of funding. I've seen services offering to somehow sprinkle some Gen AI magic on services to cheapen interactions with rate payers, for example, which as a job often requires a level of local knowledge and isn't great to be sourced too far from the locale in question. In theory, this sounds like a great idea! It's just a chatbot, as per above, and surely, in somewhere like the UK, 90% of enquiries are about potholes, right?
Wrongo. I might assume that many enquiries are about potholes, because it is something British people love to complain about (legitimately, most of the time). But most of us don't interact much with our municipal bodies at all, and when we do, there tends to be something of a long story involved - social care for a relative, live information about school closures in inclement weather, or discussions about planning or the environment in the area. There is unlikely to be a vast, clean-of-PII-but-full-of-local-details data set available for every authority body, and even ones which surely could be grouped together across different bodies (bin day enquiries) collide with reality pretty hard (The UK has 39 different collection regimens, as per this excellent post). Therefore, we need to consider one last thing about the sources...
What is the breadth of sources?
How diverse could the training data be?
The breadth of potential sources for a given service can be thought of in a few ways; the format, and the fences. Text is the most prevalent format of content on the Internet by some margin, and proliferates regardless of platform ownership, language, trends, and especially, copyright. Therefore, any service utilising this bread-and-butter of the Internet gets a head start on source diversity, and why models such as Chat GPT, Claude, Perplexity, Gemini and so on are very good at giving advice and tutorials about coding, where content is voluminous and in many different structures (guided tutorials, product documentation, Stack Overflow Q&As) and careers, because hustle culture feels like it is at least 20% of the Anglophone internet (and the best network on Stack Overflow).
Content in other formats such as video and sound (and to a lesser extent images), are both less prevalent and often in walled gardens. It is not that straight forward to download and code every video uploaded to YouTube or Vimeo, or every short form on Tik Tok or Instagram, unless you are the platform owner (and even then, probably still quite difficult in places). If something is promising you easy access to generate something otherwise from a walled garden, or with notably robust IP enforcement, it's probably too good to be true.
So, when evaluating a potential new GenAI tool, ask yourself these questions about the potential sources of the data;
- Where could it come from?
- How detailed or nuanced could that data be, to reflect what is happening in real life?
- How diverse could the sources be for this subject?