Generative AI - Abstractions, trade-offs, and learning by doing
A disagreement or question that comes up often in the generative AI space is whether people should use generative AI without having a deep understanding of how generative AI systems work. Another way to put this is the ask if we should rely on higher level abstractions in our work.
I’m personally focused on the professions of developers and architects, so that’s what I’ll talk about today. We all rely on abstractions, but making the right decisions about the level of abstractions we use is what differentiates a great developer or architect from a mediocre one. This isn't a new discussion, and comes up often with both new and older technologies. I often argue that developers and architects should be on the more conservative side here. Why is this?
I think it's important to take a couple of things into account when making these recommendations.
What is the potential downside of implementing without full understanding?
How much does implementation facilitate learning?
These two considerations weigh against each other when trying to decide if it makes sense to implement without deep understanding. If implementation will help us learn about and avoid the tradeoffs and downsides of a technology, then this is an argument for learning by doing. If potential downsides are very large, then this is an argument against. Remember, these attributes weigh against each other, so even if potential downside is small, if the learning upside is also small, it may still make sense to avoid an abstraction.
All of this must, of course, be weighed against other factors such as overall value (which includes downside). So this consideration is just one of many that needs to be taken into account.
So what of generative AI? Unfortunately here the potential downsides are fairly high and the ability to learn through abstraction is quite low. The prompt itself is an abstraction that seems to work against learning what is really going on in generative AI models such as LLMs, and which itself spawns troublesome secondary abstractions such as anthropomorphizing LLMs and certain types of magical thinking about what LLMs are really capable of.
I believe the reason for this lack of learning is that the primary failure modes of LLMs are usually invisible to non-experts. Hallucinations can only be spotted if the user already has good knowledge of the domain, is able to think critically about it, and is willing to make the effort to do their own research. Similarly, bias in all it's forms, including racism and sexism, is invisible at least as often as it is visible, and both filter models and fine-tuning have been deployed on LLMs to make bias even more invisible than it would otherwise be.
For these reasons, I strongly urge developers and architects intending to deploy these models in products and infrastructure to first gain a good understanding of how these models really work. This is not as difficult as it may seem. Bea Stollnitz's blogs on these topics are accessible, detailed, and link to the original papers on these topics, which themselves are usually written in a largely accessible manner for those with some technical background. Other resources and explainers have also become available if you prefer a different style or want to know about a different model (Stollnitz mostly focuses on GPT LLMs).
It’s important to remember that this recommendation is for developers and architects looking to deploy these models in products. For developers and architects trying to use these models as day-to-day assistants to augment their own capabilities, the calculus is quite different! Developers are domain experts in their own domain, so they are in a much better position to spot and address hallucinations. Additionally, racism and sexism biases in these models are unlikely to express themselves in a damaging way when we are trying to program algorithms or get feedback on coding practices and style from an LLM like Github CoPilot. Because the downsides are mitigated and smaller respectively, it is much more likely for a developer to be able to responsibly and productively use tools like Github CoPilot or ChatGPT as assistants in their professional work. The same is true for most domain experts.