Learning, despite AI

About two years ago, I posted on this blog, expressing a concern that has since only grown.

The tech industry has become reliant on AI for productivity, with Satya Nadella, Microsoft’s CEO, claiming that up to 30% of Microsoft’s code is written by AI as of last year, the banking sector is expecting to cut significant parts of its workforce, replacing them with AI and large companies like Siemens and Volkswagen are making big-bet investments in AI to boost their productivity. This and other factors have led to a decline in entry-level positions and increasing pressure on young developers to perform and keep up with “the machine”: software developers are expected to use AI to boost their productivity and write more code, rather than stop and think about what they’re doing.

According to faculty staff, students are over reliant on AI to do some of the work, hindering their learning and under-performing as a result: “Exploring the underlying mechanism, additional analyses show that the effect is particularly detrimental to students with high learning potential, suggesting an effect whereby GenAI tool usage hinders learning.” (Wecks et al.: “Generative AI Usage and Exam Performance”).

One article of note describes the issue particularly well:

Programming students have a widespread access to powerful Generative AI tools like ChatGPT. While this can help understand the learning material and assist with exercises, educators are voicing more and more concerns about an overreliance on generated outputs and lack of critical thinking skills. It is thus important to understand how students actually use generative AI and what impact this could have on their learning behavior. To this end, we conducted a study including an exploratory experiment with 37 programming students, giving them monitored access to ChatGPT while solving a code authoring exercise. The task was not directly solvable by ChatGPT and required code comprehension and reasoning. While only 23 of the students actually opted to use the chatbot, the majority of those eventually prompted it to simply generate a full solution. We observed two prevalent usage strategies: to seek knowledge about general concepts and to directly generate solutions. Instead of using the bot to comprehend the code and their own mistakes, students often got trapped in a vicious cycle of submitting wrong generated code and then asking the bot for a fix. Those who self-reported using generative AI regularly were more likely to prompt the bot to generate a solution. Our findings indicate that concerns about potential decrease in programmers’ agency and productivity with Generative AI are justified. We discuss how researchers and educators can respond to the potential risk of students uncritically over-relying on Generative AI. We also discuss potential modifications to our study design for large-scale replications. (Rahe and Maalej, “How Do Programming Students Use Generative AI?”. arXiv:2501.10091)

At a societal level, the concern becomes even greater: The Impact of Artificial Intelligence on Human Thought by Rénald Gesnot highlights a variety of societal concerns, including the “standardization of thought”, and “cognitive offloading” and the sedanterization that comes with it. (See also Derga et al.: “From tools to threats: a reflection on the impact of artificial-intelligence chatbots on cognitive health”). This puts our ability, as humans and as a society, to be creative and to innovate at risk.

We’ve already seen what a social media echo chamber can do to political discourse and how “engagement” has become an economic good. With the advent of bots integrated into social media, a boombox of like-minded noise is added to that echo chamber. This can amplify our own errors in judgement as well as influence our decisions and cognitive biases in social settings, as it has already been shown to do in experimental settings. (See Vincente and Matute, “Humans inherit artificial intelligence biases”.) The same is true for software development: chat bots like the ones integrated through GitHub Copilot will tend to follow familiar patterns based on the data they’re trained on, and often have a hard time staying within the guardrails a human developer can be trained to stay within. Instructions like “no crypto math outside the crypto library” or “use OpenSSL’s new APIs only, no legacy API use allowed” may have to be repeated or made part of automated checks on the bot’s output where a human would understand the message and refrain from going outside those guardrails once properly trained.

The answer to this growing problem is not to stop using AI: even moratoria on AI development in favor of AI safety research don’t call for stopping their use but rather for a redirection of research effort from more and more intelligent and powerful models to safety guardrails and alignment ensurance. Rather, the answer is to put the human being back at the centre of development. Some business leaders talk a great deal about innovation, and tend to put a lot of faith in AI to drive that innovation, but while AI itself is certainly an innovation in the sense that it is both novel and useful, using AI by plugging it into every part of every business process does not make a business innovative.

To continue innovating, especially in large businesses, companies need fresh ideas, new perspectives, and change agents to join their ranks. The generation of young developers joining the workforce today are supposed to be that zephyr of fresh air that, just strong enough to blow the dust away yet gentle enough to leave the old dogs soundly asleep, keeps the innovation oxygenated and alive. For that to work, though, they need to be allowed to learn, to train their brain rather than the model, and (most importantly) to make mistakes.

So the question becomes: how do we protect what juniors uniquely bring while allowing them to work with AI responsibly and teaching them what good looks like? I think the answer is to give them a staged learning path that allows them to be productive early, onboarding with the tools and with the business with a 30-60-90 kind of plan.

The first stage would be to work with the V&V team: test the product, learn what the expected behaviour is, where it is counter-intuitive, where the gaps in documentation are, and what bugs are hiding. This teaches the junior developer about the product while giving the business a fresh set of eyes looking at the product, helping find real bugs and, perhaps, automating some manual tests saving future work. AI bots are typically good at test automation, especially when standard frameworks are being used, and potential mistakes, while still having an impact on productivity, at least don’t make it to the customer. Of course, test artefacts still need to be produced and reviewed, as does the automation code, but the environment is safer.

The second stage, once Junior knows what the product is supposed to do and knows how it is tested, is to pair them with a senior developer. At this point, they’ll start writing product code and provide second code reviews for the senior developers. Their own code should now be reviewed by Copilot, but not produced by it – at least not beyond inline suggestions. This is where Junior learns what good code looks like: the patterns to follow and the anti-patterns to avoid. This is also where your seniors may need some coaching and guidance of their own: innovation starts with doing something differently, looking at whatever you’re working on from a new perspective, and killing the age-old “we’ve always done it this way” mentality. The introduction of AI does part of that, and part of what it does is positive change, but a concern raised by Gesnot in his paper is that it brings the same change everywhere it’s used. The cultural diversity and distinctiveness brought to the table by young engineers in a global team is the value they bring here, though it may try the patience of the old guard.

During this second phase, Junior may be tempted to unlearn some of what they’ve learned in the first phase: those test automations may seem less “shiny” than the product code. Hopefully, though, you have a solid “shift left” quality plan in place already and your developers are expected, if not required, to write their own integration tests. This is where behaviour-driven development and test-driven development practices become integral. If you’re not doing this already, make it part of the program and require your junior developers to adopt shift-left, BDD, and TDD.

Finally, when they’ve had a chance to learn what good looks like, have accumulated experience with the product, and have had their code reviews by the seniors become consistently positive, turn on agentic mode. Continue to keep guardrails, like unit test coverage, automated integration tests, static analysis, software composition analysis, etc. in place but allow especially the high-performing junior to leverage agentic AI assistants and boost their creativity by automating the churn.

This approach has two advantages: first, it brings immediate value by making Junior find bugs, contribute to your shift-left quality program, and eliminate some of your technical debt in test automation, while also allowing them to learn your product and become more effective and second, it builds a solid foundation for future innovation by ensuring the competence of tomorrow’s greybeards.

That’s the longer view: today’s high-performing seniors were once juniors developers. They’ve had the opportunity to learn, make mistakes, and be the subject of the occasional muttered expletive of the seniors of their time. Today’s junior needs a similar investment of time and patience to become tomorrow’s SME. The question isn’t whether you can afford to be patient with juniors now. It’s whether you can afford not to be.