Top Feature Gaps for Improving Code and Test Gen with GenAI
By Jon Chappell jon@sage-tech.ai
1 min
Use of Generative AI throughout the full SDLC is a vast subject. I don’t pretend to have thought through, let alone done, hands-on Gen AI for each aspect of the SDLC, however, based on the work I’ve done thus far with Gen AI code and test generation, here are the most important feature gaps for this work. In other words, what do I most wish I had that would improve outcomes. A full list would likely run to at least 20 items, but here are the top 4. Sage can help with each of these.
Gen AI Code Quality Assessment: Assess source code and determine how well the code artifact (module/file/class/method) meets criteria for code that can be modified successfully using an LLM (aka LLM friendliness), as well as traditional software design criteria regarding modularization, responsibility, extensibility and other factors that affect understandability and maintainability, e.g. code bloat, complexity, size, clarity, dependencies, modularity and more. In particular, we are interested in identifying code that will not be easily consumed and modified by LLMs.
Gap Analysis: Identify where code implementation is not consistent with requirements and design statements, including acceptance test plan, architectural patterns, feature scope management. One important part of this is doing this analysis incrementally, as we iterate through generating code and tests for the application. e.g. imagine a daily or more frequent re-baselining, identifying what code has changed since the last time, looking for places where the code has drifted away from expectations.
LLM Context Management: Tools that help select an effective subset of the application source code to work on for a given GenAI feature and workflow, and which monitor LLM context size / token usage to minimize errors caused by overloading the LLM’s context.
Verification: LLM driven (non-deterministic) and traditional (deterministic) tools (e.g. linters, automated tests) used to verify code and test generation results. “Does the generated code do what we expected it to do?”. The goal of course is to find defects early.