Helping Generic AI Coding Tools Tackle Enterprise Scale Projects

Mar 11

White Paper for Sage Technologies AI by Eric Newcomer

Introduction

Most people are aware by now that generative AI tools can significantly improve productivity by summarizing meetings and documents, and executing analytical and research tasks.

And most people are also aware that generative AI coding tools significantly improve developer productivity as well.

But many are not as aware that the challenges of developing, modernizing, and migrating complex enterprise applications exceed the cognitive capacity of the AI coding tools, especially for large scale enterprise applications.

General knowledge training means that generic AI code generation tools are sufficient for creating general purpose business applications.

They do not have the training or breadth of understanding to create or modernize strategic (and proprietary) core business applications. A very significant and meaningful gap exists between the general knowledge an LLM possesses and the specific knowledge about an enterprise application modernization requires.

Applications that automate an organization’s core business are part of how they compete with each other, and represent unique and complex challenges for IT systems. Maintaining and modernizing such mission critical applications is risky, and many such projects fail. Using AI coding agents multiplies this risk when developers code faster and don’t understand what they are modifying.

Examples include Tesla’s real time routing system for informing drivers about the next open charging station nearby, Amazon’s ordering and delivery application, CItibank’s global faster payment processing system, Uber’s ride sharing application, or Netflix’s streaming service. And so on.

Applications such as these are not the same as applications built using general coding knowledge. Code for strategic core business applications is considered valuable intellectual property, and closely guarded.

When you think about it in the historical sense of IT evolution, such applications have existed in various forms for decades and are part of how large organizations compete with each other. And many, if not all, of these need attention to bring them up to modern standards.

Understanding, managing, evolving, migrating, and modernizing such applications requires specific expertise and experience. It’s not something you can learn in a university setting, or train an LLM on using publicly available code.

Foundation Model Limits

Generative AI is not truly intelligent and cannot understand the reasoning behind a complex application. It’s a probabilistic system based on statistically predicting the right answers to a prompt.

Generative AI coding tools are capable of generating large volumes of source code in response to a prompt but are incapable of ensuring all of the code conforms to a specific architecture, or is of sufficient quality for the purpose.

Generated code often doesn’t work the way it’s supposed to, and often doesn’t meet requirements for enterprise level qualities of service such as scalability, reliability, failover, security, performance under load, and so on. The resulting code, especially for larger applications, creates a lot of manual review and iterative rework.

ChatGPT and other foundational models such as Claude, Gemini, and Grok do well at generating code for simple tasks consistent with their training, such as creating a webpage or basic mobile app, but often fail at generating code for the kind of complex and unique applications a large enterprise requires to operate a large, unique business at scale, such as global payroll payment processing, credit card transaction processing, insurance claim processing, transportation logistics planning, and so on.

Generative AI coding tools just don’t have sufficient context to accurately create or modernize complex enterprise applications.

Understanding the business requirements, target architecture, and overall IT environment for exactly what needs to be done to create or modernize a complex application is crucial for an enterprise application to succeed, especially when working in a landscape already largely populated with legacy applications with interdependencies that may also need to be modernized or migrated.

Foundational models fall short in their ability to work with complex applications at a large scale. When there are too many variables, and the output needs to be carefully validated against strict requirements, these models typically prove inadequate to the task.

Gen AI Context Windows

When analyzing application code, foundation models all have context window limitations on the number of tokens they can ingest, which restricts the number of lines of code (LoC) they can process.

An estimate of 100 lines of code = 1,000 tokens is a popular rule of thumb, but of course as we all know, the real world is going to vary considerably. Many large legacy applications have more than a million LoC.

It’s true that you can improve your chances of success by breaking the problem up and getting the models to work on one part of the application at a time, but then you are playing the role of the architect manually and validating that the generated code is correct and assembling the pieces.

Even working within the largest context window, however, which allows for processing 1.2M LoC, you have no real control over the results, which typically include hallucinations and inaccuracies. The larger the code base, the greater the chance for error, and the greater the need for expensive rework.

And when you get close to the context window limit, or even exceed it, the results can be completely unpredictable.

Establishing Context

While AI models understand the basics of source code, they do so at a simple associative level, not within an overall architectural context. They are not capable of fitting the pieces together into a comprehensive representation of an application, or recognizing and depicting an overarching architectural representation.

Such a gap in understanding can’t be solved by a larger context window. An LLM has no way to distinguish a critical business rule from a formatting helper, for example. The system extends beyond code and includes runtime behavior, data relationships, integration contracts, and such quirks as tribal knowledge encoded in opaque variable names.

And without the proper understanding and specific knowledge about an application, it won’t be possible. (Here we see again the contrast between general knowledge of an LLM and the specific, often proprietary, knowledge about an enterprise application.)

Knowledge and understanding isn’t the same as reading the code and analyzing it. A complex enterprise application is a system, including runtime behavior, data relationships, integration contracts, and reflects tribal knowledge (human interpretation of naming conventions and authorization schemes for example).

Enterprise project leaders may have the idea from Microsoft, GitHub, AWS, and major consultancies that AI “understands code now” and may accept this claim at face value. Is it going to be an accurate understanding? Typically not.

Correlation, complete information, and continuous validation is required to get the right context for modernization.

Today, the challenge is more pressing than ever, with the growing requirement to quickly evolve systems to improve customer satisfaction and include new functionality. Never mind the need to support a growing number of new AI projects.

A Solution

Imagine you could use generative AI to generate a comprehensive modernization plan, including examples of code what you might want to generate. And after you have a chance to iterate on the plan until it’s exactly the way you want it, and only then hand off to the coding agents to generate the new application. You would have an architecture toolbox to guide you every step of the way – documenting the existing system you want to modernize or expand, and automatically validating all new code against the project requirements and plan.

Using an AI architecture tool such as Sage-Tech AI provides all of this out of the box, lowers risk, and improves the capacity to migrate and modernize, fundamentally changing the nature of this challenge for the better.

Sage-Tech AI uses an LLM to analyze application code and other artifacts and creates a model of the application as a result of the analysis. The model generates a full set of documents, including architecture diagrams, training, project planning, compliance, business domain mapping, integration points, and so on, which are fed back into an LLM to further refine the model.

The model uses a proprietary knowledge graph to correlate and associate related application components and analyze them in relationship to each other. Sage-Tech AI stores the model in JSON format in a database. The model typically undergoes multiple iterations to improve its quality and accuracy, using confidence scores and other metrics to confirm the LLM’s output.

Sage-Tech AI exposes the application data model using an MCP server to feed information about the application to an LLM to combine its general knowledge with application specific information.

For example, if your goal is to convert a monolithic application into microservices, the LLM applies its general knowledge about microservices while the Sage-Tech AI MCP server supplies the relevant information about the specific application to create and implement a migration plan.

Sage-Tech AI in other words adds context to foundational models so they can handle large, complex enterprise applications that are strategic to a business.

During the analysis phase, Sage-Tech AI automatically discovers and incorporates into its model:

Bounded Contexts: Delineate service boundaries for refactoring and incremental migration or modernization (for example, to migrate from monolith to microservices
Entry Points: Map an application’s calling structure in complete detail so that a modernized version can maintain the application’s system-to-system connectivity
Aggregate Roots: Identify core business objects, such as where operational business records are created, changed and shared.
Business Rules: Find, understand, record, and replicate the business rules. Associate the rules with contexts, aggregate roots, and workflows
Workflows: Understand automated processes that are acting on your aggregate roots, and find implementation constraints and business rules.
Implementation Constraints: Identify constraints such as an EDI partner that can only accept a certain record size for a data exchange, and associate them with the application knowledge graph, to make sure the modernized system represents the entire existing automated business process.

Using a general LLM for complex application understanding is like asking someone to help you navigate a city by describing it to them turn-by-turn. Sage-Tech AI is like having a detailed map of the entire city with annotations about every building, street, and landmark.

Context and the right level of understanding make all the difference between a general purpose application that anyone can build and a strategic application that can help the business grow, increase revenue, and beat the competition.

Modernization ROI

Using AI to analyze and create a model for deep application understanding automates what previously was a significant manual effort, accelerates modernization and other related projects, reduces time to production, helps increase revenue, and reduces capex. In particular, understanding the complexity and details of a mission critical system is the key to success.

Automating enterprise architecture functions provides comprehensive landscape coverage and consistent implementation of strategic policies and approaches to modernization, migration, rationalization, as well as to new enterprise application work. It’s risky work, and businesses are reluctant to allocate budget for “big bank” modernization projects as a result of prior failures.

An old adage in the computer industry says that the more work you do upfront on a project before you start, the better the results will be at the end. This means doing the architecture and design work upfront helps you ensure a good outcome at the end of the project. This is again what Sage-Tech AI is designed for.

AI automation can also enable modernization and other projects that were not considered possible using manual effort alone.

An analysis comparing the level of effort and results in understanding the 1.3M LoC open source COBOL application ACAS using Sage-Tech AI with Claude Sonnet versus using Claude Sonnet by itself and versus manual analysis showed that:

Sage-Tech AI took two minutes to generate comprehensive architecture documents, including a seven-layer architectural classification, identifying thirty-six business function, 182 copybook dependencies
Claude Sonnet took eight to twelve hours to produce limited results, missing architectural patterns, discovering structure file by file with no cross-file dependency analysis, and exhausted its context window

In addition, Sage-Tech AI produced a migration scope definition, generated code translation examples (including source code, data migration, and GUIs), and created a target architecture design.

The Intellyx Take

Working with complex enterprise applications has always required a specific set of skills. On the one hand, you need to understand the particular characteristics of the business, especially those that make it unique and competitive, and turn those into application requirements. .

Then on the other hand you have to figure out how to implement those requirements using the right combination of technologies – the right type of data storage and retrieval, the right type of UX, the network communication requirements, the security, reliability, performance characteristics, and so on.

And an exercise like this is hard enough to get right the first time. Historically close to half of the large bespoke enterprise application development projects fail. Just as so many of the modernization and migration projects fail.

It’s not an easy thing to do. And it isn’t easy for generative AI to do it, either.

What’s needed (as is often the case) is the right information, in the right place, at the right time. In generative AI terms, proper context engineering. And this is what Sage-Tech AI provides.

Deepak Sharma