It's easy to fail with Gen AI. Here's what you can do to avoid it

A useful ChatGPT application example for an ERP system, risks associated and workarounds.

undefined
Best Practices
Engineering insights
Machine learning
10 min read

Suppose you have an ERP. It has a bunch of APIs that are somehow connected to the UI. An individual working with the UI – say, HR manager – needs to figure out how to use the UI to build a report. Not to mention, some reports won’t look the way you need it, should you build them using available UI, or the insights format may be inconvenient.

For example, you have a bunch of tables, but you want a graph and need to somehow group the data. Ask the ChatGPT bot to do it for you. The bot will decide itself which APIs to call, retrieve the data, group it, and generate the report. As simple as that.

Hi, I’m Ilya Mokin, Head of R&D at Symfa. In this article, I’ll show you how you can implement such a solution in your company. I’ll also briefly go through the potential issues you might run into and show you a few screenshots of mine so that you see what the problem actually looks like, should you face it.

Important: This is an industry-agnostic solution, a mere example of using AI technology to get quick wins for your business. I hope technical managers will find this article most useful, and will be happy to get feedback from AI advocates from business, too (though some terminology may seem difficult for you).

Table of Contents

  • Use case and the immediate benefits that Gen AI promises
  • What are the potential issues?
  • There are workarounds for this problem
  • Does data quality issue persist?
  • It’s a long long way to perfection

Use case and the immediate benefits that Gen AI promises

On the graph I placed below is the request I made to ChatGPT, and the graph is built based on the data collected. It shows a workload breakdown for a specific developer: How many hours he has tracked for each project during the year.

I explained to the GPT Bot how to use the ERP system APIs so that it can make requests to it on its own.

  • I described to the bot what data I need
  • The bot decided which APIs to go to and how to process the data
  • The bot generated Python code and executed it
  • The bot returned the result as a picture with a breakdown

Image (22)

In short, it gave some basic, yet coherent, answer to my request.

This is a bot that runs across your CRM or ERP system, collects data and builds reports for you.  It works for any system that has RESTful APIs, to be exact. Such systems most likely use Swagger. And even if they don't, it's easy to define Swagger in them. Feed Swagger files to OpenAI’s GPT or any LLM model. Thus, the bot will be able to make requests to the endpoints of your ERP/CRM and build charts based on the results of these requests. 

In the best-case scenario, that would be it. In the real world, this is still possible, but there’ll be a few plot twists down the road.

What are the potential issues?

1. The bot is unpredictable

This query you’re looking at above is very sterile, with 20 unsuccessful queries preceding it. Just because it generated this graph doesn't mean it's ready for production use. It's important to understand that ChatGPT can be used in an ERP, but this particular solution still needs a lot of work.

  • The bot itself decides how many requests it needs to make. Every redundant request is wasting your tokens.

The Bot Is Unpredictable (2)

The bot makes one try after another wasting your tokens

Say, what happens if you accidentally make the wrong request? The bot may decide that it needs to use a specific API, a huge one. The API returns this massive amount of data, which the bot cannot handle and it crashes. It tries again – and again it crashes. And again. Each of these requests consumes tokens. In other words, you pay for zero chances of execution.

2. Glitch is a natural thing

The bot generates the code in Python, alright. But if the API returns a very large amount of data, there is a high chance of a glitch. The bot acts rather as an assistant, not the one in charge. The dialogue below is very similar to real dev-bot communication.

The developer: How do I generate such and such report?
Bot: Well, it may look like this, try adding your own data instead.

Okay, that’s good enough for the training purposes, but it’s some real business we’re talking about here.

The bot may retrieve the existing data, but it takes only a small portion of it or nothing (see example below) – say, two lines instead of a hundred. It inserts the data into the code, generates a graph – voula, you have a graph with two barcharts instead of a hundred. The data seems to be returned correctly, but this is some crappy code (see work_logs array bellow).

import matplotlib.pyplot as plt
from collections import defaultdict
from datetime import datetime, timedelta


# Data from the work logs (simplified for this example)
  work_logs = [
# ... (actual data from query presented as list of dictionaries)
]


# Initialize a dictionary for all January 2024 days with default 0 spent time
  days_in_january = {datetime(2024, 1, day): {'spent': 0, 'issue': ''} for day in range(1, 32)}


# Populate dictionary with the work_logs data
for log in work_logs:
date = datetime.fromisoformat(log['date'][:-1])  # Using [:-1] to remove the 'Z' at the end
issue = log['issue'][:20] + '...' if len(log['issue']) > 20 else log['issue']  # Truncate issue if too long
days_in_january[date]['spent'] += log['spent']
if days_in_january[date]['issue']:
days_in_january[date]['issue'] += '\n'  # Newline if there's already an issue for this day
days_in_january[date]['issue'] += issue


# Transform data into lists for plotting
  dates = list(days_in_january.keys())
spent_hours = [info['spent'] for info in days_in_january.values()]
issues = [info['issue'] for info in days_in_january.values()]


# Create a bar chart
plt.figure(figsize=(10, 8))
bars = plt.bar(dates, spent_hours)


# Add issue text inside each bar
for bar, issue in zip(bars, issues):
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2, yval / 2, issue, ha='center', va='bottom', color='white', fontsize=7, rotation=90)


# Set up the x-axis to show each day as a tick
plt.gca().xaxis.set_major_formatter(plt.matplotlib.dates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(plt.matplotlib.dates.DayLocator(interval=1))
plt.xticks(rotation=90)


plt.xlabel('Date')
plt.ylabel('Hours Spent')
plt.title('Hours Spent per Day in January 2024 with Task Issue Labels')
plt.tight_layout()  # Adjust layout to make room for the x-axis labels


# Show the plot
plt.show()

ChatGPT generated a valid code but ignored ALL data by inserting a stub, which generates a useless empty chart.

Run Instructions

Two failed request attempts: First time Code Interpreter generated error; second attempt was more successful – the bot generated the result, but it was wrong. Each attempt wastes your tokens.

So, the real dialogue between the user and the bot will more likely be like that:

The developer: Generate a report for me. Do this and that. Present it in the form of a chart. Please insert all the data into the report.
Bot: You are sending too many messages! Please, give me time (shuts down).

Naturally, an ordinary person gives up after a while. A developer can guess where the problem lies, redo the query and even get the proper result, while a common user won’t go to that lengths.

There are workarounds for this problem

I mentioned Swagger before – feed Swagger files to the bot, and it will decide what requests to make. But whenever APIs are not adapted for a bot, they can return a lot of data. Sometimes it will cause the bot to freeze. Or the bot can make an incorrect request to the API, and then generate incorrect code ... Too risky.

It makes more sense to use function calling – the general approach that OpenAI suggests to implement. OpenAI supports such thing as function descriptions. Describe your function for a bot in a special format. This works better because OpenAI trained the model to format the function well (and other models are working on this too). If you want to connect to the ERP end-points, feeding a huge OpenAPI may work, but it won’t be a very stable solution. If you want stability, you need to describe each function separately by hand.

But even this does not guarantee 100% success.

If you want seriously impressive results, prepare a dataset for your API and fine-tune the model. Here, again, another door may slam at you – OpenA does not support fine-tuning for the Assistant model with the Code Interpreter. Probably it will be available in future, but for now you can use the fine-tuned chat… which adds extra work with Python code execution and a bunch of other surprises.

Additionally to that you could create a decision tree with sub-instructions and a list of bots for different sub-cases. The instruction will decide which bot to use for different user’s intents - a pretty old approach which is still useful even with new LLM models.

User's Request

Then maybe you should give it a chance with other open source models, which you can fine-tune for yourself? Think falcon, Mixtral, Llama, openchat, google/gemma, all available in public via huggingface. In the long term, this may be a cheaper option, depending on how actively you use the bot (see Azure's pricing as an option).

Open Ai

Does data quality issue persist?

You cannot control the behavior of the bot under the hood (i.e. what data it takes to generate the report). Proper fine-tuning of the model and the bot may ensure 80% success rate. Those APIs that cause the bot to freeze, simply block them. But 20% of cases when data can be inaccurate remain. 20% of some important data missed.

The solution for now – consider this system more of an assistant that it’s built to be.

The model itself is designed to HELP in coding. That is, it isn’t yet strong enough to generate the code ready for execution 100%. OpenAI is certainly trying to fix it, but data quality risk persists.

It’s a long long way to perfection

This is the ERP case I’m talking about, but you can connect ChatGPT to any other APIs – CRM, Jira, you name it. Moreover, it’s not about the performance charts. You can get any kind of report with any data that our system does not easily allow you to get.

Many companies do analytics add-ons on top of existing systems. Using ChatGPT for extra analytics saves you the costs of building a separate system. With the solution you’ve just read about – the bot groups the existing APIs and makes a report for you – easy-access analytics may become a new reality for your business. 

In contrast – similar reports are done now in Jira and Azure DevOps by Symfa talents. But those are prepared either by trained Senior PMs or BI specialists, who know the ins and outs of the time tracking system, queries, data and the visual capacities of those systems. 

Jira reporting

Azure DevOps reporting

Jira and Azure DevOps reports made by Senior PMs at Symfa

ChatGPT makes doing the same work ways easier – just give the bot a task and it does everything for you. However, the fine-tuning and testing efforts in front of us are enormous. As enormous as the potential that AI techs hold.

Stay tuned! Follow us on LinkedIn and X to be the first to know about our updates.
We post regularly about current tech trends and share honest software development backstage stories.

Related Articles

BACK TO BLOG

Contact us

Our team will get back to you promptly to discuss the next steps