Use case and the immediate benefits that Gen AI promises
On the graph I placed below is the request I made to ChatGPT, and the graph is built based on the data collected. It shows a workload breakdown for a specific developer: How many hours he has tracked for each project during the year.
I explained to the GPT Bot how to use the ERP system APIs so that it can make requests to it on its own.
- I described to the bot what data I need
- The bot decided which APIs to go to and how to process the data
- The bot generated Python code and executed it
- The bot returned the result as a picture with a breakdown
In short, it gave some basic, yet coherent, answer to my request.
This is a bot that runs across your CRM or ERP system, collects data and builds reports for you. It works for any system that has RESTful APIs, to be exact. Such systems most likely use Swagger. And even if they don't, it's easy to define Swagger in them. Feed Swagger files to OpenAI’s GPT or any LLM model. Thus, the bot will be able to make requests to the endpoints of your ERP/CRM and build charts based on the results of these requests.
In the best-case scenario, that would be it. In the real world, this is still possible, but there’ll be a few plot twists down the road.
What are the potential issues?
1. The bot is unpredictable
This query you’re looking at above is very sterile, with 20 unsuccessful queries preceding it. Just because it generated this graph doesn't mean it's ready for production use. It's important to understand that ChatGPT can be used in an ERP, but this particular solution still needs a lot of work.
- The bot itself decides how many requests it needs to make. Every redundant request is wasting your tokens.
The bot makes one try after another wasting your tokens
Say, what happens if you accidentally make the wrong request? The bot may decide that it needs to use a specific API, a huge one. The API returns this massive amount of data, which the bot cannot handle and it crashes. It tries again – and again it crashes. And again. Each of these requests consumes tokens. In other words, you pay for zero chances of execution.
2. Glitch is a natural thing
The bot generates the code in Python, alright. But if the API returns a very large amount of data, there is a high chance of a glitch. The bot acts rather as an assistant, not the one in charge. The dialogue below is very similar to real dev-bot communication.
The developer: How do I generate such and such report?
Bot: Well, it may look like this, try adding your own data instead.
Okay, that’s good enough for the training purposes, but it’s some real business we’re talking about here.
The bot may retrieve the existing data, but it takes only a small portion of it or nothing (see example below) – say, two lines instead of a hundred. It inserts the data into the code, generates a graph – voula, you have a graph with two barcharts instead of a hundred. The data seems to be returned correctly, but this is some crappy code (see work_logs array bellow).
import matplotlib.pyplot as plt
from collections import defaultdict
from datetime import datetime, timedelta
# Data from the work logs (simplified for this example)
work_logs = [
# ... (actual data from query presented as list of dictionaries)
]
# Initialize a dictionary for all January 2024 days with default 0 spent time
days_in_january = {datetime(2024, 1, day): {'spent': 0, 'issue': ''} for day in range(1, 32)}
# Populate dictionary with the work_logs data
for log in work_logs:
date = datetime.fromisoformat(log['date'][:-1]) # Using [:-1] to remove the 'Z' at the end
issue = log['issue'][:20] + '...' if len(log['issue']) > 20 else log['issue'] # Truncate issue if too long
days_in_january[date]['spent'] += log['spent']
if days_in_january[date]['issue']:
days_in_january[date]['issue'] += '\n' # Newline if there's already an issue for this day
days_in_january[date]['issue'] += issue
# Transform data into lists for plotting
dates = list(days_in_january.keys())
spent_hours = [info['spent'] for info in days_in_january.values()]
issues = [info['issue'] for info in days_in_january.values()]
# Create a bar chart
plt.figure(figsize=(10, 8))
bars = plt.bar(dates, spent_hours)
# Add issue text inside each bar
for bar, issue in zip(bars, issues):
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2, yval / 2, issue, ha='center', va='bottom', color='white', fontsize=7, rotation=90)
# Set up the x-axis to show each day as a tick
plt.gca().xaxis.set_major_formatter(plt.matplotlib.dates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(plt.matplotlib.dates.DayLocator(interval=1))
plt.xticks(rotation=90)
plt.xlabel('Date')
plt.ylabel('Hours Spent')
plt.title('Hours Spent per Day in January 2024 with Task Issue Labels')
plt.tight_layout() # Adjust layout to make room for the x-axis labels
# Show the plot
plt.show()
ChatGPT generated a valid code but ignored ALL data by inserting a stub, which generates a useless empty chart.
Two failed request attempts: First time Code Interpreter generated error; second attempt was more successful – the bot generated the result, but it was wrong. Each attempt wastes your tokens.
So, the real dialogue between the user and the bot will more likely be like that:
The developer: Generate a report for me. Do this and that. Present it in the form of a chart. Please insert all the data into the report.
Bot: You are sending too many messages! Please, give me time (shuts down).
Naturally, an ordinary person gives up after a while. A developer can guess where the problem lies, redo the query and even get the proper result, while a common user won’t go to that lengths.
There are workarounds for this problem
I mentioned Swagger before – feed Swagger files to the bot, and it will decide what requests to make. But whenever APIs are not adapted for a bot, they can return a lot of data. Sometimes it will cause the bot to freeze. Or the bot can make an incorrect request to the API, and then generate incorrect code ... Too risky.
It makes more sense to use function calling – the general approach that OpenAI suggests to implement. OpenAI supports such thing as function descriptions. Describe your function for a bot in a special format. This works better because OpenAI trained the model to format the function well (and other models are working on this too). If you want to connect to the ERP end-points, feeding a huge OpenAPI may work, but it won’t be a very stable solution. If you want stability, you need to describe each function separately by hand.
But even this does not guarantee 100% success.
If you want seriously impressive results, prepare a dataset for your API and fine-tune the model. Here, again, another door may slam at you – OpenA does not support fine-tuning for the Assistant model with the Code Interpreter. Probably it will be available in future, but for now you can use the fine-tuned chat… which adds extra work with Python code execution and a bunch of other surprises.
Additionally to that you could create a decision tree with sub-instructions and a list of bots for different sub-cases. The instruction will decide which bot to use for different user’s intents - a pretty old approach which is still useful even with new LLM models.
Then maybe you should give it a chance with other open source models, which you can fine-tune for yourself? Think falcon, Mixtral, Llama, openchat, google/gemma, all available in public via huggingface. In the long term, this may be a cheaper option, depending on how actively you use the bot (see Azure's pricing as an option).
Does data quality issue persist?
You cannot control the behavior of the bot under the hood (i.e. what data it takes to generate the report). Proper fine-tuning of the model and the bot may ensure 80% success rate. Those APIs that cause the bot to freeze, simply block them. But 20% of cases when data can be inaccurate remain. 20% of some important data missed.
The solution for now – consider this system more of an assistant that it’s built to be.
The model itself is designed to HELP in coding. That is, it isn’t yet strong enough to generate the code ready for execution 100%. OpenAI is certainly trying to fix it, but data quality risk persists.
It’s a long long way to perfection
This is the ERP case I’m talking about, but you can connect ChatGPT to any other APIs – CRM, Jira, you name it. Moreover, it’s not about the performance charts. You can get any kind of report with any data that our system does not easily allow you to get.
Many companies do analytics add-ons on top of existing systems. Using ChatGPT for extra analytics saves you the costs of building a separate system. With the solution you’ve just read about – the bot groups the existing APIs and makes a report for you – easy-access analytics may become a new reality for your business.
In contrast – similar reports are done now in Jira and Azure DevOps by Symfa talents. But those are prepared either by trained Senior PMs or BI specialists, who know the ins and outs of the time tracking system, queries, data and the visual capacities of those systems.
Jira and Azure DevOps reports made by Senior PMs at Symfa
ChatGPT makes doing the same work ways easier – just give the bot a task and it does everything for you. However, the fine-tuning and testing efforts in front of us are enormous. As enormous as the potential that AI techs hold.
Stay tuned! Follow us on LinkedIn and X to be the first to know about our updates.
We post regularly about current tech trends and share honest software development backstage stories.