Learning AI integration by letting a chatbot insult you

Nuno Cardoso Nuno comes from a with a researcher background in computer science, in the areas of natural language processing, information retrieval, semantic web, and artificial intelligence. He has a proactive personality, always curious around new technologies and development trends that create better user experiences within mobile / desktop applications. He is an advocate of code simplicity, readability, predictability and maintainability. Web products should be easy to understand, operate and maintain through their lifespan, for both users and developers.

09/14/2023 |

We have seen an impressive amount of news being released around AI applications in health, entertainment, mapping, and service industries. We read about AI chatbots passing exams, driving cars, diagnosing rage diseases, and predicting floods, but the most popular application so far is with AI chatbots such as ChatGPT.

As a business that wants to jump into the AI wagon, we want to start somewhere. Where to begin? Should we search for seminars online? Buy books about AI? Buy an AI start-up company? Are there cheaper methods?

We're tech people. We like to build stuff to learn. So, while we do not have a concrete application to work with, let's mock one.

This is the story of how Nuno Cardoso and Manfred Bjørlin made a roasting app with the help of AI chatbots, for our own (and other's) amusement, to test our self-esteem and self-deprecating humor while we assess how we can bring AI to our products.

How does AI Face Roast work?

The roast app idea is straightforward:

Take a picture of the subject
Make a prompt from the picture description, with the desired tone
Display AI's output as a picture legend
Add the mugshot to the picture pile.

We split the work in frontend and backend, and here is how we made it.

Frontend

The frontend code is React and Typescript. We use redux for state management, as we can't live without it. It may sound like too much for a simple app, but it really helps speed up / debugging while prototyping the app.

To illustrate this, here is an example: we boot the app, and we get a list of previous roast pictures so that we can display a picture wall in the background. The picture collection data is stored with the help of actions/reducers. When a new roast picture is made, we append it to the pile by dispatching redux actions.

We can avoid unnecessary re-renderings and flashes that look always bad on single-page applications, by just letting redux manage your apps's data. Only the background component responsible for rendering pictures will re-render when adding a new roast picture. All other components stay the same.

With the help of React key prop, old pictures are untouched in the background, while new pictures get rendered seamlessly.

Now, to take a picture, HTML5 provides easy access to the webcam's video feed, so that is not a challenge anymore with modern browsers.

Most of the work was spent actually to ensure the front end would integrate properly with the backend in both development and production environments. We needed, therefore, a simple node express server to run as a proxy for API calls, to help circumvent CORS restrictions.

Finally, all was packed into a docker container and published to Azure Cloud, waiting for people wishing to be roasted.

Backend

In the back end, we have yet another proxy service doing several tasks.

First, the image is saved in a CosmosDB database. Then the image is sent to Azure Facial Recognition API in Azure AI Services, which provides us with a quite neutral description of what's seen in the picture.

If we use this image as an example (I know, I should be a model!):

After a bit of making the JSON result from the facial recognition human readable, this is the description we end up with: "a person with the hand on the chin. This image fits into the following categories: others and people. Out of the faces I can identify, I'd guess they're: Male (50). Other stuff I see in the picture is: window and person."

This description is updated in the database, and then a curated prompt is prepared based on the description. On the main page, when taking a picture, you're choosing if you want a roast, compliment, backhanded compliment, etc. - and this is inserted into the prompt to ChatGPT. So for this image, we asked for a "hard roast", so the prompt going to ChatGPT is: "Make a two sentence tongue in cheek, but not too mean hard roast of a person that is described like this:" followed by the mentioned description. And as a fun fact, we had to add the "but not too mean" in order for it to actually be funny, and not just plain mean. This prompt is then sent to the ChatGPT Completion API, and the first result is chosen. Then this is updated in the database, and the URL to where to fetch the result is returned to the front end.

And the roast from ChatGPT for this picture? "He must have been thinking for hours about the best way to scratch his chin with his hand. No matter how hard he tries, it looks like he'll never find the answer.". I'm not sure to be more insulted by the roast or the fact that it thought I was 50, but that's life.

Both of these requests (to Azure Facial Recognition API and ChatGPT API) are simple HTTP POST requests with JSON bodies, and the results are given in JSON format as well.

So this may be one of the simplest examples of how to combine two APIs in a series, to generate a more complete result. This proxy is in other words doing one defined assignment, by doing a number of individual tasks.

Is this the way?

It is at least one way. Probably the way to move forward with AI is to combine different AI services with "older" existing services or build AI services into your platforms and solutions.

One single AI service will probably not revolutionize your business, but rather the way you succeed in implementing and inserting AI services at suited places in your business solution.

One thing is for sure - brush up your skills on APIs and proxies, and you are going to do a lot of integration work into your app.

Is it expensive?

It depends. ChatGPT by itself for something small as this totaled to $0.3 after 2 days of conference, and a lot of people using it. But as with most services, this is usage-specific. If you are to add a lot of your data to the request, and/or connect your AI services to your data it will increase. At the moment the prices are stabilizing, at least on the more "seasoned" services, but this has to be a calculation for every integration. There are also always ways of tweaking the cost by limiting data access, etc.

If you're just getting into testing to use ChatGPT to work with your data, the platform has a great Playground where you have access to more functionality than you have in the front-end most users use when accessing ChatGPT. This is available once you have registered as a developer. And just as a hot tip when you get into the documentation and want to connect this to some data: Take a look at the functions and function_call feature in the Chat request.

But how is this useful?

That is kind of up to your imagination. I think no one has really hit the nail on its head yet. There are a lot of kind of useful applications out there, but none that really leverages AI to revolutionize the business.

Maybe your idea is the one? And I think the best ideas come up while playing.

What did we learn from this?

We are very impressed with the AI's capabilities in not only recognizing the objects and themes from an image but into offending us with some well-deserved roasts.

Have fun!

If you want to know more about how we used this app at our stand at JavaZone Oslo 2023, as in the image below, check out this recap.