There's No AI Without APIs

Adopting AI without understanding the critical APIs underneath can lead to significant security exposure.

There's No AI Without APIs

One of the major technologies arising in the last decade is the concept of LLM-driven AI. These systems, referred to in short-hand simply as “AI”, provide incredible processing capabilities, producing everything from human-like natural language to developing code from rough service drafts.

While much of this development has happened in the very visible consumer application segment of the market, much of it is in fact powered by complex APIs underpinning LLM services. Unfortunately, adopting AI without understanding the significant APIs underneath can lead to significant security exposure.

With this in mind, let’s take a look at the current AI market and the APIs that drive it. We’ll look at the landscape as a whole, and start to understand the security risks underlying these systems and their adoption. And finally, we’ll confirm the fact that there is no AI without APIs – and accordingly, there is no secure AI without secure APIs driving it!

The Intersection of AI and APIs

Firstly, we should state just what the relationship between “AI” as a broad industry term and the API space actually is. AI is often used as a synonym in the industry for LLMs, or Large Language Models. This is somewhat of a misnomer – while current AI solutions are not general artificial intelligence, these Large Language Models nonetheless can turn out content that feels “human-like”. Securing these LLM-based APIs requires an understanding of how these models work and how the data is designed to flow between each element of the service.

Most LLMs provide an API to interact with their service, connecting external subscribers to the LLM for off-site processing. As an example, an LLM API that specializes in generating images from text might provide an API that can be integrated into a design application, allowing users to generatively create additional content off existing pictures using the LLM in question. 

The Growing Significance of AI in Modern Applications

It is this promise – the idea of simply sending a query in a service externally to an endpoint for further processing – that makes LLMs and AI so promising for integration. Modern applications have started targeting LLM APIs and AI more generally as part of their service offerings for a few core reasons:

  • LLMs lower the bar for creation – generating boilerplate text or simple iconography doesn’t take a lot of time, but it does when it’s repeated over and over again. LLMs offer a way to generate this content without so much friction, reducing “rote task” cost.
  • AI offers new development avenues – AI and LLMs offer a second “brain” to toss ideas at, opening up new avenues for generation and experimentation.
  • LLMs and AI drive down cost – many of the rote tasks that AI replaces are often done with contractors or other high-cost methodologies. LLMs and AI offer to drive this price down and free up those resources for other tasks.

For these reasons, AI solutions have become widespread in the last few years, especially with the development of advanced models such as ChatGPT and RunwayML.

The Unseen Link: Why APIs are Essential for AI

With all of this value on the table, it’s possible that many forget what is actually driving these systems – APIs.

An API, or application programming interface, is a simple interface between systems that sets a commonly agreed upon standard and protocol for interaction. AI and LLMs are no different from any other system – they need a way to integrate with external partners and users. Any AI worth its salt in the marketplace is going to offer an API, and as such, understanding how to securely use these APIs – and the very models they leverage – is important to adopting and securing AI at scale.

APIs are a crucial component of the Emerging LLM Stack (Source: https://a16z.com/emerging-architectures-for-llm-applications/)

Unfortunately, this step can be lost in the rapid adoption that has been common in the industry. It’s tempting to think of AI security as a matter of systemic security from each provider, but the reality is that AI security is just API security with some extra steps. To get a firm grasp on this, we must ask a major question about AI – just what is it, exactly, and how does it work?

The AI-Driven Landscape

Overview of AI Platforms and Language Models (LLMs)

So just what is an LLM? We’ve mentioned previously that LLM means “Large Language Model”, but how does it do what it does?

Imagine you have a computer that is able to mimic whatever you say to it. If you say “banana”, it will repeat “banana”. If you say “twenty is two tens”, it will say “twenty is two tens”. Now imagine that this computer is trained to identify relationships – when you next say “twenty is two tens”, the computer might be able to extrapolate that thirty would be three tens, and so forth. If you tell it that a banana is yellow, it will remember this, and if you mention something like a lemon in the future, it might say “this lemon is yellow - just like a banana!”

This seems simple at first glance, but imagine this continuing at a massive scale. What if, instead of simple sentences, you could feed this model millions and millions of pages of books, scanned dictionaries and thesauruses, academic literature, etc. With all of this information, the LLM could “learn” relationships and facts.

Take this a step further. Now that the computer knows all of the information we gave it, what if we asked the system to tell us something in the same style as the first thing it ever heard? It might say “the orange is orange”, even if we have not prompted that exact question.

Herein lies the benefit of an LLM – the system has learned to associate objects to colors, but has also learned the grammatical rules to express this relationship. With this, you could say something like “write me a poem in the style of Keats”, or “explain to me what the moon is made of as if I was five years old”, and you could get a legible, understandable body of text.

The Role of APIs in Enabling AI Capabilities

So far so good. Now let’s go a step further. If we fed this machine code from an application, it might learn how each of these steps is accomplished, and what is good code structure. What if we then asked it to review code that we will provide to see if it is “correct” (which, in this case, means “close in structure to the code it has learned”).

In this example, we are asking the LLM to ensure that what we have done matches the way it has learned is correct. Now what if we wanted to provide this service through an API for checking code as you develop it in an IDE?

This is the role of APIs in enabling AI capabilities. With a few endpoints and a structured LLM, you could easily develop any number of systems that act as a code corrector, a security verifier, a human-readability scanner, etc. And all of this could happen invisibly through an external API away from the end user, providing incredible value without disrupting the user flow.

What is so unique in this relationship between APIs and LLM AI solutions is that the interaction is often bidirectional - as much as APIs can leverage AI solutions, they can also improve them! APIs can be used to integrate training data into a corpus for training an LLM, they can help manage interconnected systems and control data flow, they can even connect LLMs together for multiple iterative model improvement – in many ways, APIs and LLMs are thus best seen as symbiotic enablement engines.

Examples of AI Applications Powered by APIs

For context and clarity, let’s look at two example APIs in this space.

Perhaps the most visible AI-based API is Google’s Speech-to-Text service. This service uses Google’s AI systems to generate text from speech input, utilizing a complex and powerful neural network to clean up audio and convert it to usable text in a very short timeframe. While the core model is quite powerful, several other trained models are available, allowing a variety of audio input to be processed efficiently, both using cloud resources and using on-device resources.

Another API, and perhaps the highest profile of the two, is the DALL-E 3 API This LLM is part of the OpenAI platform, and is a sibling to ChatGPT. DALL-E 3 allows for the creation of graphical content from text, and has been used widely for ideation and marketing material. The API allows users to generate this content natively within their applications and workflows, offering simplicity and flexibility.

The Vulnerabilities in the Link: API Security

While this technology is very promising, it does come with its own set of risks and considerations. 

Risks Associated with Inadequate API Security in AI Systems

LLMs and AI APIs are powerful tools – but they should be understood within the context of their use. By design, AI APIs take data and process it – and this processing often (but not always) leverages external resources. This introduces quite a few risks for data security and integrity, and, in certain cases, can also result in accidental data exfiltration.

As an example of how this can happen, ChatGPT was recently discovered to have a glitch that allowed for private information to be revealed through a strange channel – by simply asking the service to repeat “poem” over and over again. This exploit was addressed rather quickly, but it shows that AI APIs process data in such a way that many security weaknesses are not even entirely visible – and are certainly hard to plan against.

Even when information is processed without external resources, there is the real possibility that additional security risk is created. Some LLMs generate and log reports as part of their core offering, and while the information may be safe and encrypted on-device, there have been more than a few examples1 & 2 of services not properly encrypting logs. These logs could lead to greater vulnerability if exposed.

There is also the fact that these systems are not as tried-and-true as existing ones. The LLM landscape is relatively new, and while AI APIs are flashy and becoming widely adopted, they are not proven with several decades of use cases. That’s not to say that this automatically makes them a bad choice, but it does raise a question as to whether or not they have been tested enough to trust them with our most sensitive data.

Finally, and relatedly, there’s the simple fact that these systems are not perfect – despite what some adoptees may think. LLMs have gotten a bad rap lately because so many people simply trust the service to be true. LLMs don’t push perfect code or true text – they create an output that is human-like, and it’s entirely on the end user to validate this information. 

What makes this difficult to deal with is that the LLM will often insist that its output is factual and correct, an occurrence termed “hallucination” by developers and researchers. What this means for the end user is that an API which corrects grammar might confidently tell you that you are wrong – despite the fact that you are correct.

While this issue is getting better with every new revision, it does bear the question – can we trust these systems with vital security systems and practices?

API Security: A Must for AI Organizations 

Thankfully, there are some best practices that can be leveraged to secure AI API implementations. While these are strong starting points, it’s important to remember that these are just that – starting points. Every instance will come with its own caveats and considerations, and short of adopting a trusted partner to manage these issues, you will need to invest some serious time and energy in ensuring adequate security in your particular environment.

The Two-Way Relationship

Firstly, adoptees should remember that these systems are principally two-way. While you are getting text, images, etc. from LLMs and other AI APIs, you are also feeding data to get that output. This means that you need to properly sanitize your data and consider what is flowing into the service as much as what is flowing in.

For example, using an AI system to validate the structure of a database object before pushing to production is not a bad idea. This can help identify areas where the data is recursively stored, where there is some inefficiency in storage, etc. What makes this process dangerous is if you use live production data instead of fake testing data – you could be pushing actual user information to an external service which could then be used to retrieve that data!

If instead you sanitize your inputs with variables or better structured queries, you can get the same benefit without putting any proprietary or private information at risk.

You should also consider that these data sets used by AI can be poisoned – and often, this can occur as part of using the LLM as part of a core data stack. LLMs reprocess data, introduce new data from new sets constantly, and ultimately ingest more context and information over time that could be less trustworthy than traditional sources. Accordingly, you should sanitize the data coming back from the service as an untrusted data source. While the topic of unsanitized data input is a funny topic for webcomics, it could lead to massive data loss and service damage in the real-world, and all it takes to pull this off is enough poison in the data set.

Ensure Accuracy through Auditing

LLMs and the AI APIs that result from them suffer from hallucinations, but this does not make them useless. What it does make them is fallible – and just as you would with any fallible content, you must audit and ensure accuracy. Adopting AI APIs must come with their own auditing and accuracy workflows to ensure that whatever is generated is accurate, function-appropriate, and built for purpose.

Related to this topic is the concept of visibility. Because AI APIs often push their output directly to the endpoint, it can be easy to lose track of just what is an internal service generated piece of content and what is externally generated. Accordingly, you need to ensure you are tracking all of these generative systems throughout the service by ensuring high visibility and tracking. This requires a huge level up across the system, as any lacking visibility will make it easy to lose this content at scale.

As a part of this approach, you need to have adequate logging and monitoring on everything that is happening throughout your system. While this is generally a good tip regardless, many adopters of AI technology are tempted to just pipe the output of these LLMs into the service directly. This is highly dangerous for many reasons, but the huge issue is that this often sidesteps logging entirely. Treat these connections for what they are – external APIs that need to be logged and monitored.

Prioritizing API Security in AI Development

Simply put, your posture is going to inform just how effective your AI API security is. You could make an incredibly secure integration with an AI API and LLM ecosystem, but if your service itself is insecure, it won’t matter.

Accordingly, you need to invest heavily in ensuring proper security posture. This requires a lot of things to come together, including proper authentication, authorization, zero trust approach, threat modeling, and more, but doing this properly is worth its weight in gold.

Adopt a position of prioritizing API security. Security-first as a term has become popular for this concept, but at the end of the day, whatever you call it, you must resolve every issue down to this – “step one – how do we do this securely?”

Adopt a Trusted Partner – How FireTail Can Help

API Security is not going away, and with the development of AI APIs, security is only going to become more complex. Thankfully, FireTail can partner with you to deliver incredible security with very low friction.

FireTail is a leading security solution for APIs and the businesses that use them. By leveraging FireTail’s proven platform, you can integrate external services, internal systems, and more, unlocking huge business potential while delivering effective, secure, and safe functionality for your user base.

FireTail offers the most complete set of features for modern organizations seeking true end-to-end API security.  The Security Posture Management offering allows organizations to gain full visibility into their service ecosystem, discovering potential vulnerabilities and addressing them quickly and securely. When paired with FireTail’s word-class Audit Trail system, which unlocks comprehensive auditing across the service, organizations can gain a full view that is not provided anywhere else. Finally, API Alerting and Monitoring allows you to effectively track your service, proactively engaging with threats and detecting potential attacks before they become a problem.

API Security is critical, and it requires a trusted partner – check out FireTail today and see just what it can do for you!