Part 2: AI, Architecture, and Ontologies

A principles-based approach

What’s facing organizations today is a mountain range akin to the one faced by organizations when computer use first started becoming widespread: how to get all the data that was on paper into digital form so that it could be consumed. This was a massive question; one that consumed millions of dollars in rote data entry, which was rife with errors, and which is still not completed today. This should sound familiar. 

I’d argue, however, that the current mountain range is possibly harder to traverse than even the previous one. This time, instead of interns facing endless rooms of filing cabinets, highly specialized data analysts are facing unknown terabytes of data to sort through, stored in many different places, none of which is as easy to find as a cardboard box hiding on the top shelf. 

Meme image showing a stick figure climing a mountain with the text "Digitization" in the top half of the image, then showing the same figure is actually just climbing a small part of a huge mountain the bottom half, with the the text "AI Optimization" written on it.

To further complicate the matter, all of this data has been organized along different logical lines, as discussed last week in Part One of this series. 

To help the professionals in their mountain climbing, enter: Ontologies. Or Enterprise Architecture. Whichever one you prefer. They are different, which is the subject of many contentious debates, but the non-PhD-level bottom line is that both refer to the creation of a list of terms into which each piece of data your organization holds can be sorted. This overall sorting methodology is crucial to building useful AI workflows because it stitches together all the unique logic that underlie the older data sets. Seems reasonable! 

But, another problem: data very frequently “fit into more than one category, and belong to several groups at once. It [isn’t] an exact science.” (Packer, Neil. One of a Kind. 2020). This is called a many-to-many relational state. 

Meme image showing Willy Wonka drinking from a flower that's a tea cup with the words This is a flower which is also a tea cup. Materials usually limit application, but not here, in this case. Material is immaterial. written on it.

Let’s stop a moment and ground ourselves: many-to-many relational states might sound abstract; I assure you, they are not. You exist in a many-to-many relational state everyday: you might be a parent, a child, a sibling, a professional colleague, an athlete, and a hobbyist gardener all at the same time. And look at you! Stable as a Helium atom. So, as you can see, relational states are all around us, and we navigate them just fine. 

Because we navigate many-to-many relational states okay, it doesn’t seem like it should be much of a problem for an AI, either, and it’s not, under the right circumstances and given the right direction. The problem is sorting the data into this state. 

The impulse is to try to create lists that are true, but that presupposes that the data and their placement fit neatly into true and false states in regards to the taxonomic terms. Brain bend incoming: frequently, the data does not fit neatly into true and false states, even in a many-to-many organizational framework. You, after all, do not cease to be a sibling when you begin to be a professional colleague, nor is either state better or worse or of higher or lower value than the other. So instead of conceptual neatness, there are multiple trues and multiple falses in any data set, many of which overlap each other simultaneously, depending on how you’re approaching the data at the moment, and many of these are not of higher or lower value than the others.* 

The taxonomy, therefore, cannot be arranged neatly along data that is true in one place and false in another, nor hierarchically along axes of better or worse, higher value and lower value.  

For this reason, organizations and data professionals must relinquish the quest for truth in data organization and embrace a principles-based approach. As a start, any data taxonomy must:

  1. Be human-centered, that is, designed for the humans using the AI workflow. This means creating a taxonomy that’s arranged in the way that makes the most sense to the most people who use it. 

  2. It cannot be too long. A taxonomy of 10,000 terms isn’t going to be that much use, if only because you lose the efficiency of having an AI do some of the work. 

  3. It must be constantly worked on. Upkeep and evolution of these taxonomies needs to be considered a required cost for all organizations running AI workflows and RAG models on their data. The boffins running the taxonomy will need to constantly tend to it to ensure it still makes sense and is evolving in a way that reflects the business, technological, and organizational logic prevalent at the time. 

Of course, many organizations will simply adopt AI in the same way they made the leap to digital: by taking what they can when they have to and not really worrying about the rest. But in a world where a few glimmers of clarity can mean beating your rivals to market or gaining that market edge, the smart companies will not only invest in the fastest tech—they’ll throw money and intelligence at the tech’s tuning to make sure it keeps performing. 

*For more on this concept, see Edward de Bono’s Parallel Thinking (Vermillion, 1994). His position on the arbitrary nature of truth go a bit far for me insomuch that I don’t think he adequately bounds his statements at the point of physical reality, even though physics isn’t perfectly rational, but that’s just a quibble. Overall, the book is super amazing and useful when thinking about how to organize concepts in modern business and culture.