Taxonomies in Content Graphs - Ian Piper - Taxonomy Talk
Ian Piper demonstrates building semantic content graphs using SKOS taxonomies and APIs, highlighting corpus analysis and human-centered tagging.
Taxonomies in Content Graphs: The Importance of Taxonomies in Building Graphs
Resource information
Presenter: Ian Piper
Event: Taxonomy Talk
Date recorded: April 25, 2025
Duration: 53:08
Format: Presentation
Language: English
Original audio: English
Subtitles: English
Transcript: Available (English)
Tags: structured content, information models, entity extraction, SKOS taxonomies, content graphs
Summary
Ian Piper demonstrates how to build content graphs by combining structured content with SKOS taxonomies, using APIs and semantic technologies to create rich information networks. He walks through the complete process from content modeling and taxonomy development through corpus analysis and entity extraction, showing practical examples using content management systems and tools like Drupal, PoolParty, and GraphDB. This presentation is essential for information architects and content strategists who want to move beyond simple tagging to create sophisticated, queryable knowledge systems that reveal deep connections between content and concepts.
Key Takeaways
Break content into atomic structural components - Don't treat content as simple text blobs; decompose articles into granular elements like titles, authors, abstracts, and classifications to enable richer graph connections and better semantic linking.
Use corpus analysis to tailor taxonomies to your content - Start with a basic taxonomy, then run corpus analysis against your actual content to identify gaps, find unused concepts, and discover new terms that should become taxonomy concepts through an iterative refinement process.
APIs are essential for content graph architecture - Every component (content management system, taxonomy tool, triple store) must have robust APIs to enable the middleware connections that make content graphs functional and maintainable.
Human judgment beats auto-tagging - While entity extraction can suggest taxonomy matches for content, human experts should make final tagging decisions because they understand context and aboutness better than automated systems, avoiding false positives.
Content graphs enable multi-dimensional exploration - Unlike traditional hierarchical navigation, RDF-based content graphs allow users to explore information by any relationship dimension (by author, by concept, by project) using query languages like SPARQL, revealing unexpected connections between content pieces.
Summary and key takeaways generated using AI assistance and may not capture all nuances of the original presentation.
Video
Transcript
Transcript (English)
Transcript is generated and cleaned up for readability using AI.
00:00:10.000 --> 00:00:24.000 Bob: I don't know. I'll take your word for it. Okay, it'll be fine. All these things seem so spectacularly easy until like the last second when you're trying to implement them.
00:00:26.000 --> 00:00:40.000 Bob: So we're just going to give people another minute or two to filter in here from whatever their mornings or evenings are. How do I do that, Connor?
00:00:40.000 --> 00:00:47.000 Connor: You should be able to see it in the participants pane, I believe.
00:00:47.000 --> 00:01:03.000 Bob: Got it. Okay, there's Ian, and I can say make co-host. And I can say there's Connor Cantrell.
00:01:03.000 --> 00:01:06.000 Ian: I see that I'm a co-host now.
00:01:06.000 --> 00:01:11.000 Bob: All right. The system works.
00:01:11.000 --> 00:01:28.000 Bob: Let's make sure and thank Grace. Grace is doing tech support from her car in the middle of a national park somewhere in the western United States.
00:01:28.000 --> 00:01:52.000 Bob: Okay. I think people are starting to filter in here. So let me just do some quick introductions and housekeeping. Thanks for joining us this morning, afternoon or evening, wherever you are in the world. My name is Bob Kissenchak, and I'm hosting and moderating this Taxonomy Talk event today. Thanks for coming and thanks. Hopefully you all heard about it through Taxonomy Talk.
00:01:52.000 --> 00:02:31.000 Bob: Our speaker today is Ian Piper, who... Ian kindly volunteered to give this talk today. Ian is the director of Telluride Information Services. He specializes in designing and building information models, ontologies, and taxonomies for content classification.
00:02:49.000 --> 00:02:54.000 Ian: Bob, you've frozen.
00:02:54.000 --> 00:03:09.000 Connor: No, I think it's on Bob's end. Ian, I mean, we could give him a minute to see if he can rejoin or unfortunately I don't have access to your intro information. If you maybe like to finish your own introduction.
00:03:09.000 --> 00:03:48.000 Ian: Of course. Yes, I know some of you from the names I see on the screen here. Probably very few of you know me. My name's Ian Piper. I run a small business in the UK building taxonomies and ontologies for a range of different client types. I'm not going to go into huge detail over that. If you're interested, get in touch with me and we can talk further. But I would suggest that we should perhaps just get on with the talk. Is that okay?
00:03:56.000 --> 00:04:22.000 Ian: I'm going to run this presentation as a keynote slideshow for the most part, but I'll be stepping out a couple of times to show you applications that I'm using for content graph purposes as illustrations of what I'm going to talk about. For the most part, it's a slideshow, I'm afraid.
00:04:32.000 --> 00:04:54.000 Ian: So I'm going to share my screen. Despite appearances, I'm really not that technologically savvy. I'm hoping that you can see a screen with the title "Taxonomies and Content Graphs." Could somebody just confirm that you can see that?
00:05:08.000 --> 00:05:55.000 Ian: Great. So that's the title of the talk and what I'm going to be talking about today is content graphs, which I'll explain in detail - my own perspective on content graphs, which may differ from those of others in the meeting. So I'm going to give an overview of content graphs as I understand them. I'm going to be talking about how you go about designing content with taxonomies in mind. And then I'm going to talk about building a taxonomy that is tailored for content. And then finally, using all of that and a tool called entity extraction to actually build a content graph.
00:05:55.000 --> 00:06:25.000 Ian: So essential components of a content graph. Well, the core of it is content with a well-defined structure. And I'll talk a little about what I mean by that here. But essentially, I mean anything but a simple blob of text that contains all of the information for a content article or whatever. I mean split down into logical structural components.
00:06:25.000 --> 00:06:51.000 Ian: Second component is a taxonomy of concepts. And when I say taxonomy, I'm almost always talking about a SKOS taxonomy. And I'll describe that in a little more detail later. But a key point, if we're talking about content graphs, is that the concepts, the taxonomy of concepts, should be tailored to the knowledge domain of the content that you're working in.
00:06:51.000 --> 00:07:24.000 Ian: You need a process for tagging your content objects with taxonomy concepts. There's something that surprises me - it's so often overlooked that people will create, organizations will create a taxonomy and will have content, but really don't spend much time linking them together. Or when they do, they do it in a very simplistic way. I don't think we should do it in such a simplistic way, and I'm going to talk about that today.
00:07:24.000 --> 00:07:39.000 Ian: When you create concepts typed content, you need a place to store that - what I call a tagging fact. You need to store that information, and that's another thing that we're going to talk about later.
00:07:39.000 --> 00:08:12.000 Ian: And a really key important component for me is that you need an application programming interface for each of these components, simply so that they can talk to each other. In the old days, APIs just didn't exist. Nowadays, many of the systems that I work with, I look for an API as one of the first things so that I know that when I build an external application or when I work with another application, it knows how to talk to it.
00:08:12.000 --> 00:08:50.000 Ian: So anyway, here's a schematic of what I'm talking about when I talk about a content graph. It's a collection of information objects, each of which has its own structure and each of which can be linked to other information objects in a network. So this might include text objects which have an internal structure in a content management system. Might include things like authors. Those might be in the same content management system, of course, but it makes sense to store them separately, as we'll discuss a little later.
00:08:50.000 --> 00:09:18.000 Ian: Might be organizations. These are all valid content as far as I'm concerned. And it might be projects. I've worked with all four of those components in some of my client projects. But tying them all together, gluing them all together, are concepts - taxonomy concepts.
00:09:18.000 --> 00:10:07.000 Ian: A content graph can be made up of a number of different components and I've got a schematic here illustrating that idea. We've got on the left content objects and those could be your narrative text content. They could be things like author information or any of the other things - projects, organizations, other such things that I've mentioned already. They'll contain taxonomy concepts, and that will be a SKOS taxonomy. In order to give you all of the richness that you need in order to have a really capable graph.
00:10:07.000 --> 00:10:39.000 Ian: You need a triple store, which is where when you've created your tagging facts, you need a place to store them and that triple store is where you would store this information. But how that's all pulled together is bespoke middleware. And I say bespoke because I'm unaware of any commercial tools that do this kind of thing out of the box. And that's why I believe that APIs are so important in gluing everything together.
00:10:39.000 --> 00:11:08.000 Ian: Each of these tools that I've got on screen here - my content management system in the example I'm going to show you is Drupal, and that has a number of APIs actually. The taxonomy is provided by PoolParty in this particular case, and that has a RESTful API as well. And then the triple store comes from GraphDB. GraphDB has a number of APIs, and I've used these in my work.
00:11:08.000 --> 00:11:54.000 Ian: When we're talking about content type, which is the first component that I want to talk about - I apologize in advance for what I'm saying is impossibly trivial, but you know, sometimes this is overlooked - if you are building content in a content management system, structural components in that content are really important. When you are looking to build content into a content management system, particularly as objects, you need to be looking at things like the structural components that you can break that content down into: titles, authors, teasers, full text, any existing classification and so on.
00:11:54.000 --> 00:12:14.000 Ian: Once you've inspected that, that tells you the kind of granular atomic components in your content object that you want to take hold of. And each of those, incidentally, could be subject to classification against a taxonomy.
00:12:14.000 --> 00:12:36.000 Ian: Then you create within your content system content types. And as I'll explain in a moment, alongside a content type, you create an information model that gives you an RDF representation of that content type.
00:12:36.000 --> 00:12:53.000 Ian: With all of the components of a content graph that I'm going to talk about today, URIs are really important. Unique identifiers are really important and they should be URIs because all of this stuff uses the web one way or another.
00:12:53.000 --> 00:13:23.000 Ian: When you're building a content type, it's worth looking at how you can aggregate the individual items in a content type together to make larger content types or collections of content. Since I use Drupal, these things are called views in Drupal - the way that you collect together pieces of content in order to create, say, a report or to export an entire set of content using an API.
00:13:23.000 --> 00:13:49.000 Ian: I mentioned API and I'm going to mention it several times more. It's important to look for the opportunities to use an API as you're developing your content, both for importing content into your content management system and exporting it to other systems. And I always tell clients to check for an API when you select a content management system.
00:13:49.000 --> 00:14:21.000 Ian: Now, in this particular example that I'm showing today, I use as my source of content a really useful academic content source and this is called arXiv. And it is produced by a US university, and I'm struggling to remember which one it is at the moment, but I'll remember it at the end, I'm sure.
00:14:21.000 --> 00:14:42.000 Ian: We're looking here at a piece of content that I used to extract and bring into a content management system. I want to show how when you look at a piece of content like this, you can break it down into its different parts. So this has a title.
00:14:42.000 --> 00:15:07.000 Ian: It has a set of authors about which I'm going to talk again a little more. A teaser or an abstract or a summary. This gives access to the full text of the content item article. And it contains tags or subjects or classification items.
00:15:07.000 --> 00:15:27.000 Ian: And so that makes, if you can get that kind of breakdown of content and particularly if you can fire an API request at it and get this back as atomic content, then part of your job's already done.
00:15:27.000 --> 00:15:57.000 Ian: Having broken down visually the content that you've seen in the web page, building a content type allows you to determine the name, the format, the cardinality and so on of the particular item of content that you're dealing with - so an abstract or an ID or an author or whatever category, subcategories.
00:15:57.000 --> 00:16:26.000 Ian: And as I said earlier, alongside the content type, you should really be thinking about an information model, an RDF model that mimics that, that parallels it. And the reason for that is that later on we're going to be using taxonomy concepts to link to content items. And the kind of semantic glue that we're going to use to link them together is created for us by the information model that we're creating here.
00:16:26.000 --> 00:16:59.000 Ian: So you can see in the screenshot that this content graph model has content graph items, content graph persons, content graph projects as classes, and it has a set of class relations that can be used to glue these together. So you can see "author of," for example, and "has author," which are inverse relations that link together a content item and an author.
00:16:59.000 --> 00:17:47.000 Ian: I want to talk a little bit more about authors because in this particular case - it's not always the case - but in this particular case, authors presented a little bit of a problem because the author field where I'd like to find an author actually contains a comma-delimited list of all of the authors. And that means you have to do some extra work to parse the data out. And so you can see in this particular example, we've got a human will look at this and see two authors: Jalil Chafai and Didier Concordé. They'll also see that the stuff in brackets after each author name, and you might infer that that's an institutional affiliation. It is, in fact, it's the University of Toulouse.
00:17:47.000 --> 00:18:09.000 Ian: Ideally, this is exactly the kind of information that you'd want to be parsing out into smaller blocks. So when you're doing your content analysis, you might be looking at authors and try to create individual authors with their the various components of their names broken down. If you're in an organization and you're building your own content graph, you've got much more control over this, of course.
00:18:09.000 --> 00:18:41.000 Ian: But just to spell it out, a name could be any of these. These are seven different renditions of my name - not all by any means that have ever been used. And a human can look at these and say, this is probably the same person. But a simple program won't be able to do that.
00:18:41.000 --> 00:19:12.000 Ian: And I'm using this as an argument for saying whatever information type you're inspecting when you're building a content type or an information model, break it down as far as you can into logical units. So first name, last name, middle name, initial, title, alternative title, institutional affiliation and so on. Now, that's all I'm going to say about content for the moment.
00:19:12.000 --> 00:19:30.000 Ian: Time to move on to taxonomies, which is what a lot of us are interested in in any case. When I think of taxonomies or structured taxonomies in particular, I think SKOS. It's the kind of de facto standard for building taxonomies, at least in my work.
00:19:30.000 --> 00:19:56.000 Ian: And SKOS, which stands for Simple Knowledge Organization System, is itself based on RDF - the broader RDF ontology. It's an example of RDF. And when you create a taxonomy, you're creating instance information based on the SKOS ontology.
00:19:56.000 --> 00:20:25.000 Ian: A SKOS taxonomy will contain URIs as immutable unique identifiers. And that's really important because it's the handle by which you get hold of a taxonomy concept in the future. And it's how you build the content graph. Being an RDF graph, it links URIs together. My process for doing this kind of work is to build a basic taxonomy by fair means or foul as a starting point.
00:20:25.000 --> 00:20:56.000 Ian: Then to elaborate that taxonomy using one of a variety of tools, and I tend to use corpus analysis, which I'll show you in a moment. Within your taxonomy, look for opportunities to link concepts together. This is important because later on when you've got a content graph, you may want to inspect the concepts in a content graph and find out what other concepts they might be related to.
00:20:56.000 --> 00:21:40.000 Ian: And think about how this is going to be consumed by an external application. As I said a little while ago, so often that point's overlooked. That you build a taxonomy, but you don't really explore how it's going to get used. For example, one of my clients recently built a very complex SKOS-based taxonomy which had the opportunity of being used to create a very sophisticated, rich, graph-based information. But then they only ever use the preferred labels from that taxonomy, which they exported, put into Excel, and then used to create a keyword list in the taxonomy. Feels like a bit of a waste of opportunity to me.
00:21:40.000 --> 00:22:19.000 Ian: So building a basic taxonomy. The starting point is to look at the knowledge domain of the content that you want to deal with. If you're building a general purpose taxonomy out in the wide world, of course, you want it to be general purpose. But within an organization, you have the luxury of being able to say you can put corners around your content and say, I want my taxonomy to service this content.
00:22:19.000 --> 00:22:45.000 Ian: When you're looking at your content, think about what the aboutness of it is. Now, at this point, I should say the word "aboutness" I got from one of the people on this call, Yona Levinson. I learned that from her many years ago, and I'm so glad she taught me that word because I use it all the time now. It's a nice kind of summary word that says, what is this content about?
00:22:45.000 --> 00:23:34.000 Ian: I'd look for opportunities for using existing keywords. And I've shown you in that screenshot of the arXiv content earlier that they use a subject classification. And that's great. That's a good starting point for building your basic taxonomy. And I would also look for search logs - what are your users looking for? What are the things that are in their minds when they come to your site or come to you or use your system? Subject matter experts are really important. They're often the best judge of aboutness of content. And there are some practical tools like card sorting that you can use to build this basic taxonomy.
00:23:34.000 --> 00:24:01.000 Ian: What I've got here is a basic taxonomy that I derived from the subject classification that arXiv uses. It's a two-level, fairly straightforward taxonomy. And so it's a subject categorization really. So that would be the starting point for elaborating and building a richer taxonomy using corpus analysis.
00:24:01.000 --> 00:24:44.000 Ian: So what is corpus analysis? Corpus analysis is a process that enables you to enrich an existing taxonomy and tailor it for a body of content that you know well. So you start with your basic taxonomy, you identify a body of content which is a corpus that is suited to the knowledge domain you want to work in. And then you run the analysis. I'll show you this in a moment. I've got a demo that I can show you. But the purpose of this is to find matches between the content and the concepts and to look for gaps that you can then fill.
00:24:44.000 --> 00:25:21.000 Ian: So at this point I'm going to step out of the presentation and into a corpus analysis. And you're going to have to bear with me for a moment because I'm afraid I've been timed out and I have to log back in again. [Demo time] So here I am in my taxonomy and I've created a corpus. The corpus contains about 1200 documents. And that's one of the things that inevitably you need to do when you're creating when you're doing corpus analysis. The more documents you can bring to bear, the better, because it gives you the wider scope of relevant content.
00:25:40.000 --> 00:26:18.000 Ian: I won't go through the actual process because it takes sometimes hours to run. But this is the result that you get back. And it tells you some really useful things. First of all, there are around 400 or so concepts in my taxonomy, and you'll see that 392 of those have been found. And that means that of the 400 plus taxonomy concepts, this corpus analysis has found most of them across the content. And that means it's relevant content. But there's some other things we can look for here.
00:26:18.000 --> 00:27:06.000 Ian: We can look at the extracted concepts. These are the concepts that have been identified as part of the analysis. And you can look at a variety of things. One of these is concepts that have been found in the corpus, which is illustrated here. And those are cases where there's a concept in the taxonomy and there's a piece of content, one or more pieces of content that match, and you see a frequency of occurrences as part of this table of information. And this is useful to kind of give you confidence that the taxonomy that you've got matches well to the content.
00:27:06.000 --> 00:27:52.000 Ian: You can also look for concepts not found in the corpus. And this is also really useful. Because what it's saying is basically you've got a taxonomy concept but you've got no matching content in your corpus content. And that could be for a variety of reasons. It might be that you haven't got a sufficiently broad corpus. It might be the taxonomy is wrong. Or it might be that you've got things like portmanteau concepts, which we've got here. You can see I'm highlighting "comets, asteroids and meteorites" as one of the concepts. Now that's a concept in the taxonomy, but there's no matching content.
00:27:52.000 --> 00:28:18.000 Ian: And that's simply because that string doesn't occur in any of the text-based content within the whole of the corpus. In circumstances like that, I would be tempted to break that portmanteau concept down into three concepts: comets, asteroids, or meteorites. And I would predict if I did that and then re-ran the corpus analysis, I'd get a higher hit rate.
00:28:18.000 --> 00:28:54.000 Ian: And that's a key point, that this is iterative. Creating and running a corpus analysis - you start off with a very simple taxonomy and then you use the corpus analysis to enrich it, to give it, to bring in greater number of candidate concepts. And that leads me to the last bit of information you have in a corpus analysis, and that is extracted terms. Extracted terms are content that appears frequently in your corpus.
00:28:54.000 --> 00:29:36.000 Ian: But that has no matching concepts in the taxonomy. For the most part, these are things that are never going to be in a taxonomy. For example, the word "almost." It may occur many times in your corpus content, but that doesn't mean it means anything. But there are - at least in the initial iterations. By the way, I'm showing you something here that is the endpoint of about four iterations of corpus analysis. So by the time we get to this, there are really diminishing returns. So a lot of the extracted terms that could ever have been promoted into concepts have already been done so.
00:29:36.000 --> 00:30:10.000 Ian: And that's really the point of the extracted terms view of a corpus analysis. It tells you things that could be concepts but haven't been identified yet as part of the taxonomy. So that's all I wanted to show you on that, I'm just going to go back into my presentation now. [End demo]
00:30:10.000 --> 00:30:42.000 Ian: So this is a screenshot showing the stuff that I just showed you there. So to recap where we are at the moment. Can somebody just confirm, can you see the screen with the word "recap" at the top? [Confirmation from audience] Okay, good. Sorry, I'm a little bit paranoid about multi-screen presentations and just want to make sure I'm showing you the thing I hope I'm showing you.
00:30:42.000 --> 00:31:20.000 Ian: So to recap where we are, we've got a set of structured content in a content management system that has been created as a series of effectively content objects. And we've modeled that in RDF so that we can then have the tools to link it to taxonomy concepts. We've got those taxonomy concepts in a structured taxonomy. And having done corpus analysis, it's tailored to the content. So we've got the components of a graph now, and we can go ahead and build that graph.
00:31:20.000 --> 00:31:54.000 Ian: Now, sadly, there are no commercial offerings that I know of. I could be wrong, but I don't know of any commercial tools to build and navigate content graphs. And so, unfortunately, it's a DIY process. And I've got to confess at this point, I am not a software developer. So the middleware software I'm going to show you later on is a little bit clunky and amateur. But it works.
00:31:54.000 --> 00:32:24.000 Ian: To build your content graph, build it on systems that have well-characterized RESTful APIs so that you know how to talk to them and get reasonable answers back. You retrieve content objects from your content management system using an API, and that means you can get back the components of content aggregated however you need to into larger bodies.
00:32:24.000 --> 00:32:57.000 Ian: And then you run a process to match those content objects to taxonomy concepts out of your taxonomy management system. And those are things that I call tagging facts. You've got a piece of content, you've got a taxonomy concept, and you've got a match between them. And that match is powered by your information model. So you've got "content article has author so-and-so," "content article has subject taxonomy concept."
00:32:57.000 --> 00:33:14.000 Ian: Having created those matches, you then have to put them into a form that you can make use of. And you would do that by putting it into a triple store, which is the next bit that I'm going to talk about.
00:33:14.000 --> 00:33:42.000 Ian: Just to reiterate the slide I showed earlier. What we've got here is content objects that we extract into middleware. We've got taxonomy concepts that we extract and match to the content. That creates tagging triples, which we then store in a triple store. And the way I do that in this example is by using something called entity extraction.
00:33:42.000 --> 00:33:59.000 Ian: That's a feature of the taxonomy system that I'm using. And basically it's a way of creating a match between the taxonomy concepts and content. And if that sounds a little bit like corpus analysis, I think it rests on the same sort of technology.
00:33:59.000 --> 00:34:32.000 Ian: You analyze a piece of content, you hold it next to a taxonomy and you say, where are my matches? What taxonomy concepts are present in this body of content? The extractor, therefore, as a minimum needs to load your content system and it needs to load your taxonomy. The particular one that I use is in the PoolParty taxonomy system. And I'm not going to show you that for the simple reason that there isn't really a good user interface for it. It's really used through its API, which is how I've used it, and I'll show you that in a moment.
00:34:32.000 --> 00:35:17.000 Ian: So what you need to do in order to do this DIY development is to create an application that feeds in the taxonomy and feeds in the content, matches them up together, and then creates a sort of tagging triple, tagging fact. And by the way, I need to emphasize this last point - this is not auto-tagging. I don't believe in auto-tagging. It has too many false positives. The human being has the best insight and should have the final say.
00:35:17.000 --> 00:36:06.000 Ian: So this rather clunky screenshot shows the application that I developed. And this allows you to, if you look in the left panel, this is a set of content items. In the middle is the currently selected content item. It loads the full text from a PDF and it then fires that content off to the extractor and gets back a list which is shown in the right-hand column of matching concepts. So what this is doing is saying this article here, "Online Learning in Discrete Hidden Markov Models," could be about these things, this list of things on the right. And you may be able to read them here: Bayesian decision-making, Markov model, neural networks, machine learning. Those are the items I've ticked.
00:36:06.000 --> 00:36:36.000 Ian: Out of a list of potential classifications for this content. You'll see there's a lot that I haven't ticked, and that's kind of an illustration of my belief that the human being is the best placed for making that decision, making that choice.
00:36:36.000 --> 00:37:05.000 Ian: What you get out of that is shown in the next screen, which is - and I apologize, it's rather difficult to read - is a list of triples. And I've broken these triples out for you here further down. I'll just explain what each of them is. So we've got a subject for a triple. I suppose I should mention that a triple is going to be made up of three components: a subject, an object, and a predicate. The subject is the thing that's doing something. The object is what's being done to, and the predicate is what the action is.
00:37:05.000 --> 00:37:59.000 Ian: So we've got a subject, we've got an object, and we've got a predicate. And the subject in this case is a content graph item, and you can read through the URI here to see that there's a base URI for the model. And it's a class of content graph item, and there's a specific instance here, which is... it's called "name 1211" and that's the original arXiv ID. So what I've done is minted a URI based on the original arXiv ID so I can get back to that if I need to. At the other end, the object is a taxonomy, and in this case the taxonomy is called "Telluride content graph" and then the particular instance of a concept - apologies - has got this long UUID.
00:37:59.000 --> 00:38:47.000 Ian: So in effect, we're saying this content item as a subject has a subject that is illustrated by this taxonomy concept. And the process of emitting all of these triples is what essentially you're doing when you're taking a body of content and tagging it. I'm sorry, we're going to have to go through these labels again. I told you I'm not particularly technically well-versed. So the last stage in building the content graph is actually build it - import these triples into a graph database.
00:38:47.000 --> 00:39:22.000 Ian: The format that I used in the last screen for that set of triples - there's something called N-triples. And that's quite useful if you need to stream out a whole range of semantic triples to use somewhere else. So the process I went through here - it's not what you do in a real world situation. I exported a file and I imported a file. In the real world, you'd use an API to do this directly. And in fact, I have done that in GraphDB. But for illustration purposes today, I want to show you exporting file, importing into GraphDB.
00:39:50.000 --> 00:40:29.000 Ian: Once it's in GraphDB, this is what the collection of triples looks like. It's similar to what we had before. On the left hand side, we have a subject. Then we have a predicate, then we have an object. You may see looking at this that we've got - this is actually a prefix that's used in the graph database that allows - it's a shorthand for the full base URI for the content. And we can see a range of triples here. I'm just going to step out again and show you some stuff in GraphDB.
00:40:29.000 --> 00:41:26.000 Ian: [Demo attempt] So this is GraphDB and you can see this table here is the same thing I was just showing you in that slide. And so this is a set of triples, and the first few you can see are all framed around the same subject. So we're determining what type it is - it's a Telluride content graph item. We can see a label, we can see a title which happens to be the same as the label. We can see the author. And the author here is pointing to another object in the system, which is a Telluride content graph person, and that is the URI pointing to that person. And so on.
00:41:26.000 --> 00:42:28.000 Ian: So this is now your graph. This is a content graph. And while it may not look any particularly more useful than what we've already looked at, it is. Because since this is an RDF graph, we can explore it in a range of different ways. One of the nice things about RDF and why I like using it is that the nature of the triple store is that you can explore it by any dimension of the triples. So you could, for example, say "Show me every RDF triple that has this as its subject." "Show me every RDF triple that has a 'has author' predicate" and so on.
00:42:28.000 --> 00:42:47.000 Ian: You can explore this using SPARQL. I'm no SPARQL expert, I have to say. I don't know why it's not showing me my update. Apologies for this. Maybe I'm not going to be able to show you that. Oh, this is interesting.
00:42:47.000 --> 00:43:29.000 Ian: Well, we've hit a bit of a technical glitch here. Okay, well, I guess I better step out at this point. [Technical difficulties] I'm sorry. It looks like I can't get into this data. It looks like I've got to do something with my license. I think I know why that is. I think that GraphDB has just been released in a new version and the old version is probably now not working. This was working two hours ago.
00:43:29.000 --> 00:43:40.000 Ian: Okay, I'm going to step out back into the presentation because I think we're not going to get anywhere with that.
00:43:40.000 --> 00:45:03.000 Ian: So I can't unfortunately show you a SPARQL query or the results of the query, but take it from me, you can use SPARQL or GraphQL or some other query language to explore your triple store. You can also do it visually, and it's a real shame that that demo I just tried to show you didn't work because the visual graph is a very good way of exploring a content graph. And what you can see in this content graph here - the different colored blobs we have here - the pinkish red ones represent our articles or content graph items. The blue represent authors. The yellow represent taxonomy concepts. So you can see that this graph has got links between... We've got a concept here that is linked to five different articles. We've got another concept here that is linked to two articles and so on. And we could elaborate, if I was able to show you the real thing, you can elaborate this out to show other connections and explore further in your graph.
00:45:03.000 --> 00:45:11.000 Ian: Okay, I'm coming to the end of my talk now.
00:45:11.000 --> 00:45:55.000 Ian: Hopefully, I've demonstrated that a content graph is a structured information source based on different components of content and taxonomies and semantic glue to hold it all together. That you can use SPARQL or GraphQL to extract information from different dimensions, which I haven't succeeded in doing. I'm sorry about that. You can create external programs that can talk using APIs to navigate your graph. And you can use the object and data properties in that graph to gain further information using something I haven't really covered at all today, which is inferencing.
00:45:55.000 --> 00:46:39.000 Ian: Everything that I've shown you in the graph so far is asserted information. But I mentioned earlier on that it's quite nice when you're building a taxonomy to be able to have concept-to-concept links. And this allows you to use some inferencing to when you've put this into GraphDB to say, "What other concepts are like this concept? What other concepts are used to tag this content?" And looking then at that content, explore further at what other concepts that uses, and so on.
00:46:39.000 --> 00:47:31.000 Ian: You can also explore the use of machine learning, though I've got to put my hand up and say that I am no fan of AI. And I'm an AI skeptic. While I believe in machine learning, and there are some pretty good sound algorithms for machine learning, I am not a particular advocate of AI in content graphs. I may be proven wrong. Wouldn't surprise me at all. And that's as much as I'm going to say. I've written a series of articles that go into a lot more detail on content graphs and that's available at my website. There's a contact here for me if you want to get in touch. And I want to particularly thank GraphWise and Cornell University for making these particular resources available during the work that I've described here.
00:47:31.000 --> 00:47:47.000 Ian: And thanks for listening.
00:47:47.000 --> 00:48:27.000 Bob: Thanks, Ian. That was fantastic. I think I agree with almost every single thing you said, which is fantastic. That's not always the case. There are a couple of questions popping up in the chat. I'm going to invite other people to add things to the chat. We have 10 minutes or so to get through them. So I'm going to start here with the question from Temitayo, and excuse me if I didn't pronounce that quite right. He was wondering if he can use, this is back to when you were doing term extraction with corpus analysis. Can he use or have you ever used WordNet for corpus analysis? Can you talk a little bit about that?
00:48:27.000 --> 00:49:23.000 Ian: Yeah, well, I think you can. I haven't. So I think it could be an extremely good source for building taxonomies. WordNet, however, is general purpose, so you're going to get a general purpose taxonomy out of it. If you've got - I think it's probably worth just re-emphasizing that I found this with organizations - if you try to apply a general purpose taxonomy to a very narrowly focused knowledge domain, it's not always that successful. Because you may get when you do this sort of corpus analysis step, you get some false results back because you're trying to apply really general purpose terms to really narrowly specified content.
00:49:23.000 --> 00:49:40.000 Bob: I completely agree with that. And then Bonnie asks if you could talk about, and I think this is probably subjective at this point. What do you see as the difference between a concept graph and a knowledge graph?
00:49:40.000 --> 00:50:32.000 Ian: Yeah, well, it's a question of specificity, really. A content graph is an example of a knowledge graph. I use the term though to emphasize the importance of narrative content. A knowledge graph more generally simply is a way of linking stuff to other stuff, structured content or structured information to other structured information, and it's general purpose - could be anything. A content graph is kind of centering the domain in systems that use narrative content. And I would say more specifically that use structured taxonomies for tagging, because I think that's a crucial part of it.
00:50:32.000 --> 00:50:49.000 Bob: And then, and I think the question was actually, and maybe I read it poorly, between a concept graph and a knowledge graph, I suspect your answer is the same, is that it's a matter of specificity.
00:50:49.000 --> 00:51:00.000 Ian: Yeah, I'm sorry, I misheard you. I thought you said content graph.
00:51:00.000 --> 00:51:00.000 Bob: No, I might have just mispronounced it. So Michelle asks, where do ontologies fit in? And I think this is probably a complicated answer.
00:51:00.000 --> 00:53:44.000 Ian: Well, yeah, I mean, the information model that I showed partway through the talk, very simple information model, was a custom ontology. But I'd like to just cite one of my current projects. I won't say who the client is, but the client wants an ontology that effectively acts as a template or a recipe for the particular type of information that they're working with. So it doesn't have to have instances built in, but you can. There's no reason why you can't have an ontology that contains individuals. And one of the nice things about having individuals in your ontology is that you can do things like SHACL to test the validity of your ontology. So in summary, I would say an ontology only needs to be a template or a recipe or a framework for a knowledge domain. It can optionally also contain individuals that are instances of that ontology. So as an example, I mentioned SKOS taxonomies earlier on. So the taxonomy that I built using PoolParty in this particular case - that is a set of instances that conform to the SKOS ontology. In order to conform to the SKOS ontology, you only need to have a set of two or three classes that you need to have, a set of object and data properties. But when you build a taxonomy in PoolParty as I do, you're creating a whole bunch of individuals that conform to that ontology. I'll just mention one other thing there, which is that quite often when I'm building a taxonomy like this, I find that SKOS isn't enough. And so I build a custom ontology that I can then apply to the taxonomy. And that's the beauty of RDF, of course, that you can import other ontologies into the one that you're working with. So you can expand the capabilities of your taxonomy.
00:53:44.000 --> 00:54:28.000 Bob: Yeah, I think the sort of mix and match of things that you grab from various namespaces is an underrated feature of ontologies. I guess I would also maybe say that from a slightly different point of view, if you turn your head and squint, knowledge graph is just like a less scary word for ontology, depending on how you're thinking about the parts and the wholes and how they all fit together. Anyone else, while we have Ian here for about five more minutes, have any other questions that they want to throw in the chat? Andy Fitzgerald says a knowledge graph is an implied ontology, which I think is a nice way of thinking about it.
00:54:28.000 --> 00:54:34.000 Ian: Yeah. Agree.
00:54:34.000 --> 00:54:45.000 Bob: Ian and everyone. Sorry. I have a question from Heather. Is there a text that you would recommend for someone getting started in taxonomies?
00:54:45.000 --> 00:55:35.000 Ian: Oh, well, one of my colleagues, Heather Hedden, has written a book called "The Accidental Taxonomist," which is pretty good. There is a - there's a knowledge... let me see if I've got the title right. Semantic Web Company wrote a book called "The Knowledge Graph Cookbook," I think it's called, which is pretty good. And that touches on ontologies and taxonomies. And I mean, if you're really struggling, go to my website because I've got 30 or 40 articles on taxonomies and ontologies and the like there, which I hope are reasonably digestible.
00:55:35.000 --> 00:56:58.000 Bob: And I'll use this opportunity, Heather Epstein-Diaz, if you're not already, I'm going to throw in the chat here. Probably most of you came here through the Taxonomy Talk Discord server. If you're not already in the Discord server, it's a free community for taxonomy and other semantic practitioners. I've just thrown a link in the chat. Also, we are almost constantly soliciting for other talks for this series. If you're interested in giving a talk next quarter in this very forum, the Taxonomy Talk, if you will. I have a link for it, but I lost the handle on it this morning in all the other confusion. Please get in touch with me or Grace through the Discord, but we'll be putting out a call for other talks and announcements. Oh, thank you, Connor. Connor has dropped the proposal talk link in the Discord chat here. Again, I want to thank Ian Piper and everyone for joining us and bearing with us through the minor technical difficulties along several axes that we had. This was great. Really enjoyed the talk. And yeah, I hope everyone has a great rest of the day.
00:56:58.000 --> 00:57:15.000 Bob: The recording from the talk once processed will be distributed to people who signed up for the talk. Thanks again, Ian. Thanks, everyone.
00:57:15.000 --> 00:57:15.000 Ian: No problem. Have a good day.
Transcript highlights
On the overlooked connection between taxonomies and content:
"There's something that surprises me - it's so often overlooked that people will create, organizations will create a taxonomy and will have content, but really don't spend much time linking them together. Or when they do, they do it in a very simplistic way." (07:00 - 07:16)
On human judgment vs. automation:
"And by the way, I need to emphasize this last point - this is not auto-tagging. I don't believe in auto-tagging. It has too many false positives. The human being has the best insight and should have the final say." (35:09 - 35:23)
On aboutness (crediting Yonah Levenson):
"The word 'aboutness' I got from one of the people on this call, Yonah Levenson. I learned that from her many years ago, and I'm so glad she taught me that word because I use it all the time now. It's a nice kind of summary word that says, what is this content about?" (22:27 - 22:45)
On wasted opportunities with taxonomies:
"For example, one of my clients recently built a very complex SKOS-based taxonomy which had the opportunity of being used to create a very sophisticated, rich, graph-based information. But then they only ever use the preferred labels from that taxonomy, which they exported, put into Excel, and then used to create a keyword list in the taxonomy. Feels like a bit of a waste of opportunity to me." (21:15 - 21:40)
Related resources
Tools & references mentioned
The Accidental Taxonomist by Heather Hedden This book is a comprehensive guide to building information taxonomies, developing taxonomy standards for IA, SEO and content management.
The Knowledge Graph Cookbook by Semantic Web Company This book focuses on the methodology and practical application of building and leveraging knowledge graphs for various business needs.
Part of Collections
Taxonomy essentials - Essential techniques and methodologies for building effective taxonomies, from basic concepts to advanced corpus analysis methods.
Content strategy - Exploring the intersection of content modeling, structured data, and semantic technologies for better information organization.
Taxonomy - Real-world examples of taxonomy implementation, from corpus analysis to content graph development, showing practical challenges and solutions.
This resource is maintained by Grace Lau.
Last updated
Was this helpful?