2.Building Knowledge Graphs to Enable Agentic AI

https://subscription.packtpub.com/video/data/9781806675555/p1/video1_2/building-knowledge-graphs-to-enable-agentic-ai # **Building Knowledge Graphs to Enable Agentic AI** In this session, build a knowledge graph one data product at a time to provide the right context to AI agents and fully exploit the potential of agentic AI. This talk, **“Building Knowledge Graphs to Enable Agentic AI,”** explains why robust enterprise knowledge graphs and information architecture are essential to make agentic systems reliable in production, and how to build them in a federated “knowledge mesh” way.paste.txt --- ## 1. Why context engineering matters - Agentic systems are now built around **large language models (LLMs)** instead of purely symbolic rule engines, which makes impressive demos easy but production reliability hard.paste.txt - The main production gap is often **not model intelligence**, but **poor communication of context**: the agent is not given the right information about its environment, tasks, and state.paste.txt - LLM context windows are rapidly expanding (up to millions of tokens), but simply stuffing more data in the prompt **reduces accuracy**, increases **cost**, and increases **latency**, especially beyond ~80–100K tokens.paste.txt ## Definition of context engineering - **Context engineering** = the art and science of filling the context window with **just the right information** at each step, no more and no less.paste.txt - It extends beyond prompt engineering to include: - **Procedural instructions** (prompt). - **State** (current plan, workflow step, recent actions). - **Knowledge** (domain information, environment description, task-specific facts).paste.txt --- ## 2. Memory models for agents (human-inspired) - Context engineering requires **memory**: not all information can be kept in the context window, so it must be stored and selectively retrieved.paste.txt - Agent memory frameworks increasingly mimic **human cognitive architecture**, with: - **Short-term / working memory**: the current context window.paste.txt - **Long-term memory** split into: - **Episodic memory**: transactional trail of recent actions and experiences (task progress, interaction history).paste.txt - **Semantic memory**: conceptual knowledge of the world and domain.paste.txt - **Procedural memory**: skills and know-how (how to execute a recipe, follow a procedure, etc.).paste.txt ## Layered information architecture - Information is organized into layers: - **Data**: raw observations (e.g., sensor readings) in short-term memory.paste.txt - **Information**: data enriched with **metadata** that adds meaning and context.paste.txt - **Knowledge**: abstracted, generalized concepts and models that form a mental model of the world.paste.txt - For an agent, the **information architecture** (data → information → knowledge) must be explicitly designed and populated with what is relevant to the agent’s tasks.paste.txt --- ## 3. Enterprise information architecture vs per-agent silos - If each agent builds its own data/information/knowledge stack as a byproduct of its implementation, **every agent will model key concepts differently**.paste.txt - Example: two teams separately explain “customer” to two different agents; each encodes this concept differently, leading to **semantic mismatches** when agents must coordinate.paste.txt - To avoid this, **enterprise information architecture** should be a **corporate asset**, shared and incrementally built, from which individual agents load their relevant slice of memory.paste.txt ## Benefits of a shared architecture - When two agents need context on “customer,” they **load from the same central model and datasets**, so they: - Speak the **same language**. - Refer to the **same datasets/data products**.paste.txt - This shared architecture enables **semantic interoperability** among agents instead of a “big agentic mess” of incompatible components.paste.txt --- ## 4. What is an enterprise knowledge graph? - An **enterprise knowledge graph (EKG)** is presented as the best model to formalize the **unified enterprise information architecture**.paste.txt - It integrates three layers into one machine-readable model: - **Data layer** (actual business data). - **Metadata / information layer** (descriptions, data contracts).paste.txt - **Knowledge / conceptual layer** (ontologies and concepts).paste.txt - The EKG is both **human-readable** and **machine-readable**, usable by LLMs, formal reasoners, and other systems.paste.txt --- ## 5. Data and information layers: data products ## Data layer - Base of the graph: data generated by **transactional systems**, which the agent ultimately needs to act correctly.paste.txt - Modern data management has shifted from monolithic warehouses to **modular architectures** composed of **data products**.paste.txt ## Data products - A **data product** is a **software artifact that exposes data** as its primary function.paste.txt - It bundles: - The **data** itself. - The **code** that transforms/serves it. - The **infrastructure** to run that code. - The **interfaces** to consume the data.paste.txt - Crucially, data products include **data product descriptors or contracts** with rich **metadata**, so data is understandable and reusable.paste.txt - Managing data as products covers the first two layers of the knowledge graph: - Data plane (raw data). - Information plane (metadata that turns data into information).paste.txt --- ## 6. The knowledge layer: ontologies and semantics ## Problem: technical vs semantic interoperability - Data products are often **technically interoperable** (APIs, formats) but not **semantically interoperable**.paste.txt - Different domains (e.g., sales vs marketing) model the same concept (customer) differently; consumers must repeatedly do **translation work** across domains.paste.txt ## Central ontology - To solve this, the organization needs a **central ontology**: - A **unified conceptual model** of key business concepts (customer, product, order, etc.).paste.txt - Used to **normalize and relate** data products across domains.paste.txt - The ontology is encoded as **triples** (subject–predicate–object), forming a graph in which: - Nodes are **concepts**. - Edges (predicates) encode relationships.paste.txt - The **meaning of a concept** comes from its position and relations in the graph (what it is connected to or not), not from a textual definition.paste.txt ## Linking data products to the ontology - Extend data product contracts to include **links to ontology concepts**: - E.g., “this dataset contains instances of the concept `Customer` defined in the central ontology,” or “this column is the `shipment address` property of `Customer`.”paste.txt - This makes data products **semantically interoperable**: if two products link to the same concept, their data can be reliably combined by agents and humans.paste.txt --- ## 7. The scaling problem and “knowledge mesh” ## Central ontology bottleneck - Building a comprehensive central ontology is **complex**; a central team quickly becomes a **bottleneck**.paste.txt - When needed concepts are stuck in the ontology team’s backlog, data product teams may stop linking to the ontology, and consumers bypass it, undermining semantic interoperability.paste.txt ## Applying data mesh principles to knowledge - The talk proposes **“knowledge mesh”**: applying the four **data mesh** principles to **knowledge management**.paste.txt The four principles adapted: 1. **Knowledge as a product** - The enterprise ontology is **not monolithic** but composed of **smaller sub-ontologies**, each managed as a **product**.paste.txt - Each knowledge product (sub-ontology) has clear ownership and evolves over time with use cases.paste.txt 2. **Domain orientation (knowledge domains)** - Ownership is by **knowledge domains** (e.g., Customer, Product, Order-to-Cash), not strictly by business departments.paste.txt - Knowledge domains cut across business domains, bringing together representatives from all areas that use a concept like “customer” to agree on a shared model.paste.txt 3. **Federated computational governance** - A **federated governance team** defines global modeling rules so sub-ontologies can be composed into a coherent whole.paste.txt - Requires a platform to support: - Creation, evolution, and versioning of ontologies. - Composition of sub-ontologies into the enterprise ontology. - Interfaces for searching concepts and discovering linked data products.paste.txt 4. **Self-serve infrastructure (EIOP platform)** - The talk references an **Enterprise Information Operation Platform (EIOP)**: - Manages data products, knowledge products, and their links. - Supports building and maintaining the enterprise knowledge graph in a federated, scalable way.paste.txt --- ## 8. Use-case–driven incremental growth of the knowledge graph - The knowledge graph should grow **incrementally, driven by concrete use cases**, not by modeling for its own sake.paste.txt - For each prioritized use case (often an AI/agent use case): - Identify **data products** needed or to be extended.paste.txt - Identify **concepts/knowledge products** needed to describe and semantically align those data.paste.txt - Run two parallel streams: - Build/extend the data products. - Build/extend the ontology concepts (knowledge products).paste.txt - Link the data products to the concepts; this **adds nodes and edges** to the enterprise knowledge graph.paste.txt - Over time, use case after use case, the graph **expands organically**, always anchored in business value.paste.txt --- ## 9. Interaction with LLMs and agents ## Using the knowledge graph with LLMs - When a user asks a question, the agent can: - Perform **concept extraction** from the query. - Search the **knowledge graph** for relevant concepts and relations.paste.txt - The retrieved graph structure can be passed to the LLM as: - Formal triples (RDF, Turtle-style). - Or **natural language sentences** representing those triples.paste.txt - The speaker prefers **natural text representation**, since LLMs are trained more heavily on text than on formal ontology formats.paste.txt ## Agents learning from experience - Context goes both ways: - From memory to context window (**retrieval**). - From context window back to memory (**writing**), capturing feedback and outcomes.paste.txt - When agents write back to memory, their internal state changes, so they **learn from experience**, similar to humans.paste.txt - User feedback and environmental signals influence the agent’s memory and, over time, its behavior.paste.txt ## Generative AI vs agentic AI - **Generative AI** (GenAI) = LLMs used primarily to **generate content** (text, images) in response to prompts.paste.txt - **Agentic AI** = LLM-based systems that **also interact with the external environment** through sensors and actuators, taking actions autonomously.paste.txt - Agentic AI = GenAI **plus** the ability to act in the world, guided by context and goals.paste.txt --- ## 10. Final takeaways - There will be **no reliable AI or agentic systems** without a reliable **enterprise information architecture**.paste.txt - Intelligence (LLMs, orchestration frameworks) will increasingly be provided by external vendors, but **your knowledge is unique and must be structured by you**.paste.txt - The best way to **future-proof AI investments** is to continuously improve the enterprise knowledge graph and information architecture so agents always receive high-quality context.paste.txt 1. [https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/139614499/c9fd2284-b316-4126-b35b-873091643fe9/paste.txt](https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/139614499/c9fd2284-b316-4126-b35b-873091643fe9/paste.txt) 2. [https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/139614499/2ee025f7-cabe-4041-992e-39e39dd2d69e/paste.txt](https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/139614499/2ee025f7-cabe-4041-992e-39e39dd2d69e/paste.txt) ---