4.Closing Panel Building AI Agent

https://subscription.packtpub.com/video/data/9781806675555/p1/video1_4/closing-panel-building-ai-agent # **Closing Panel: Building AI Agent** This panel brings together hands-on practitioners and technical leads to discuss what it takes to build useful AI agents. We'll unpack the real-world tradeoffs—from toolchain decisions and infrastructure choices to team workflows and failure handling. Expect candid insights on what works, what breaks, and how to build agents that are not just impressive—but reliable and aligned with business needs. This closing panel, **“Building AI Agents,”** is a Q&A‑driven discussion with three enterprise practitioners focused on how to design, evaluate, scale, and safely deploy AI agents in production.paste.txt --- ## Panel context and participants - Moderator Manish hosts three experts: **Kapil** (AI platform architect in retail/healthcare/telecom), **Sandeep** (AI leader at AWS), and **Raphaël** (CTO and AI/data engineering strategist in Hong Kong).paste.txt - The session is entirely audience‑question driven, covering evaluation, architecture patterns, autonomy, skills, costs, and future directions of multi‑agent systems.paste.txt --- ## Evaluating agents in production - The panel distinguishes **two evaluation layers**: - **Outcome‑based metrics**: task completion rate, efficiency, escalation frequency, user complaints, proportion of interactions resolved without humans.paste.txt - **Operations‑based metrics**: LLM evaluation scores, tool‑call accuracy, planning efficiency, latency, error rates, hallucinated outputs, unsafe recommendations.paste.txt - Everyone stresses **“design for observability from day one”**: - Use centralized logging, metrics, and traces; track tool calls, reasoning steps, and decision paths.paste.txt - Prefer open standards like **OpenTelemetry** and emerging AI observability specs from the Linux Foundation/IEEE to standardize agent metrics.paste.txt - Recommended feedback mechanisms: - Simple **user ratings** (“Was this helpful?”), active feedback loops comparing past vs current performance.paste.txt - **High‑fidelity alerting** and anomaly detection for degradation in agent performance.paste.txt - **Human‑in‑the‑loop oversight** on sensitive decisions.paste.txt --- ## Architecture and memory-efficient patterns - A recurring pattern is to break agent behavior into **staged modules**: 1. **Input** (user query, prompt). 2. **Reasoning** (decompose problems, interpret state). 3. **Planning** (explicitly encoded plans and tool routing rules). 4. **Execution** (tool/API calls, document generation, side effects). 5. **Feedback** (record actions and outcomes). 6. **Memory update** (persist state, knowledge, and lessons).paste.txt - Decouple expensive LLM loops from execution: push heavy I/O (API calls, storage, PDF generation) **outside** the core reasoning loop to avoid context bloat.paste.txt - Emphasis on **token budgeting** and **memory control** for open‑source LLMs: - Control input/output limits, use checkpoints in long tasks, and avoid packing entire histories into context.paste.txt - Two design modes: - **Planner agents**: LLM decides plan and tools (more flexible but unreliable with smaller models).paste.txt - **Workflow/DAG‑based agents**: explicit directed acyclic graph, with LLMs only at decision nodes; reduces planning burden and works better with weaker models like LLaMA when tasks are repetitive and well‑understood.paste.txt --- ## Common failure modes in real deployments - **Infinite loops / runaway reasoning**: - Agents stuck re‑planning or re‑trying; costs explode if there’s no **circuit breaker** or retry limit.paste.txt - **Tool misuse and overload**: - Examples: MCP‑exposed database where the agent happily calls “get all customers” and returns a million records without pagination, crashing the system.paste.txt - **Null/failed API responses treated as success**: - Agents often proceed as if a failed or empty call succeeded; require explicit checks and fallback strategies.paste.txt - **Compounding errors**: - In multi‑step plans, a single early error propagates, so overall reliability drops sharply; the panel highlights research showing that when prompts contain many instructions (e.g., 50–100), success probabilities fall dramatically even for top models.paste.txt - **Cost blow‑ups**: - Teams underplay **cloud and LLM cost** at first; later discover that unconstrained agent behavior (long contexts, frequent tool calls) dramatically raises bills.paste.txt --- ## Autonomy: how much to give and when - Agents should be placed on an **autonomy ladder**, not immediately given full control: - **Level 0** – Logging only; agent suggestions are not executed. - **Low levels** – Agents suggest actions in low‑risk areas (e.g., re‑running tests) without review. - **Mid levels** – Agents can escalate or repair common failures but must be tested and reviewed.paste.txt - **Top level** – Very rare: fully autonomous in production, allowed only after strong evidence and guardrails.paste.txt - Panelists recommend **never granting full autonomy** without: - A robust **feedback loop**. - Clear **risk indexing** of use cases (high‑risk → tighter constraints and more human oversight).paste.txt - Rule of thumb from practice: - Start with **micro‑agents** that have **small, focused goals** and minimal instructions; chaining specialized agents reduces error compounding vs one large universal agent.paste.txt --- ## Orchestration vs fully agentic systems - Many teams are “stuck between orchestration frameworks and full autonomy.” Panel guidance: - Think **system‑first**, not model‑first: design from **business workflows and risks back** to agents.paste.txt - Use **explicit orchestration** (DAGs, workflow engines) for repeatable, high‑risk tasks; reserve agentic planning for parts where flexibility adds value.paste.txt - Future is likely **hybrid topologies**, mixing: - Central orchestration (like Kubernetes + orchestrators) with - **Choreographed peer‑to‑peer services** where agents communicate directly under shared goals.paste.txt - One expert explores a **“marketplace agent” paradigm**: - Enterprises host a catalog of specialized agents (internal or SaaS) and workflows “hire” them as needed, with central governance for audit, access, and cost control.paste.txt --- ## Small models, domain specificity, and cost - Question: Are **domain‑specific small models + agents** a good combo? - Today, small models often underperform on **planning** and complex reasoning; the best planner‑agents are still frontier models (Anthropic, OpenAI, etc.).paste.txt - Promising pattern: use a **strong model for high‑level planning** and **small domain‑tuned models** for specific actions (e.g., code generation, niche classification) to cut costs.paste.txt - Small models are appealing for **energy and carbon efficiency**; blog and research metrics show substantial energy savings.paste.txt - Panel expects small, specialized models to play a larger role as research improves their planning capabilities.paste.txt --- ## Skills, org readiness, and build vs buy ## Skills and roles - Building agents is **not just ML**: - Requires strong **software engineering**, distributed systems, and testing skills because agents are “just software with extra uncertainty.”paste.txt - Needed competencies include: - **Data engineering and information architecture** (solid data foundations for tools and retrieval).paste.txt - **ML/LLM engineering** for prompt design, fine‑tuning, and evaluation. - **Platform and MLOps** for CI/CD, deployment, observability. - **Security, governance, and ethics** for compliance and safe behavior.paste.txt - Certifications (AI, data engineering, cloud AI offerings) can help with credibility, but panelists prioritize **hands‑on POCs and real workflows** over certificates.paste.txt ## Build vs buy agent stacks - Classic **build vs buy** logic still applies: - Consider **time‑to‑market**, customer needs, internal skills, and regulatory constraints.paste.txt - Off‑the‑shelf agents/agent platforms help you move fast but still require integration into your systems and workflows.paste.txt - Key questions when buying: - **Where does the LLM run?** Who controls data and logs (important for privacy/regulation)?paste.txt - How will you **govern**, **observe**, and **test** third‑party agent behavior? - Expect **convergence of frameworks**: over the next couple of years, major agent frameworks are likely to align around shared standards like MCP and common abstractions.paste.txt --- ## Tooling gaps and future needs - Major gap: **evaluation and testing tools specifically for agents**, not just raw LLMs. Needed capabilities include: - Tracing tool calls, memory updates, and decision branches. - Assessing whether the agent chose the right tools and handled errors correctly. - Regression testing of multi‑step behaviors.paste.txt - Observed needs: - **Full‑stack agent observability**: decision tracing, structured logs, replayable sessions, integrated with OpenTelemetry.paste.txt - Better **debugging UX** for multi‑agent flows (visual DAGs, state timelines, what‑if replays).paste.txt --- ## Strategic perspective on AI agents - Agents should be seen as **system components** in a larger architecture, not magical replacements for humans or developers.paste.txt - Enterprises should shift mindset from **task‑only modeling (models predicting a label)** to **goal‑oriented systems**: - From “model predicts risk” to “agent flags risk, coordinates follow‑ups, and books consultations.”paste.txt - Panelists encourage organizations to: - Start with **narrow, high‑value use cases** (e.g., cart‑abandonment support, appointment scheduling) and grow from there.paste.txt - Keep agents under **measured autonomy**, with clear metrics, guardrails, and human oversight, especially in early deployments.paste.txt 1. [https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/139614499/92f15510-9262-495d-8c8a-3cf05c571b33/paste.txt](https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/139614499/92f15510-9262-495d-8c8a-3cf05c571b33/paste.txt) 2. [https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/139614499/2ee025f7-cabe-4041-992e-39e39dd2d69e/paste.txt](https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/139614499/2ee025f7-cabe-4041-992e-39e39dd2d69e/paste.txt) 3. [https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/139614499/c9fd2284-b316-4126-b35b-873091643fe9/paste.txt](https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/139614499/c9fd2284-b316-4126-b35b-873091643fe9/paste.txt) ---