Agentic AI reshapes martech economics and infrastructure

▼ Summary
– AI pricing has shifted from flat-rate to token-based models, which is problematic because agentic workflows that chain multiple tool calls consume many tokens and quickly exceed free or low-cost subscription limits.
– A daily marketing pipeline (e.g., searching 200 results, summarizing, generating headlines) can use 4,000–5,000 tokens per run, accumulating over 100,000 tokens monthly and surpassing typical caps on platforms like OpenAI or Anthropic.
– High token usage does not guarantee better output, but users still pay for every token, leading to throttled workflows or expensive overage fees for marketing teams.
– The solution is to keep raw data under your control using non-LLM filtering (e.g., keyword scoring or vector search) to select only relevant pieces before sending them to the model, which can cut token costs by 60% or more without sacrificing insight quality.
– Tools like Hermes Agent allow teams to own their context by running on their own infrastructure and storing conversation history and tool outputs locally, making them provider-agnostic and scalable compared to vendor-tied options like Claude Cowork.
The marketing world’s initial romance with AI felt like an endless buffet: pay a flat fee and consume as much as you want. But the party is over. Providers are now shifting to token-based pricing just as agentic workflows become standard practice , and these agents devour tokens at an alarming rate. If martech infrastructure doesn’t evolve to control costs, the bill will become unsustainable.
When AI connects to your actual business systems, it transforms from a simple Q&A bot into a powerful orchestrator. It can pull customer records from your CRM, analyze campaign performance, browse the web, and generate a personalized report in one seamless workflow. This magic happens through tool calling, where AI accesses external systems via APIs and Model Context Protocol (MCP) connections. The productivity boost for marketers is enormous, eliminating the need to jump between half a dozen apps. The hidden cost? Every tool call burns tokens. AI agents, in particular, consume a staggering number because they feed the entire task history, their internal reasoning, and all external data back through the model at every step of their problem-solving loop.
The token cap reality hits hard. Consider a typical daily marketing pipeline: searching 200 results, summarizing them, and generating five headline variations. This easily consumes 4,000 to 5,000 tokens per run. Over a 30-day month, you’re looking at well over 100,000 tokens. That blows past free-tier limits on platforms like OpenAI and Anthropic and can exhaust a $20 subscription before the second week is over. (These estimates are based on standard industry tokenization metrics; actual usage will vary by model, prompt structure, and output length.)
The problem is that token consumption doesn’t correlate with result quality. As Scott Brinker and Frans Riemersma note in the State of Martech 2026 report, “more input does not automatically mean better output” , but you still pay for every bit. Claude Cowork and similar tool-heavy environments make this painfully visible. Every file read, every search, every API call adds a billable token. Users on a $20 subscription often hit throttling by week two, forced to choose between crippling their workflow or paying exorbitant overage fees. Neither option is sustainable for a daily marketing pipeline.
The solution is owned context, not dependence on a single provider. Keep raw data under your control in a shared team database like PostgreSQL or Qdrant, a cloud data warehouse like Snowflake or BigQuery, or a folder in shared cloud storage. Then use lightweight, non-LLM filtering logic to extract relevant pieces before anything touches the model. You might use an LLM once to set this up, the way you’d use AI to write a formula. But after that, it runs automatically on every batch of new data without calling an LLM at all. Simple keyword scoring or vector similarity search , both orders of magnitude cheaper than an LLM call , rank data by relevance. When a social listening pipeline pulls 500 tweets about a brand, the filtering step quietly selects the 10 most relevant ones and sends only those to the model. The token bill drops by 60% or more, while insight quality remains unchanged.
Several tools can implement this filtering. Hermes Agent, Claude Cowork, Claude Code, and Perplexity Computer all connect an LLM to external tools, enabling API calls, file reads, and workflow automation across multiple apps. However, Hermes runs on your infrastructure and is provider-agnostic, while the others are tied to Anthropic and Perplexity’s models. Other notable tools include OpenClaw (380K+ GitHub stars), an open-source agent harness with filesystem-based memory stores; OpenAI Codex CLI (93K stars), offering terminal-based agent access with local file persistence; and orchestration frameworks like LangChain (140K stars) and CrewAI (54K stars), which you build against rather than use directly.
What they all share is that the model is a guest in your system, not the landlord. Hermes takes this principle to an extreme, maintaining a persistent local context store , your conversation history, tool outputs, and embeddings remain in your database and accessible across sessions. A memory layer learns from each interaction, capturing preferences and corrections so the agent improves over time rather than starting fresh every session. Its built-in tool ecosystem (web, terminal, APIs, vision, Python) means the same pipeline that pulls Salesforce or HubSpot records, checks a data warehouse, and drafts a report also captures intermediate results and saves them locally. And because it’s provider-agnostic, switching from OpenRouter to a self-hosted LLaMA requires only a config line change.
The product is the implementation, but the pattern is what matters. Any team can adopt it. The message isn’t “use Hermes Agent.” It’s “start building systems that let you own your context, because the provider-centric approach cannot scale.” The momentum behind agentic, context-owning tools is undeniable. But the strategic question remains: do you want to pay for the work, or own the infrastructure and pay only for the reasoning? A bigger subscription still risks running out of capacity. A different architecture removes that issue entirely. Every marketing team must choose which side of that equation they want to be on.
This is the first in a three-part series on the shift toward agentic marketing workflows and the infrastructure needed to support them. Part 2 walks through how the architecture works in practice. Part 3 covers getting started with Hermes Desktop , installation, skills, and workflows.
(Source: MarTech)

