AI Agents Need Context, Not Perfect Data: The Case for Risk-Based Data Quality

Gleb Mezhanskiy spent years building tools to make enterprise data clean. In March 2026, the Datafold CEO told his audience the effort never paid off the way software monitoring did for companies like Datadog. His argument is now reshaping how data leaders define quality going into the back half of 2026.

A Vendor CEO Calls the Industry a Disappointment

Mezhanskiy laid out the case in a March 5 post titled “Data Engineering in 2026: 12 Predictions.” Prediction eleven argues data teams will stop chasing data quality because AI agents care about context instead. Years of investment and engineering effort, he wrote, never produced a breakout success comparable to Datadog’s rise in software monitoring. Data quality, in his telling, moved from a line item on annual goals to something teams handle on a best-effort basis.

The claim deserves a caveat before it travels any further. Mezhanskiy runs one company in a crowded field, and his view reflects a single vendor’s vantage point, not an industry consensus. Monte Carlo reported raising $236 million, while Bigeye reported $73.5 million in total funding. Datafold separately announced a $20 million Series A. Together, the three companies disclosed at least $329.5 million in funding, spread across quality, reliability, and observability rather than one tidy category. The label of failure depends on which yardstick gets used, and Mezhanskiy picked a yardstick favoring his prediction.

Why Data Resists the Software Playbook

Mezhanskiy’s strongest point has nothing to do with funding rounds. He argues data is harder to test than software because ground truth keeps shifting. A login either succeeds or it does not. An “active user” can mean three different things depending on whether marketing, product, or finance is asking, and no amount of column-level testing settles a disagreement over definitions. Add more alerts to a noisy pipeline and the value of each new alert drops fast.

The comparison holds up better as a spectrum than a hard line. Security teams chase ambiguous signals every day, and plenty of AI-driven software ships with outcomes nobody can verify with certainty. Software ground truth is not always as clean as Mezhanskiy’s framing suggests. What sets data apart is the scale of the ambiguity: a single warehouse can hold dozens of conflicting definitions for the same business concept, and a software team rarely faces so many forks in the same afternoon.

What AI Agents Need

Here is the part of Mezhanskiy’s argument worth taking seriously even with the caveats attached. An agent pulling from a warehouse needs more than a validated column. It needs lineage showing where a number came from, the transformation logic behind it, documentation explaining why a fallback table exists, and an ontology connecting business entities like customer, order, and product. Mezhanskiy calls the combination a context graph, and Datafold now sells one alongside its conventional quality tools.

Worth saying directly: Mezhanskiy is not a neutral narrator. His company profits if buyers shift spending from quality monitoring toward the context layer his prediction describes, and the financial stake does not vanish just because the underlying argument is reasonable. The argument still has a limit: context helps an agent interpret a number correctly, but it does not turn a corrupt, stale, or biased number into a safe one. Lineage tells an agent where data came from, not whether the data deserved trust in the first place.

What Risk-Based Data Quality Looks Like in Practice

The most useful evidence in this debate is not a prediction. It is a pattern already showing up in how teams build data contracts. The Open Data Contract Standard, published by Bitol under Apache 2.0 through the LF AI and Data Foundation, defines a vendor-neutral YAML format covering schemas, quality rules, ownership, support channels, and service levels. Teams use the standard to formalize what a dataset promises, without locking into one company’s platform.

A handful of operating habits separate teams getting value from contracts from teams adding paperwork:

Put responsibility on the team producing a dataset, not the team consuming it three pipelines downstream.
Store contracts as code in version control rather than as a slide deck nobody opens again.
Run checks in CI or in the pipeline itself, so a violation gets caught before it reaches a dashboard or an agent.

Monte Carlo’s guidance tells customers to keep contracts lightweight and aimed at pipelines carrying real business weight, rather than every table in the warehouse. Soda and Atlan support the same pattern through YAML files, Git workflows, and rule enforcement, and neither positions its tooling as a requirement: each treats automation as a convenience layered on top of a discipline a team could run with a text editor and a CI pipeline.

The Counterevidence

Market researchers do not support a collapse story for data-quality spending. Mordor Intelligence estimates the data-quality tools market will grow from $3.27 billion in 2026 to $7.39 billion by 2031, a 17.7 percent compound annual growth rate. Treat the figure as a commercial estimate rather than an audited total. Different research firms define the category differently and land on numbers disagreeing with each other, which is normal for a market this fragmented and says more about inconsistent definitions than about the underlying trend.

The safer read: spending keeps growing while the definition of quality gets broader. No evidence reviewed for this piece shows budgets moving from quality monitoring into context graphs. Joe Reis’s 2026 State of Data Engineering Survey, drawn from 1,101 practitioners over two weeks in late 2025, makes a simple point: quality has not slipped down anyone’s list of worries. Thirty-four percent of respondents named data quality or reliability as a major drain on team time, and just over ten percent called it their single biggest organizational bottleneck. Respondents skew senior and concentrated in North America and Europe, and Reis describes the percentages as indicative rather than definitive. Even with caveats, the survey points toward teams stretched thin by quality work, not teams walking away from it.

A Framework for Tiering Data Quality

The practical move is not to pick a side between Mezhanskiy’s prediction and the survey data. It is to stop treating every dataset like it deserves the same level of scrutiny. A four-tier model gives data leaders a starting point for deciding where strict contracts belong and where lighter documentation will do.

Tier 0 covers revenue and regulatory critical data: billing systems, financial reporting feeds, and compliance submissions. Each dataset here gets a formal contract, automated checks running in CI, a named owner, and an on-call page when something fails.

Tier 1 covers customer and product critical data: dashboards customers see directly, metrics executives report externally, and machine learning features feeding customer-facing predictions. Each dataset still gets a formal contract, with scheduled checks and an alert routed to an owner, though without paging anyone at 2 a.m.

Tier 2 covers internal and operational data: ad hoc reporting, internal analytics, and experimentation tables. Lightweight documentation and preserved lineage matter more than a formal contract here, since a mistake stays contained inside one team.

Tier 3 covers exploratory data: one-off exports, scratch tables, and prototype datasets. No contract applies, no quality guarantee exists, and each dataset carries a clear label saying so.

Three questions place most datasets correctly.

Would a wrong number trigger financial loss, legal exposure, or a regulatory filing problem?

Tier 0. Does the dataset feed a customer-facing surface or a metric reported outside the company?

Tier 1, unless the financial or regulatory exposure already pushed it to Tier 0. Does more than one team rely on the dataset for decisions, without any external or regulatory stakes attached?

Tier 2. Anything left over, one-off exports and prototypes included, defaults to Tier 3.

Once a dataset earns a contract, the document needs six fields, regardless of format:

Schema and data types for every field a consumer might touch, with nullable fields and expected ranges spelled out.
Freshness and availability targets stated as a number, not a description: updated within four hours, available 99.5 percent of business days.
Quality thresholds and the checks enforcing them: completeness, uniqueness, and any business rule specific to the dataset.
A named producer team, a named consumer team, and an escalation path for when the two disagree.
A change management process describing how schema changes get announced and how long consumers get to adapt.
A support channel, stated by name, where a consumer reports a problem and gets a response time commitment.

For an illustrative example, picture a subscription company assigning its monthly recurring revenue table to Tier 0. The six fields might read:

Schema: customer_id (string, not null), mrr_amount (decimal, zero or greater), billing_period (date).
Freshness: updated within four hours of each billing run.
Quality checks: completeness at 99.9 percent or higher, uniqueness enforced on customer_id plus billing_period.
Ownership: the Billing Platform team produces the table, Finance Reporting consumes it, and disputes escalate to the on-call data engineer within 15 minutes.
Change management: schema changes get announced two weeks ahead in the #data-contracts channel.
Support: a named inbox commits to a response within one business day.

A scratch table feeding a one-off cohort analysis needs none of this. The cost of writing six fields for every table in the warehouse is exactly why most contract programs stall, and tiering exists to keep the cost pointed at the data where it pays for itself.

Data leaders tracking this model should watch incident impact, detection time, false-alert volume, and contract violations by tier, rather than a single company-wide quality score hiding where the real damage happens. A Tier 0 violation and a Tier 3 violation are not the same event, and a dashboard treating them equally will bury the signal leaders need most.

Where Tiering Breaks Down

Two failure modes show up almost immediately once a team adopts a tier model, and neither involves the framework itself.

The first is tier creep. Every team believes its data matters most, and a model with no enforcement mechanism drifts toward labeling everything Tier 0 within a year. The fix is procedural rather than technical: route Tier 0 nominations through finance, legal, or whichever function owns the regulatory exposure, and require a stated dollar figure or compliance citation before a dataset earns the label.

The second failure mode connects directly back to Mezhanskiy’s argument. A tier assignment lives in a person’s head or a wiki page unless someone writes it into metadata an agent or a query engine can read. An AI agent pulling from a warehouse has no way to know a table is a Tier 3 scratch dataset unless the assignment travels with the table itself, through tags, a catalog entry, or the context graph Mezhanskiy’s company sells. Skip the tagging step, and an agent can pull from an unreviewed prototype table to answer a question belonging to Tier 0. The result turns a low-stakes dataset into a high-stakes decision, and nobody notices until something breaks.

Tiers also need a review cadence, since a dataset’s risk profile rarely stays fixed. A cohort analysis built for one board meeting can turn into a recurring metric a CFO quotes externally within two quarters, at which point the dataset has quietly moved from Tier 3 to Tier 1 without a contract ever attached to it. A quarterly re-tiering review, owned by whoever runs the data platform, catches the drift before a metric goes external.

Data teams are not giving up on quality. They are admitting universal coverage was always a fiction, and the fiction grew more expensive once AI agents started running on the same warehouses humans used to babysit by hand. Strong programs in 2026 will decide, in writing, where bad data does real damage, defend the ground hard, and leave enough context behind for people and machines to handle everything else with open eyes.

The post AI Agents Need Context, Not Perfect Data: The Case for Risk-Based Data Quality appeared first on .

Categories