DQC Logo
|

AI Agents

Agentic rule creation: Context-aware data quality at scale

Hero
By Marc Boner

Data quality rules are the foundation of trustworthy data systems. They catch errors before they propagate, flag anomalies in real-time, and ensure compliance with business logic. Yet for most data teams, creating comprehensive rule sets has been a time-consuming manual process, one that requires both deep domain knowledge and technical SQL expertise.

A data engineer typically spends hours analyzing table schemas, checking column relationships, profiling data distributions, and crafting SQL queries to validate business logic. A single table might require >20 rules covering accuracy, completeness, freshness, and conformity checks. Multiply that across dozens or hundreds of tables, and rule creation becomes a significant bottleneck.

We built DQC's agentic rule creation to solve this problem. Instead of hours of manual work, our platform generates context-aware, comprehensive rule sets in minutes, without requiring users to write a single line of SQL.

Intelligence That Understands Your Data

When you connect a data source to DQC and trigger rule creation, you are not just running a generic template. Behind the scenes, an agentic workflow activates: a coordinated system of statistical analysis, machine learning algorithms, and large language models that work together to understand your specific data context.

The process begins with context analysis. Our LLMs examine table and column names, data types, sample values, and metadata to understand what kind of data they are looking at. Is this an SAP MARA materials master table? A Snowflake customer database? An e-commerce orders table in BigQuery? The answer fundamentally shapes which rules make sense.

Once context is established, multiple parallel workflows activate, analyzing column-level statistics, examining relationships between columns, and leveraging machine learning to detect patterns in your data. Throughout this process, LLMs orchestrate the workflow, making decisions about which checks to prioritize based on the table's role in your data ecosystem.

This is what makes the system truly agentic: it adapts its approach based on what it discovers. A customer email column triggers pattern validation and format checks. A financial amount column prompts range validation and cross-table reconciliation. A timestamp column activates freshness monitoring and temporal consistency rules.

From Context to Comprehensive Coverage

Let's look at how this works in practice. Consider a retail company's orders table in BigQuery. When our platform analyzes this table, it identifies key business patterns: OrderDate, OrderTotal, CustomerID, OrderStatus. Within minutes, it generates rules like:

  • Temporal accuracy: OrderDate must not be in the future

  • Referential integrity: CustomerID must exist in the customers table

  • Business logic: OrderTotal must equal the sum of associated line items (cross-table consistency)

  • Status conformity: OrderStatus must be one of predefined valid values

  • Completeness: Critical fields like CustomerID and OrderTotal cannot be null

  • Freshness: New orders should appear within expected time windows

For a different context (say, a PostgreSQL user authentication table), the generated rules would look completely different: password hash format validation, email pattern conformity, account creation timestamp logic, and suspicious login attempt detection.

The cross-table consistency check deserves special attention because it represents the kind of sophisticated validation that's particularly painful to create manually. In our orders example, the system doesn't just validate individual columns; it understands that OrderTotal should reconcile with line item details in a separate table. It generates the appropriate join logic and validation rules automatically, catching discrepancies that might indicate data pipeline issues or calculation errors.

Two Pathways to Intelligent Rules

We've designed two complementary ways to access this agentic capability, both serving the same goal of automated, context-specific rule creation.

The rule prediction button provides comprehensive coverage with a single click. After connecting your data source, click the button and within minutes you'll have a complete rule set covering accuracy, completeness, freshness, and conformity checks for your table. This is ideal when you are onboarding new data sources or want to establish baseline quality monitoring quickly.

Our DQ-AI Assistant offers a more conversational approach. You can ask for specific types of rules: "Suggest completeness checks for customer data" or "What consistency rules should I have between orders and inventory?" The assistant uses the same agentic workflow but tailors its output to your specific request, making it perfect for iterative refinement or targeted quality improvements.

Both pathways eliminate the need for SQL expertise. Over 90% of our rules are no-code, making data quality accessible to analysts, business users, and anyone who understands their data's business context, regardless of technical background.

The Shift from Manual to Agentic

The transformation from manual rule creation to agentic automation represents more than just a speed improvement, though reducing a 10-minute manual process to 30 seconds certainly matters. It's about democratizing data quality. Previously, creating comprehensive rule sets required specialized SQL knowledge and understanding of statistical tests. Now, that expertise is embedded in the system itself.

This shift enables data teams to focus on what humans do best: understanding business requirements, investigating quality issues, and making decisions about data remediation strategies. The tedious work of translating those requirements into technical validation rules happens automatically.

See It in Action

Agentic rule creation is available now on the DQC Platform. If you are curious about how context-aware automation could accelerate your data quality initiatives, we'd be happy to show you the system in action with your own data.

Want to see it in action?

Book a meeting.