Table of Contents
Introduction
Denodo data virtualization performance problems almost always have the same root cause: the platform is being used as a query router when it should be used as an intelligent query optimizer, or the caching layer has been configured for convenience rather than for workload profile. The Denodo query optimizer is sophisticated, but it makes decisions based on available information — and in environments where source statistics are stale and caching policies are applied uniformly, the optimizer is working with incomplete inputs.
This post covers the performance levers that matter in production Denodo deployments — query pushdown, cost-based optimization, caching strategy, MPP integration, and source statistics — with the specificity needed for architects and engineers managing live environments.
Understanding the Denodo Query Optimizer: Rule-Based and Cost-Based
The Denodo Virtual DataPort optimizer runs in two phases. The rule-based phase applies structural simplification rules to the query plan before execution: predicate pushdown, join reordering where provable, and algebraic simplification of filter conditions. The cost-based phase uses statistics about source data cardinality, index availability, and estimated row counts to choose between execution strategies.
Cost-based optimization is only as good as the statistics feeding it. Denodo collects statistics about source views through an explicit statistics collection process — this is not automatic. In environments where source data volumes change significantly over time and statistics are not refreshed, the optimizer will make suboptimal execution decisions based on outdated cardinality estimates. Establishing a regular statistics refresh schedule, particularly for large or rapidly changing source tables, is a foundational performance practice.
The query execution plan for any Denodo view is inspectable through the Execution Trace in the Design Studio. The trace shows which operations were pushed down to source systems, which were executed at the virtual layer, and the estimated vs. actual row counts at each step. Reading execution traces is the primary diagnostic skill for Denodo performance work.
Pushdown Optimization: The First Line of Performance
Predicate pushdown is the most impactful performance mechanism in Denodo’s arsenal. When a filter condition in a Denodo query can be translated into a predicate that the underlying source database understands and executes natively, Denodo delegates that filtering work to the source. The source returns only the matching rows, rather than returning the full table for Denodo to filter in-memory.
The conditions for effective pushdown depend on the capabilities of the source adapter. Well-supported JDBC sources (Oracle, SQL Server, PostgreSQL, Redshift, Snowflake) support full predicate pushdown including complex filter conditions and aggregations. APIs, NoSQL sources, and flat files have more limited pushdown capability, requiring Denodo to perform more work at the virtual layer.
Join pushdown — where Denodo pushes the entire join operation to a source that contains both tables — is a significant performance enabler for joins between tables in the same database. When both tables in a join reside in the same source, Denodo can generate a single SQL statement that executes the join at the source, returning only the joined result set. When tables span different sources, the join must be executed at the virtual layer, with data retrieved from each source separately.
Understanding which joins in your virtual model are cross-source versus same-source is a key part of model design. Where possible, structuring virtual views so that joins between large tables occur within the same source system, with cross-source joins happening at a higher aggregation level, significantly reduces the data volume that Denodo must process in-memory.
Caching Strategy: Selective Materialization, Not Blanket Caching
Denodo’s cache is not a replacement for ETL — it is a selective materialization layer for specific performance scenarios. Applying full caching to every virtual view eliminates the real-time benefit of data virtualization and turns the platform into a replication tool.
The scenarios where caching delivers genuine value are specific: slow or rate-limited sources (web services, APIs, legacy databases with high query latency); sources where the underlying data changes infrequently relative to query frequency; and complex aggregations that are expensive to recompute at query time but used repeatedly by multiple consumers.
Virtual DataPort supports two cache modes. Partial cache stores results for specific filter conditions — when a query is run with filter conditions that were previously cached, the cached result is returned. New filter combinations miss the cache and execute against the source. Full cache materializes the entire view’s data into the cache database. Full caching is appropriate for moderately sized views over slow sources; it is inappropriate for large fact tables where fresh data is required.
Cache invalidation policy matters as much as cache mode. Time-based expiration is the simplest approach but can serve stale data. For sources with event-driven update patterns, configuring cache invalidation to trigger on upstream process completion (via a Denodo stored procedure or scheduler job) keeps cached data current without requiring time-based polling.
MPP Integration: When to Use the Lakehouse Accelerator
Denodo’s MPP (Massively Parallel Processing) engine is engaged when analytical queries require heavy computation — large cross-source joins, window functions, or aggregations that cannot be pushed down to a single source. The Lakehouse Accelerator, available in Denodo Enterprise Plus, integrates Presto (an MPP engine) directly within the Denodo platform.
When the Lakehouse Accelerator is configured, Denodo’s optimizer can choose to move data from source systems into temporary tables in the MPP engine and execute the full query there in parallel — achieving data warehouse-level performance on cross-source analytical queries without requiring a separate data warehouse. The optimizer makes this decision per-query based on cost estimates, selecting MPP execution when the parallel processing benefit outweighs the data movement cost.
The MPP engine also supports Apache Iceberg format for caching and bulk load operations, extending its compatibility with modern lakehouse architectures. In Fabric or Databricks environments, Denodo’s Iceberg-based cache means the cached data is readable by other tools in the lakehouse ecosystem without format conversion.
Conclusion
Denodo performance tuning is a discipline that starts with execution trace analysis, not trial-and-error caching. The highest-return investments are keeping source statistics current, maximizing pushdown through thoughtful virtual model design, and applying caching selectively to the scenarios where it genuinely reduces source load. MPP integration is the tool for heavy analytical workloads where federation alone is insufficient.
Prism Analytics partners with enterprises on Denodo data virtualization implementations — from initial architecture through performance tuning and ongoing optimization. Contact us to explore how we can help.
