Direct Lake in Microsoft Fabric is a query technology that lets Power BI users create semantic data models from OneLake. It’s a step up from Direct Query and Import Mode and intends to remedy their limitations.
This article takes a closer look at how it fares in supporting Power BI to be more agile and efficient, especially in light of the AtScale TPC-DS Benchmark Report.
Performance Concerns With Power BI/Direct Lake
The Import Mode in Power BI is storage and resource-intensive, requiring frequent data refreshes. Direct Query, though live, is constrained by the source systems’ response time.
Direct Lake intends to address these issues by bypassing the warehouse/lakehouse tier and querying OneLake directly.
Does it deliver?
Query Failures
During the AtScale tests, Power BI/Direct Lake returned queries in around 500 milliseconds for a 100GB data size. That was reasonably fast. However, it encountered performance bottlenecks as volumes increased beyond 1 TB and queries started timing out.
Moreover, Power BI fell back to DirectQuery mode to keep queries from failing, bringing back related performance issues.
Lazy Loading + Fallback to Direct Query = Performance Lag
As per the AtScale test results, refreshing Power BI models slowed performance, with a cold cache effect kicking in. Unlike a warm cache that stores frequently queried data separately, a cold cache treats all data the same, so the system has to work harder to find it.
A semantic model refresh on top of this can exacerbate performance issues. Semantic models need space to store information about data structure, dataset associations and their multiple representations. They often double in size when refreshed because Power BI/Direct Lake creates an exact copy.
When the data exceeded memory in the tests, the solution reverted to Direct Query mode.
Fabric Lakehouse Limitations
Fabric relies on Power BI on the web for data modeling, but this coupling isn’t without its considerations. The web version of Power BI doesn’t do calculated columns since DAX support is unavailable. Building models in Power BI Desktop isn’t an option as there’s no way to share them with the web version.
The AtScale team had to exclude certain tests as they encountered modeling issues with the Semantic Model web editor. Though query results improved somewhat when switching to a higher capacity configuration, the fallback behavior persisted.
Microsoft Support for Semantic Layer Technologies
Microsoft provides several workarounds to cater to large data sizes — a semantic model scale-out, autoscale and the large semantic model storage format setting.
Semantic Model Scale-Out
As discussed above, Power BI memory gets stretched when a semantic layer refreshes because it creates a replica. Harnessing this technology, the vendor provides one read-write and multiple read-only semantic model replicas. Every time the combined CPU usage of all active read-only replicas exceeds capacity, a new replica is created to share the load.
A single read-write replica might circumvent data duplication during write-back, but it can be limiting. When query overload happens, throttling is applied which might cause latency and reduced throughput.
Autoscale
While autoscaling sounds like a viable solution for handling data surges, it comes with its own set of considerations in the context of Power BI and Direct Lake.
When data exceeds memory, the solution can add one vCore every 24 hours at $85 per hour. If autoscale triggers frequently, you’re looking at a significant cost center and might have to shell out a tidy sum for a higher SKU.
Plus, performance may dip briefly as the system adjusts to the new vCore. And the 24-hour window before adding another core might be too long for highly dynamic workloads. As an alternative, pre-aggregating data might improve query response times, but Power BI doesn’t do that.
Large semantic model storage format
Data compression in Power BI is an effective technique to reduce the semantic model volume. Efficient memory management drives accelerated responses.
However, this feature isn’t available with the Pro license. Plus, the model size in Premium is limited to 100 GB and downloading a semantic model to a .pbix file format can be tricky.
Cost Considerations
Microsoft has priced Fabric at par with the Power BI Premium capacity model. You’re charged for the allocated CPU, memory and storage. However, you’ll have to shell out extra for OneLake storage costs, and frequent data-intensive workloads can burn a hole in your pocket.
While Fabric offers dashboards for monitoring usage and costs, per-user metrics might not be available, hindering smaller teams looking to save every dime. As the platform is still under development, guesstimates might be your best bet while planning a long-term investment in Microsoft Fabric.
Alternatives
The maturity of the Power BI/Direct Lake-Fabric universe is a consideration for some users. On the other hand, vendors like AtScale and Kyvos offer mature semantic layers that can scale to handle thousands of users while maintaining sub-second query responses.
They can process terabytes of data and unlimited model sizes, thanks to disk storage. AtScale creates materialized views in the source system and allocates storage to the aggregate tables while making it available to users.
With an advanced OLAP engine, Kyvos handles unbalanced and ragged hierarchies. When entities don’t go down to the required level of detail or skip a level, it transforms them into a consistent format. Plus, it supports MDX functions for calculated measures out of the box.
Kyvos offers dedicated connectors for Power BI, Tableau, MicroStrategy and many other BI tools. Incremental refreshes every few minutes keep data up to date, and the platform can accommodate millions of rows in a single request. Intelligent data aggregation supports sub-second queries on billions of rows without performance degradation.
Conclusion
In its current avatar, Power BI/Direct Lake offers advantages for users within the Microsoft ecosystem and queries on smaller to medium-sized datasets. It will be interesting to see how it evolves in the coming years.
However, established semantic layer solutions might be better suited for handling extremely large data volumes or for functionalities not yet available in Power BI/Direct Lake.