There is a new generation of tools for business intelligence and analytics that let us take information directly from the source without having a complicated data warehouse in-between. It sounds good; you can get an extraordinary dashboard with real-time data. The question here is when to use real-time (live data) vs. extractions (scheduled).
Dashboards are an excellent way to expose data through the use of different components as charts, indicators, filters, text, images or web pages on a single screen. When used the right way, dashboards provide the ability to identify trends, contributions, peaks, etc. at a glance.
Depending on the data, the process, the frequency with which the dashboard’s information is used, among other parameters, the approach to deal with the data can be different.
Sometimes you need a daily visualization of a set of KPIs from the previous day’s close. For this group of KPIs, an extraction of the data overnight looks like an acceptable solution.
Several tools offer this feature; you can schedule an extraction and keep it in memory, or in an embedded database. This snapshot of the data can have transformations and aggregations generated in anticipation; with this, you also will have a fast execution of your reports.
On some data sets, you might receive requests from users to get constant extractions during the day, because they need the latest information from the source. But this kind of needs can bring disadvantages on that source; the performance may be affected, it can also slow the network speed or even generate confusion on other users that are using the same provider. In those cases, another option needs to be evaluated. Maybe the dashboards can go directly to the source and show live data.
There are many things to take in consideration in both scenarios. We need to think about metrics that have a lot of aggregations, complicated transformations, large volumes of data, or historical data that needs to be exposed. Depending on these characteristics, you might consider having a data warehouse, which might provide better performance.
So the question is, do we need to take the information directly from the source each time the user requests it or can we use a snapshot of the data and get the information in a two-step process?.
The response will depend on multiple factors, such as the frequency to access the data, the amount of data and the processing power have. You also need to think if the owner of the source is going to let you access during the day without limitations.
Here we collected a group of key points to figure out what kind of option is the best for your requirement:
- It enables monitoring and faster response time to issues.
- This method could create performance issues. (e., based on the amount of data, calculations, etc.)
- It is recommended to use with sources that do not need additional manipulation, and it’s ready for reporting.
- Allows the creation of time-based reports (e., Financial Statements) for a performance review.
- Typically better performance as data can be pre-aggregated.
- Adjustments/corrections to historical data require reload multiple periods.
- Snapshots are likely to be much faster than live queries because the data is already in the memory or database within the reporting tool.
We hope this information answers any questions you may have about this new way to query data.
If you would like to discuss more, please contact us at firstname.lastname@example.org