Quirky query

Post by **fmulder** » Wed Aug 16, 2017 8:38 am

We implemented a panel to collect historical data from value archives. The user can define a filter to limit the amount of data and to get faster(!) results. Unfortunately, applying the filter did not make it faster and i did some research.

TIP: Give the datamanager the debgug flag '-dbg query' to see what actually happens !

Lets first examine the following query:

Code: Select all

  string strQuery = 
    "SELECT 'COUNT(_original.._value)' "
    "FROM "
      "'MyDatapoint.variables.*' REMOTE 'AME:' "
    "WHERE "
      "(_DPT =\\"peanutbutter\\")  TIMERANGE(\\"2017.08.08 00:00:00\\",\\"2017.08.11 00:00:00\\",1,0)";

We should not get any data at all when we use the WHERE clause 'peanutbutter' (because that is clearly not how I'd call my datapoint types).

When you now look at the debug messages from the datamanager then you'll see that it is intelligent enough to realize that there are no datapoints matching the WHERE clause. The query is fast and there are no results.

But now consider the following query:

Code: Select all

  string strQuery = 
    "SELECT 'COUNT(_original.._value)' "
    "FROM "
      "'MyDatapoint.variables.*' REMOTE 'AME:' "
    "WHERE "
      "(_DPT =\\"peanutbutter\\")  OR (_DPT =\\"chocolate\\") TIMERANGE(\\"2017.08.08 00:00:00\\",\\"2017.08.11 00:00:00\\",1,0)";

This query will still return no results, but it will take quite some time. Investigation of the debug log messages shows:

* The Datamanager will first (through the value archives) collect all data matching the 'FROM' and the timerange
* It will then apply the WHERE and conclude that no dp element matched the WHERE
* And will eventually return an 'empty' result

What can we learn from this:

* Apparently: a simple WHERE is done before collecting data.
* A complex WHERE is done after collecting ALL the data !

Our partial solution : we've moved the selection of datapoints into the FROM statement. Our FROM statement is now huge and our WHERE is now gone. If we now have 14 dp elements that match our filter, then our FROM mentions 14 dp elements and we see that the Datamanager is collecting data from exactly 14 dp elements.

Hope this helps and explains why your queries sometimes take a long time

Share the fun
Frenk Mulder

Fri Sep 01, 2017 9:10 am

During my tests I did not see in duration for the different queries.

But in my experience it is better to use an IN list instead of multiple OR-ed comparisons.

Code: Select all

WHERE _DPT IN ("peanutbutter","chocolate")

Post by **fmulder** » Fri Sep 01, 2017 10:02 am

You miss my point.

The 'where' statement that I show is completely bullshit. The DataManager should not have done any data collection at all. The time should have been 0 because there's no datapoint at all matching the where statement.
Still, the datamanager first collects all the data and ... then concludes that none of the collected data matches the WHERE statement ????

share the fun
Frenk Mulder

Post by **leoknipp** » Fri Oct 06, 2017 4:06 pm

I did tests with different queries.
During the tests I have seen what Frenk has observed. If the WHERE statement causes more than one _DPT condition the query is started, even if the result for the FROM statement does not fit the WHERE clause.

When using the _DPT IN statement the query is not started if the FROM filter does not match the dp type filter.

If the WHERE condition can be applied before starting the query depends not only on the complexity of the condition. It also depends on the WHERE condition itself.
If you do TIMERANGE query and filter for one of the configs + attributes in the SELECT condition the WHERE condition must be applied after reading data, e.g.
SELECT '_offline.._value' FROM 'Exa*' WHERE '_offline.._value' = 0 TIMERANGE("2017.10.05 15:35:22","2017.10.06 15:35:23",1,0)

Best Regards
Leopold Knipp
Senior Support Specialist

Post by **fmulder** » Sun Oct 08, 2017 8:59 pm

Leo,

Thanks for the research. This query did help me to better understanedhow the queries are actually being executed.

As a solution I now have a script function that basically creates a very large "FROM" statement. i found that it is no problem to build a FROM statement that shows 1000's of DP elements.

So:
* if you have 20.000 dp elements
* But you want a query to collect data from 1000 of them
* Then simply create your own FROM statement with 1000 DP elements.

Share the fun
Frenk Mulder

Post by **leoknipp** » Mon Oct 09, 2017 8:35 am

In case of RDB archiving reading historical data takes a lot of time if queryRDBDirect = 1 is used and the FROM statement contains a lot of entries.
This issues is already known, right now we do not have a solution for it.

Best Regards
Leopold Knipp
Senior Support Specialist

HowTos

Quirky query

Quirky query

Re: Quirky query

Re: Quirky query

Re: Quirky query

Re: Quirky query

Re: Quirky query