Mar 032014
 

The optimizer does particularly badly with the query below. It estimates the query will return 5,101 rows when in fact nothing matches the predicate:

select *
from   this_query_returns_nothing
where  there_are_no_values < 0;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1995374266

----------------------------------------------------------------
| Id  | Operation         | Name                       | Rows  |
----------------------------------------------------------------
|   0 | SELECT STATEMENT  |                            |  5101 |
|*  1 |  TABLE ACCESS FULL| THIS_QUERY_RETURNS_NOTHING |  5101 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("THERE_ARE_NO_VALUES"<0)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         45  consistent gets
          0  physical reads
          0  redo size
        304  bytes sent via SQL*Net to client
        365  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Creating an index doesn’t help us because there’s only a few distinct values in the THERE_ARE_NO_VALUES column:

select count (distinct there_are_no_values) 
from   this_query_returns_nothing;

         C
----------
         2

create index no_better on 
this_query_returns_nothing ( there_are_no_values );

Index created.

select *
from   this_query_returns_nothing
where  there_are_no_values < 0;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1995374266

----------------------------------------------------------------
| Id  | Operation         | Name                       | Rows  |
----------------------------------------------------------------
|   0 | SELECT STATEMENT  |                            |  5101 |
|*  1 |  TABLE ACCESS FULL| THIS_QUERY_RETURNS_NOTHING |  5101 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("THERE_ARE_NO_VALUES"<0)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         45  consistent gets
          0  physical reads
          0  redo size
        304  bytes sent via SQL*Net to client
        365  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

We can see that the lowest value in the table is 0:

select min(there_are_no_values)
from   this_query_returns_nothing;

MIN(THERE_ARE_NO_VALUES)
------------------------
                       0

More importantly, we know that values less than 0 won’t be recorded in the table.

How can we get the optimizer to figure this out? Answers in the comments!

(note: consider the stats up-to-date. I didn’t do anything to “trick” the optimizer by creating fake/non-representative stats).

  6 Responses to “SQL Quiz: The optimizer got it wrong. How can we make it do better?”

  1. Hi Chris,

    from your query it looks like the THERE_ARE_NO_VALUES-column is integer/number. Is this really the case? If not, it might be a good idea to change the datatype.
    Does it help to put a Check constraint on the table, allowing only the possible values?

    Is this a riddle or a real-life problem?

    Best regards,
    Salek

    • You’re on the right track with a check constraint :)

      The column is an integer. It’s a riddle, though it could plausibly be a real world problem if people aren’t aware of the upper/lower bounds of your data.

      • OK,

        then I go with
        – create check constraint: ALTER TABLE this_query_returns_nothing ADD CONSTRAINT CK_tqrn_tanv1 CHECK ( there_are_no_values IN () ); OR
        – create check constraint: ALTER TABLE this_query_returns_nothing ADD CONSTRAINT CK_tqrn_tanv2 CHECK ( there_are_no_values BETWEEN x AND y );
        – set not null (if correct): ALTER TABLE this_query_returns_nothing MODIFY ( there_are_no_values NOT NULL );
        – create foreign key (if correct): ALTER TABLE this_query_returns_nothing ADD CONSTRAINT FK_tqrn_parenttable FOREIGN KEY ( there_are_no_values ) REFERENCES parenttable ( id );

        Best regards,
        Salek

        • Your second option is the closest to what I did – added a check constraint the there_are_no_values >= 0. The optimizer is then able to automatically add this predicate to the query and recognise that it won’t return any data! I’ll post a full example later (probably tomorrow) unless someone beats me to it.

  2. Hello Chris,

    and what is happening if you hint dynamic sampling in the query?


    select /*+ dynamic_sampling(t 10) */ *
    from this_query_returns_nothing t
    where there_are_no_values < 0;

    regards,
    Peter

    • That may help, but we can do better. Salek’s onto the right answer – see my response to him.

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>