Monday 2 May 2016

The Importance of Virtual Destructors

Our system had a memory leak. It wasn't huge, but network software (unlike phones or laptops) has to run for months or years without failure, so it was unacceptable. Unlike most complex software in C and C++, we are not plagued by memory leaks, so this was a bad surprise. We make extensive use of smart pointers (though not the horrible shared_pointer class, which is an ugly accident waiting to happen) and the RAII paradigm. As a result, objects get deleted when they should, whether they are short-term temporaries or long-term configuration information. Yet, we were leaking memory.

The fix, once we tracked it down, turned out to be the addition of a one-line function with an empty body:

virtual ~query_helper_iterator_base() { };

How can an empty function fix a complex memory leak? Read on...

Our system (STM) keeps track of several very large collections, of potentially millions of objects: network applications, hosts, users, network routes and several others. It provides a SQL-like Rest API that can be used to retrieve them based on complex filters and other criteria. For example, you can say "find the top 20 applications ordered by traffic rate, and tell me their current traffic rate and flow count".

Our web-based GUI maintains several charts, which it updates periodically every few seconds via the Rest API. Many users can run their own copy of the GUI, so these complex queries can be seen several times per second. The problem is that they require the whole collection to be scanned, and every entry to be inspected. As you may imagine, this is very expensive in CPU cycles and memory bandwidth, and we found it was a performance problem for our larger installations.

All of these collections are scanned in the background every few seconds, to maintain performance information. It doesn't seem hard, as we scan the whole collection and have them in L1 cache, to preload a cache for common queries such as the one above. Hence we came to write the query_helper collection of classes. Thus armed, we can say to a collection "do you have a cache which matches this query?". For example, we keep a cache of the top 1000 applications by traffic rate. When the GUI asks for the current top 20, we simply look at the cache and pick the top 20 from it - much easier than scanning a million applications.

Different caches have different implementations, depending on the nature of the query they are designed to support. For example, some use a simple STL vector, while others, which are explicitly ordered, use an STL set. This implementation detail needs to be hidden from the internal user, who merely knows that they have been given a cache that matches their query. This is, of course, an obvious application of virtual functions and pure base classes. The user sees a query_helper object, whose implementation is hidden behind them. So far, so straightforward.

The problem comes when we consider the iterators for these collections. The normal usage is a range-based for loop, which has to create a concrete iterator over the collection:

query_helper *qh = collection->find_matching_helper(rest_query);
if (qh) {
    for (object *obj : qh) {
        ...

The type of the iterator must be known, in detail, at compile time, yet the full type of the collection varies dynamically at runtime. How can this be made to work?

The solution is to use a second level of iterator, to which the base class iterator simply contains a pointer. That can in turn use virtual functions to implement an iterator in terms of the specific implementation of the actual query_helper object. The outer iterator simply delegates every function to the corresponding virtual function of the inner iterator. The only complication is that creating an iterator requires a dynamic memory allocation, but in the overall scheme of the Rest API implementation this is small stuff.

By now you're probably wondering what this has to do with our memory leak. I need to explain another implementation detail. The caches are rebuilt from scratch at every scan of the collection. But what happens if someone is using one when this happens? How can we know when it is safe to delete the old cache? The answer is that our cache uses our generic intrusive reference count mechanism. The object contains a reference count, and the same "mixin" class (called reference_countee) that provides it also provides a class-specific smart pointer that manipulates the counter, deleting the object when the count falls to zero.

So all we have to do is include said smart pointer in the iterator object, and it will protect the old cache from being deleted until the iterator is deleted. No possibility of dangling references, and all taken care of by C++'s constructor and destructor system. Nothing can go wrong...

Except (as you've guessed) something did go wrong. In fact, any cache that got used during its brief lifetime was never getting deleted. Reading the code, everything should have worked. The outer iterator (for the query_helper base class) has a destructor, which in turn deletes the inner iterator it points to The inner iterator's destructor destructs the smart pointer, which in the process decrements the reference count and, if this was the last user, deletes the object. That mechanism - the same identical C++ source code - is used in dozens of places in the code, and it works.

This, finally, is where the empty one line function comes in. C++, because of its heritage in C, doesn't assume that a function is virtual. Unless it's explicitly declared to be, the compiler assumes it can safely generate a static call to the member function of the actual class pointed to, rather than incurring he extra cost of using a function pointer. The base class of the inner iterator has no content, so the automatically-generated destructor does nothing.

So... when the outer iterator was deleted, its destructor was called. That correctly called the destructor for its contained inner iterator (thanks to std::auto_ptr... don't get me started on why the new unique_ptr is a terrible idea). But, instead of calling the proper destructor for the derived class, which would decrement the reference count and hence allow the old cache to finally be deleted, it was just calling the trivial destructor for the base class.

It's not the code in our one-line function that matters, it's simply the fact that it exists and (most important) is declared virtual. That tells the compiler that instead of generating a static call, it must look in the object's virtual function table (vtable) to find the corresponding function in the derived class.

C++11 purists will say, correctly, that there is another way this should be caught. If we define a function whose sole purpose is to override a virtual function in a base class, it should be given the 'override' qualifier. This would have generated a compiler error since the base class didn't in fact have such a function to override. We consider our wrists duly slapped, and will redouble our efforts to ensure that such functions are duly qualified.

This is pretty hard to understand, so at risk of making it even harder, here are some code snippets to illustrate what is going on. (It's not quite right yet - it's very hard to get the blog system to handle angle brackets).

/************************************************************************
 * First the base type for the INNER iterator - the one that MUST have
 * the critical virtual destructor. All its member functions, like
 * operator++ shown below, are pure virtual, and there are no member
 * variables.
 ************************************************************************/

class subtype_iterator_base : public std::forward_iterator_tag
{
public:
    // virtual ~subtype_iterator_base() { }; // the vital virtual destructor!
    virtual subtype_iterator_base &operator++() = 0;
    ....
};

/************************************************************************
 * Now the base query_helper class. This is the only class seen by the
 * users of this function - they are never aware of any of its derived
 * classes.
 ************************************************************************/

class query_helper
{
    ....    
/************************************************************************
 * Now the OUTER iterator of the base class. It contains only one member
 * variable, an auto_ptr to the base INNER iterator which actually does the 
 * work. All its member functions simply delegate to the corresponding
 * virtual member function of the inner iterator.
 ************************************************************************/

    class iterator : public std::forward_iterator_tag
    {
    private:
        auto_ptr my_sub_iter;
    public:
        iterator &operator++() { my_sub_iter->operator++(); return *this; };
        ....
    };
    ....
};

/************************************************************************
 * Finally, a specific derived class of the query_helper class, in this
 * implemented (somehow) using an STL set. It defines an iterator class,
 * derived from subtype_iterator_base, which in turn has a smart
 * reference-counting pointer to the query_helper it is using.
 * 
 * Note the inheritance from reference_countee (using CRTP), which provides
 * both the intrusive reference count and the ::pointer smart pointer
 * class. 
 ************************************************************************/

template<class OBJECT_CLASS, class COLLECTION_CLASS>
class query_helper_xxx : public query_helper,
                         public reference_countee<query_helper_xxx>
{
private:
    std::set<OBJECT_CLASS*> my_items;
public:
    class my_iterator_t : public subtype_iterator_base
    {
    private:
        typename std::set<OBJECT_CLASS>::iterator my_iterator;
        typename query_helper_xxx<OBJECT_CLASS,COLLECTION_CLASS> ::pointer 
            my_qh;
    public:
        C *operator*() const override { return *this->my_iter; };
        ....
    };
    ....
};

No comments: