Fun Iterating PagedCollections With Generics and Iterators

0 comments suggest edit

Book Oh boy are you in for a roller coaster ride now!

Let me start with a question, How do you iterate through a large collection of data without loading the entire collection into memory?

The following scenario probably sounds quite familiar to you. You have a lot of data to present to the user. Rather than slapping all of the data onto a page, you display one page of data at a time.

One technique for this approach is to define an interface for paged collections like so…

/// <summary>
/// Base interface for paged collections.
/// </summary>
public interface IPagedCollection
    /// <summary>
    /// The Total number of items being paged through.
    /// </summary>
    int MaxItems

/// <summary>
/// Base interface for generic paged collections.
/// </summary>
public interface IPagedCollection<T> 
    : IList<T>, IPagedCollection

The concrete implementation of a generic paged collection is really really simple.

/// <summary>
/// Concrete generic base class for paged collections.
/// </summary>
/// <typeparam name="T"></typeparam>
public class PagedCollection<T> : List<T>, IPagedCollection<T>
    private int maxItems;

    /// <summary>
    /// Returns the max number of items to display on a page.
    /// </summary>
    public int MaxItems
        get { return this.maxItems; }
        set { this.maxItems = value; }

A method that returns such a collection will typically have a signature like so:

public IPagedCollection<DateTime> GetDates(int pageIndex
    , int pageSize)
    //Some code to pull the data from database 
    //for this page index and size.
    return new PagedCollection<DateTime>();

A PagedCollection represents one page of data from the data source (typically a database). As you can see from the above method, the consumer of the PagedCollection handles tracking the current page to display. This logic is not encapsulated by the PagedCollection at all. This makes a lot of sense in a web application since you will only show one page at a time.

But there are times when you might wish to iterate over every page as in a streaming situation.

For example, suppose you need to perform some batch transformation of a large number of objects stored in the database, such as serializing every object into a file.

Rather than pulling every object into memory and then iterating over the huge collection ending up with a really big call to Flush() at the end (or calling flush after each iteration, ending up in too much flushing), a better approach might be to page through the objects calling the Flush() method after each page of objects.

The CollectionBook class is useful just for that purpose. It is a class that makes use of iterators to iterate over every page in a set of data without having to load every record into memory.

You instantiate the CollectionBook with a PagedCollectionSource delegate. This delegate is used to populate the individual pages of the data we are iterating over.

public delegate IPagedCollection<T> 
    PagedCollectionSource<T>(int pageIndex, int pageSize);

When iterating over the pages of a CollectionBook instance, each iteration will call the delegate to retrieve the next page (an instance of IPagedCollection<T>) of data. This uses the new **iterators feature of C# 2.0.

Here is the code for the enumerator.

///Iterates through each page one at a time, calling the 
/// PagedCollectionSource delegate to retrieve the next page.
public IEnumerator<IPagedCollection<T>> GetEnumerator()
  if (this.pageSize <= 0)
    throw new InvalidOperationException
      ("Cannot iterate a page of size zero or less");

  int pageIndex = 0;
  int pageCount = 0;

  if (pageCount == 0)
    IPagedCollection<T> page 
      = pageSource(pageIndex, this.pageSize);
    pageCount = (int)Math.Ceiling((double)page.MaxItems / 
    yield return page;

  //We've already yielded page 0, so start at 1
  while (++pageIndex < pageCount)
    yield return pageSource(pageIndex, this.pageSize);

The following is an example of instantiating a CollectionBook using an anonymous delegate.

CollectionBook<string> book = new CollectionBook<string>(
    delegate(int pageIndex, int pageSize)
        return pages[pageIndex];
    }, 3);

I wrote some source code and a unit test you can download that demonstrates this technique. I am including a C# project library that contains these classes and one unit test. To get the unit test to work, simply reference your unit testing assembly of choice and uncomment a few lines.

Technorati Tags: Tips, TDD, C#, Generics, Iterators

Found a typo or error? Suggest an edit! If accepted, your contribution is listed automatically here.



5 responses

  1. Avatar for Mike
    Mike August 14th, 2006

    This is very cool. Thanks for sharing!

  2. Avatar for Scott
    Scott August 14th, 2006

    Thanks for sharing! I've been thinking about this particular problem too.

  3. Avatar for Jay R. Wren
    Jay R. Wren August 16th, 2006

    I've recently written something similar, but I'm also trying to use ObjectDataSource to allow automatic paging with ASP.NET DataGridView.
    I've not quite got it working. I have a generic type which implements all the appropriate interfaces.
    BindingObjectDataSource<T> : ObjectDataSource, IBindingList, IList, ICollection,
    IEnumerable, ICancelAddNew, IRaiseItemChangedEvents, IList<T>, ICollection<T>,
    with two methods defined:
    public IList<T> GetItems()
    public IList<T> GetItems(int startIndex, int maxRows)
    yet the following fails:
    GridView1.DataSource = bObjectDataSource;
    <asp:GridView ID="GridView1" runat="server"
    AllowPaging="true" AllowSorting="true"

    PagerSettings-NextPageText="next" PagerSettings-PreviousPageText="prev"
    The error is that a non-generic version of GetItems() could not be found.
    This seems strange.

  4. Avatar for Bill Pierce
    Bill Pierce August 18th, 2006

    Hey Phil,
    I know it's off topic but what tool do you use to format your sourcecode for posting to your blog? I'm using Copy Source As HTML and it has some issues.

  5. Avatar for Haacked
    Haacked August 18th, 2006

    Hey Bill, I use Manoli's CSharp Format.
    Somebody wrote a plugin for Windows Live Writer too, but not sure if it has been published yet.