db4o Queries on Large Datasets and a bit of Linq

by Christoph Menge in .NET, db4o

My last small note on db4o performance will soon be outdated – fortunately. Newer releases of db4o will no longer rely on Cecil to perform reflection, thereby speeding up db4o linq queries – However, make sure you have Mono.Reflection.dll in your app! Also there are some restrictions when it comes to the compact framework and native queries (which still need Cecil), so you’d best make sure to read this official db4o announcement.

Talking about speed and performance, I just came across an issue that was also discussed in db4o forums very recently: Sort operations on large datasets.
Note: This has nothing to do with linq or linq to db4o, it’s just the same for SODA queries.

Let’s take a very common example: Find some most recent items, for example most recent blog/forum posts or some other ‘top list’ on a very large amount of entries N:

        var mostRecentPosts = (from Posts o
              in ObjectContainer
              orderby o.Created descending
              select o).Take(100).ToList();

This query will be really really slow on a large amount of objects. Why? Essentially, the BTREE operation is very fast as it should be, but unfortunately db4o will invoke its SODA system on each of the objects, even if they are already outruled by the BTREE operation.

See this discussion on the db4o forum and the associated Jira-bug.

This is somewhat sad, because a query on a previously filtered set of items is blazing, about 100x faster:

        var mostRecentPosts = (from Posts o
              in ObjectContainer
              where o.Created > yesterday
              orderby o.Created descending
              select o).Take(100).ToList();

The latter operation plays roughly in the same league as SQL Server (don’t flame me – performance comparisons and profiling is really complicated and there is a zillion of factors that influence it, I know. That’s why I say ‘roughly the same league for this query’ and I’m talking about default setups).

This is a somewhat unfortunate situation, because it produces slow queries for many typical applications – unless you have a strong where clause that cuts down N from millions to hundreds. If you’re interested in seeing this issue fixed, head over to their issue tracker and vote for the issue to be fixed! Thanks!

I’ll be posting a bit more on db4o over the next few days I hope.

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Related posts:

  1. The Object-Document Mismatch: MongoDB and db4o with Linq
  2. My db4o Wishlist
  3. db4o Performance Pitfalls
  4. NoSQL Approaches: Trying to use db4o in the Real World

Tags: , ,

← Previous

Next →

3 Comments

  1. [...] There’s a nasty performance issue with larger amounts of data [...]

  2. [...] Don’t run SODA when no more constraints are present I blogged about this already, because you experience this in very common scenarios, namely whenever you query a small subset of [...]

  3. [...] There have been some important changes in builds past 14021, see my more recent post on that topic. In short, the new kid in town is Mono.Reflection.dll, which took Cecil’s [...]

Leave a Comment