The Object-Document Mismatch: MongoDB and db4o with Linq

by Christoph Menge in .NET

Rob Conery recently wrote about using MongoDB with Linq. I was really intrigued by the fact that you can use elegant, readable, type-safe Linq-queries to access MongoDB document database. To be honest, I had no clue what an object database really is, but when it speaks Linq it must be cool, I thought.

So I dug a bit deeper into MongoDB and NoRM, which is a nifty C# driver for MongoDB developed by Andrew Theken and several others. You might want to grab a copy at github, where you can see how incredibly active the project is! Now, back to the evaluation: what is the best way to evaluate a database? Of course, build a live product using it!

Since I was completely abusing db4o for said project (in fact, I am storing something you’d call documents there), I decided that this would be a great candidate for a migration. So now we’re migrating from an object database to a document database and from an ACID database to a NoSQL solution.

MongoDB is considered a NoSQL solution, while db4o is considered ‘soft’ NoSQL – see Stefan’s comment at the bottom. Why the distinction – both do not rely on, support or use SQL whatsoever?! But then again, that is not what the Term »NoSQL« is all about. It’s probably one of the most misleading terms ever coined and perhaps it should read »Not ACID« or just »Persistence without Prejudgement, PwoP«. db4o makes ACID guarantees, comes from an embedded background and offers single-server durability while MongoDB is made for the net, does not have single-server durability, supports MapReduce and is driven by JavaScript.

Hell, they couldn’t be more different. But then again, I can access both using identical interfaces:

// Linq is understood by both, so you could use the lines in both:
var u = (from Note n in container
         where n.Text == null select n).ToList();
var v = session.Query<Tag>().Where(p => p.Name != null);
// Store an object graph to Mongo using NoRM:
session.Add(testNote);
// Store an object graph to db4o:
container.Store(testNote);

Don’t be fooled – although these lines could all be used with MongoDB or db4o and all of them could even be used with the very same classes, they are still fundamentally different in behavior. Also, for anything but the most simple problems, you can’t just persist the same domain models.

What is a document now?

A document is not just an unstructured piece of data. It’s not a BLOB. Instead, an instance of a class, plus all it refers to, could be a document:

class UniqueIdObject
{
    public Guid Id { get; private set; }
    public UniqueIdObject() { Id = Guid.NewGuid(); }
}

class Report : UniqueIdObject
{
    public Report() : base() { Tags = new List<Tag>(); }
    public string Text { get; set; }
    public List<Tag> Tags { get; set; }
}

class Tag : UniqueIdObject
{
    public string Name { get; set; }
}

Now we might want to store a note in the database, and the code needed is just

using (NoRMSession session = new NoRMSession())
{
    Report newReport = new Report() { Text = "Hello World, MongoDB!" };
    newReport.Tags.Add(new Tag() { Name = "Tag 1" });
    newReport.Tags.Add(new Tag() { Name = "Tag 2" });
    session.Add(newReport);
}

That’s it! Wow! Of course, we haven’t taken care of indexation and stuff and since MongoDB is schemaless (or better, has a dynamic schema) we need to do that in code. But still, this is essentially all you need.

The important thing now is: What happens to those little `Tags` we deliberately put into a separate class? Now the mapper hides a bit of truth from us, because MongoDB works on so-called “Collections”. session.Add(newReport) will be session.Add<Report>(newReport), which will in turn put the newReport object into the Report-Collection! So the object graph, as it is, will be serialized into the Report collection, including our little Tag objects!

Each item with an orange border is an ‘atomic’ item in its respective data store:

db4o serialized graph

mongo serialized document


Let’s naively try to fetch all tags:

var v = session.Query<Tag>().Where(p => p.Name != null);

This does not work, v is null because there is no tag collection! Instead, the tag lists are part of the reports we put into the Report collection. Note that this would work in db4o, because db4o will store references as references, while ‘documents’ store the contained data instead of references. This is beautifully simple, but it’s also very different from what you might expect and it has lots of implications for your object structure.

Thoroughly think through your schemaless schema

MongoDB is made for scalability and simplicity, so it does not care for our foolish approach to fetch tags directly. There are ways to access that data directly, however. We could use a deep-graph query or write a javascript Map/Reduce instruction, but that is a bit out of scope right now. What’s more important is that it calls for changes to our domain model objects: If we really want to store a reference, we need to do so manually, in ye olde sql-way, by storing the associated Id instead of the object. Of course, that makes deserialization a bit more complicated because the object we now retrieve from the database aren’t ready-to-use as they come.

However, automating that process will induce several complications, among them the need to handle cyclic references, a concept for fetching or activating objects on-the-fly, called Transparent Activation in the db4o world and making sure we’re not inducing a massive performance hit along the way.

Also, updating objects can be painful. Suppose we stored a list of Reports for each user. Now we might want to put the list of reports directly into the user object, or store a list of ReportIds for each users and put the Reports into their very own ReportCollection. As usual, there is no silver bullet, so this decision depends on the specific needs of the application, but whatever decision we take will not be visible to users of the resulting objects. In fact, it leads to some unwanted strong coupling:

class ReportService
{
  private NoRMSession _session;

  // ...
  public void UpdateReportDetail()
  {
    this.ReportDetailXY = ComplicatedCalculation();
    this.LastChanged = DateTime.UtcNow;

    // If the reports are in their very own Report Collection, this
    // is fine. However, if they are contained as lists in the user
    // who owns them, we're in trouble and this will fail!
    _session.Update(this);
  }
}

This might not be a big problem for really small applications, and if you really need performance (why else would you choose a NoSQL system?) you have to fine-tune your objects anyways. Still, I have the feeling that there is some space for improvement here, and a basic wrapper could help in overcoming some of the issues raised. Some basic ideas on how to approach this will follow shortly.

Wrapping it up

Obviously, I’m comparing Apples and Oranges here: db4o is made to make persistence easier, especially with complex domain model objects. db4o, being an object database, behaves exactly the way you’d expect objects under serialization to behave, but that makes it quite complex. MongoDB is focused on simplicity and scalability. Through the document concept, you gain simplicity on the database, administration and wrapper (driver)-side, but you have to struggle with a slight impedance mismatch in code, especially against Linq, again.

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Related posts:

  1. db4o Queries on Large Datasets and a bit of Linq
  2. NoSQL Approaches: Trying to use db4o in the Real World
  3. My db4o Wishlist

Tags: , , ,

← Previous

Next →

3 Comments

  1. Dear Christoph,

    very nice writeup!

    But one sentence is not quite correct. You wrote:
    “MongoDB is considered a NoSQL solution, while db4o is not.”
    And the “is not” references to my website http://nosql-database.org
    (#2 in Google right after the Wikipedia).

    But as you can see db4o is clearly listed under this website.
    I put db4o in the category “soft” nosql (which is nosql).
    The rational to put Systems like db4o or XML Databases to the
    “soft” nosql and not the “core” nosql is explained in my onaswer on
    this blog: http://www.kellblog.com/2010/04/11/yes-virginia-marklogic-is-a-nosql-system/

    I have been a part of the db4o team since 2003/04 and I would have a lot
    of reasons to support db4o (e.g. The Definitive Guide book by Apress).
    But the categories under the nosql websites and the listing for db4o and for all is quite fair I assume.

    Best Regards
    Stefan Edlich

  2. Hi Stefan,

    thanks a lot for your feedback and the clarification! After one month, I finally managed to correct the actual text – sorry for the delay.

    Best,
    Chris

Leave a Comment