<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>emphess .NET &#187; db4o</title>
	<atom:link href="http://www.emphess.net/tag/db4o/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.emphess.net</link>
	<description>Christoph Menge&#039;s Blog</description>
	<lastBuildDate>Tue, 15 Jun 2010 00:50:30 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Object-Document Mismatch: MongoDB and db4o with Linq</title>
		<link>http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/</link>
		<comments>http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/#comments</comments>
		<pubDate>Wed, 05 May 2010 15:43:07 +0000</pubDate>
		<dc:creator>Christoph Menge</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[db4o]]></category>
		<category><![CDATA[linq]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://www.emphess.net/?p=232</guid>
		<description><![CDATA[Rob Conery recently wrote about using MongoDB with Linq. I was really intrigued by the fact that you can use elegant, readable, type-safe Linq-queries to access MongoDB document database. To be honest, I had no clue what an object database really is, but when it speaks Linq it must be cool, I thought. 
So I [...]]]></description>
			<content:encoded><![CDATA[<p>Rob Conery recently <a href="http://blog.wekeroad.com/2010/03/04/using-mongo-with-linq">wrote about using MongoDB with Linq</a>. I was really intrigued by the fact that you can use elegant, readable, type-safe Linq-queries to access <a href="http://www.mongodb.org/display/DOCS/Home">MongoDB document database</a>. To be honest, I had <strong>no clue</strong> what an <strong>object database</strong> really is, but when it speaks Linq it must be cool, I thought. </p>
<p>So I dug a bit deeper into MongoDB and NoRM, which is a nifty C# driver for MongoDB developed by <a href="http://andrewtheken.com/">Andrew Theken</a> and several others. You might want to <a href="http://github.com/atheken/NoRM">grab a copy at github</a>, where you can see how incredibly active the project is! Now, back to the evaluation: what is the best way to evaluate a database? Of course, build <a href="http://www.backlink-tracker.net/">a live product using it</a>!</p>
<p>Since I was completely abusing db4o for said project (in fact, I am storing something you&#8217;d call documents there), I decided that this would be a great candidate for a migration. So now we&#8217;re migrating from an object database to a document database and from an ACID database to a NoSQL solution.</p>
<p>MongoDB is considered a <a href="http://nosql-database.org/">NoSQL</a> solution, <a href="http://www.kellblog.com/2010/04/11/yes-virginia-marklogic-is-a-nosql-system/">while db4o is considered &#8217;soft&#8217; NoSQL</a> &#8211; see Stefan&#8217;s comment at the bottom. Why the distinction &#8211; both do not rely on, support or use SQL whatsoever?! But then again, that is not what the Term &raquo;<em>NoSQL</em>&laquo; is all about. It&#8217;s probably one of the most misleading terms ever coined and perhaps it should read &raquo;Not ACID&laquo; or just &raquo;Persistence without Prejudgement, <em>PwoP</em>&laquo;. db4o makes ACID guarantees, comes from an embedded background and offers single-server durability while MongoDB is made for the net, does not have single-server durability, supports MapReduce and is driven by JavaScript. </p>
<p>Hell, they couldn&#8217;t be more different. But then again, I can access both using identical interfaces:</p>
<pre class="brush: csharp">
// Linq is understood by both, so you could use the lines in both:
var u = (from Note n in container
         where n.Text == null select n).ToList();
var v = session.Query&lt;Tag&gt;().Where(p =&gt; p.Name != null);
// Store an object graph to Mongo using NoRM:
session.Add(testNote);
// Store an object graph to db4o:
container.Store(testNote);
</pre>
<p>Don&#8217;t be fooled &#8211; although these lines could all be used with MongoDB or db4o and all of them could even be used with the very same classes, they are still fundamentally different in behavior. Also, for anything but the most simple problems, <strong>you can&#8217;t just persist the same domain models</strong>.</p>
<h2>What is a document now?</h2>
<p>A document is not just an unstructured piece of data. It&#8217;s not a BLOB. Instead, an instance of a class, <strong>plus all it refers to</strong>, could be a document:</p>
<pre class="brush: csharp">
class UniqueIdObject
{
    public Guid Id { get; private set; }
    public UniqueIdObject() { Id = Guid.NewGuid(); }
}

class Report : UniqueIdObject
{
    public Report() : base() { Tags = new List&lt;Tag&gt;(); }
    public string Text { get; set; }
    public List&lt;Tag&gt; Tags { get; set; }
}

class Tag : UniqueIdObject
{
    public string Name { get; set; }
}
</pre>
<p>Now we might want to store a note in the database, and the code needed is just</p>
<pre class="brush: csharp">
using (NoRMSession session = new NoRMSession())
{
    Report newReport = new Report() { Text = &quot;Hello World, MongoDB!&quot; };
    newReport.Tags.Add(new Tag() { Name = &quot;Tag 1&quot; });
    newReport.Tags.Add(new Tag() { Name = &quot;Tag 2&quot; });
    session.Add(newReport);
}
</pre>
<p>That&#8217;s it! Wow! Of course, we haven&#8217;t taken care of indexation and stuff and since MongoDB is schemaless (or better, has a dynamic schema) we need to do that in code. But still, this is essentially all you need.</p>
<p>The important thing now is: What happens to those little `Tags` we deliberately put into a separate class? Now the mapper hides a bit of truth from us, because MongoDB works on so-called &#8220;Collections&#8221;. <code>session.Add(newReport)</code> will be <code>session.Add&lt;Report&gt;(newReport)</code>, which will in turn put the <code>newReport</code> object <strong>into the Report-Collection!</strong> So the object graph, as it is, will be serialized into the Report collection, <em>including</em> our little Tag objects!</p>
<p>Each item with an orange border is an &#8216;atomic&#8217; item in its respective data store:</p>
<div id="attachment_242" class="wp-caption alignleft" style="width: 310px"><a href="http://www.emphess.net/wp-content/uploads/2010/05/db4o-graph.png"><img src="http://www.emphess.net/wp-content/uploads/2010/05/db4o-graph-300x184.png" alt="" title="db4o-graph" width="300" height="184" class="size-medium wp-image-242" /></a><p class="wp-caption-text">db4o serialized graph</p></div>
<p><div id="attachment_241" class="wp-caption alignleft" style="width: 310px"><a href="http://www.emphess.net/wp-content/uploads/2010/05/mongo.png"><img src="http://www.emphess.net/wp-content/uploads/2010/05/mongo-300x200.png" alt="" title="mongo" width="300" height="200" class="alignright size-medium wp-image-241" /></a><p class="wp-caption-text">mongo serialized document</p></div><br />
</p>
<div style="clear: both;"></div>
<p>Let&#8217;s naively try to fetch all tags:</p>
<pre class="brush: csharp">
var v = session.Query&lt;Tag&gt;().Where(p =&gt; p.Name != null);
</pre>
<p>This does not work, <code>v</code> is <code>null</code> because there is no tag collection! Instead, the tag lists are <strong>part of the reports</strong> we put into the <code>Report</code> collection. Note that this would work in db4o, because db4o will store references as references, while &#8216;documents&#8217; store the contained data instead of references. This is beautifully simple, but it&#8217;s also very different from what you might expect and it has lots of implications for your object structure.</p>
<h2>Thoroughly think through your schemaless schema</h2>
<p>MongoDB is made for <em>scalability and simplicity</em>, so it does not care for our foolish approach to fetch tags directly. There are ways to access that data directly, however. We could use a deep-graph query or write a javascript Map/Reduce instruction, but that is a bit out of scope right now. What&#8217;s more important is that it calls for <strong>changes to our domain model objects</strong>: If we really want to store a reference, we need to do so manually, in ye olde sql-way, by storing the associated Id instead of the object. Of course, that makes deserialization a bit more complicated because the object we now retrieve from the database aren&#8217;t ready-to-use as they come.</p>
<p>However, automating that process will induce several complications, among them the need to <strong>handle cyclic references</strong>, a concept for fetching or activating objects on-the-fly, called <strong>Transparent Activation</strong> in the db4o world and making sure we&#8217;re not inducing a massive performance hit along the way.</p>
<p>Also, updating objects can be painful. Suppose we stored a list of <code>Reports</code> for each user. Now we might want to put the list of reports directly into the user object, <strong>or</strong> store a list of <code>ReportIds</code> for each users and put the Reports into their very own ReportCollection. As usual, <strong>there is no silver bullet</strong>, so this decision depends on the specific needs of the application, but whatever decision we take will not be visible to users of the resulting objects. In fact, it leads to some unwanted <strong>strong coupling</strong>:</p>
<pre class="brush: csharp">
class ReportService
{
  private NoRMSession _session;

  // ...
  public void UpdateReportDetail()
  {
    this.ReportDetailXY = ComplicatedCalculation();
    this.LastChanged = DateTime.UtcNow;

    // If the reports are in their very own Report Collection, this
    // is fine. However, if they are contained as lists in the user
    // who owns them, we&#039;re in trouble and this will fail!
    _session.Update(this);
  }
}
</pre>
<p>This might not be a big problem for <strong>really small applications</strong>, and if you really need performance (why else would you choose a NoSQL system?) you have to fine-tune your objects anyways. Still, I have the feeling that there is some space for improvement here, and a <strong>basic</strong> wrapper could help in overcoming some of the issues raised. Some basic ideas on how to approach this will follow shortly.</p>
<h2>Wrapping it up</h2>
<p>Obviously, I&#8217;m comparing Apples and Oranges here: db4o is made to <strong>make persistence easier</strong>, especially with complex domain model objects. db4o, being an <strong>object database</strong>, behaves exactly the way you&#8217;d expect objects under serialization to behave, but that makes it quite complex. MongoDB is focused on <strong>simplicity and scalability</strong>. Through the document concept, you gain simplicity on the database, administration and wrapper (driver)-side, but you have to struggle with a slight <strong>impedance mismatch</strong> in code, especially against Linq, again.</p>
<div style="float: right;"><p align="right"><a rel="nofollow" class="tt" href="http://twitter.com/home/?status=The+Object-Document+Mismatch%3A+MongoDB+and+db4o+with+Linq+http://bit.ly/925m7U" title="Post to Twitter"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a rel="nofollow" class="tt" href="http://delicious.com/post?url=http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/&amp;title=The+Object-Document+Mismatch%3A+MongoDB+and+db4o+with+Linq" title="Post to Delicious"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a rel="nofollow" class="tt" href="http://digg.com/submit?url=http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/&amp;title=The+Object-Document+Mismatch%3A+MongoDB+and+db4o+with+Linq" title="Post to Digg"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a rel="nofollow" class="tt" href="http://www.facebook.com/share.php?u=http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/&amp;t=The+Object-Document+Mismatch%3A+MongoDB+and+db4o+with+Linq" title="Post to Facebook"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a></p></div>]]></content:encoded>
			<wfw:commentRss>http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>My db4o Wishlist</title>
		<link>http://www.emphess.net/2010/04/14/my-db4o-wishlist/</link>
		<comments>http://www.emphess.net/2010/04/14/my-db4o-wishlist/#comments</comments>
		<pubDate>Wed, 14 Apr 2010 22:30:18 +0000</pubDate>
		<dc:creator>Christoph Menge</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[db4o]]></category>
		<category><![CDATA[linq]]></category>
		<category><![CDATA[Object Database]]></category>
		<category><![CDATA[OODBMS]]></category>

		<guid isPermaLink="false">http://www.emphess.net/?p=168</guid>
		<description><![CDATA[After finding that db4o did not screw up in our projects, I dug a bit through their issue tracker, which is a very important resource you should definitely check out if you&#8217;re working with db4o!
Just to get that straight: I&#8217;m an avid db4o user and really love it. These issues are not critical and they [...]]]></description>
			<content:encoded><![CDATA[<p>After finding that <a href="http://www.emphess.net/2010/04/12/nosql-approaches-trying-to-use-db4o-in-the-real-world/">db4o did not screw up</a> in our projects, I dug a bit through <a href="http://tracker.db4o.com/secure/Dashboard.jspa">their issue tracker</a>, which is a very important resource you should definitely check out if you&#8217;re working with db4o!</p>
<p>Just to get that straight: I&#8217;m an avid db4o user and really love it. These issues are not critical and they don&#8217;t stop me from using or evangelizing db4o. However, I think there is some lack of awareness of some issues.</p>
<p>Also, I&#8217;d like to spawn some discussion about the issues below. Unfortunately, due to their changes to the forum system, most of the original discussions on the <a href="http://developer.db4o.com/Forums.aspx">db4o forum</a> are hard to find or possibly lost. You may want to <em>vote on the issues you deem most pressing</em>, which you can easily do in their issue tracker! I&#8217;m very interested in what you think about this little selection.</p>
<h2>Selected Issues</h2>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-1133">Don&#8217;t run SODA when no more constraints are present</a><br />
I <a href="http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/">blogged about this already</a>, because you experience this in very common scenarios, namely whenever you query a small subset of a larger candidate set. For example, consider selecting the last 50 posts on a blog/qa-site/etc. What will happen is that db4o runs the BTREE query for the sort operation (blazing), <em>then hydrates (?) all objects</em>, the returns the first 50 of them and throws away the rest. Thing is that there is no need to further inspect the items, and activating all them is basically a linear operation. Thus, this common type of query currently runs in <em>O(n)</em> instead of <em>O(log n)</em> which is an incredibly dramatic difference.</p>
<p><span id="more-168"></span></p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-1899">LINQ-Implementation is not &#8216;thread&#8217;-safe</a><br />
A very <a href="http://blog.stevensanderson.com/2007/11/29/linq-to-sql-the-multi-tier-story/">similar issue has been on LINQ-to-SQL&#8217;s todo list some time ago</a>.<br />
I&#8217;m not sure whether this is so much the typical use case. For the db4o case, it teaches us two things right now:</p>
<ul>
<li>Container reuse is non-trivial and should be approached with extreme care. You don&#8217;t want to run into this kind of byzantine error in a live app.</li>
<li>The object identity problem is similar in both Object-Relational Mapping and OODBMS</li>
</ul>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-191">Scalable server architecture: multiple readers against the same file, transactional files</a><br />
This sounds daunting, and it&#8217;s probably a huge one, as you can see from its age. I also believe this might be politically challenging, because this moves into the direction of <a href="http://www.versant.com">Versant&#8217;s</a> large-scale object database. However, there is a lot of movement into that direction from the user-side it seems &#8211; people are asking for features of this kind more and more lately, largely due to the ultra-cool LINQ integration db4o has. I suppose I&#8217;d be wise to focus on this kind of scenario as it could really become the preferred way of writing web applications: it&#8217;s extremely agile, supports rapid development, is flexible in that the same (LINQ) code could be used for different persistence layers if that should ever be needed and leads to clean, compile-time checked, type-safe code.</p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-236">Sanitize reflector design &#8211; remove core dependencies on generic reflector</a><br />
Being able to get rid of the generic reflector seems important, I&#8217;m already building my own code for this. Here we have conflicting requirements: The GenericReflector makes db4o very easy to use and may help beginners. It is also required in client-server scenarios where the server doesn&#8217;t have the necessary model dlls, but for most applications I think you should try to avoid it. Storing data in a generic manner is slow and requires a lot more space, making it highly inefficient.</p>
<p>Current attack vector: <a href="http://developer.db4o.com/Forums/tabid/98/aff/4/aft/9635/afv/topic/Default.aspx">Throw a Listener on the object created event</a> on the server and make sure the server knows the type.</p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-1905">Allow immediate TCP port reuse</a><br />
When opening a server, the TCP port will be blocked in case the server crashes, the app is terminated, etc. Since that might happen quite often when you use the &#8216;integrated server&#8217; where the server is actually created in your web application, a restart of the web application will fail because the TCP port is blocked. In a client-server scenario, on the other hand, a simple restart wouldn&#8217;t be possible because you need to assign a new port to clients or wait 6 minutes. This should be fairly simple, but I don&#8217;t know if that comes with any side-effects.</p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-113">Fast Collections</a><br />
This is a huge one. The cool thing is that this could allow much more complicated queries to be executed in reasonable timeframes. However, I&#8217;m a bit worried about <a href="http://tracker.db4o.com/browse/COR-644">the issue &#8220;FastCollections: Inside BTree List implementation&#8221;</a>, because that sounds really important, but is in state &#8220;Won&#8217;t fix&#8221;.</p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-478">Locking</a><br />
Optimistic locking would be a nice-to-have thingie, but you do this yourself rather easy I think.</p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-1772">A new object is stored upon value type updates</a><br />
This is rated critical, so it&#8217;s not an item for a &#8220;wishlist&#8221;. I&#8217;m not sure if I understand its implications and I rarely use value types apart from Guids, and updating Guids is pointless &#8211; still a db4o user should probably know this and keep this in mind, so I felt it should go here &#9760;.</p>
<h2>db4o Configuration</h2>
<p>This one is not really in the issue tracker as a single item, and it&#8217;s more of a general remark. <em>One of the rather messy things in db4o is configuration</em>. Even with the <a href="http://programing-fun.blogspot.com/2008/10/changes-in-db4o-configuration.html">new configuration interface</a>, there is quite a bit of confusion among users. The reason, in my eyes, is mostly a combination of incomplete documentation and unexpected behaviour. Examples:</p>
<ul>
<li>The indexation setting for fields is the only configuration setting that is persistent. Everything else, including unique constraints, needs to be re-set when (or &ndash; more precisely &ndash; <em>before</em>) opening the <code>ObjectContainer</code>.</li>
<li>Applying an option to a field that doesn&#8217;t exist will not trigger a warning or an <code>Exception</code></li>
<li>Some settings simply won&#8217;t have any effect when you perform them after opening the <code>ObjectContainer</code>, but they do not warn you.</li>
<li>Certain settings <strong>must</strong> be applied on the server, a few <strong>must</strong> be applied on the client and with some &#8230; well, you just set them on both just to make sure. Here, db4o does throw exceptions, however!</li>
<li>Several options need to be set before <em>creating</em> the object container (e.g. string encoding) and cannot be reset afterwards, again being completely silent about the ineffectiveness of the respective settings.</li>
<li>Some settings, such as field-based cascade-on-activate, <a href="http://developer.db4o.com/Forums/tabid/98/aff/4/aft/9783/afv/topic/Default.aspx">simply don&#8217;t seem to work at all</a></li>
</ul>
<p>This leads to lots and lots of confusion. Most importantly, it is often hard to determine whether a certain setting was successfully applied, or not. Also, some defaults are unexpected:</p>
<ul>
<li>Default <code>ActivationDepth</code> is (completely random) 5. Why not 8? Or 2? This troubles beginners a lot. Either set it to infinity, or to zero. Everything else feels just random. You can still include a line <code>ActivationDepth = 5;</code> in beginner&#8217;s samples, thereby showing them that the setting is there and that they need to be aware of it.</li>
<li>Default string encoding seems to be <code>UTF-16</code> or <code>UCS-2</code>, probably the most useless encodings around, despite the Windows Kernel working with it. <code>UTF-8</code> would come in as a reasonable default, but with <code>UTF-16</code> half of your database is probably zeros, because even in non-english environments, there is still a lot of mostly ASCII-data to be stored (such as URLs, Email addresses, base64 encoded information, SHA-hashes, etc.). Also many languages have non-ASCII characters only sparingly, German being one example.</li>
</ul>
<p>I think it&#8217;d be really cool if the configuration interface was a little more explicit and would throw Exceptions instead of silently ignoring requests that cannot be fulfilled.</p>
<div style="float: right;"><p align="right"><a rel="nofollow" class="tt" href="http://twitter.com/home/?status=My+db4o+Wishlist+http://bit.ly/cC000j" title="Post to Twitter"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a rel="nofollow" class="tt" href="http://delicious.com/post?url=http://www.emphess.net/2010/04/14/my-db4o-wishlist/&amp;title=My+db4o+Wishlist" title="Post to Delicious"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a rel="nofollow" class="tt" href="http://digg.com/submit?url=http://www.emphess.net/2010/04/14/my-db4o-wishlist/&amp;title=My+db4o+Wishlist" title="Post to Digg"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a rel="nofollow" class="tt" href="http://www.facebook.com/share.php?u=http://www.emphess.net/2010/04/14/my-db4o-wishlist/&amp;t=My+db4o+Wishlist" title="Post to Facebook"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a></p></div>]]></content:encoded>
			<wfw:commentRss>http://www.emphess.net/2010/04/14/my-db4o-wishlist/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NoSQL Approaches: Trying to use db4o in the Real World</title>
		<link>http://www.emphess.net/2010/04/12/nosql-approaches-trying-to-use-db4o-in-the-real-world/</link>
		<comments>http://www.emphess.net/2010/04/12/nosql-approaches-trying-to-use-db4o-in-the-real-world/#comments</comments>
		<pubDate>Mon, 12 Apr 2010 22:14:38 +0000</pubDate>
		<dc:creator>Christoph Menge</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[Entrepreneurship]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[db4o]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[ASP.NET MVC]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Startup]]></category>

		<guid isPermaLink="false">http://www.emphess.net/?p=138</guid>
		<description><![CDATA[We&#8217;ve been working a lot on db4o related and db4o based projects lately, and close to completion of the first and most simple product, we really hit a few roadblocks.
UPDATE: Just after releasing this article, I found the bug in our code. It&#8217;s not db4o&#8217;s fault after all&#8230;
Motivation
One thing up front: We don&#8217;t need an [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve been working a lot on <a href="http://www.db4o.com">db4o</a> related and db4o based projects lately, and close to completion of the first and most simple product, we really hit a few roadblocks.<br />
<strong>UPDATE: Just after releasing this article, I found the bug in our code. It&#8217;s not db4o&#8217;s fault after all&#8230;</strong></p>
<h2>Motivation</h2>
<p>One thing up front: We don&#8217;t need an <a href="http://en.wikipedia.org/wiki/Object_database">object database</a> for the simple tools we currently build, but we felt it was a good idea to get some acquaintance with the technology, because we will certainly need it for our (stealth) startup &#8220;pactas&#8221; soon. In pactas, the data structure is really complicated (with a fancy class hierarchy) and it probably will change very frequently.</p>
<p>Also, since I am such a big fan of reusability, we really develop a web engine &#8211; a framework that allows us to reuse a lot of the code for different projects and make sure most of our code has been tested thoroughly in the field. This proves to be a significant design decision when it comes to data modeling, and while I&#8217;m very happy with the decision, it certainly makes development harder.</p>
<h2>The Problem</h2>
<p>So, in time with the release of our simple <a href="http://www.backlink-tracker.net">backlink tracker</a>, one of our development database files started to show strange behaviour &#8211; a certain query (via LINQ) would not order objects anymore &#8211; the <code>orderby</code>-clause seemingly was completely ignored. Our live server even came up with an &#8220;Invalid DateTimeKind specified&#8221; exception when trying to perform the query! What&#8217;s worse: The problem kept occurring from time to time, but it was not reproducible! Byzantine errors are clearly my favourite&#8230; </p>
<p>We thought the issue might be related to the current development/unstable versions of db4o that we were using (7.12 and 7.13). Using the stable version of db4o (7.4) proved difficult, because the old LINQ provider falls back to LINQ to Objects very often, which requires to fetch all objects in question from the database &#8211; that is very, very slow for a lot of objects, so we had to abandon that. We clearly wanted to stick to LINQ for a number of reasons (compile-time checking, readability, reusability).</p>
<p>Obviously the problem is related to the <code>orderby</code> operation on <code>DateTime</code> fields. I tried to modify the query, removed grouping because I feared it might be unstable (<em>warning</em>: the <em>sort</em> operation is, in fact, unstable! Unstable grouping would be useless, but the grouping is stable so that&#8217;s fine), even debugged the db4o code, but I couldn&#8217;t find any problem in there. Since the code is rather complex and that was the first time I took a look at it, I was happy to find some of the relevant code at all. <font style="text-decoration: line-through;">Somewhere in the deeps of it, something screwed up.</font> I didn&#8217;t want to spend too much time on that since I had to chew some particle physics on the side. </p>
<h2>Solutions?</h2>
<p>At the time of writing this (in fact, yesterday) I had something like a hotfix, but it turned out it&#8217;s completely nonsense &#8211; it worked around our internal bug in a very peculiar way. No more, no less.</p>
<p>When talking about this, somebody mentioned that it wasn&#8217;t such a good idea to use <code>DateTime</code> at all, and we should store the ticks instead. That lead to two long discussions with my co-ed Christian. We concluded: First, the power of object databases is that they do not force you to hack around and find different representations for your data (which raises the bar for object databases). The one big shortcoming of SQL is that it forces you to find a second, equally good, but different representation of your data and you need to translate between these representations all the time. You need to synchronize them. And you lose a lot of fancy features (such as lists, generics, inheritance, etc.) on the way.</p>
<p>Secondly, Christian suggested that object databases suffer from <a href="http://www.joelonsoftware.com/articles/LeakyAbstractions.html">leaky abstractions</a> badly, in that they break encapsulation in a way that leaks out a hell of a lot of implementation details. As Joel puts it: &#8220;All non-trivial abstractions, to some degree, are leaky.&#8221; The point is: <font style="text-decoration: strikethrough">With the error we&#8217;re currently encountering, it&#8217;s becoming a <em>real problem</em>. This is not an <a href="http://www.codinghorror.com/blog/2009/06/all-abstractions-are-failed-abstractions.html">imperfect piece of architecture</a>, it&#8217;s an exception in a database query. It kills the app dead! Boom!</font></p>
<p>In order to get activation [depth] straight, db4o needs to know how .NET&#8217;s containers work internally &#8211; that is clearly a detail that should be hidden, but db4o knows about it. db4o also takes care of that, but it leads to some messy issues. There is special code in db4o that handles containers, non-trivial objects such as <code>strings</code>, <code>DateTime</code> (which are non-trivial because they use 62 bits for the actual ticks and 2 bits for the <a href="http://msdn.microsoft.com/en-us/library/shx7s921.aspx">DateTimeKind</a>) and Lists. This is an implementation detail of the .NET framework, and it might change over time. There&#8217;s <a href="http://tracker.db4o.com/browse/COR-1582">been a bug with <code>Map</code></a>, and I&#8217;m almost sure there is a bug with <code>DateTime</code>, too. Don&#8217;t get me wrong: The fact that <em>some kind of mapping</em> is needed is a somewhat generic (if not <em>the</em>) problem of serialization, it&#8217;s not really db4o-specific, and cannot be eliminated. In Hibernate, there is also a lot of code that handles the mapping of lists and the like but it doesn&#8217;t rely on implementation detail, thus it&#8217;s not (as) leaky.</p>
<h2>Back to SQL?</h2>
<p>Here&#8217;s the thing: I&#8217;d be beneficial to have a storage that is <em>independent of the actual implementation on top</em>, because it decouples the data store from the application which is <a href="http://developer.db4o.com/Forums/tabid/98/aff/4/aft/9847/afv/topic/Default.aspx">not what db4o does</a>. But wait, that is exactly what SQL is, right? Indeed: SQL forces you to map (or to cut down) your stuff to it&#8217;s internal features. A list becomes a foreign key on the other table, but that doesn&#8217;t play well with derived types, generics, etc&#8230; This is tricky as we all know, and <a href="http://www.ohloh.net/p/nhibernate/analyses/latest">you need lots of code to do that</a>. </p>
<p>Worse, SQL forces you do that mapping for everything, including your own classes, and it demands a schema for every type of object &#8211; this is not what I want. I&#8217;d like to see a set of base objects in an object database which are natively understood by the DB. These objects can be mapped from and to by a layer which may be part of the db, or can be added manually if you need something very special. Still, it should allow to store your object in the database with all it&#8217;s magic, only that known objects will be translated, e.g. a <code>DateTime</code> will be stored as <code>long Ticks</code> and a <code>DateTimeKind</code> flag separated, making comparison operations easier (note that the comparison only works on the ticks: Whether the time is local or UTC is not considered by .NET in comparisons). Lists will be unfolded into an internal tree representation if indexed, so they become much easier to cope with for the database. That would also make <a href="http://www.gamlor.info/wordpress/?p=1069">managing m:n relations</a> easier, since they could then be viewed as bilateral relationships &#8211; something you often need. Right now, they&#8217;re unilateral, thereby requiring some additional management on your behalf.</p>
<p>Migrating to SQL is clearly an option for this product, since data couldn&#8217;t fit SQL any better, but that doesn&#8217;t solve the issue for our pactas project, where we certainly will need schema-less data, complex object hierarchies, etc.</p>
<p>However, there are a few issues that <font style="text-decoration: line-through;">remain unresolved as of now and they do qualify as show stoppers</font> :</p>
<ul>
<li style="text-decoration: line-through;">There&#8217;s an irreproducible bug that potentially wrecks the db</li>
<li>There&#8217;s a <a href="http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/">nasty performance issue with larger amounts of data</a></li>
</ul>
<h2>Alternatives</h2>
<p>I&#8217;m quite dissatisfied that we <font style="text-decoration: line-through;">have to abandon db4o at this stage</font>, because I believe it&#8217;s the best kind of serialization I&#8217;ve ever experienced. It&#8217;s <a href="http://blog.wekeroad.com/2010/02/06/nosql-a-practical-approach-part-1">perfectly simple</a>, it&#8217;s fast and most importantly, code-centric. If reusability is a concern, having the database structure and/or the ORM mapper dictate the objects is a major pain.</p>
<p><a href="http://en.wikipedia.org/wiki/ADO.NET_Entity_Framework">Entity Framework 4 (EF4)</a> promises to handle this a lot better through <a href="http://blogs.msdn.com/adonet/archive/2009/05/12/sneak-preview-model-first-in-the-entity-framework-4-0.aspx">&#8220;Model First&#8221;</a> and <a href="http://blogs.msdn.com/efdesign/archive/2009/06/10/code-only.aspx">&#8220;Code Only&#8221;</a>, but I am still a bit afraid of EF because it doesn&#8217;t appear to be anything near lightweight and we will need schemaless storage for our future products anyways.</p>
<p><a href="http://www.versant.com">Versant</a>, which bought db4o some time ago also offers its large-scale object database, which seems to suit large web-applications a lot better. First, it&#8217;s certainly made for huge amounts of data (unlike db4o, which comes from an embedded background and <a href="http://www.sdtimes.com/link/33117">is aimed at database sizes in the low GB area</a>, but <a href="https://developer.db4o.com/Documentation/Reference/db4o-7.4/java/reference/html/reference/tuning/performance_hints/increasing_the_maximum_database_file_size.html">supports up to 254 GB per file</a>) and <a href="http://developer.db4o.com/Forums/tabid/98/aff/4/aft/9855/afv/topic/Default.aspx">handles multi-threading better</a>. Also, since db4o <a href="http://www.itwire.com/sponsored-announcements/38149-versant-expands-db4o-open-source-licensin">now moved to v3 of the GPL</a>, it may not be freely usable in non-open source web applications anymore, so both solutions now have a price tag.</p>
<p>There is also an <a href="http://www.versant.com/en_US/solutions/oem_program/">ISV/OEM empowerment program</a> for Versant&#8217;s Object Database, which seems to make it affordable, but I haven&#8217;t looked at it in detail yet. Over the next weeks, I will have to evaluate a few of those other NoSQL solutions such as <a href="http://cassandra.apache.org/">Cassandra</a> and <a href="www.mongodb.org/">MongoDB</a>, just to name two totally different options. </p>
<h2>Aftermath</h2>
<p>So what was the issue, after all? db4o did not screw up, we did: Take a blend of local and UTC time based on a completely random criterion, add two spoons of daylight savings time changes, add some misconfigured timezone on the server, bake for two weeks at 500 &deg; C and pling! You got yourself some really strange issue cake. Lessons learned:</p>
<ul>
<li>Debugging issue with object databases is harder than with RDBMS because the information is not chopped up.</li>
<li>Do code reviews. Do code reviews.</li>
<li>There aren&#8217;t too many alternatives to db4o, after all</li>
<li>With the elaborated architecture and code-centric design we currently have, going back to SQL is a huge pain, and we won&#8217;t do it</li>
</ul>
<p>All serialization is painful.</p>
<div style="float: right;"><p align="right"><a rel="nofollow" class="tt" href="http://twitter.com/home/?status=NoSQL+Approaches%3A+Trying+to+use+db4o+in+the+Real+World+http://bit.ly/bHV4Kz" title="Post to Twitter"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a rel="nofollow" class="tt" href="http://delicious.com/post?url=http://www.emphess.net/2010/04/12/nosql-approaches-trying-to-use-db4o-in-the-real-world/&amp;title=NoSQL+Approaches%3A+Trying+to+use+db4o+in+the+Real+World" title="Post to Delicious"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a rel="nofollow" class="tt" href="http://digg.com/submit?url=http://www.emphess.net/2010/04/12/nosql-approaches-trying-to-use-db4o-in-the-real-world/&amp;title=NoSQL+Approaches%3A+Trying+to+use+db4o+in+the+Real+World" title="Post to Digg"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a rel="nofollow" class="tt" href="http://www.facebook.com/share.php?u=http://www.emphess.net/2010/04/12/nosql-approaches-trying-to-use-db4o-in-the-real-world/&amp;t=NoSQL+Approaches%3A+Trying+to+use+db4o+in+the+Real+World" title="Post to Facebook"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a></p></div>]]></content:encoded>
			<wfw:commentRss>http://www.emphess.net/2010/04/12/nosql-approaches-trying-to-use-db4o-in-the-real-world/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>db4o Queries on Large Datasets and a bit of Linq</title>
		<link>http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/</link>
		<comments>http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/#comments</comments>
		<pubDate>Tue, 16 Mar 2010 12:26:49 +0000</pubDate>
		<dc:creator>Christoph Menge</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[db4o]]></category>
		<category><![CDATA[linq]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://www.emphess.net/?p=123</guid>
		<description><![CDATA[My last small note on db4o performance will soon be outdated &#8211; fortunately. Newer releases of db4o will no longer rely on Cecil to perform reflection, thereby speeding up db4o linq queries &#8211; However, make sure you have Mono.Reflection.dll in your app! Also there are some restrictions when it comes to the compact framework and [...]]]></description>
			<content:encoded><![CDATA[<p>My last small note on db4o performance <a href="http://developer.db4o.com/Forums/tabid/98/aff/37/aft/9716/afv/topic/Default.aspx">will soon be outdated</a> &#8211; fortunately. Newer releases of db4o will no longer rely on Cecil to perform reflection, thereby speeding up db4o linq queries &#8211; <strong>However, make sure you have Mono.Reflection.dll in your app!</strong> Also there are some restrictions when it comes to the compact framework and native queries (which still need Cecil), so you&#8217;d best make sure to <a href="http://developer.db4o.com/Blogs/Product/tabid/167/archive/month/date/2010-02-28/Default.aspx" target="_blank">read this official db4o announcement</a>.</p>
<p>Talking about speed and performance, I just came across an issue that was also discussed in db4o forums very recently: Sort operations on large datasets.<br />
Note: This has nothing to do with linq or linq to db4o, it&#8217;s just the same for SODA queries.</p>
<p>Let&#8217;s take a very common example: Find some most recent items, for example most recent blog/forum posts or some other &#8216;top list&#8217; on a very large amount of entries N:</p>
<pre class="brush: csharp">
        var mostRecentPosts = (from Posts o
              in ObjectContainer
              orderby o.Created descending
              select o).Take(100).ToList();
</pre>
<p>This query will be really really slow on a large amount of objects. Why? Essentially, the BTREE operation is very fast as it should be, but unfortunately db4o will invoke its SODA system on each of the objects, even if they are already outruled by the BTREE operation. </p>
<p>See <a href="http://developer.db4o.com/Forums/tabid/98/aff/4/aft/9751/afv/topic/afpgj/1/Default.aspx#27735">this discussion on the db4o forum</a> and <a href="http://tracker.db4o.com/browse/COR-1133">the associated Jira-bug</a>.</p>
<p>This is somewhat sad, because a query on a previously filtered set of items is blazing, about 100x faster:</p>
<pre class="brush: csharp">
        var mostRecentPosts = (from Posts o
              in ObjectContainer
              where o.Created &gt; yesterday
              orderby o.Created descending
              select o).Take(100).ToList();
</pre>
<p>The latter operation plays roughly in the same league as SQL Server (don&#8217;t flame me &#8211; performance comparisons and profiling is really complicated and there is a zillion of factors that influence it, I know. That&#8217;s why I say &#8216;roughly the same league for this query&#8217; and I&#8217;m talking about default setups). </p>
<p>This is a somewhat unfortunate situation, because it produces slow queries for many typical applications &#8211; unless you have a strong where clause that cuts down N from millions to hundreds. If you&#8217;re interested in seeing this issue fixed, head over to their <a href="http://tracker.db4o.com/browse/COR-1133">issue tracker and vote for the issue to be fixed</a>! Thanks!</p>
<p>I&#8217;ll be posting a bit more on db4o over the next few days I hope.</p>
<div style="float: right;"><p align="right"><a rel="nofollow" class="tt" href="http://twitter.com/home/?status=db4o+Queries+on+Large+Datasets+and+a+bit+of+Linq+http://bit.ly/91M9fW" title="Post to Twitter"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a rel="nofollow" class="tt" href="http://delicious.com/post?url=http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/&amp;title=db4o+Queries+on+Large+Datasets+and+a+bit+of+Linq" title="Post to Delicious"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a rel="nofollow" class="tt" href="http://digg.com/submit?url=http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/&amp;title=db4o+Queries+on+Large+Datasets+and+a+bit+of+Linq" title="Post to Digg"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a rel="nofollow" class="tt" href="http://www.facebook.com/share.php?u=http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/&amp;t=db4o+Queries+on+Large+Datasets+and+a+bit+of+Linq" title="Post to Facebook"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a></p></div>]]></content:encoded>
			<wfw:commentRss>http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
