<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>emphess .NET &#187; linq</title>
	<atom:link href="http://www.emphess.net/tag/linq/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.emphess.net</link>
	<description>Christoph Menge&#039;s Blog</description>
	<lastBuildDate>Tue, 15 Jun 2010 00:50:30 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Object-Document Mismatch: MongoDB and db4o with Linq</title>
		<link>http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/</link>
		<comments>http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/#comments</comments>
		<pubDate>Wed, 05 May 2010 15:43:07 +0000</pubDate>
		<dc:creator>Christoph Menge</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[db4o]]></category>
		<category><![CDATA[linq]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://www.emphess.net/?p=232</guid>
		<description><![CDATA[Rob Conery recently wrote about using MongoDB with Linq. I was really intrigued by the fact that you can use elegant, readable, type-safe Linq-queries to access MongoDB document database. To be honest, I had no clue what an object database really is, but when it speaks Linq it must be cool, I thought. 
So I [...]]]></description>
			<content:encoded><![CDATA[<p>Rob Conery recently <a href="http://blog.wekeroad.com/2010/03/04/using-mongo-with-linq">wrote about using MongoDB with Linq</a>. I was really intrigued by the fact that you can use elegant, readable, type-safe Linq-queries to access <a href="http://www.mongodb.org/display/DOCS/Home">MongoDB document database</a>. To be honest, I had <strong>no clue</strong> what an <strong>object database</strong> really is, but when it speaks Linq it must be cool, I thought. </p>
<p>So I dug a bit deeper into MongoDB and NoRM, which is a nifty C# driver for MongoDB developed by <a href="http://andrewtheken.com/">Andrew Theken</a> and several others. You might want to <a href="http://github.com/atheken/NoRM">grab a copy at github</a>, where you can see how incredibly active the project is! Now, back to the evaluation: what is the best way to evaluate a database? Of course, build <a href="http://www.backlink-tracker.net/">a live product using it</a>!</p>
<p>Since I was completely abusing db4o for said project (in fact, I am storing something you&#8217;d call documents there), I decided that this would be a great candidate for a migration. So now we&#8217;re migrating from an object database to a document database and from an ACID database to a NoSQL solution.</p>
<p>MongoDB is considered a <a href="http://nosql-database.org/">NoSQL</a> solution, <a href="http://www.kellblog.com/2010/04/11/yes-virginia-marklogic-is-a-nosql-system/">while db4o is considered &#8217;soft&#8217; NoSQL</a> &#8211; see Stefan&#8217;s comment at the bottom. Why the distinction &#8211; both do not rely on, support or use SQL whatsoever?! But then again, that is not what the Term &raquo;<em>NoSQL</em>&laquo; is all about. It&#8217;s probably one of the most misleading terms ever coined and perhaps it should read &raquo;Not ACID&laquo; or just &raquo;Persistence without Prejudgement, <em>PwoP</em>&laquo;. db4o makes ACID guarantees, comes from an embedded background and offers single-server durability while MongoDB is made for the net, does not have single-server durability, supports MapReduce and is driven by JavaScript. </p>
<p>Hell, they couldn&#8217;t be more different. But then again, I can access both using identical interfaces:</p>
<pre class="brush: csharp">
// Linq is understood by both, so you could use the lines in both:
var u = (from Note n in container
         where n.Text == null select n).ToList();
var v = session.Query&lt;Tag&gt;().Where(p =&gt; p.Name != null);
// Store an object graph to Mongo using NoRM:
session.Add(testNote);
// Store an object graph to db4o:
container.Store(testNote);
</pre>
<p>Don&#8217;t be fooled &#8211; although these lines could all be used with MongoDB or db4o and all of them could even be used with the very same classes, they are still fundamentally different in behavior. Also, for anything but the most simple problems, <strong>you can&#8217;t just persist the same domain models</strong>.</p>
<h2>What is a document now?</h2>
<p>A document is not just an unstructured piece of data. It&#8217;s not a BLOB. Instead, an instance of a class, <strong>plus all it refers to</strong>, could be a document:</p>
<pre class="brush: csharp">
class UniqueIdObject
{
    public Guid Id { get; private set; }
    public UniqueIdObject() { Id = Guid.NewGuid(); }
}

class Report : UniqueIdObject
{
    public Report() : base() { Tags = new List&lt;Tag&gt;(); }
    public string Text { get; set; }
    public List&lt;Tag&gt; Tags { get; set; }
}

class Tag : UniqueIdObject
{
    public string Name { get; set; }
}
</pre>
<p>Now we might want to store a note in the database, and the code needed is just</p>
<pre class="brush: csharp">
using (NoRMSession session = new NoRMSession())
{
    Report newReport = new Report() { Text = &quot;Hello World, MongoDB!&quot; };
    newReport.Tags.Add(new Tag() { Name = &quot;Tag 1&quot; });
    newReport.Tags.Add(new Tag() { Name = &quot;Tag 2&quot; });
    session.Add(newReport);
}
</pre>
<p>That&#8217;s it! Wow! Of course, we haven&#8217;t taken care of indexation and stuff and since MongoDB is schemaless (or better, has a dynamic schema) we need to do that in code. But still, this is essentially all you need.</p>
<p>The important thing now is: What happens to those little `Tags` we deliberately put into a separate class? Now the mapper hides a bit of truth from us, because MongoDB works on so-called &#8220;Collections&#8221;. <code>session.Add(newReport)</code> will be <code>session.Add&lt;Report&gt;(newReport)</code>, which will in turn put the <code>newReport</code> object <strong>into the Report-Collection!</strong> So the object graph, as it is, will be serialized into the Report collection, <em>including</em> our little Tag objects!</p>
<p>Each item with an orange border is an &#8216;atomic&#8217; item in its respective data store:</p>
<div id="attachment_242" class="wp-caption alignleft" style="width: 310px"><a href="http://www.emphess.net/wp-content/uploads/2010/05/db4o-graph.png"><img src="http://www.emphess.net/wp-content/uploads/2010/05/db4o-graph-300x184.png" alt="" title="db4o-graph" width="300" height="184" class="size-medium wp-image-242" /></a><p class="wp-caption-text">db4o serialized graph</p></div>
<p><div id="attachment_241" class="wp-caption alignleft" style="width: 310px"><a href="http://www.emphess.net/wp-content/uploads/2010/05/mongo.png"><img src="http://www.emphess.net/wp-content/uploads/2010/05/mongo-300x200.png" alt="" title="mongo" width="300" height="200" class="alignright size-medium wp-image-241" /></a><p class="wp-caption-text">mongo serialized document</p></div><br />
</p>
<div style="clear: both;"></div>
<p>Let&#8217;s naively try to fetch all tags:</p>
<pre class="brush: csharp">
var v = session.Query&lt;Tag&gt;().Where(p =&gt; p.Name != null);
</pre>
<p>This does not work, <code>v</code> is <code>null</code> because there is no tag collection! Instead, the tag lists are <strong>part of the reports</strong> we put into the <code>Report</code> collection. Note that this would work in db4o, because db4o will store references as references, while &#8216;documents&#8217; store the contained data instead of references. This is beautifully simple, but it&#8217;s also very different from what you might expect and it has lots of implications for your object structure.</p>
<h2>Thoroughly think through your schemaless schema</h2>
<p>MongoDB is made for <em>scalability and simplicity</em>, so it does not care for our foolish approach to fetch tags directly. There are ways to access that data directly, however. We could use a deep-graph query or write a javascript Map/Reduce instruction, but that is a bit out of scope right now. What&#8217;s more important is that it calls for <strong>changes to our domain model objects</strong>: If we really want to store a reference, we need to do so manually, in ye olde sql-way, by storing the associated Id instead of the object. Of course, that makes deserialization a bit more complicated because the object we now retrieve from the database aren&#8217;t ready-to-use as they come.</p>
<p>However, automating that process will induce several complications, among them the need to <strong>handle cyclic references</strong>, a concept for fetching or activating objects on-the-fly, called <strong>Transparent Activation</strong> in the db4o world and making sure we&#8217;re not inducing a massive performance hit along the way.</p>
<p>Also, updating objects can be painful. Suppose we stored a list of <code>Reports</code> for each user. Now we might want to put the list of reports directly into the user object, <strong>or</strong> store a list of <code>ReportIds</code> for each users and put the Reports into their very own ReportCollection. As usual, <strong>there is no silver bullet</strong>, so this decision depends on the specific needs of the application, but whatever decision we take will not be visible to users of the resulting objects. In fact, it leads to some unwanted <strong>strong coupling</strong>:</p>
<pre class="brush: csharp">
class ReportService
{
  private NoRMSession _session;

  // ...
  public void UpdateReportDetail()
  {
    this.ReportDetailXY = ComplicatedCalculation();
    this.LastChanged = DateTime.UtcNow;

    // If the reports are in their very own Report Collection, this
    // is fine. However, if they are contained as lists in the user
    // who owns them, we&#039;re in trouble and this will fail!
    _session.Update(this);
  }
}
</pre>
<p>This might not be a big problem for <strong>really small applications</strong>, and if you really need performance (why else would you choose a NoSQL system?) you have to fine-tune your objects anyways. Still, I have the feeling that there is some space for improvement here, and a <strong>basic</strong> wrapper could help in overcoming some of the issues raised. Some basic ideas on how to approach this will follow shortly.</p>
<h2>Wrapping it up</h2>
<p>Obviously, I&#8217;m comparing Apples and Oranges here: db4o is made to <strong>make persistence easier</strong>, especially with complex domain model objects. db4o, being an <strong>object database</strong>, behaves exactly the way you&#8217;d expect objects under serialization to behave, but that makes it quite complex. MongoDB is focused on <strong>simplicity and scalability</strong>. Through the document concept, you gain simplicity on the database, administration and wrapper (driver)-side, but you have to struggle with a slight <strong>impedance mismatch</strong> in code, especially against Linq, again.</p>
<div style="float: right;"><p align="right"><a rel="nofollow" class="tt" href="http://twitter.com/home/?status=The+Object-Document+Mismatch%3A+MongoDB+and+db4o+with+Linq+http://bit.ly/925m7U" title="Post to Twitter"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a rel="nofollow" class="tt" href="http://delicious.com/post?url=http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/&amp;title=The+Object-Document+Mismatch%3A+MongoDB+and+db4o+with+Linq" title="Post to Delicious"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a rel="nofollow" class="tt" href="http://digg.com/submit?url=http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/&amp;title=The+Object-Document+Mismatch%3A+MongoDB+and+db4o+with+Linq" title="Post to Digg"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a rel="nofollow" class="tt" href="http://www.facebook.com/share.php?u=http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/&amp;t=The+Object-Document+Mismatch%3A+MongoDB+and+db4o+with+Linq" title="Post to Facebook"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a></p></div>]]></content:encoded>
			<wfw:commentRss>http://www.emphess.net/2010/05/05/the-object-document-mismatch-mongodb-and-db4o-with-linq/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>My db4o Wishlist</title>
		<link>http://www.emphess.net/2010/04/14/my-db4o-wishlist/</link>
		<comments>http://www.emphess.net/2010/04/14/my-db4o-wishlist/#comments</comments>
		<pubDate>Wed, 14 Apr 2010 22:30:18 +0000</pubDate>
		<dc:creator>Christoph Menge</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[db4o]]></category>
		<category><![CDATA[linq]]></category>
		<category><![CDATA[Object Database]]></category>
		<category><![CDATA[OODBMS]]></category>

		<guid isPermaLink="false">http://www.emphess.net/?p=168</guid>
		<description><![CDATA[After finding that db4o did not screw up in our projects, I dug a bit through their issue tracker, which is a very important resource you should definitely check out if you&#8217;re working with db4o!
Just to get that straight: I&#8217;m an avid db4o user and really love it. These issues are not critical and they [...]]]></description>
			<content:encoded><![CDATA[<p>After finding that <a href="http://www.emphess.net/2010/04/12/nosql-approaches-trying-to-use-db4o-in-the-real-world/">db4o did not screw up</a> in our projects, I dug a bit through <a href="http://tracker.db4o.com/secure/Dashboard.jspa">their issue tracker</a>, which is a very important resource you should definitely check out if you&#8217;re working with db4o!</p>
<p>Just to get that straight: I&#8217;m an avid db4o user and really love it. These issues are not critical and they don&#8217;t stop me from using or evangelizing db4o. However, I think there is some lack of awareness of some issues.</p>
<p>Also, I&#8217;d like to spawn some discussion about the issues below. Unfortunately, due to their changes to the forum system, most of the original discussions on the <a href="http://developer.db4o.com/Forums.aspx">db4o forum</a> are hard to find or possibly lost. You may want to <em>vote on the issues you deem most pressing</em>, which you can easily do in their issue tracker! I&#8217;m very interested in what you think about this little selection.</p>
<h2>Selected Issues</h2>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-1133">Don&#8217;t run SODA when no more constraints are present</a><br />
I <a href="http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/">blogged about this already</a>, because you experience this in very common scenarios, namely whenever you query a small subset of a larger candidate set. For example, consider selecting the last 50 posts on a blog/qa-site/etc. What will happen is that db4o runs the BTREE query for the sort operation (blazing), <em>then hydrates (?) all objects</em>, the returns the first 50 of them and throws away the rest. Thing is that there is no need to further inspect the items, and activating all them is basically a linear operation. Thus, this common type of query currently runs in <em>O(n)</em> instead of <em>O(log n)</em> which is an incredibly dramatic difference.</p>
<p><span id="more-168"></span></p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-1899">LINQ-Implementation is not &#8216;thread&#8217;-safe</a><br />
A very <a href="http://blog.stevensanderson.com/2007/11/29/linq-to-sql-the-multi-tier-story/">similar issue has been on LINQ-to-SQL&#8217;s todo list some time ago</a>.<br />
I&#8217;m not sure whether this is so much the typical use case. For the db4o case, it teaches us two things right now:</p>
<ul>
<li>Container reuse is non-trivial and should be approached with extreme care. You don&#8217;t want to run into this kind of byzantine error in a live app.</li>
<li>The object identity problem is similar in both Object-Relational Mapping and OODBMS</li>
</ul>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-191">Scalable server architecture: multiple readers against the same file, transactional files</a><br />
This sounds daunting, and it&#8217;s probably a huge one, as you can see from its age. I also believe this might be politically challenging, because this moves into the direction of <a href="http://www.versant.com">Versant&#8217;s</a> large-scale object database. However, there is a lot of movement into that direction from the user-side it seems &#8211; people are asking for features of this kind more and more lately, largely due to the ultra-cool LINQ integration db4o has. I suppose I&#8217;d be wise to focus on this kind of scenario as it could really become the preferred way of writing web applications: it&#8217;s extremely agile, supports rapid development, is flexible in that the same (LINQ) code could be used for different persistence layers if that should ever be needed and leads to clean, compile-time checked, type-safe code.</p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-236">Sanitize reflector design &#8211; remove core dependencies on generic reflector</a><br />
Being able to get rid of the generic reflector seems important, I&#8217;m already building my own code for this. Here we have conflicting requirements: The GenericReflector makes db4o very easy to use and may help beginners. It is also required in client-server scenarios where the server doesn&#8217;t have the necessary model dlls, but for most applications I think you should try to avoid it. Storing data in a generic manner is slow and requires a lot more space, making it highly inefficient.</p>
<p>Current attack vector: <a href="http://developer.db4o.com/Forums/tabid/98/aff/4/aft/9635/afv/topic/Default.aspx">Throw a Listener on the object created event</a> on the server and make sure the server knows the type.</p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-1905">Allow immediate TCP port reuse</a><br />
When opening a server, the TCP port will be blocked in case the server crashes, the app is terminated, etc. Since that might happen quite often when you use the &#8216;integrated server&#8217; where the server is actually created in your web application, a restart of the web application will fail because the TCP port is blocked. In a client-server scenario, on the other hand, a simple restart wouldn&#8217;t be possible because you need to assign a new port to clients or wait 6 minutes. This should be fairly simple, but I don&#8217;t know if that comes with any side-effects.</p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-113">Fast Collections</a><br />
This is a huge one. The cool thing is that this could allow much more complicated queries to be executed in reasonable timeframes. However, I&#8217;m a bit worried about <a href="http://tracker.db4o.com/browse/COR-644">the issue &#8220;FastCollections: Inside BTree List implementation&#8221;</a>, because that sounds really important, but is in state &#8220;Won&#8217;t fix&#8221;.</p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-478">Locking</a><br />
Optimistic locking would be a nice-to-have thingie, but you do this yourself rather easy I think.</p>
<p>&raquo; <a href="http://tracker.db4o.com/browse/COR-1772">A new object is stored upon value type updates</a><br />
This is rated critical, so it&#8217;s not an item for a &#8220;wishlist&#8221;. I&#8217;m not sure if I understand its implications and I rarely use value types apart from Guids, and updating Guids is pointless &#8211; still a db4o user should probably know this and keep this in mind, so I felt it should go here &#9760;.</p>
<h2>db4o Configuration</h2>
<p>This one is not really in the issue tracker as a single item, and it&#8217;s more of a general remark. <em>One of the rather messy things in db4o is configuration</em>. Even with the <a href="http://programing-fun.blogspot.com/2008/10/changes-in-db4o-configuration.html">new configuration interface</a>, there is quite a bit of confusion among users. The reason, in my eyes, is mostly a combination of incomplete documentation and unexpected behaviour. Examples:</p>
<ul>
<li>The indexation setting for fields is the only configuration setting that is persistent. Everything else, including unique constraints, needs to be re-set when (or &ndash; more precisely &ndash; <em>before</em>) opening the <code>ObjectContainer</code>.</li>
<li>Applying an option to a field that doesn&#8217;t exist will not trigger a warning or an <code>Exception</code></li>
<li>Some settings simply won&#8217;t have any effect when you perform them after opening the <code>ObjectContainer</code>, but they do not warn you.</li>
<li>Certain settings <strong>must</strong> be applied on the server, a few <strong>must</strong> be applied on the client and with some &#8230; well, you just set them on both just to make sure. Here, db4o does throw exceptions, however!</li>
<li>Several options need to be set before <em>creating</em> the object container (e.g. string encoding) and cannot be reset afterwards, again being completely silent about the ineffectiveness of the respective settings.</li>
<li>Some settings, such as field-based cascade-on-activate, <a href="http://developer.db4o.com/Forums/tabid/98/aff/4/aft/9783/afv/topic/Default.aspx">simply don&#8217;t seem to work at all</a></li>
</ul>
<p>This leads to lots and lots of confusion. Most importantly, it is often hard to determine whether a certain setting was successfully applied, or not. Also, some defaults are unexpected:</p>
<ul>
<li>Default <code>ActivationDepth</code> is (completely random) 5. Why not 8? Or 2? This troubles beginners a lot. Either set it to infinity, or to zero. Everything else feels just random. You can still include a line <code>ActivationDepth = 5;</code> in beginner&#8217;s samples, thereby showing them that the setting is there and that they need to be aware of it.</li>
<li>Default string encoding seems to be <code>UTF-16</code> or <code>UCS-2</code>, probably the most useless encodings around, despite the Windows Kernel working with it. <code>UTF-8</code> would come in as a reasonable default, but with <code>UTF-16</code> half of your database is probably zeros, because even in non-english environments, there is still a lot of mostly ASCII-data to be stored (such as URLs, Email addresses, base64 encoded information, SHA-hashes, etc.). Also many languages have non-ASCII characters only sparingly, German being one example.</li>
</ul>
<p>I think it&#8217;d be really cool if the configuration interface was a little more explicit and would throw Exceptions instead of silently ignoring requests that cannot be fulfilled.</p>
<div style="float: right;"><p align="right"><a rel="nofollow" class="tt" href="http://twitter.com/home/?status=My+db4o+Wishlist+http://bit.ly/cC000j" title="Post to Twitter"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a rel="nofollow" class="tt" href="http://delicious.com/post?url=http://www.emphess.net/2010/04/14/my-db4o-wishlist/&amp;title=My+db4o+Wishlist" title="Post to Delicious"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a rel="nofollow" class="tt" href="http://digg.com/submit?url=http://www.emphess.net/2010/04/14/my-db4o-wishlist/&amp;title=My+db4o+Wishlist" title="Post to Digg"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a rel="nofollow" class="tt" href="http://www.facebook.com/share.php?u=http://www.emphess.net/2010/04/14/my-db4o-wishlist/&amp;t=My+db4o+Wishlist" title="Post to Facebook"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a></p></div>]]></content:encoded>
			<wfw:commentRss>http://www.emphess.net/2010/04/14/my-db4o-wishlist/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>db4o Queries on Large Datasets and a bit of Linq</title>
		<link>http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/</link>
		<comments>http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/#comments</comments>
		<pubDate>Tue, 16 Mar 2010 12:26:49 +0000</pubDate>
		<dc:creator>Christoph Menge</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[db4o]]></category>
		<category><![CDATA[linq]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://www.emphess.net/?p=123</guid>
		<description><![CDATA[My last small note on db4o performance will soon be outdated &#8211; fortunately. Newer releases of db4o will no longer rely on Cecil to perform reflection, thereby speeding up db4o linq queries &#8211; However, make sure you have Mono.Reflection.dll in your app! Also there are some restrictions when it comes to the compact framework and [...]]]></description>
			<content:encoded><![CDATA[<p>My last small note on db4o performance <a href="http://developer.db4o.com/Forums/tabid/98/aff/37/aft/9716/afv/topic/Default.aspx">will soon be outdated</a> &#8211; fortunately. Newer releases of db4o will no longer rely on Cecil to perform reflection, thereby speeding up db4o linq queries &#8211; <strong>However, make sure you have Mono.Reflection.dll in your app!</strong> Also there are some restrictions when it comes to the compact framework and native queries (which still need Cecil), so you&#8217;d best make sure to <a href="http://developer.db4o.com/Blogs/Product/tabid/167/archive/month/date/2010-02-28/Default.aspx" target="_blank">read this official db4o announcement</a>.</p>
<p>Talking about speed and performance, I just came across an issue that was also discussed in db4o forums very recently: Sort operations on large datasets.<br />
Note: This has nothing to do with linq or linq to db4o, it&#8217;s just the same for SODA queries.</p>
<p>Let&#8217;s take a very common example: Find some most recent items, for example most recent blog/forum posts or some other &#8216;top list&#8217; on a very large amount of entries N:</p>
<pre class="brush: csharp">
        var mostRecentPosts = (from Posts o
              in ObjectContainer
              orderby o.Created descending
              select o).Take(100).ToList();
</pre>
<p>This query will be really really slow on a large amount of objects. Why? Essentially, the BTREE operation is very fast as it should be, but unfortunately db4o will invoke its SODA system on each of the objects, even if they are already outruled by the BTREE operation. </p>
<p>See <a href="http://developer.db4o.com/Forums/tabid/98/aff/4/aft/9751/afv/topic/afpgj/1/Default.aspx#27735">this discussion on the db4o forum</a> and <a href="http://tracker.db4o.com/browse/COR-1133">the associated Jira-bug</a>.</p>
<p>This is somewhat sad, because a query on a previously filtered set of items is blazing, about 100x faster:</p>
<pre class="brush: csharp">
        var mostRecentPosts = (from Posts o
              in ObjectContainer
              where o.Created &gt; yesterday
              orderby o.Created descending
              select o).Take(100).ToList();
</pre>
<p>The latter operation plays roughly in the same league as SQL Server (don&#8217;t flame me &#8211; performance comparisons and profiling is really complicated and there is a zillion of factors that influence it, I know. That&#8217;s why I say &#8216;roughly the same league for this query&#8217; and I&#8217;m talking about default setups). </p>
<p>This is a somewhat unfortunate situation, because it produces slow queries for many typical applications &#8211; unless you have a strong where clause that cuts down N from millions to hundreds. If you&#8217;re interested in seeing this issue fixed, head over to their <a href="http://tracker.db4o.com/browse/COR-1133">issue tracker and vote for the issue to be fixed</a>! Thanks!</p>
<p>I&#8217;ll be posting a bit more on db4o over the next few days I hope.</p>
<div style="float: right;"><p align="right"><a rel="nofollow" class="tt" href="http://twitter.com/home/?status=db4o+Queries+on+Large+Datasets+and+a+bit+of+Linq+http://bit.ly/91M9fW" title="Post to Twitter"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a rel="nofollow" class="tt" href="http://delicious.com/post?url=http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/&amp;title=db4o+Queries+on+Large+Datasets+and+a+bit+of+Linq" title="Post to Delicious"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a rel="nofollow" class="tt" href="http://digg.com/submit?url=http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/&amp;title=db4o+Queries+on+Large+Datasets+and+a+bit+of+Linq" title="Post to Digg"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a rel="nofollow" class="tt" href="http://www.facebook.com/share.php?u=http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/&amp;t=db4o+Queries+on+Large+Datasets+and+a+bit+of+Linq" title="Post to Facebook"><img class="nothumb" src="http://www.emphess.net/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a></p></div>]]></content:encoded>
			<wfw:commentRss>http://www.emphess.net/2010/03/16/db4o-queries-on-large-datasets-and-a-bit-of-linq/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
