My db4o Wishlist

by Christoph Menge in .NET, db4o, Software

After finding that db4o did not screw up in our projects, I dug a bit through their issue tracker, which is a very important resource you should definitely check out if you’re working with db4o!

Just to get that straight: I’m an avid db4o user and really love it. These issues are not critical and they don’t stop me from using or evangelizing db4o. However, I think there is some lack of awareness of some issues.

Also, I’d like to spawn some discussion about the issues below. Unfortunately, due to their changes to the forum system, most of the original discussions on the db4o forum are hard to find or possibly lost. You may want to vote on the issues you deem most pressing, which you can easily do in their issue tracker! I’m very interested in what you think about this little selection.

Selected Issues

» Don’t run SODA when no more constraints are present
I blogged about this already, because you experience this in very common scenarios, namely whenever you query a small subset of a larger candidate set. For example, consider selecting the last 50 posts on a blog/qa-site/etc. What will happen is that db4o runs the BTREE query for the sort operation (blazing), then hydrates (?) all objects, the returns the first 50 of them and throws away the rest. Thing is that there is no need to further inspect the items, and activating all them is basically a linear operation. Thus, this common type of query currently runs in O(n) instead of O(log n) which is an incredibly dramatic difference.

» LINQ-Implementation is not ‘thread’-safe
A very similar issue has been on LINQ-to-SQL’s todo list some time ago.
I’m not sure whether this is so much the typical use case. For the db4o case, it teaches us two things right now:

  • Container reuse is non-trivial and should be approached with extreme care. You don’t want to run into this kind of byzantine error in a live app.
  • The object identity problem is similar in both Object-Relational Mapping and OODBMS

» Scalable server architecture: multiple readers against the same file, transactional files
This sounds daunting, and it’s probably a huge one, as you can see from its age. I also believe this might be politically challenging, because this moves into the direction of Versant’s large-scale object database. However, there is a lot of movement into that direction from the user-side it seems – people are asking for features of this kind more and more lately, largely due to the ultra-cool LINQ integration db4o has. I suppose I’d be wise to focus on this kind of scenario as it could really become the preferred way of writing web applications: it’s extremely agile, supports rapid development, is flexible in that the same (LINQ) code could be used for different persistence layers if that should ever be needed and leads to clean, compile-time checked, type-safe code.

» Sanitize reflector design – remove core dependencies on generic reflector
Being able to get rid of the generic reflector seems important, I’m already building my own code for this. Here we have conflicting requirements: The GenericReflector makes db4o very easy to use and may help beginners. It is also required in client-server scenarios where the server doesn’t have the necessary model dlls, but for most applications I think you should try to avoid it. Storing data in a generic manner is slow and requires a lot more space, making it highly inefficient.

Current attack vector: Throw a Listener on the object created event on the server and make sure the server knows the type.

» Allow immediate TCP port reuse
When opening a server, the TCP port will be blocked in case the server crashes, the app is terminated, etc. Since that might happen quite often when you use the ‘integrated server’ where the server is actually created in your web application, a restart of the web application will fail because the TCP port is blocked. In a client-server scenario, on the other hand, a simple restart wouldn’t be possible because you need to assign a new port to clients or wait 6 minutes. This should be fairly simple, but I don’t know if that comes with any side-effects.

» Fast Collections
This is a huge one. The cool thing is that this could allow much more complicated queries to be executed in reasonable timeframes. However, I’m a bit worried about the issue “FastCollections: Inside BTree List implementation”, because that sounds really important, but is in state “Won’t fix”.

» Locking
Optimistic locking would be a nice-to-have thingie, but you do this yourself rather easy I think.

» A new object is stored upon value type updates
This is rated critical, so it’s not an item for a “wishlist”. I’m not sure if I understand its implications and I rarely use value types apart from Guids, and updating Guids is pointless – still a db4o user should probably know this and keep this in mind, so I felt it should go here ☠.

db4o Configuration

This one is not really in the issue tracker as a single item, and it’s more of a general remark. One of the rather messy things in db4o is configuration. Even with the new configuration interface, there is quite a bit of confusion among users. The reason, in my eyes, is mostly a combination of incomplete documentation and unexpected behaviour. Examples:

  • The indexation setting for fields is the only configuration setting that is persistent. Everything else, including unique constraints, needs to be re-set when (or – more precisely – before) opening the ObjectContainer.
  • Applying an option to a field that doesn’t exist will not trigger a warning or an Exception
  • Some settings simply won’t have any effect when you perform them after opening the ObjectContainer, but they do not warn you.
  • Certain settings must be applied on the server, a few must be applied on the client and with some … well, you just set them on both just to make sure. Here, db4o does throw exceptions, however!
  • Several options need to be set before creating the object container (e.g. string encoding) and cannot be reset afterwards, again being completely silent about the ineffectiveness of the respective settings.
  • Some settings, such as field-based cascade-on-activate, simply don’t seem to work at all

This leads to lots and lots of confusion. Most importantly, it is often hard to determine whether a certain setting was successfully applied, or not. Also, some defaults are unexpected:

  • Default ActivationDepth is (completely random) 5. Why not 8? Or 2? This troubles beginners a lot. Either set it to infinity, or to zero. Everything else feels just random. You can still include a line ActivationDepth = 5; in beginner’s samples, thereby showing them that the setting is there and that they need to be aware of it.
  • Default string encoding seems to be UTF-16 or UCS-2, probably the most useless encodings around, despite the Windows Kernel working with it. UTF-8 would come in as a reasonable default, but with UTF-16 half of your database is probably zeros, because even in non-english environments, there is still a lot of mostly ASCII-data to be stored (such as URLs, Email addresses, base64 encoded information, SHA-hashes, etc.). Also many languages have non-ASCII characters only sparingly, German being one example.

I think it’d be really cool if the configuration interface was a little more explicit and would throw Exceptions instead of silently ignoring requests that cannot be fulfilled.

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Related posts:

  1. db4o Queries on Large Datasets and a bit of Linq
  2. A Simple Standalone Server for db4o as Windows Service
  3. NoSQL Approaches: Trying to use db4o in the Real World
  4. db4o Performance Pitfalls

Tags: , , ,

← Previous

Next →

Leave a Comment