Goodbye

by Christoph Menge in Software

It’s time for a change! Please find my new blog at http://www.waistcode.net
Would love to see you there! –Chris

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Suggest a Page to all facebook Friends

by Christoph Menge in Software

Bulk operations on facebook aren’t too easy to accomplish and typically involve some javascript hacking. For example, there’s a number of scripts on the net that show you how to invite all your friends to an event.

However, when I want to share a page, I’d rather not come up with some kind of wannabe-event. I simply want to suggest the page to my friends.

It took me some time to figure out how to do that. The basic idea is the same – inject a bit of javascript that invokes a click() on all items, preferably using Chrome’s console. The problem is that the actual list of friends is in an IFRAME, and Same Origin Policy will prevent you from gaining access to the contents of the IFRAME.

The Solution

  1. If you haven’t installed it yet, install Google Chrome. It’s definitely the best browser for this kind of productivity boosting.
  2. Create a new shortcut to Chrome with an added command line option --disable-web-security. This will deliberately disable a security feature called “Same Origin Policy“. On my machine, the link looks like
    C:\Users\UserName\AppData\Local\Google\Chrome\Application\chrome.exe --disable-web-security
  3. Warning: This is dangerous. Do not use this shortcut for regular browsing , but only to toggle all selected friends.
  4. Open the security-disabled Chrome and navigate to the facebook page you want to share and click “Suggest to Friends”
  5. Right-Click the page somewhere and select ‘inspect element’ to open developer tools. Choose Console
  6. Now, enter the following line:
    javascript:v=document.getElementById("social_graph_invite_iframe");var friends = v.contentWindow.document.getElementById("all_friends").childNodes;for(i=0;i<friends.length;i++){friends[i].childNodes[0].onclick()};
    
  7. Hit Enter in the console window to select/deselect all friends. This is nothing but a ‘toggle all’ feature the hard way.

A Detailed Walkthrough

Create a new shortcut

First, find your existing Chrome link, right-click it and select “Properties”:
A screenshot depicting the properties dialog of the google chrome browser's default shortcut with the 'target' highlighted

Select the text in “Target” and copy it (Ctrl + C).

Next, right-click the desktop and choose “New -> Shortcut”:
A screenshot depicting the desktop's context menu with new.shortcut highlighted

In the dialog, paste (Ctrl + V) what you just copied and add --disable-web-security. Mind the space character in between!

Click “Next” and enter a name for the new link, for example “Chrome (Insecure)” which reminds you NOT to use this for day-to-day browsing:

Click “Finish” and you should end up with a new shortcut on your desktop:

Applying the Javascript

Screenshots for the facebook / chrome part are coming, but that takes a little longer to ensure privacy…

That’s about it for now. Now go and share this link with all your friends :-)

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Tags: ,

LESS Grid CSS for Fluid Width Grids

by Christoph Menge in .NET

I’m a huge fan of LESS. CSS is nice, but using LESS makes your files so much cleaner. Now I had to cope with some grid systems today, and I figured that many online grid generators are either broken, down or buggy. Perhaps it’s just not my lucky day. Anyway, the cool thing is that those grid CSS files are somewhat simple – just some basic math.

LESS CSS?! What?

Using LESS mixins, we can replace cumbersome CSS with a few simple statements. A mixin basically copies all properties over. For example, the following LESS will produce really huge, red text if applied:

.huge{
    font-size: 80px;
}

.hugeWarning{
    .huge;
    color: red;
}

OK, nice, but the CSS equivalent wouldn’t be very different. But you can also parametrize mixins, the classical example (taken directly from the LESS website) being rounded corners:

.rounded_corners (@radius: 5px) {
  -moz-border-radius: @radius;
  -webkit-border-radius: @radius;
  border-radius: @radius;
}

#header {
  .rounded_corners;
}

#footer {
  .rounded_corners(10px);
}

LESS in ASP .NET

If you want to use LESS from your shiny new ASP.NET MVC3 application, I suggest you grab a copy of Justin Etheredge’s brilliant zero-friction SquishIt library which compiles LESS to CSS, combines the files and minimizes them. Oh, yes, and of course also handles JavaScript combining and minification!

Now, since LESS also supports some mathematical operations, we can do all the nasty grid-calculation using LESS. For a fluid grid, this becomes something like this:

/* Fluid grid based on 960.gs syntax
This grid contains ONLY the fluid-container_12 class and it's associated grid, pull, push, suffix and prefix classes. Note that this CSS does not support nesting of grids.
*/

@columnCount: 12; // When you increase this, you also need to add some more classes below
@halfGutter: 1%; // half of the gutter width

.fluid-container_12 {
	margin-left: 0px;
	margin-right: 0px;
	width: 100%;
}

.grid (@n: 1)
{
	width: @n * 100% / @columnCount - 2.0 * @halfGutter;
	margin-left: @halfGutter;
	margin-right: @halfGutter;
}

.fluid-container_12 .grid_1{ .grid(1); }
.fluid-container_12 .grid_2{ .grid(2); }
.fluid-container_12 .grid_3{ .grid(3); }
.fluid-container_12 .grid_4{ .grid(4); }
.fluid-container_12 .grid_5{ .grid(5); }
.fluid-container_12 .grid_6{ .grid(6); }
.fluid-container_12 .grid_7{ .grid(7); }
.fluid-container_12 .grid_8{ .grid(8); }
.fluid-container_12 .grid_9{ .grid(9); }
.fluid-container_12 .grid_10{ .grid(10); }
.fluid-container_12 .grid_11{ .grid(11); }
.fluid-container_12 .grid_12{ .grid(12); }

.prefix(@n:1)
{
	padding-left: @n * 100% / @columnCount;
}

.fluid-container_12 .prefix_1 { .prefix(1); }
.fluid-container_12 .prefix_2 { .prefix(2); }
.fluid-container_12 .prefix_3 { .prefix(3); }
.fluid-container_12 .prefix_4 { .prefix(4); }
.fluid-container_12 .prefix_5 { .prefix(5); }
.fluid-container_12 .prefix_6 { .prefix(6); }
.fluid-container_12 .prefix_7 { .prefix(7); }
.fluid-container_12 .prefix_8 { .prefix(8); }
.fluid-container_12 .prefix_9 { .prefix(9); }
.fluid-container_12 .prefix_10 { .prefix(10); }
.fluid-container_12 .prefix_11 { .prefix(11); }
.fluid-container_12 .prefix_12 { .prefix(12); }

.suffix(@n:1)
{
	padding-right: @n * 100% / @columnCount;
}

.fluid-container_12 .suffix_1 { .suffix(1); }
.fluid-container_12 .suffix_2 { .suffix(2); }
.fluid-container_12 .suffix_3 { .suffix(3); }
.fluid-container_12 .suffix_4 { .suffix(4); }
.fluid-container_12 .suffix_5 { .suffix(5); }
.fluid-container_12 .suffix_6 { .suffix(6); }
.fluid-container_12 .suffix_7 { .suffix(7); }
.fluid-container_12 .suffix_8 { .suffix(8); }
.fluid-container_12 .suffix_9 { .suffix(9); }
.fluid-container_12 .suffix_10 { .suffix(10); }
.fluid-container_12 .suffix_11 { .suffix(11); }
.fluid-container_12 .suffix_12 { .suffix(12); }

.push(@n:1)
{
	left: @n * 100% / @columnCount;
}

.fluid-container_12 .push_1 { .push(1); }
.fluid-container_12 .push_2 { .push(2); }
.fluid-container_12 .push_3 { .push(3); }
.fluid-container_12 .push_4 { .push(4); }
.fluid-container_12 .push_5 { .push(5); }
.fluid-container_12 .push_6 { .push(6); }
.fluid-container_12 .push_7 { .push(7); }
.fluid-container_12 .push_8 { .push(8); }
.fluid-container_12 .push_9 { .push(9); }
.fluid-container_12 .push_10 { .push(10); }
.fluid-container_12 .push_11 { .push(11); }
.fluid-container_12 .push_12 { .push(12); }

.pull(@n:1)
{
	right: @n * 100% / @columnCount;
}

.fluid-container_12 .pull_1 { .pull(1); }
.fluid-container_12 .pull_2 { .pull(2); }
.fluid-container_12 .pull_3 { .pull(3); }
.fluid-container_12 .pull_4 { .pull(4); }
.fluid-container_12 .pull_5 { .pull(5); }
.fluid-container_12 .pull_6 { .pull(6); }
.fluid-container_12 .pull_7 { .pull(7); }
.fluid-container_12 .pull_8 { .pull(8); }
.fluid-container_12 .pull_9 { .pull(9); }
.fluid-container_12 .pull_10 { .pull(10); }
.fluid-container_12 .pull_11 { .pull(11); }
.fluid-container_12 .pull_12 { .pull(12); }

Nifty, huh?

Post to Twitter Post to Delicious Post to Digg Post to Facebook

ASP.NET MVC3 is a blast! If you haven’t tried it, you definitely should – it’s really fun to work with and brings so many cool new features! Now that the MVC3 Release Candidate is out, there’s practically nothing missing.

Most importantly however, MVC3 introduces the Razor View Engine and separates MVC and WebPages, so the web rendering is no longer hard-wired into the ASP.NET subsystem.

Apart from its much cleaner syntax, Razor is also more flexible and it’s open source. Matthew Abbot and Ben of BuildStarted.com have written some interesting articles on how to use Razor without MVC and even without ASP.NET at all.

PDFs vs. HTML

Now, what does all that have to do with PDFs? Writing to PDFs itself isn’t necessarily too hard – you can easily draw some text here and there or draw lines, for example using the excellent iTextSharp library. But the real deal is typesetting!

Unfortunately, HTML and paper don’t make a very good fit. Breaking HTML tables across pages, for example, is an insufferable pain. HTML wasn’t made with paper in mind – a good thing for HTML, and a good reason not to abuse it for printing PDFs.

Introducing RazorTex

Now, Razor makes only a few assumptions about the type of data it renders – it doesn’t really have to be HTML… With RazorTex, you can create LaTeX-Views on the fly instead! A word of warning though: RazorTex is extremely young and unstable – more sample code than anything else – so be prepared! There’s a ton of features missing, but it works. If you’re ready for a rather rough ride, go ahead!

So how does it work, behind the scenes? RazorTex will use the Razor engine to compile code into a LaTeX file. A sample section from a .cstex file might look like this:

\frac{@Model.Factor.ToString("F2")}{@Model.Denominator.ToString("F2")} &= @Model.Result.ToString("F2") \\
\int_{@Model.LowerBound}^{@Model.UpperBound} @Model.Expression @Model.IntegrationVariable &= @Model.IntegrationResult \\

The '&=' and '\\' are special LaTeX-Commands, an alignment character and a line break, respectively.

If you don’t know LaTeX, the syntax can be really terrifying at first. However, there is no doubt LaTeX is extremely powerful when it comes to typesetting complicated document layouts. For example, it supports high-quality text rendering with microtypography, breaking tables across pages, typesetting mathematical formulas and rendering vector images, to name only a few. There are many additional packages you can use for practically any type of typesetting problem.

Running the Sample

Unfortunately, there are some smaller obstacles when running the sample. Also, you’ll have to install a bunch of things:

  1. Download and install ASP.NET MVC3 Release Candidate from Microsoft. This includes the Razor view engine
  2. Download and install MiKTeX 2.8
  3. Download and install WinShell for LaTeX, an IDE for LaTeX [recommended]
  4. Before running the sample, open the two included sample .TeX files and open them with WinShell. Run the PDF Compiler. This will install the required packages. If you don’t install the packages, the sample might dead-lock!
  5. Adjust the settings in app.config according to your needs and create the temporary and output folders.
  6. Finally, the application should run and create great pdfs from a console application…

The Architecture

Let’s take a quick look at the architecture of the Razor pipeline. Most importantly, Razor views are compiled – a huge advantage when you need to render a lot files with similar structure. However, we need to care for the compilation ourselves. Fortunately, Andrew Nurse has already accomplished that for us. For details, refer to his article. In short, we read the .cstex file, create a RazorParser and throw in our template string. This will create C# code which in turn can be compiled to binary using CodeDom.GenerateCodeFromCompileUnit. Of course, there are some subtleties involved such as assigning the correct base class, importing some namespaces, and so forth.

Once they are compiled, executing the latex view is blazing fast, but compilation takes some time. Therefore, it’s a good idea to cache these. Right now, there is some caching in RazorFlux, but it’s probably a good idea to compile all the latex-views in one go (and into a single assembly) – at least, that’s the way ASP.NET does it.

Additional Features, LatexHelper

Two particularly helpful features in HTML are the HtmlHelper and UrlHelper extensions. Of course, they don’t make a lot of sense in LaTeX, but there are some LaTeX-features where a custom helper would come in handy – a LatexHelper!?

Indeed, we can inject our own helpers through the base class:

Type baseType = (modelType == null)
            ? typeof(LatexTemplate)
            : typeof(LatexTemplate<>).MakeGenericType(modelType);

generator.GeneratedClass.BaseTypes.Add(baseType);

public abstract class LatexTemplate : ILatexTemplate
{
    public LatexHelper Latex { get; set; }
    public bool SuppressEmptyLines { get; set; }
}

So far, the LatexHelper is used primarily for “display templates”. Also, the LatexHelper certainly contains the worst hacks in the code. More about that later.

Drawbacks

Some disadvantages shouldn’t go unmentioned:

  • While Razor doesn’t seem to make too many assumptions, it does assume that line breaks don’t play a role – true for HTML but damn wrong for LaTeX. I have had some trouble with that, but nothing that can’t be solved in the .cstex file. Still looking for a better solution though.
  • Pdflatex isn’t too fast – there might be faster solutions for creating PDFs, but the quality of pdflatex is very high.
  • It seems impossible to get pdflatex to work with streams instead of files, so you’ll have to write temporary files – no big deal, but a little painful. If anybody knows how to it, I’d be glad to know!

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Tags: , , ,

Rob Conery recently wrote about using MongoDB with Linq. I was really intrigued by the fact that you can use elegant, readable, type-safe Linq-queries to access MongoDB document database. To be honest, I had no clue what an object database really is, but when it speaks Linq it must be cool, I thought.

So I dug a bit deeper into MongoDB and NoRM, which is a nifty C# driver for MongoDB developed by Andrew Theken and several others. You might want to grab a copy at github, where you can see how incredibly active the project is! Now, back to the evaluation: what is the best way to evaluate a database? Of course, build a live product using it!

Since I was completely abusing db4o for said project (in fact, I am storing something you’d call documents there), I decided that this would be a great candidate for a migration. So now we’re migrating from an object database to a document database and from an ACID database to a NoSQL solution.

MongoDB is considered a NoSQL solution, while db4o is considered ‘soft’ NoSQL – see Stefan’s comment at the bottom. Why the distinction – both do not rely on, support or use SQL whatsoever?! But then again, that is not what the Term »NoSQL« is all about. It’s probably one of the most misleading terms ever coined and perhaps it should read »Not ACID« or just »Persistence without Prejudgement, PwoP«. db4o makes ACID guarantees, comes from an embedded background and offers single-server durability while MongoDB is made for the net, does not have single-server durability, supports MapReduce and is driven by JavaScript.

Hell, they couldn’t be more different. But then again, I can access both using identical interfaces:

// Linq is understood by both, so you could use the lines in both:
var u = (from Note n in container
         where n.Text == null select n).ToList();
var v = session.Query<Tag>().Where(p => p.Name != null);
// Store an object graph to Mongo using NoRM:
session.Add(testNote);
// Store an object graph to db4o:
container.Store(testNote);

Don’t be fooled – although these lines could all be used with MongoDB or db4o and all of them could even be used with the very same classes, they are still fundamentally different in behavior. Also, for anything but the most simple problems, you can’t just persist the same domain models.

What is a document now?

A document is not just an unstructured piece of data. It’s not a BLOB. Instead, an instance of a class, plus all it refers to, could be a document:

class UniqueIdObject
{
    public Guid Id { get; private set; }
    public UniqueIdObject() { Id = Guid.NewGuid(); }
}

class Report : UniqueIdObject
{
    public Report() : base() { Tags = new List<Tag>(); }
    public string Text { get; set; }
    public List<Tag> Tags { get; set; }
}

class Tag : UniqueIdObject
{
    public string Name { get; set; }
}

Now we might want to store a note in the database, and the code needed is just

using (NoRMSession session = new NoRMSession())
{
    Report newReport = new Report() { Text = "Hello World, MongoDB!" };
    newReport.Tags.Add(new Tag() { Name = "Tag 1" });
    newReport.Tags.Add(new Tag() { Name = "Tag 2" });
    session.Add(newReport);
}

That’s it! Wow! Of course, we haven’t taken care of indexation and stuff and since MongoDB is schemaless (or better, has a dynamic schema) we need to do that in code. But still, this is essentially all you need.

The important thing now is: What happens to those little `Tags` we deliberately put into a separate class? Now the mapper hides a bit of truth from us, because MongoDB works on so-called “Collections”. session.Add(newReport) will be session.Add<Report>(newReport), which will in turn put the newReport object into the Report-Collection! So the object graph, as it is, will be serialized into the Report collection, including our little Tag objects!

Each item with an orange border is an ‘atomic’ item in its respective data store:

db4o serialized graph

mongo serialized document


Let’s naively try to fetch all tags:

var v = session.Query<Tag>().Where(p => p.Name != null);

This does not work, v is null because there is no tag collection! Instead, the tag lists are part of the reports we put into the Report collection. Note that this would work in db4o, because db4o will store references as references, while ‘documents’ store the contained data instead of references. This is beautifully simple, but it’s also very different from what you might expect and it has lots of implications for your object structure.

Thoroughly think through your schemaless schema

MongoDB is made for scalability and simplicity, so it does not care for our foolish approach to fetch tags directly. There are ways to access that data directly, however. We could use a deep-graph query or write a javascript Map/Reduce instruction, but that is a bit out of scope right now. What’s more important is that it calls for changes to our domain model objects: If we really want to store a reference, we need to do so manually, in ye olde sql-way, by storing the associated Id instead of the object. Of course, that makes deserialization a bit more complicated because the object we now retrieve from the database aren’t ready-to-use as they come.

However, automating that process will induce several complications, among them the need to handle cyclic references, a concept for fetching or activating objects on-the-fly, called Transparent Activation in the db4o world and making sure we’re not inducing a massive performance hit along the way.

Also, updating objects can be painful. Suppose we stored a list of Reports for each user. Now we might want to put the list of reports directly into the user object, or store a list of ReportIds for each users and put the Reports into their very own ReportCollection. As usual, there is no silver bullet, so this decision depends on the specific needs of the application, but whatever decision we take will not be visible to users of the resulting objects. In fact, it leads to some unwanted strong coupling:

class ReportService
{
  private NoRMSession _session;

  // ...
  public void UpdateReportDetail()
  {
    this.ReportDetailXY = ComplicatedCalculation();
    this.LastChanged = DateTime.UtcNow;

    // If the reports are in their very own Report Collection, this
    // is fine. However, if they are contained as lists in the user
    // who owns them, we're in trouble and this will fail!
    _session.Update(this);
  }
}

This might not be a big problem for really small applications, and if you really need performance (why else would you choose a NoSQL system?) you have to fine-tune your objects anyways. Still, I have the feeling that there is some space for improvement here, and a basic wrapper could help in overcoming some of the issues raised. Some basic ideas on how to approach this will follow shortly.

Wrapping it up

Obviously, I’m comparing Apples and Oranges here: db4o is made to make persistence easier, especially with complex domain model objects. db4o, being an object database, behaves exactly the way you’d expect objects under serialization to behave, but that makes it quite complex. MongoDB is focused on simplicity and scalability. Through the document concept, you gain simplicity on the database, administration and wrapper (driver)-side, but you have to struggle with a slight impedance mismatch in code, especially against Linq, again.

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Tags: , , ,

Mark Zuckerberg’s short keynote at f8 (it’s being repeated there, just head over!) was quite interesting. In general, I’m not really surprised because the vision of facebook has always been to give an identity to people and to minimize the overhead that is needed to get in touch. The announcements that have been made today are basically just the logical next steps. The implications on both the product development and on the technical side are quite far-reaching, however, because there will indeed be a massive reduction in friction and the overhead for developing facebook apps is just minimal. Most importantly: the semantic web is coming. Now.

Credits

Facebook offers a kind of generic currency – facebook credits. This is still in closed beta, but according to Mark Zuckerberg, you can already get in touch with them to get signed up. Again, this makes developing a payment-enabled web application much easier. And it probably will make a huge lot of money for facebook.

Open Graph API

In short, there’s a new API called »Open Graph«. Apparently, this API is much more simple that the old facebook API which is why I believe this will be a very powerful system. It will also save me from doing a long research which of the .NET facebook API wrappers is best – they are all obsolete as of now. Facebook promises that you won’t need to change your API calls in the future though. In the past, there have been quite a number of revisions of the API, often with breaking changes. The new API promises to be a lot more simple, but I have quite a set of questions in the back of my head as of now.

This simple API is also the web-wide return of the IFRAME. And here is my like button, with the evil color scheme (which looks just the like the normal?):

Note that this won’t show anything useful if you don’t have a facebook cookie set, which is quite annoying I believe.

If you want to have the your own web-wide “Like” button, here you go:
Continue Reading

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Tags: , ,

Google Pimps its Webmaster Tools

by Christoph Menge in Entrepreneurship, Software

Google’s Webmaster tools underwent some changes over the last few weeks, but most of them had been minor. Now, they released a completely overhauled “Top Search Queries” section and it’s really cool. Most importantly, Webmaster tools will no longer indicate your page is on position x for a given keyword, but it will show how often it appeared on what page (e.g. how often your site appeared on page three for the keyword “buzzword” and how many people clicked it). The old way of doing it was obviously incompatible with customized search results and therefore was often indicating very confusing positions. It once told me I was on pos #1 for the Keyword “Linq”

A screenshot of webmaster tools in action

They also added fancy charts (although I suspect these don’t work accurately yet, because of that peak you see in the screenshot) that look like Google Analytics a lot. The downside is that clicking on google search results will link to another google page which will then redirect you to the target – not exactly big trouble, but annoying when you’re used to copying links into your blog posts ;-)

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Tags: , , ,

My db4o Wishlist

by Christoph Menge in .NET, db4o, Software

After finding that db4o did not screw up in our projects, I dug a bit through their issue tracker, which is a very important resource you should definitely check out if you’re working with db4o!

Just to get that straight: I’m an avid db4o user and really love it. These issues are not critical and they don’t stop me from using or evangelizing db4o. However, I think there is some lack of awareness of some issues.

Also, I’d like to spawn some discussion about the issues below. Unfortunately, due to their changes to the forum system, most of the original discussions on the db4o forum are hard to find or possibly lost. You may want to vote on the issues you deem most pressing, which you can easily do in their issue tracker! I’m very interested in what you think about this little selection.

Selected Issues

» Don’t run SODA when no more constraints are present
I blogged about this already, because you experience this in very common scenarios, namely whenever you query a small subset of a larger candidate set. For example, consider selecting the last 50 posts on a blog/qa-site/etc. What will happen is that db4o runs the BTREE query for the sort operation (blazing), then hydrates (?) all objects, the returns the first 50 of them and throws away the rest. Thing is that there is no need to further inspect the items, and activating all them is basically a linear operation. Thus, this common type of query currently runs in O(n) instead of O(log n) which is an incredibly dramatic difference.

Continue Reading

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Tags: , , ,

The wikorum-engine StackExchange just announced they completely changed their business model. Basically, instead of buying a license from them, you now need to suggest a concept for a site. That concept can then be voted for by the community and if you can prove you have enough people interested, they will set up the site. The whole site is then community-owned and will be operated by StackOverflow. The details are much more complicated, of course. I still consider StackExchange a commercial technology. They don’t charge you for the technology, but they don’t let you participate in any effort you put in there, either.

That Domain-Issue

There is something peculiar about this I really don’t understand: If the domain is actually owned by the individual who suggested the concept, the FogCreek will never have a reliable revenue stream because the owner of the domain is the master — he could simply switch to a different technology, if he manages to keep the links intact, which might be a very important technology-factor when choosing some alternate technology.

On the other hand, if the domain is supposed to be owned by FogCreek, they have to buy every domain that is being suggested in the forum that exact second, otherwise domain grabbers will just … well, grab it (or even other users signed in). Democratic voting for domain names? I don’t think so. Personally, I would never even write down a domain name -especially not in a forum- if I like it and it’s not taken, I’d simply buy it – that gives me a year to think about it.

Not Convincing

Summing it up, I don’t think the move to their new model is a wise decision. My primary concerns are:

  • The net is a trial-and-error place. You set something up, and see if it works. That might not be a wise model, but it is very democratic and has proven to work fine. Perhaps because it is not so ‘wise’. Note that wisdom grows from experience, inherently incapacitating certain types of innovation, namely those that completely contradict experience made.
  • The code remains closed, so real customizations are not possible. This is a major flaw. I’ve got a dozen ideas of sites that could become really, really helpful to people, but they need some additional features or some heavy customization that is a major pain if you can’t access the sources. They don’t even have an XSLT processor you could use as of now…
  • If a page does not offer any way to gain revenue, you can’t invest in it’s development, both technical development and marketing, where the first is already inhibited by the fact that customization is not possible from the technical side.
  • This move will dramatically increase commitment to the development of open source clones such as OSQA, which looks very interesting by the way. There is also Shapado, and I think more will come. Crowd development can be extremely efficient and fast.

For all who operate a StackExchange site as of now, there is the cool thing that the site will operate for another three months, or if you have some real action going on, one year for free. That is great, because it allows for an easy transition to some other technology. It’s a bit sad to see that happen to such great technology, but I don’t believe this will work. Building a community remains a lot of effort, and people won’t put that effort in if they can’t call it their baby, can’t make any revenues, can’t make any customizations to the code, and can’t use the page to direct some traffic to other projects, I believe. We’ll see how it works out. I guess I will establish a small knowledge-base / link list at wikorum.net to aggregate some related information.

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Tags: , ,

We’ve been working a lot on db4o related and db4o based projects lately, and close to completion of the first and most simple product, we really hit a few roadblocks.
UPDATE: Just after releasing this article, I found the bug in our code. It’s not db4o’s fault after all…

Motivation

One thing up front: We don’t need an object database for the simple tools we currently build, but we felt it was a good idea to get some acquaintance with the technology, because we will certainly need it for our (stealth) startup “pactas” soon. In pactas, the data structure is really complicated (with a fancy class hierarchy) and it probably will change very frequently.

Also, since I am such a big fan of reusability, we really develop a web engine – a framework that allows us to reuse a lot of the code for different projects and make sure most of our code has been tested thoroughly in the field. This proves to be a significant design decision when it comes to data modeling, and while I’m very happy with the decision, it certainly makes development harder.

The Problem

So, in time with the release of our simple backlink tracker, one of our development database files started to show strange behaviour – a certain query (via LINQ) would not order objects anymore – the orderby-clause seemingly was completely ignored. Our live server even came up with an “Invalid DateTimeKind specified” exception when trying to perform the query! What’s worse: The problem kept occurring from time to time, but it was not reproducible! Byzantine errors are clearly my favourite…

We thought the issue might be related to the current development/unstable versions of db4o that we were using (7.12 and 7.13). Using the stable version of db4o (7.4) proved difficult, because the old LINQ provider falls back to LINQ to Objects very often, which requires to fetch all objects in question from the database – that is very, very slow for a lot of objects, so we had to abandon that. We clearly wanted to stick to LINQ for a number of reasons (compile-time checking, readability, reusability).

Obviously the problem is related to the orderby operation on DateTime fields. I tried to modify the query, removed grouping because I feared it might be unstable (warning: the sort operation is, in fact, unstable! Unstable grouping would be useless, but the grouping is stable so that’s fine), even debugged the db4o code, but I couldn’t find any problem in there. Since the code is rather complex and that was the first time I took a look at it, I was happy to find some of the relevant code at all. Somewhere in the deeps of it, something screwed up. I didn’t want to spend too much time on that since I had to chew some particle physics on the side.

Solutions?

At the time of writing this (in fact, yesterday) I had something like a hotfix, but it turned out it’s completely nonsense – it worked around our internal bug in a very peculiar way. No more, no less.

When talking about this, somebody mentioned that it wasn’t such a good idea to use DateTime at all, and we should store the ticks instead. That lead to two long discussions with my co-ed Christian. We concluded: First, the power of object databases is that they do not force you to hack around and find different representations for your data (which raises the bar for object databases). The one big shortcoming of SQL is that it forces you to find a second, equally good, but different representation of your data and you need to translate between these representations all the time. You need to synchronize them. And you lose a lot of fancy features (such as lists, generics, inheritance, etc.) on the way.

Secondly, Christian suggested that object databases suffer from leaky abstractions badly, in that they break encapsulation in a way that leaks out a hell of a lot of implementation details. As Joel puts it: “All non-trivial abstractions, to some degree, are leaky.” The point is: With the error we’re currently encountering, it’s becoming a real problem. This is not an imperfect piece of architecture, it’s an exception in a database query. It kills the app dead! Boom!

In order to get activation [depth] straight, db4o needs to know how .NET’s containers work internally – that is clearly a detail that should be hidden, but db4o knows about it. db4o also takes care of that, but it leads to some messy issues. There is special code in db4o that handles containers, non-trivial objects such as strings, DateTime (which are non-trivial because they use 62 bits for the actual ticks and 2 bits for the DateTimeKind) and Lists. This is an implementation detail of the .NET framework, and it might change over time. There’s been a bug with Map, and I’m almost sure there is a bug with DateTime, too. Don’t get me wrong: The fact that some kind of mapping is needed is a somewhat generic (if not the) problem of serialization, it’s not really db4o-specific, and cannot be eliminated. In Hibernate, there is also a lot of code that handles the mapping of lists and the like but it doesn’t rely on implementation detail, thus it’s not (as) leaky.

Back to SQL?

Here’s the thing: I’d be beneficial to have a storage that is independent of the actual implementation on top, because it decouples the data store from the application which is not what db4o does. But wait, that is exactly what SQL is, right? Indeed: SQL forces you to map (or to cut down) your stuff to it’s internal features. A list becomes a foreign key on the other table, but that doesn’t play well with derived types, generics, etc… This is tricky as we all know, and you need lots of code to do that.

Worse, SQL forces you do that mapping for everything, including your own classes, and it demands a schema for every type of object – this is not what I want. I’d like to see a set of base objects in an object database which are natively understood by the DB. These objects can be mapped from and to by a layer which may be part of the db, or can be added manually if you need something very special. Still, it should allow to store your object in the database with all it’s magic, only that known objects will be translated, e.g. a DateTime will be stored as long Ticks and a DateTimeKind flag separated, making comparison operations easier (note that the comparison only works on the ticks: Whether the time is local or UTC is not considered by .NET in comparisons). Lists will be unfolded into an internal tree representation if indexed, so they become much easier to cope with for the database. That would also make managing m:n relations easier, since they could then be viewed as bilateral relationships – something you often need. Right now, they’re unilateral, thereby requiring some additional management on your behalf.

Migrating to SQL is clearly an option for this product, since data couldn’t fit SQL any better, but that doesn’t solve the issue for our pactas project, where we certainly will need schema-less data, complex object hierarchies, etc.

However, there are a few issues that remain unresolved as of now and they do qualify as show stoppers :

Alternatives

I’m quite dissatisfied that we have to abandon db4o at this stage, because I believe it’s the best kind of serialization I’ve ever experienced. It’s perfectly simple, it’s fast and most importantly, code-centric. If reusability is a concern, having the database structure and/or the ORM mapper dictate the objects is a major pain.

Entity Framework 4 (EF4) promises to handle this a lot better through “Model First” and “Code Only”, but I am still a bit afraid of EF because it doesn’t appear to be anything near lightweight and we will need schemaless storage for our future products anyways.

Versant, which bought db4o some time ago also offers its large-scale object database, which seems to suit large web-applications a lot better. First, it’s certainly made for huge amounts of data (unlike db4o, which comes from an embedded background and is aimed at database sizes in the low GB area, but supports up to 254 GB per file) and handles multi-threading better. Also, since db4o now moved to v3 of the GPL, it may not be freely usable in non-open source web applications anymore, so both solutions now have a price tag.

There is also an ISV/OEM empowerment program for Versant’s Object Database, which seems to make it affordable, but I haven’t looked at it in detail yet. Over the next weeks, I will have to evaluate a few of those other NoSQL solutions such as Cassandra and MongoDB, just to name two totally different options.

Aftermath

So what was the issue, after all? db4o did not screw up, we did: Take a blend of local and UTC time based on a completely random criterion, add two spoons of daylight savings time changes, add some misconfigured timezone on the server, bake for two weeks at 500 ° C and pling! You got yourself some really strange issue cake. Lessons learned:

  • Debugging issue with object databases is harder than with RDBMS because the information is not chopped up.
  • Do code reviews. Do code reviews.
  • There aren’t too many alternatives to db4o, after all
  • With the elaborated architecture and code-centric design we currently have, going back to SQL is a huge pain, and we won’t do it

All serialization is painful.

Post to Twitter Post to Delicious Post to Digg Post to Facebook

Tags: , , , , ,