Why is Tridion's configuration library called the TDSXGit?

Posted by Dominic Cronin at Mar 09, 2008 07:05 PM | Permalink

Filed under: SDL Tridion, Tridion

Those of you who read my previous post will remember that I accessed Tridion's configuration by instantiating a TDSXGit.Configuration object. People who've worked with Tridion for a while may remember that it used to be quite common to edit the configuration in a file called cm_cnfg_git.xml. This file is still there, but without the xml extension, and these days it's encrypted so it doesn't make much sense to try to edit it directly.

To an Englishman like myself, this name TDSXGit is vaguely funny, because "git" in British English is a mild term of abuse. It's not uncommon for me to come out with phrases like: "Which stupid git broke the build"? It's definitely abuse, but fairly mild; you can say it to someone that you like.

But to the point: Back in the R4 days, Tridion's configuration data was kept in the registry, which was all well and good, but had it's own problems. When R5 was designed, there was so much XML around the place that it seemed much more sensible to keep the configuration in an XML file. The problem with this was that all that disk IO would have been a total performance killer. We needed a memory cache. Good idea, you might think, but in a COM-based web application, how do you do that? The design we ended up with makes use of a couple of fairly obscure features of COM. (By the way - I'm not claiming any credit for this, just describing what was designed by other members of the team.)

The idea is to get an object to remain in memory, and to provide a mechanism whereby any code within the application can grab a reference to the object. In COM, a reference to an object is always a pointer to an interface. Memory access in COM is controlled by "apartments" - objects running in one apartment can't directly access objects running in another apartment. In particular, if you have an interface pointer for an object in one apartment, you can't just use that pointer from a different apartment. The interface pointer needs to be "marshalled" across the apartment boundary; in other words, if you should be talking to a proxy that's local to your apartment, you'll get a pointer to that instead. The mechanism for doing this is called the global interface table, hence the acronym GIT.

The GIT is visible from anywhere in the process, and if you register an interface with the GIT, that immediately takes place of the first problem, that of keeping the object in memory. In COM, memory management is done by reference counting. An object keeps track of how many other objects currently have a reference to it, and if that number drops to zero, the object will self-destruct, thereby freeing any memory it was using. As soon as you register an object with the GIT, well the GIT has a reference to it, and therefore it isn't going to self destruct, so you have your memory cache.

When you register an object with the GIT, the API hands you back a "cookie". A cookie in this context is just a number. If you know this number, you can ask the GIT for an interface pointer that references the object. You can keep doing this as many times as you like, unless there's been an explicit call to release the object from the GIT. The interface pointer you get back will work in the apartment you are in.

There's one more thing that you need to make this all work, and that's a way of making sure the cookie is always available when you need to get hold of your memory cache. For this, you can use another obscure COM feature: the shared properties manager (SPM). This just allows you to save a value by name and retrieve it. (The SPM also takes care of a couple of other things, like grouping the properties to prevent name collisions, and locking to control access contentions.)

So when a Tridion process first accesses the configuration, the configuration file will be decrypted and loaded into a DOM object, and the DOM will be registered with the GIT. The cookie is then stored in the SPM. Any subsequent accesses for the life of the process will be simply a matter of grabbing the cookie from the SPM and using it to get the interface pointer from the GIT.

There are other techniques that could be used, but this has the advantage not only of eliminating disk IO, but also repeated parsing of the XML to create the DOM.

This should explain why when you update some configuration value, you have to shut down each of the processes that make use of the configuration. The GIT and SPM are each specific to a process. It is technically possible to get TDSXGit.Configuration to release the DOM from the GIT, but none of Tridion's application code actually does this. That's a reasonable design for a server application that isn't re-configured very often.

In theory (at least according to the theory I just described), it should only be necessary to restart a process if it is affected by the configuration value that just changed, but my own experience flies in the face of this, and I always restart all the processes. It's a bit of a cargo cult thing I suppose, but I'll keep doing it. Actually, I'd love someone to point out where my reasoning is flawed. I hate that cargo cult thing.

"Which stupid git forgot to restart the processes after that configuration change?"

Dominic Cronin's web site

Why is Tridion's configuration library called the TDSXGit?