Monthly Archives: July 2013

Disposing Resources

When you are working with managed resources in C#. you can usually let the Garbage Collector deal with tidying up after you. Memory allocation problems that are familiar in the past with languages such as C are dealt with for you.

The Garbage Collector is not magic. First off you don’t usually control when garbage collection occurs. The Framework will initiate collection whenever it thinks it’s necessary. Sometimes you’ll go a long time between collections, but as resources begin to become scarce, it will happen more often.

You can call a forced collection, but it’s rare that this isĀ  a good idea. The Garbage Collector itself will manage when collections need to occur. Forcing multiple early collections is almost always worse than letting the collection happen automatically.

The Garbage Collector cannot deal with unmanaged resources. The most common of these that you will encounter are related to database connections, and file access.

You should do two things when dealing with unmanaged resources. First off, wrap them in using statements. This ensures that the dispose method is called once you are finished with the resource.

The second thing to do is to make sure that your classes implement the IDisposable interface. This means that when the Dispose() method is called you have a chance to clean up your unmanaged resources, to prevent any memory leaks and ensure that the Garbage Collector can do it’s job.

There are a lot of further complexities to the Garbage Collection and memory management, most of the time you won’t have to worry too much about them, but it’s good to know the basics for the times you do.

Advertisements

Connections Per Hostname

There are lots of ways to improve the performance of your website, some are easier than others to implement, and some will have a greater or lesser effect.

One potentially easy win is to limit the requests made from a single hostname. Most modern browsers will create up to six requests to a single host. If you are serving all of your resources from the same domain, e.g. http://www.mydomain.com then the browser will request the first six, wait for them to load, then start requesting the next six.

If you split your resources across multiple domains then this queuing will not occur. As a simple improvement, load your static content from content.mydomain.com (or similar). This means that your dynamic pages will load, and start pulling your static content very quickly, rather than waiting for all the requests to the dynamic pages to complete.

This splitting out also leads to other possible benefits. You can put your static resources onto a CDN, so the performance of your content.mydomain.com will be drastically better than if served from your own servers. It also allows for further tweaks, this domain can be configured to be cookieless (you’d use mydomaincontent.com in this case). That would save sending cookies for static requests, which you should never need.

This is a quick improvement that should be simple to setup. The major effort is in configuring the domains correctly, and planning your system to allow for these split domains. Once you’ve done that, you’ll increase performance for everyone using your site, without having to improve the code that drives the system.

 

 

Breaking Changes

Making a breaking change is just about the most destructive thing you can do in a software system. Doing something that means everything that went before is outdated means you are going to severely limit your options for the future.

A breaking change is one where the system has changed in such a way to mean you cannot go back to the old version.

If you can’t go back to the old version, then you need to be certain that your new version will work. That may sound pretty simple, but it’s always harder in practice.

You should design your system to reduce the number of breaking changes, and to allow for the simplest methods of coping with a breaking change.

When you have the choice of a little extra work to make a smooth transition, give enough weight to the prospect of your change failing to estimate correctly. Don’t just assume that everything will work and you can always roll forwards.

If you can isolate your breaking change to a single layer then you can split your deployment to several pools. You need a way to send users to the new or old code in a reliable way, otherwise this will not work.

If you design ahead of time to take account for the breaking changes, then when you finally need to make one it won’t be as painful as it may have been.

Performance Tradeoffs

Performance is often a key concern in designing systems. Every time we consider performance we are making some kind of trade off in the wider system, and this needs to be understood or the system will fail.

Generally a performance requirement is phrased in a vague manner, indicating the system should be fast, responsive or otherwise quick. This is going to be very hard to design well for.

A good requirement will let us start making the tradeoffs we need. It will request that a key page is loaded in less than a second, or that calls to external services complete in less than 100 milliseconds. The requirement here is measurable, so we know if we have achieved it or not.

With the measurable requirement in hand, we can decide how to achieve the required performance. It might be easy, and just be met as the system is designed. We might need to cache data, or use more powerful hardware (trading cost for performance). We may find we need to use a lower level language or code module (trading maintainability for performance). We might have to use new or unproven technologies (trading risk for performance).

Once the requirement is understood we can look at the key performance tradeoffs, make the system design with these in mind and ensure that the stakeholders in the system are aware of the choices and options available to them. If we don’t have measurable performance requirements, we can’t make these informed decisions, and the system will suffer.

Map Reduce

The idea of Map Reduce is very simple. It is a method to split up a complicated problem into smaller work packets that can be solved on a distributed system.

There are two main parts to Map Reduce, the Mapping function and the Reducing function.

The Mapping function is used to split up your dataset into manageable chunks. It is performed by the master node in the system. The Reducing function will be run on each of your worker nodes. It takes a mapped set of data, processes it, and returns the results to the master node. The master will then collate all of the reduced values and return the final results.

There are some further complexities to Map Reduce implementations, namely how the data is actually sent to a worker, how the results are returned and how the scheduling is managed. This is basically what a system like Hadoop will manage for you, so you can concentrate on the details of your Mapping and Reducing.

The canonical example is to produce a count of words in a document. The input to your mapping function is a string containing the text of the document. This will be split into words by the mapper, and a list of key value pairs identifying each word will be sent to the worker nodes. The reduce function will simply count the values provided, and return the results.


"A simple string with a repeated word"

Map function


function map(string mapInput)
{
  foreach(var word in input.ToLower().Split(" "))
  {
    Console.WriteLine(word);
  }
}

function reduce(string[] reduceInput)
{
   Console.WriteLine(reduceInput.Length);
}

This is rough psuedo-code showing what you’d need to implement, it will not work exactly as written, you will need to customise for your actual implementation.

We’d expect from this to see the following results:

a 2
repeated 1
simple 1
string 1
with 1
word 1

From this simple example, you should be able to see how we can expand to cope with much more complicated and interesting problems.