....

Feb 20, 2010

Custom Error Reporting with log4net

I recently started a new position where hunting for errors included logging into one of two active web servers, looking over a couple of directories that were logging via log4net, and also checking the Windows event log. Needless to say this was a PITA. I decided my first initiative was to try and improve the visibility into our application errors, to better understand our production issues. To confound the issue we weren't getting context like server variables (browser, referring url, ect) or the user logged in, which can be very helpful in the discovery process and also for support. Typically I would try to use something like Elmah, because the less work the better, but there are a few snags. One, we are using a custom db session provider which helped to link the dying ASP pages to .NET. Two, we use Fluorine and NHibernate, and they do alot of internal logging using log4net. Additionally our existing app had log4net logging all over the place. So I decided to set out on a custom appender to consolidate. There were a few configurations I thought of, but I settled on inserting all errors into the database and using an admin interface to view, datamine, and manage our exceptions. First thing I had to do was insert a Global.asax in all 8 of our applications to catch all unmanaged exceptions. Each one had something like the following:

void Application_Error(object sender, EventArgs e) 
{ 
    // Code that runs when an unhandled error occurs 
    log4net.ILog log = log4net.LogManager.GetLogger("MyApp");
    if (log.IsErrorEnabled)
        log.Error("An uncaught exception occurred", this.Server.GetLastError());

}

void Application_Start(object sender, EventArgs e) 
{
    // Code that runs on application startup 
    log4net.Config.XmlConfigurator.Configure();
}

Next I wanted to find a decent database appender that wouldn't affect the performance of our app too much. Luckily I found Ayende's AsyncBulkInserAppender which, as its name suggests, is both async and queues up inserts at a configurable queue length. With some minor tweaks, I was able to get this to work with our app. I added some additional context to get our user, ala cookie from current request, and I could also stuff server variables into a custom column I created. I started by overriding the Append event for the appender. Inside that event you can add custom context to the logging event.

protected override void Append (LoggingEvent loggingEvent)
{
    try
    {
        SetUrl(loggingEvent);             
    }
    catch (Exception ex)
    {
        ErrorHandler.Error("AsyncBulkInserterAppender ERROR", ex);
    }

    base.Append(loggingEvent);
}

protected virtual void SetUrl (LoggingEvent loggingEvent)
{
    if (IsInWebContext())
    {
        loggingEvent.Properties["url"] = HttpContext.Current.Request.Url.ToString();
    }
}

private bool IsInWebContext ()
{
    return HttpContext.Current != null;
}

Next I added the appender to a few configs and set them to log errors only. I found out while doing this that you can cascade configs within the same directory, even if they are in different app pools. So I simultaneously cleaned up alot of our redundant web.configs during this process. One thing you'll need to know is how to add a custom column to your appender. Here is an example of the column I used to store the url.

<mapping>
    <column value="Url" />
    <layout type="log4net.Layout.PatternLayout">
        <conversionPattern value="%X{url}" />
    </layout>
</mapping>

Everything was going well, and I was ready to build my interface. I tested each site by throwing an error and checking the log, then I realized that SOAP exceptions from web services were outside the normal pipeline, and thus weren't caught within the global.asax. Shit. I did a little more googlejerking and hacked together the following:

public class SoapExceptionHander : SoapExtension
{
    public override void ProcessMessage (System.Web.Services.Protocols.SoapMessage message)
    {
        if (message.Stage == SoapMessageStage.AfterSerialize)
        {
            if (message.Exception != null)
            {
                log4net.ILog log = log4net.LogManager.GetLogger("WebService");
                if (log.IsErrorEnabled)
                    log.Error("An uncaught web service exception occurred", message.Exception);
            }
        }
    }
    
    public override object GetInitializer(Type serviceType) 
    { 
        return null; 
    } 

    public override object GetInitializer(LogicalMethodInfo methodInfo, SoapExtensionAttribute attribute) 
    { 
        return null; 
    } 

    public override void Initialize(object initializer){ } 
    
}

Add added in this in the web.config:

.......
    <webServices> 
      <soapExtensionTypes> 
        <add type="YourNameSpace.SoapExceptionHander,YourDll" priority="1" group="High"/> 
      </soapExtensionTypes> 
    </webServices>    
</system.web>

One thing you *need* to know, is that you can't test this from the little test page that .Net creates. The best way to do this is call the web service from a test page, making sure the service is throwing an exception. Don't waste hours of your life trying to debug why your custom SoapExtension isn't working. Argggg.

So now I've got all errors from all applications logging into one place. I built my interface, with a filter on just about everything. I also added the ability to 'handle' exceptions as a means of managing errors that need attention.

Much better. Now we are depressed at the amount of log4net errors and warnings we see, but atleast we can address them. :) Next on my list is the ability to maintain and push a branch of svn for 'hotfixes' so we can address these bugs realtime without rolling out code that isn't ready for primetime.

Nov 28, 2009

Adventures in Information Extraction (Part I)

I'm currently embarking on a personal project to parse and index content from certain specific blogs. Before this adventure, my knowledge of the relevant body of information science was merely a term, Natural Language Processing (NLP), that I had no understanding of, and still don't. Now that I've done some google-jerking, and with a bit of helpful clarification from a friend of mine enrolled grad school, I've hopefully started to define my initial effort into some more specific technologies and fields of study. Nonetheless, alot of this will change as time goes on.

First let me describe, in general, my problem set and let me preface by saying, this is currently my 10,000 ft view of the problem approach. I wish to target specific blogs and extract from them strictly blog content, minus the useless parts of the page (navigation, footer, header, ect). Now, I hope to initially target two sites that contain lots of other blogs. In doing this, I should be able to extract a general template (a common body for example) that contains the content of interest. With content in hand, I would like to store this stuff in an index for quick retrieval using Lucene.Net. Now what I am going to extract remains a bit muddy at the moment, as I try to explore the relationship between certain Natural Language Processing techniques and Lucene. For instance, Lucene has the ability to tokenize (basically, find words), perform stemming (map variations such as 'traveled', 'traveling', and 'travel' to the root 'travel'), and filter stopwords ('the', 'a'). These things seem foundational to search in general, but what I really would like to index specifically are place names like 'The Taj Mahal', 'Pismo Beach', or 'Moe's Tavern'. My research has pointed me to Named Entity Extraction, a subfield of Information Extraction. Now the field of IE starts to get a bit murky for me, and this theme will perpetuate as I go further down the rabbit hole. I first started to try and find a good api and accompanying tutorial on these things. Many roads pointed to NLTK and it's quite descriptive, and free, online book. The library is built with Python, and I've traditionally been brewing C# and .NET for awhile. But the idea of learning a new language is welcoming to me. My initial opinion of Python, is that its much easier to learn and use than a regimented procedural language like C# (though .NET has been making impressive headway into functional programming with F#). Additionally, the syntax and libraries are more adept at parsing text. The plan so far is to work my way through the online book to Chapter 7, which goes into exactly what I'm looking for. Anyway, all this stuff is like drinking from a fire hose: new language, two new api's, and new subject matter, but I'm hoping I'll be able to put together a prototype in my spare time. So far, I've been able to do some fun things with Python using Frequency Distribution (how many times a word appears in text), finding a word in context (collocation), and filter out some proper nouns using POS Tagging, a small step closer to getting my place names. So for now, I'll keep inching towards Named Entity Extraction. Once I have a working prototype for getting place names, I can circle back on some of the other things like feeding content into a simple Lucene index and figuring out the relationship between NLTK and Lucene. Until I have some automatic extraction methods, I'm using Dapper to get me some data to play with.

It seems funny writing all this up, because tomorrow the direction will change. But I'll stay flexible, and hopefully writing about my experience with help others avoid my stupid mistakes.

Nov 16, 2009

NHibernate 2.1 Performance Issues with Flush and LinFu

We recently discovered our nightly processes were taking much longer (50 - 80%) and wondered why. My focus was on the upgrade from 2.0.1.4 straight to 2.1.1. First thing I tried was to change the LinFu Proxy Factory to Castle. This had a surprising boost. For this particular operation some object properties were being accessed 100,000's of times. Redgate Profiler showed that, in aggregate, proxied object access can add up. 3300 objects loaded, with 3 many-to-one objects eagerly loaded, had some some properties accessed 350,000 times. One particular property being accessed took roughly 2 seconds over the life of the process, and some of the others took roughly a second. All adding up to roughly 12 seconds on a 100 second process. Whoa. Keep in mind that's with Redgate's deepest profiling on. Switching to Castle seemed to cut that to 5 seconds. Anyone know what's going on here?

But the big surprise of the day was what happened with flushing from 2.0.1 to 2.1+. A basic flush on this process on 2.0.1 took .42 seconds, but on 2.1.1 it took .79 seconds. Really? What the hell happened in between those versions? Just to be sure I checked 2.1 and 2.0.1, both were the same. I wonder if abstracting dynamic proxying from NH had some unintended consequences.

Needless to say we are back on 2.0.1 for the time being. This process that was taking 1 min and 11 seconds now takes 47 seconds. Why? A couple of reasons: first the process calls embedded business logic many times that ends up doing "safety" flushing when it's a mostly readonly session. The next refactor is to fix that problem, but in all the flush increase difference accounted for 10 - 15 seconds. Second, was the proxies which accounted for the remainder of the difference. I didn't do anymore benchmarks on 2.0.1 vs Castle on 2.1.1, but I'm sure there are differences there from all the testing that was done. In either case there's not anything in 2.1+ that's worth these performance tradeoffs we experienced. NH.Linq seemed like it would have been a good move, but it's not yet ready for production use. So for now I'll take the perf benefits of 2.0.1 for the minor upgrades of 2.1+.