Aug 14, 2015

Topic Analysis Exploration

I've been experimenting with Natural Language Processing, and I'm keenly interested in unsupervised techniques such as LDA and LSI. I have a fascination with unsupervised techniques, like clustering and neural networks, that have the ability to provide meaning without preconceived influence. The basic steps to set up LSI or LDA analysis are covered in the Gensim tutorials. If you don't know Gensim, it's a pretty sweet set of libraries for topic analysis and there's even a port of Google's Word2Vec to Python with some key performance improvements. I appreciate the focus on performance here, something that I think is rare in academic-like libraries.

My current knowledge on NLP is still pretty elementary, but I've focused on seeing a) what's possible and b) what has tutorials/libraries to get going. For purposes of this blog I'll stick to topic analysis which Gensim does well. Roughly speaking here was my R&D process, which wasn't rigorous or scientific by any means. Lot's of trial and error.

  1. Pull blogs from Elasticsearch by country
  2. Filter stop words, and perform lemmatization/stemming
  3. Create a corpus and dictionary
  4. Run that corpus through LDA or LSI
  5. 'Read the tea leaves' (a topic is a collection of words which can be difficult and require some insight)

First I tried LSI and at times you have to really investigate what the topic about. However, this sample below (from LDA analysis) is a bit more straight forward from an extraction on Uganda travel blogs. The print format is a bit confusing: probability*word + probablity2*word2... A collection of words is listed with the corresponding 'strength' of that word. This is a topic on safaris in Queen Elizabeth National Park

topic #3 (0.010): 0.010*bwindi + 0.010*lions + 0.008*queen + 0.008*elephants + 0.008*elizabeth + 0.007*tracking + 0.007*impenetrable + 0.007*gorillas. + 0.005*elephant + 0.005*park

This one is a bit tougher, take a look. Goats and beneficiaries? WTF? Type those 2 words into africa.wanderight.com and filter by 'Uganda'. You'll see a few blogs related to Vets without Borders (VWB). Pretty cool, huh?

topic #24 (0.010): 0.012*goats + 0.005*goats, + 0.005*beneficiaries + 0.003*pens + 0.003*disabled + 0.002*chuck + 0.002*background + 0.002*vaccinate + 0.002*tracked + 0.002*right?

Having a fast search engine on hand to pair words together has been super helpful at figuring what a topic really is. But it's hit or miss. I have no idea what the one below is about. Maybe you can figure it out.

topic #34 (0.010): 0.004*learned + 0.004*played + 0.004*stories + 0.004*tents + 0.004*resort + 0.004*dance + 0.004*grateful + 0.003*treat + 0.003*exhausted + 0.003*medicine

Here are some things I've tried to get better topics.

  • Improved stop words. Originally I used the NLTK list, but then I just used this list
  • I've recently played with stemming and I think the results have improved slightly (Why didn't I use the raw Snowball field within Elasticsearch? In short, I couldn't find anything with 5 minutes of googling, but really I liked having more control over the data, like stopwords)
  • With LDA, I tried more passes which improves results at the cost of performance. Fine for my exercise. Similarly, I tried more iterations with LSI.
  • Vary the number of topics. With some of these countries I don't have a ton of blogs. For Uganda I have just a bit over 400. I haven't nailed this down yet, but 50 seems to do ok.
  • Just now I tried a bigram method since some of these blog posts are so long. The results aren't as strong, and I can see that it's just clustering random things from single blogs. But still some telling words in pairs. 'white water', 'health center', 'gorilla tracking'.

I probably should have used a more rigorous method for optimizing the inputs, but I talked myself into thinking these are subjective enough anyway. Since I ran so many quick trials, I was able to know what should show up if I varied something. Another method I used to just determine the efficacy of topic analysis in general was to see if I could find 'Things to Do' listed in TripAdvisor. Getting those topics, I presume, would just gets my foot in the door. The reality is that I hope to find things that are difficult to track down in TripAdvisor or a general Google search. Like Vets without Borders. Volunteering as a means of travel is totally legit, but not a money-making adventure. Probably why you can't find it within travel channels, which is why I think online travel planning sucks.

My overall impression with LDA and LSI luke warm at best. I can find some interesting things fast, but there are only a few of those things in 50 or 100 topics. The rest you have to dig a bit. So I might have half a dozen topics that are solid. Perhaps part of the stems from certain blogs that are really many blogs combined into one. Words and documents are key pieces in these methodologies that provide boundaries for the model to learn from. Perhaps to get better data I could make a document a paragraph. Something worth trying. But for now I'd be willing to bet I progressed to the edge of the 80/20 rule. Any future gains would be incrementalish.

Back to my TripAdvisor hypothesis. It turns out topic analysis combined with manual interpretation can't match the 'things to do' in TripAdvisor. My sense is that there is a collection of techniques that will get me there. Some of those things I'm trying, so I've still got a lot to learn. If you're interested in diving a bit deeper on topic analysis, check this out.

Jul 24, 2015

Wanderight, a technical journey

While I've been vague about my idea and project, things are coming into focus. In this time of entrepreneurial exploration I've had time to get reacquainted with developing software. Every technology I've employed so far is new to me: operating system, cloud hosting platform, programing language, text based search platform... But like most other software devs I enjoy learning new things. The pragmatist in me knows that many of these tools are the best for the job. Python is good at manipulating text, data analysis, and building things quick. Elasticsearch is great at encapsulating the complexity and learning curve of running Lucene or Solr. Linux is the best platform to run and operate Python and Elasticsearch. Azure is the best platform for giving me free hosting which is awesome at keeping my burn rate low.

I'd like focus on the technology aspects of what I'm working on from the perspective of a beginner on many fronts. But I also have the belief that I'll be able to do somethings with technologies that I barely understand as of yet. That's where the fun comes in. Without explaining the product and the opportunity/market...yada yada, I'll focus on the operational learnings and a bit about the motive for the focus. So first, a bit of a primer: I believe that many textual sources have a wealth of information that is mostly untapped. If you think Google News can aggregate well and Google Search can find things well, I'd hope to do something similar with travel. As a person that's traveled and explored in over a dozen countries, I hate the over-sponsored, highly-advertised world of online travel. There are people out there giving authentic information on travel blogs, forums... you just need to tease it out. Make the content easy to find and relatable to the end user, and be diligent about spotting and identifying that which is 'sponsored'. The easiest way I could think to start is with narrative travel blogs. Which brings me to my tech. Parse, clean, and analyze travel blogs. What I eventually want to have is a site that will serve up authentic travel information to the international planning traveler. As far as I can tell, there are 3 distinct categories.

  1. Build a 'basic' search engine based on a semi-currated content feed. By semi-currated I mean that I'll only be putting in content that I presume isn't sponsored. Some of that authentic content is apparent, some of it is tricky, so start with the obvious.
  2. Discover your data set. Yay data science. Understand key characteristics about it. Get a sense for what is 'good' and make and define those as hypotheses in your product journey. Clean your dataset. Don't even allow things in that are in a foreign language of a certain content length and aren't based in the region of focus. Not to mention that cleaning and data janitorial work that needs to be done before it can be searched. Luckily Elasticsearch does alot of that lifting for you: stemming, stopwords... Obviously I need to pick out the main content, key in on meta data while, and clean html while parsing (covered in step 1).
  3. Generally speaking, find more intelligent ways to serve content up to the user. Things like region discovery through topic analysis. Or categorization to filter content for a traveler archtype. Summarize long posts to make searching faster. Extract places to enhance your search and discover capability. Recommend blogs to read based on blogs of interest. I believe that information extraction, natural language processing, and machine learning have become much more accessible to the non-academic. That being said, this step isn't plug and play but there is wealth of books, platforms, and sample code to get this autodidact well on my way. If nothing else, this is waaaaay better than going back to school for a degree in what I can learn with the power of the internets.

Now the details and my reflection 6 weeks in with so much new technology. To address step 1) I choose 2 key technologies:Python and Elasticsearch. Python for its text parsing prowess and Elasticsearch for the heavy lifting of Lucene. BeautifulSoup is an excellent library for finding information in blogs and scraping out what you want. Learning Python was fairly straight forward. The 2 languages I'm most familiar with are C# and Javascript. Python is alot more like Javascript. The key difference building software this time around is that I have a veteran (read cantankerous) view on technology. Everything in its place. I used to get wrapped around the axle with proper coding technique, architecture, scalability...but that doesn't have much place here. Those things matter most when you have different problems to solve than trying to see if other people give a shit about what you're building. This graphic is telling of the different development mindsets. My nature is to be a 'Settler', but I've longed to be a 'Pioneer'. Which basically means now I'm a giant fucking hack. You could drive a mack truck through my hacks. It's fun and awesome. But I think the key difference is (and this is a big *if*) I would know how to scale things when the time comes. Whether I'd need to burn it to the ground and rewrite or build up what's there, I can figure it out because I have in the past. That's a comfort that not many others have and I consider it a key personal asset.

A long time ago I worked with Lucene. Like 10 years ago. What Elasticsearch has done for text search since that time is awesome. Getting going out of the box is easy. The tutorials and the amount of StackOverflow help make the ease of adoption so much less intimidating than trying to learn what TF-IDF is when you just want to get shit working. Get it working first, dive deep later. Elasticsearch is awesome at that. I don't need to care that it runs on 5 shards or that I need to use a Snowball analyzer for proper stemming. I can get going and find lots of information on the internets when I run into issues. That being said the run-into-walls approach only works so long. Eventually you need to dive deep, like yesterday. I spent the at least half the day understanding how custom query scoring works, how to avoid using TF-IDF on certain fields that are short, and looking at very complex 'explain' query chains. I went deep into the rabbit hole that day my friends, but my search is so much better. Being able to find terms in a statistically relevant way just gets your foot in the door. Boosting blogs that are more recently, longer in length, and contain more pictures really makes a difference in the quality of your first few results.

For hosting I originally started with AWS. Every major cloud platform has some sort of offer for startups. AWS really raises the bar for getting anything free out of them. Like for $1k/year, I'd need to take an online MIT course on entrepreneurship. No thanks. I bought the professor's book on audio, time better spent. Azure has a BizSpark program that lowers the bar to entry and is awesome for people like me who are bootstrapping an MVP. 2 weeks after I applied, I got in. I've got a basic Web App hosting a static single page that uses Angular and other javascript goodies to to talk to the backend. I've got another Ubuntu VM running Elasticsearch (and an experimental Django API). That's it, and my shit is pretty fast, but I've only got like 4k blogs at the moment. It's hard to admit this, but I've been a Windows guy the majority of my career. I was a self-deprecating Windows user and I knew that Linux was a better platform for building most software (save C#), I'd just never taken the time to learn it. I'm a newish Mac user and now I've refreshed my memory on VI, I 'grep' shit, write bash scripts, and in general still have little idea what I'm doing. But the advantage to working in the command line is becoming clear when you get the shortcuts down, especially installing software. pip install is my friend, but don't get me started on how I don't have a virtual environment for my Python libraries. Did you know there are 2 locations for libraries when using Python? It sucks.

Right now I have a domain where you can go search narrative travel blogs in Africa. Eventually, I'll tell the world about it, but I have a few more big ticket items to address. That being said, I'm looking for early adopters! Free internet scouring if you are traveling to Africa!

Jun 22, 2015

Entrepreneurial Beginnings

It's been sometime since my last post on quitting. Since then I've been endeavoring on my entrepreneurial journey, while admittedly soaking up summertime in Boulder. I wanted to recap, if for no other reason than, to keep a running catalog of the course of events.

First week after I quit, I jumped right into Boulder Startup week. This event has come and gone with my attention for years, so my timing this year wasn't all that coincidental. I went to a number of activities and got to catch up and with those I've known professionally at some point in my 8 year tenure living in Boulder. In general, TechStars and it's attitude pervades the startup culture here. There are other players in town, but TechStars helped to pioneer the culture as it is. One thing I've felt personally and hear from others in the community is how Boulder is a 'give first' community. That's straight from Brad Feld and his notions on Startup Communities. The most helpful events were the TechStars mentor sessions and startup demo presentations. The Founder Stories were great too. These sessions reminded me of a book I read years back called Founders at Work. Super helpful for newbs like me. One thing I wish there were more of were sessions hosted by the bootstrapped. I've generally worked for bootstrapped software companies for the last 10 years, and there is something to be said for not taking money and making your own way. There was one excellent and informal session put on by SnapEngage. I learned some about Open Book Management and liked 3 things: they are super diligent about who they bring in, have a great operational model where workers can meaningfully contribute to company metrics, and everyone shares *equally* in the company's success. Seems like a great company to work for and something I'd hope to model my business operations after.

In the general search for more information on how to start my adventures, I'm keenly aware of the need for other cofounders. Not only does TechStars lean heavily on this, I've always enjoyed, as many do, the company you keep at work. Not to mention the numerous benefits of having someone else to do some lifting, execute where you are weak, and balancing each others life/mental roller coaster out. Finding a cofounder isn't easy, but nothing sends a message that you are serious like quitting a well paying job in the prime of your career. I'm lucky that I have a good background and narrative that seems to help when generating interest from others. I've been trying out CoFounder's Lab. Basically its a social network for those looking for startup opportunities and building a team. It's great to meet others who can help give you feedback on your idea, network, or form partnerships. The forum of networking needs a lot of love, and the random meets don't produce much return. You've got to talk to 20 people to find something of interest. I believe firmly that these things need to happen organically over time. It's obviously much more advantageous to have a cofounder that you've worked with in the past or even someone that's in your network, but my list has been tested already. Networking and planting seeds now is key, you just need to have a long term mindset.

I've had to put my ego and introversion aside. I'm cold contacting lots of folks in the area and in the industry. I'd guess I have an 80% response rate so far and persistence pays off for those high value players. Not knowing the industry I'm entering into, I put priority to those who know the industry or those who've started startups. It's important to know what the prevailing opinions are. You can't carve a niche out of an industry if you don't first know how it works. In the last 2 weeks a travel focused incubator, TravelPort Labs, has opened up in Denver, which seems to have alot of pieces that might help me. I met with them 2 weeks ago and I like that they are starting their own initiative for the first time like me, I'd get a decent amount of attention, they have experienced mentors, and helpful UX/UI resources. My reservations are whether we'd have conflicting operational approaches, the 1.25 hr drive each way, and if I'm limited to the travel industry in case the technology takes me elsewhere. In any case, I'll apply because I had a great conversation with 2 guys from the program.

Not knowing the industry and never having done market research, I generally was a bit disorganized at doing this on my own. But I found a few footholds with large market research firms, competitor analysis, and a gem of a site, Tnooz, which focuses on latest trends in travel, especially startups. I have a sense of the challenges in the travel planning space, the size, behaviors of the target demographic. The space is so crowded and generally everybody is faced with the challenge of 1 or 2 engagements per year and a high CAC. There's been a number of times I've gone to a travel planning site and been discouraged at the fact that a) it exists b) it seems well done from a UI standpoint. Then I test it and find the confidence to keep moving forward and carving out a niche. I'm the sort of entrepreneur that is a builder and have a personal need for the product, the question is whether others agree and are willing to pay for it. Nonetheless, it's at best a journey and a reasonable place to start.

So far I've taken a stab at the modern version of a business plan, the Lean Kanban board. I've identified my hypotheses and have strategies to get there. When you talk to the serial entrepreneurs around here it's "get out of the building", test, pivot, product/market fit. But in this "process", i think you need to have some core narrative or principles you are revolving around. For me it's technology, operational narratives, and an industry that you can give a shit about. I'd like to have all 3, but the buck will stop somewhere, that's what the hypothesis are for. From here on out when starting a business, you'll get tested on where to go and where to start. You need to really have your head on straight about who you are and why you are doing this.

That mostly sums up the activities of the first few weeks. Now I'm onto writing code and building a product, a subject for the next blog post.

Jun 2, 2015

Travel: A Necessary American Curriculum



I wish traveling was a required life curriculum.  Not vacationing, but travel.  Sitting on a beach sipping mai tai’s is plenty needed at points in life, but that's not what I'm talking about. How is travel different than vacationing?  I think when you travel you have a purpose in mind.  Perhaps you want to see and experience things.  Different perspectives, different food, different landscapes.  Perhaps you need to move through something, a problem or the vague notion of one.  Travel is best a journey that helps to step out of the patterns of your routine, and engage in the therapeutic activities of wanderlust.  Perhaps you’re keenly aware of how fucked up things are in your life and your therapies are much more deliberate in nature.  For other people, like myself, you love to learn.  Learning is a paramount activity for its own sake and there nothing quite like experiential learning.  Remember that scene in Good Will Hunting, where Robin Williams says to Matt Damon, “you’re just a kid, you don’t have the faintest idea what you’re talkin’ about…I’ll bet you can’t tell me what it smells like in the Sistine Chapel”  Experiential learning has a staying effect and an impression that's hard to replicate from other methods of learning.

Why should travel be required?  We’ve all met bigoted and ignorant people in life.  How much do you think those people have traveled?   "Travel is fatal to prejudice, bigotry, and narrow-mindedness." - Mark Twain.  Experiencing different cultures causes you to reflect on what has shaped you, and certain liberal rhetorics become more clear.  Chomsky’s Manufacturing Consent takes on a whole new meaning when you see how little advertising there is elsewhere.  Sure other countries might be outright corrupt when you need pay off the cops, but at least its on the table and not so subversive.  Military-Industrial Complex, Consumerism, Narcissism… these are concomitant side-effects of American society.  I'm not here to bash ‘merica, but merely see it for what it is.  Take the emotion out of the rhetoric by stepping outside of your routine and truly see how your circumstances came to be.  Engage, if but momentarily, in a different way of living or viewing the world.  Go to some non-western 3rd world country and see how much happier people can be even when they have little material wealth.

I remember reading Zen and the Art of Motorcycle Maintenance by Robert Pirsig in my travels to Fiji.  As an analytical person his analysis of the scientific process was eye opening to me.  Years later I read his follow-on, Lila: An Inquiry into Morals.  Lila touched on anthropology and explored a culture's values and mores in relation to other cultures, intellect, and biology.  These things take on a much deeper meaning when you understand just how different value systems of different countries and cultures can be.  It’s quite certain that nobody has it figured out, but surely some more than others.  When you see the connectedness and community of societies outside the Ayn-Randian dystopia, you can see that maybe you are missing something.  When reading Lila, I realized that most disagreements between persons stem from different value systems, different ways of looking at the world.  What I can see now is the black and white morality of a Christian derived US value system, is really bullshit.  Right and wrong is just a matter of what side you are on.  And the driver of war is keenly driven by our primitive aspects of ‘us vs them’.  Nothing codifies a group of individuals like having an enemy: in business, in sports, in nation building…  What does this have to do with travel?  You develop meta-attitudes and value systems that embrace other cultures.  You become a citizen of the world, not just the US.  

Do you travel?  Did you used to?  Do you have a ‘wish/bucket list’ in the recesses of your mind?  Dig it up, plan a trip.  Do you have a family?  There are a ton of resources out there on family travel.  Are you low on money?  There are resources out there for being a nomad on dollars a day.  Do you only have 3 weeks of vacation and use it sparingly.  That’s just a shitty excuse.  And a broken byproduct of the American work culture best known as the Paradox of Productivity.  We are the least traveled of any developed nation and its at our detriment.  Maybe you’ve read the top 5 regrets of the dying, most of it is obvious right?  The challenge is to step out of your routine, to make it a priority.  Or perhaps you’ve seen that paying for experiences is vastly better for you than material things.  By no means am I a nomad, but I live well within my means.  And I’d like to think I'm unconstrained by those trappings of American materialism.  Fuck your car, your house, your overinflated sense of self worth at your job, and go travel.  And travel with purpose. (When you do, feel free to share with me how flawed my views of contemporary American society are.)

May 22, 2015

On Quitting as CTO, Wantrepreneurship, and Starting from Scratch


Shitting my pants before bungee jumping
Two weeks ago I quit my job as CTO.  With long aspirations to become CTO, it was a surprise to many, and at the time I realized it, to me as well.  As CTO you get to help shape the company in a significant way, which can be very rewarding.  From company to company the role can vary greatly as Werner points out.  As the company grows, leadership needs to have consistent values, messaging, and scalable operations.  Reticence and self discipline go a long way at a company that has its fair share of organizational and technical debt.  In fact, I've become convinced that what everybody in HR already knows: emotional intelligence is truly a defining characteristic of most good executives.  I was fortunate enough to have a sense of this from before I even started the workforce.  When I was a teen in the early and mid 90's the nature of intelligence was in debate and I was soaking it all in.  A book called Multiple Intelligences came out then a highly controversial book The Bell Curve and then Emotional Intelligence after that.  My general takeaway after this debate was that intelligence is simply the ability to adapt to one's environment.  That's a hotly contested opinion, but one that's stuck with me for a long time.  But here's the catch, you have to pick the right environment.  After scaling with the company for five and a half years, my personal career goals no longer aligned with the company's narrative.  Having considered myself a change agent for so long I realized I no longer had the drive to push things forward.  It was time to step out of the way to let the talented team I'd worked so hard to build, step up.

Being as it was, the decision didn't come easy.  My disposition is typically all-in.  I knew once that I'd lost the desire for my current job that I'd feel like a fraud.  It was important to me that I wasn't one of those employees phoning it in.  I absolutely hate that.  I'd sooner do something that's whole ass than half ass.  Second, the timing for me financially is amenable to being without income for a period.  I've deliberately lived my life free of financial burden and undue American materialism.  I have a reasonable mortgage, no other debt, no kids, and a wife that has a more than adequate income to sustain our standard of living.  The toughest part of quitting was leaving the team I'd hired.  Who you work with is paramount in my book and this statement from Peter Senge in the Learning Organization really strikes a chord with me
"When you ask people about what it is like being part of a great team, what is most striking is the meaningfulness of the experience. People talk about being part of something larger than themselves, of being connected, of being generative. It become quite clear that, for many, their experiences as part of truly great teams stand out as singular periods of life lived to the fullest. Some spend the rest of their lives looking for ways to recapture that spirit."
I've been part of a great team and when I was able to build my own I sought to define and establish a set of principles needed to rebuild something similar.
  1. Create growth opportunity and narrative 
  2. Find those who show an aptitude for continual learning 
  3. Find those who work well with others 
  4. Make sure the role and growth opportunity is defined 
  5. Only hire those who give a shit
The good news is that with a dugout full of talented people they'll lean on each other and their own talents to find their path forward.  Of that much I am certain.

Those that know me well, know that I've had intentions on starting my own business for a long time.  Being an analytical person, I wanted to better understand the landscape.  By working for 4 startups in 10 years, I got to learn a lot.  As a developer I learned how to build and maintain software.  As a manager I exercised my people skills.  As an executive I've exercised my business acumen. Brad Feld talks about Wise's Talent Triangle in Startup Opportunities and I have 2 of the 3 covered: business acumen and operational experience.  So in essence, I think I have enough of a foundation to find success in the entrepreneurial game.  But unlike the rest of Brad's book I don't really meet the checklist of knowing when to quit your day job.  I'd give myself a 7/10.  But that's passing, right?

I'm sure there are many stories of how folks decided to quit their jobs.  It's not like I have this project on the side that's gaining traction and it's not like I have a co-founder who has wanted to start something.  Generally speaking, I'm at a pretty open stage with nary a plan.  I do have an idea that's nagged at me for sometime.  It's in a space I'm passionate about, but have little experience in.  The technology approach is appealing to the nerd in me, but I'm still trying to discover the market opportunity.  At the very least, I'd end up with some compelling technology discovery and perhaps a way with which to leverage that into a product will become clear.  I need time.  Time to explore and create.   It's been too long since I've created things or worked on a team that's building something.  I'm most eager to get back into that seat, get more hands on with coding and data science.  That in and of itself will be a win.

So far, it's an emotional roller coaster.  Having been at the company for so long, part of my identity was wrapped up in it.  Combine that with the fact that I'm not racing this year, I'm a ship without a rudder.  But it's also a good opportunity for reinvention.  I went to Boulder Startup Week, and got some great advice from a few mentors.  I've also reached out to old colleagues and old mentors.  It's really encouraging to have a good support network in my community and from my past in general.  But in the end you need to distill all the information you get into a direction.  Lean Startup is all the craze these days, but it's no silver bullet.  You can't substitute methodology for good ole fashion critical thinking.  Time to get to work.

Mar 7, 2010

Nullable method parameters with FluorineFx

After upgrading to the latest version of FluorineFx, we noticed quite a few new exceptions: "Could not find a suitable method with name %". We checked the parameters, overloads, ect.  One thing was consistent, each had at least one nullable parameter.  We've already branched FluorineFx for datetime issues (TimezoneCompensation.None doesn't actually mean none, that's another post), so I took a crack at fixing this one as well.  I traced everything back to the bloated method TypeHelper.IsAssignable().  As best I can tell this tries to see if the method parameter in-hand can be assigned to the parameter type of the method.  At the heart of things, its using the .Net TypeConverter, but it won't handle nullables.  You need to use the NullableTypeConverter instead.  We added the following line of code to line 693 of the TypeHelper file:

........
if (obj != null)
{
    if (isNullable)
    {
        NullableConverter nullableConverter = new NullableConverter(targetType);
        targetType = nullableConverter.UnderlyingType;
    }

    TypeConverter typeConverter = ReflectionUtils.GetTypeConverter(obj);//TypeDescriptor.GetConverter(obj);
........

Why don't I just check this into trunk?  Well I tried to contact Zoltan, the main contributor to FluorineFx as far as I can tell, and he's completely unresponsive.  Bummer.  There a few other nitpics we'd really like to checkin.

Feb 20, 2010

Custom Error Reporting with log4net

I recently started a new position where hunting for errors included logging into one of two active web servers, looking over a couple of directories that were logging via log4net, and also checking the Windows event log.  Needless to say this was a PITA.  I decided my first initiative was to try and improve the visibility into our application errors, to better understand our production issues.  To confound the issue we weren't getting context like server variables (browser, referring url, ect) or the user logged in, which can be very helpful in the discovery process and also for support.  Typically I would try to use something like Elmah, because the less work the better, but there are a few snags.  One, we are using a custom db session provider which helped to link the dying ASP pages to .NET.  Two, we use Fluorine and NHibernate, and they do alot of internal logging using log4net.  Additionally our existing app had log4net logging all over the place. So I decided to set out on a custom appender to consolidate.  There were a few configurations I thought of, but I settled on inserting all errors into the database and using an admin interface to view, datamine, and manage our exceptions.  First thing I had to do was insert a Global.asax in all 8 of our applications to catch all unmanaged exceptions. Each one had something like the following:

void Application_Error(object sender, EventArgs e) 
{ 
    // Code that runs when an unhandled error occurs 
    log4net.ILog log = log4net.LogManager.GetLogger("MyApp");
    if (log.IsErrorEnabled)
        log.Error("An uncaught exception occurred", this.Server.GetLastError());

}

void Application_Start(object sender, EventArgs e) 
{
    // Code that runs on application startup 
    log4net.Config.XmlConfigurator.Configure();
}
Next I wanted to find a decent database appender that wouldn't affect the performance of our app too much. Luckily I found Ayende's AsyncBulkInserAppender  which, as its name suggests, is both async and queues up inserts at a configurable queue length.  With some minor tweaks, I was able to get this to work with our app.  I added some additional context to get our user, ala cookie from current request, and I could also stuff server variables into a custom column I created.  I started by overriding the Append event for the appender.  Inside that event you can add custom context to the logging event.

protected override void Append (LoggingEvent loggingEvent)
{
    try
    {
        SetUrl(loggingEvent);             
    }
    catch (Exception ex)
    {
        ErrorHandler.Error("AsyncBulkInserterAppender ERROR", ex);
    }

    base.Append(loggingEvent);
}

protected virtual void SetUrl (LoggingEvent loggingEvent)
{
    if (IsInWebContext())
    {
        loggingEvent.Properties["url"] = HttpContext.Current.Request.Url.ToString();
    }
}

private bool IsInWebContext ()
{
    return HttpContext.Current != null;
}

Next I added the appender to a few configs and set them to log errors only.  I found out while doing this that you can cascade configs within the same directory, even if they are in different app pools.  So I simultaneously cleaned up alot of our redundant web.configs during this process.  One thing you'll need to know is how to add a custom column to your appender.  Here is an example of the column I used to store the url.

<mapping>
    <column value="Url" />
    <layout type="log4net.Layout.PatternLayout">
        <conversionPattern value="%X{url}" />
    </layout>
</mapping>

Everything was going well, and I was ready to build my interface.  I tested each site by throwing an error and checking the log, then I realized that SOAP exceptions from web services were outside the normal pipeline, and thus weren't caught within the global.asax.  Shit.  I did a little more googlejerking and hacked together the following:

public class SoapExceptionHander : SoapExtension
{
    public override void ProcessMessage (System.Web.Services.Protocols.SoapMessage message)
    {
        if (message.Stage == SoapMessageStage.AfterSerialize)
        {
            if (message.Exception != null)
            {
                log4net.ILog log = log4net.LogManager.GetLogger("WebService");
                if (log.IsErrorEnabled)
                    log.Error("An uncaught web service exception occurred", message.Exception);
            }
        }
    }
    
    public override object GetInitializer(Type serviceType) 
    { 
        return null; 
    } 

    public override object GetInitializer(LogicalMethodInfo methodInfo, SoapExtensionAttribute attribute) 
    { 
        return null; 
    } 

    public override void Initialize(object initializer){ } 
    
}

Add added in this in the web.config:

.......
    <webServices> 
      <soapExtensionTypes> 
        <add type="YourNameSpace.SoapExceptionHander,YourDll" priority="1" group="High"/> 
      </soapExtensionTypes> 
    </webServices>    
</system.web>

One thing you *need* to know, is that you can't test this from the little test page that .Net creates. The best way to do this is call the web service from a test page, making sure the service is throwing an exception. Don't waste hours of your life trying to debug why your custom SoapExtension isn't working. Argggg.


So now I've got all errors from all applications logging into one place.  I built my interface, with a filter on just about everything.  I also added the ability to 'handle' exceptions as a means of managing errors that need attention.



Much better.  Now we are depressed at the amount of log4net errors and warnings we see, but atleast we can address them. :)  Next on my list is the ability to maintain and push a branch of svn for 'hotfixes' so we can address these bugs realtime without rolling out code that isn't ready for primetime.