select * from Aggghhhhhh;
My efforts at being efficient were thwarted this evening by a rather annoying Castle ActiveRecord buu — err – behavior. I decided to juice these lemons into lemonade by turning this aggravation into a blog post, as maybe it will help someone else out there.
It all started when I added a new column to the Users table. This was an Int64 type mapped to the Postgres “bigint” data type. After getting this all setup and finishing the code I was working on, I copied the bits out to the Alpha server to be used by my small contingent of test volunteers. Right when I logged on to the production site, the main “page loading” web method call threw an exception which indicated a SQL error. The SQL log file informed me that my unique constraint on my new column was being violated. This was odd, since nothing I was doing should even update that column. What was more odd is that this behavior didn’t repro on my development database. So I decided to point my dev machine over to the production database and see what was going on.
The crash was one of the weirdest things I’ve seen. The following line of code, which is part of the user authentication mechanism, was the culprit:
int i = User.Count(“UserId = ? and SecurityGuid = ? and SessionGuid = ?”, credentials.UserId, credentials.SecurityGuid, credentials.SessionGuid);
This is of course odd because this statement should do nothing but count some rows in the database, there’s no way in a million years this should be updating any data! Plus, the line of code had already been run several times successfully. It gets weirder. After looking at the inner exception, which contained the root SQL text being run, I noticed the UPDATE command was attempting to update other users in the database! The code was not only modifying data in the database, it was attempting to modify other user accounts! Ok now this is annoying.
Well, clearly there were some pending updates in the buffer, and Count() was flushing the buffer to ensure an accurate count. But what was updating the Users table? After spending about 20 minutes stepping through each line of code, adding a plethora of strategic breakpoints, I had exhausted all my patience and was getting no closer to figuring out what was going on. This annoyance was compounded by the fact that my production server was now down. It was 2am and I wanted to go to bed!
At this point, I decided to DROP the unique constraint on the new column, which was of course the symptom of the error and not the cause. Out of any other ideas, I figured this would either completely corrupt my database or perhaps provide some insight on the nature of the problem. The latter was thankfully what occured.
When I dropped the constraint, as expected the crash went away. What was in the database was nothing less than enlightening. Several of the users had the new column set to 0. A ha! It was making sense now.
Stop reading here if you’re adventurous enough to hypothesize the nature of the problem. It took me a few minutes to understand what was going on.
The new bigint column is nullable (by design,) however the Int64 mapped property was not. It turns out, I was reading a set of recipes which have a relationship with the Users table (through the recipe owner). When my code read the user’s cookbook, some User objects were instantiated. The new column was set to 0, since Int64 is a value type. Since the database value was known to be null, the record was flagged as dirty. When the HQL was run, NHibernate decided it needed to update those dirty records and flush the buffer. The second user updated violated the unique constraint since both rows had a new value of 0. The reason this appeared on the production server and not my development database was that my production user account had recipes in the cookbook owned by other users.
The solution was of course change the Int64 to Int64? so that the null’ed value could be represented. This is just another tale where an hour of debugging results in a one character fix. If I had a nickel for all of those.