Public musings, often on software development RSS 2.0
# Friday, July 29, 2005

As you can tell I'm using this morning to really catch up on my blog posts.  I've been meaning to update them but things were busy and then I was sent on this assignment called 'vacation'.   Turns out I really enjoyed that assignment... I need to post a couple picks of my wonderful neice and nephew, but I'm all ready for a repeat. 

But for now I just wanted to send out a quick reference to a new technical book regarding Visual Studio Tools for Office(VSTO).  The book is being published in September, but I was fortunate enough to get an early look because InterKnowlogy has been doing alot of VSTO work lately and Tim Huckaby (our CEO) spoke at Tech Ed on this topic.  So after Tim took a look at the book so he could say what he thought of it he passed it to me and I put it to use.  The book covers not only VSTO but integration with Microsoft Office 2003 from .NET 2.0.  It talks about how to call office from your application and just as importantly how too build really powerfull VSTO apps.  More importantly it doesn't draw a line in the sand with Word and Excel, this book goes into customizing Outlook and Infopath.  This book covers stuff that some of our IK developers have had to learn the hard way.

The book will be released mid september but there is Addison-Wesley and a portion of Chapter 1 is posted here: http://www.awprofessional.com/title/0321334884

The version of Chapter One I previewed is noticably longer and the book comes with plenty of code samples.  Although it's written in C# instead of my preffered language of Visual Basic I definitely recommend it for anyone who is looking to leverage the power of Office 2003 in their custom application.  Converting samples from C# to VB just helps you really understand the code.

Friday, July 29, 2005 9:41:56 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
Technology | VSTO

So I started documenting my observations on GUIDs and how they are used in databases in a previous entry and it’s time to get back to this topic.  The original entry which discusses what GUIDs are and the model they typically follow can be found here: http://nerdnotes.net/blog/PermaLink,guid,0ab98866-4d22-4874-bd1d-10ab6fa483b6.aspx

 

This entry is going to focus on why it’s bad to use a GUID as the primary key on a SQL Table.  As I noted in my previous entry, one of the characteristics of GUIDs that make them unique is that they are generated randomly.  On the other hand a primary key in SQL Server uses a clustered index.  A clustered index describes an index where each of the entries in the index is kept in sorted order.   Specifically the idea is that since entries are inserted into the index in sorted order, it is possible to very quickly find an entry without needing to scan the entire index.

 

Key to this discussion is an understanding of the implications of a clustered index.  As with any index in a database, it takes extra time to add an entry into an index.  The advantage of an index is that they speed up lookup on a table.  Sometimes, indexes are unique (such as the one for the primary key) and in some cases they are clustered.  Each of these requirements can slow the amount of time needed to add the latest entry to that index.  The result is that when we work with a table and we start making decisions on what should and shouldn’t be indexed we are looking to balance the cost of doing an insertion or update vs. the speed of doing a lookup.

 

Of the indices you place on your table, the one with the most direct impact on the speed of your database's query is the clustered index which is by default on the primary key.  It is possible to create a  primary key that doesn’t use a clustered index, but this is a rare event and not even considered by most DBAs and developers.  What's important however is the unique characteristic of a clustered index.  Clustered indices are sorted, that is each entry is physically ordered based on it’s sort order.  This is where the difference between an identity/integer based primary key and a GUID based key becomes vital.

 

The way that GUIDs are generated randomly means that unlike an identity column where the next value always sorts after the preceding value, GUIDs are random.  Thus when you need to add this new entry to your clustered index, instead of SQL Server appending the latest entry to the bottom of the index which is a quick and relatively painless operation, it is constantly attempting to insert your new entry into the middle of the index.  Doesn’t sound like a big deal, well it is because it triggers block splits and a lot of additional information.  Not only that but a 4 or 8 byte integer value takes up a lot less space then does a 64 byte GUID.

 

Yes, space, not something we normally worry about anymore, and from the standpoint of your database as a whole it’s not an issue.  It's an issue because of the fact that you are putting this value in an index.  The whole point of an index is that multiple entries can be quickly accessed because each entry in the index is small, but with GUID values aren't small.

 

So this accounts for two easy to verify technical reasons why GUIDs are a poor choice for use as the primary key on your table.  Other reasons include the fact that since SQL Server and .NET v1.1 have slightly different physical representations for GUIDs you are constantly converting between the formats, and the fact that identity values are much more human readable in a test environment as opposed to GUIDs.

 

But aside from the arguments against GUIDs what are the arguments for GUIDs.  Well the first is that since behind the scenes SQL Server applies a row identifier in the form of a GUID you are actually saving space by re-using that row GUID for your primary key.  Of course the fallacy here is that just because SQL Server has assigned a value behind the scenes to the row doesn’t mean you are using that value in your index.  As a result your indexing is still more expensive.

 

More importantly is a common security issue in web based applications.  HTML/CGI developers and their future counterparts in ASP and ASP.NET need a way to identify the user between HTTP requests.  The challenge here is that each HTTP request needs to carry an identifier for the current record in the database, so there is a huge temptation to use the primary key for that record.  The problem is that if you are using identity columns and are passing those integer values, it is very easy for a hacker to manipulate that integer value by adding or subtracting 1 and getting someone else’s record.  This is VERY bad.

 

So one way to resolve this was to use GUID values as the primary key.  This way a hacker couldn’t just shift a digit and get another record, of course they could start guessing at the correct ID for another user or record, but the odds of guessing correctly are incredibly small.  Unfortunately this solution still has a flaw, which is that once a hacker has that identity value even though it’s a GUID security is still compromised.  Fortunately there is a simple and much better solution for .NET developers.  The Session ID allows you to associate each request with data maintained on the server.

 

Session values are temporary and as a result resolve the security issues seen with passing identity columns across the web.  The only remaining reason (that I know of off the top of my head) for using GUIDs has to do with resolving differences across multiple independent databases.  This is another large topic, and so I’m going to put off that discussion until a future post, but suffice it to say for the current discussion that this remaining reason has nothing to do with transaction databases.   Unlike a database which is being used to hold records from multiple different unique sources your transaction database doesn’t need to be concerned with integrating entries from another source… but as I said that’s a topic for another post.

Friday, July 29, 2005 9:15:46 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
SQL Server | Technology

Those that know me are aware that while I ride my road bike near my home on the coast I regularly get up to the Fawnskin/Big Bear area to ride my Mountain bike.  For those who aren’t familiar with the area Fawnskin lies on the North Shore of Big Bear Lake. Unlike my roadie which I ride for miles, my Mtn bike is a pure adventure tool.  So I don’t count the miles I ride on it toward my roadie goals whether I’m doing about an hour with my coworkers in the Flightline area or out for an adventure up in the San Bernardino National Forest.

 

Note with my coworkers the normal ride is about 5 miles and we do it in around 50-75 minutes.  It’s a loop with some steep but by definition relatively short climbs.  The ride is all single track and rather technical.  On the other hand up in Big Bear the rides tend to be long climbs (1-3 miles generally ascending) for literally a thousand or more feet and then the accompanying downhill.  A good example is my original ride up Polique Canyon Rd. to 2N71 across to 3N12 down to 3N14 and into Fawnskin.  The ride is about 10 miles and the first 5 miles climb from the lake elevation of 6900 feet to an elevation of ~8000 feet, followed by an equal down hill back to town.

 

This past 4th of July holiday weekend I decided to explore a new route.  I’ve been taking a more challenging route that takes me up Polique Canyon (2N09) and then across Holcomb Valley (3N16) to 3N12 which brings me back to 3N14 just south of the YMCA camp so I descend down into Fawnskin.  This has a massive 2.5 mile climb up 2N09 followed by a very tough climb up 3N12 (before it descends).  Overall the goal, since these roads aren’t very technical (rocks and difficult handling) is to ‘keep it on the middle ring’ and when I’m fresh I can do just that.  So this weekend it was time for a new challenge.

 

I had already gone to the east on 3N16 out toward Baldwin Lake, so I wanted to go west.  What I saw on my map indicated that I could continue a short distance past 3N12 and hit 3N08 and that it would take me around to 3N14 down below the Hanna Flat campground area.  So this was my new route.  The first 4 miles took me to the top of 2N09.  The next 4 miles took me down into Holcomb Valley, across to 3N08 and started me along 3N08.  These 4 miles were mostly rolling hills and some sand, nothing too difficult.  Then I got into 3N08.

 

3N08 is the first fire road that I would classify as 'difficult'.  Now over the course of 3N08 (heading West) you are loosing elevation, but it isn't a downhill ride.  In fact there is a lot of climbing and a lot of it up steep inclines.  Areas that definitely call for the granny (little) front chain ring and your lowest gears.  The down hills are very steep and ugly with lots of rocks.  This is a tough downhill and I know I’m not ready to head up this road yet.  

 

A word of warning however, the first time I road this I started to doubt my map.  After all fire roads are exactly well documented and the map I use sometimes has minor errors.  As you ride along you have a beautiful view of a river valley and on the other side are the mountains between you and Fawnskin/Big Bear.  Now of course along the way my thought was "uh-oh I'm on the wrong mountains! When do I go across (there aren’t many suspension bridges in the forest)?  Where am I really going?".   My concern was if the map was wrong I was headed to Green Valley Lake and was going to be calling home for a ride back.   Of course, by the time it became an issue I was past what I considered to be the point of no return so I continued forward.  The net is the next 4 miles down 3N08 go on forever and you are literally out in the middle of nowhere, but you can’t miss 3N14, because this part of 3N08 dead-ends into this much larger fire road.  At the same time 3N08 is very technical to the point that there is one downhill section that I walk because of the jutting rocks that could cause major bodily injury.

 

Once to the 'bottom' of 3N08 you hit 3N14 right in the river valley.  This gives you a long 2 mile climb up 3N14 (a more well maintained/traveled fire road) past the Hanna Flat campgrounds and the YMCA Camp Whittle and down into Fawnskin.  The views off to the north-west while climbing 3N14 are great and I'm pretty certain that was Lake Arrowhead I could see off in the distance.  All told you are looking at about 17.5 miles depending on where you start.  It was a very challenging ride and one I repeated a second time so I could be more comfortable on it in the future, of course the future is tomorrow at this point... so I'll be out on the roads again on Saturday morning.

Friday, July 29, 2005 9:13:13 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
Cycling
# Wednesday, July 20, 2005

After logging in, be sure to visit all the options under Configuration in the Admin Menu Bar above. There are 26 themes to choose from, and you can also create your own.

 

Wednesday, July 20, 2005 12:00:00 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
dasBlog
# Wednesday, June 15, 2005
One of those things that happens when you ride a consistent route is it's nice to get a feel for your best time.  Of course the trick is remembering your previous best time. Today Tim tracked our time out on the standard Flightline route that he, Adam and I have been riding since they started the environmental destruction to put in a couple new roads through our local mtn. bike area.  Today's time was 50 minutes 42 seconds.  So next time we need a reference we can get one.... we can do better... it's only 5 miles.
Wednesday, June 15, 2005 9:11:35 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
Cycling
# Sunday, June 12, 2005

Time for a more technical entry here in my blog.  As you can probably tell, I have a pretty strong belief that my blog, should not be a flat one-dimensional product.  After all it's important that we maintain interests outside of our job, so as I sit here watching a bit of OLN's Cyclism Sunday I figure I'll put in a quick technical entry before I go for my own Sunday ride down the coast.

In this case I want to talk about the use of GUID's.  First I'm going to talk about the characteristics of what a GUID is...  Historically of course Microsoft introduced the GUID structure around the same time that UUID's were introduced.  SO let's start with what's a UUID.  Good definitions are available here: http://www.dsps.net/uuid.html and here: http://www.opengroup.org/onlinepubs/9629399/apdxa.htm.  As you can see the UUID is defined as a Universally Unique Identifier and it is a 128 bit or 16byte value.  A value of this size is represented bya seres of hexadecimal (base 16) pairs.  This format is the same one used by Microsoft in the implementation of the GUID.  One of my favorite lines which I remember hearing from a Microsoft employee was that Microsoft had named their implementation of UUID as GUID, because they didn't consider a value of this size to be unique across the universe but hopefully something which would be valid at the global level. 

in part I'm going to allow Microsoft to give us a complete defintion available from MSDN here: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemguidclasstopic.asp  I happen to really like this definition from Microsoft because it clarifies something which people sometimes mistake: GUIDs are NOT guaranteed to be unique.  In fact let me quote the way that Microsoft phrases from this page: "Such an identifier has a very low probability of being duplicated."  This is an important statment because it recognizes the reality that GUIDs aren't guaranteed to be unique and shouldn't be treated as such in every instance.  The reason that a GUID can be treated as unique is that it uses a number range that is large enough to make random generation of the same values a low probability.  However, it is the use of random generation which also causes GUID's to repeat before the entire range of values can be used.

So what are the 'system' characteristics where a GUID makes sense?

Let me break the answer into a series of bulleted characteristics:

  • A decentralized model where several disconnected or different systems need to generate identifiers.
  • Identifiers will generally be unique. (meaning they might not always be unique, so uniqueness is not a true requirement)
  • it should be possible to easily change the identifier assigned to any given system with little or no impact to account for those instances where a duplicate GUID is created.  
  • the order that the identifiers are created and assigned should have no impact on behavior.

So what are some example systems where these characteristics fit well, well for starters the 'system' for which they were originally created, as identifiers for objects.  For developers using Microsoft technology GUIDs became famous during the progression of COM.  As an object is created a unique GUID is assigned to that object and when deployed those objects which may have been created by any software vendor should be unique.  Every now and then a collision does occur and sometimes those collisions even make it out into the world at large, but when that happens the next version of software can change out the GUIDs associated with that software with little or no system impact.  (Yes I've seen it in comercial products and I would prefer to not name the vendor involved, but in short they had an object which collided with a GUID used by a driver on certain PC's) Similarlly the value of the GUID that is assigned to an object does not change the performance of that object either during installation or at runtime.

Of course developers fell in love with this model and became focused on the word 'unique' in the title.  As a result, whenever a developer thinks they need a 'unique' value they immediately think of GUIDs.  However there are situations where a GUID is not a good solution, and in fact SQL Server presents a common one.  So in my next post I'm going to discuss why GUIDs should never be used as the unique identifier for rows in a transaction database.... I'll also discuss how a transaction database is different from a data warehouse and as a result why a GUID might work just fine for that solution.

Sunday, June 12, 2005 4:00:02 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
Technology | SQL Server

I've blogged in the past on making apheresis donations to the San Diego Blod Bank.  In that post I discussed how the blood bank tries to ensure you always make a two unit donation of some form.  Since that post I had a couple updates.  The first is that I'm regularly finding myself as a double platlet donor.  This means I can potentially donate every two weeks, but more importantly it means that about every other donation (3 donations out of 4) I actually am a triple unit donor.  That's pretty cool, but I've also learned some more details of the advantages of this method of donation.

The first things I learned was that while I knew that platlets are used to help cancer patients and red cells are used for  anemia, blood during surgery etc, but I wondered what plasma was used for?  Turns out one of the primary uses of Plasma is for the treatment of severe burns.  Burn victims need additional blood volume because their bodies are essentially taking that liquid portion of their blood and using it to help rebuild damaged skin etc.  Since plasma makes up the majority of your actual blood volume having transfusions of it on hand can allow your boddy to heal faster.  I also learned that Plasma was one of the longest lasting blood products in terms of how long the blood bank can store it.

More importantly however is the second thing I learned.  While I'm type 0 which makes my red blood cells very valuable, Plasma works in the opposite fashion.  Thus people who are type AB whose blood type is fairly rare can make universal Plasma donations.  These plasma donations are then available to people of any blood type.  So remember regardless of your blood type, the local blood bank (whether the American Red Cross or San Diego Blood Bank here in San Diego) is pretty much in constant need of a donation of your blood.  After all ~95% of people will habe a blood donation at some point in their life, even though only ~5% of the population actually donates.

Sunday, June 12, 2005 2:22:31 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
About the Nerd
# Friday, June 10, 2005

I mentioned it's been awhile since I did a bike update.  Three log hits for this entry. 

First is my current mileage for the year, my odometer reads 4926, or just over 1075 for the year thus far.  Thus I'm getting the month of June to get ahead for the coming 6 months and to put me on track to possibly even reach all the way to 6,000 on my odometer by the end of the year.  A good portion of this was done over my vacation when I took my bike and every other morning of my vacation I got in a 52 mile ride from Laughlin, NV to Needles, CA on River Rd. which lies West of the Colorado River, then returning up Arizona Rt. 95 through Bullhead.  The 'fun' part of this ride, is when I leave at 7AM the light winds are from the South and in my face, but as the sun starts to heat the environment the winds shift and they come from the North so that by the time I am headed through Bullhead I am facing a steady 10-20mph wind from the North.  Its a good training ride, although portions of the road could really use some maintenance.

Second regards the Tour de Cure in East San Diego County.  The ride was the 21st of May.  The Tour de Cure is the American Diabetes association annual fund raising ride.  The ride is held around the country in different locations and the Spring, and this year only ride in San Diego (last year I helped organize a 100+ mile ride over Mt. Palomar, but we not holding that ride this year).  At any rate the 62.5 miles includes over 6,000 feet in elevation gain and generally draws between about 250-350 people.  We departed at 7:00AM and headed up into the mountains of East San Diego county.  Unfortunately it was a hot day with the temperature for the day hitting well over 90 degrees farenheit.  I was 5 minutes late starting so I got to ride without the benefit of a group for almost the entire day.

After the first 10 miles the route starts heading up and the next 22 miles (on the metric century course) are predominently uphill as your elevation goes from ~1000 to ~5000 ft. above sea level.  It took me from 7AM till 10AM to reach this halfway point and by the time I got there my left foot was numb... never a good sign so I spent a little time at the halfway point in the shade and watched one or two people drop out of the ride.  Then I started the return, fortunately it is mostly downhill (along a different route) but unfortunately it does have some good uphill sections.  In the end it took me till Noon to finish the ride and I did the 62.5 miles with an average speed of 13.9mph.  All told it was a great ride and we raised quite a bit of money for the ADA.

Finally, I'm going for a ride tomorrow with the NCCC again.  It's been like 5 weeks since I last rode with them and we're doing what is now known as Route 58.  Jeff Gross who lays out the Saturday rides has started posting them on his wife's website, so the route for tomorrow is here: http://www.kimberlygross.com/PageManager/Default.aspx?PageID=932815&NF=1.  The nice thing with Jeff's new system is he includes a map so you can get a feel of where the ride should go if you are unfamilar with it.  In this case he recently added a new portion to the end of the ride so instead of it being a 34 mile ride it's a 38 mile ride.  I put it into my topo software and came up with the following elevation profile which I'm including to test embedded graphics in my cross posting.  The total elevation gain for the ride is about 2500ft, and it should be a good ride.

Of course I've skipped over quite a bit in this quick update.  Let's see I've been managing to keep up a minimum of once a week to work on my bike and several mtn bike rides both with co-workers in Carlsbad and on my own up in the Mountains of Big Bear, including the use of the Sky Chair at Snow Summit to explore some of the Southern side of Big Bear Lake.  I'll try to get organized enough to post one or two pictures from up in the mountains this weekend.

Friday, June 10, 2005 11:29:54 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -

# Thursday, June 09, 2005

Many of you may be familiar with the Yellow wristbands supported by Lance Armstrong's live strong foundation to support the fight against cancer.  Some of you who know me may have noticed that I've transferred my yellow band to a red band... not that I'm particularly consistent on wearing any wristband, and nothing against cancer research. I have several family members (both living and deceased) touched by it. 

The red wrist bands support Diabetes and are available from the American Diabetes Association.  As most people know my wife Tracie and I are very involved in the fight against Diabetes (since I have Type 2 Diabetes).  You can order the wrist bands online from the ADA at: http://store.diabetes.org/products/product_category.jsp?FOLDER%3C%3Efolder_id=2534374302024093&bmUID=1123024922007   (Alternatively you can contact your local ADA office who can help you obtain a wrist band.)

They contain the ADA's message for Care-Cure-Committment since unlike many organizations the ADA isn't just out to find a cure, but also to ensure that the 18million or so Americans who have Diabetes (many similar to high blood pressure don't know it) learn to care for themselves.  The ADA plays an important role in Diabetes education especially in those communities such as the Hispanic and Native American populations where incidence of this genetic disease is much higher.

Thursday, June 09, 2005 1:07:49 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
Diabetes
Look at Byetta and my historical numbers...
Thursday, June 09, 2005 12:56:05 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
Diabetes
Archive
<July 2005>
SunMonTueWedThuFriSat
262728293012
3456789
10111213141516
17181920212223
24252627282930
31123456
About the author/Disclaimer

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2010
Bill Sheldon
Sign In
All Content © 2010, Bill Sheldon