Public musings, often on software development RSS 2.0
# Sunday, June 12, 2005

Time for a more technical entry here in my blog.  As you can probably tell, I have a pretty strong belief that my blog, should not be a flat one-dimensional product.  After all it's important that we maintain interests outside of our job, so as I sit here watching a bit of OLN's Cyclism Sunday I figure I'll put in a quick technical entry before I go for my own Sunday ride down the coast.

In this case I want to talk about the use of GUID's.  First I'm going to talk about the characteristics of what a GUID is...  Historically of course Microsoft introduced the GUID structure around the same time that UUID's were introduced.  SO let's start with what's a UUID.  Good definitions are available here: http://www.dsps.net/uuid.html and here: http://www.opengroup.org/onlinepubs/9629399/apdxa.htm.  As you can see the UUID is defined as a Universally Unique Identifier and it is a 128 bit or 16byte value.  A value of this size is represented bya seres of hexadecimal (base 16) pairs.  This format is the same one used by Microsoft in the implementation of the GUID.  One of my favorite lines which I remember hearing from a Microsoft employee was that Microsoft had named their implementation of UUID as GUID, because they didn't consider a value of this size to be unique across the universe but hopefully something which would be valid at the global level. 

in part I'm going to allow Microsoft to give us a complete defintion available from MSDN here: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemguidclasstopic.asp  I happen to really like this definition from Microsoft because it clarifies something which people sometimes mistake: GUIDs are NOT guaranteed to be unique.  In fact let me quote the way that Microsoft phrases from this page: "Such an identifier has a very low probability of being duplicated."  This is an important statment because it recognizes the reality that GUIDs aren't guaranteed to be unique and shouldn't be treated as such in every instance.  The reason that a GUID can be treated as unique is that it uses a number range that is large enough to make random generation of the same values a low probability.  However, it is the use of random generation which also causes GUID's to repeat before the entire range of values can be used.

So what are the 'system' characteristics where a GUID makes sense?

Let me break the answer into a series of bulleted characteristics:

  • A decentralized model where several disconnected or different systems need to generate identifiers.
  • Identifiers will generally be unique. (meaning they might not always be unique, so uniqueness is not a true requirement)
  • it should be possible to easily change the identifier assigned to any given system with little or no impact to account for those instances where a duplicate GUID is created.  
  • the order that the identifiers are created and assigned should have no impact on behavior.

So what are some example systems where these characteristics fit well, well for starters the 'system' for which they were originally created, as identifiers for objects.  For developers using Microsoft technology GUIDs became famous during the progression of COM.  As an object is created a unique GUID is assigned to that object and when deployed those objects which may have been created by any software vendor should be unique.  Every now and then a collision does occur and sometimes those collisions even make it out into the world at large, but when that happens the next version of software can change out the GUIDs associated with that software with little or no system impact.  (Yes I've seen it in comercial products and I would prefer to not name the vendor involved, but in short they had an object which collided with a GUID used by a driver on certain PC's) Similarlly the value of the GUID that is assigned to an object does not change the performance of that object either during installation or at runtime.

Of course developers fell in love with this model and became focused on the word 'unique' in the title.  As a result, whenever a developer thinks they need a 'unique' value they immediately think of GUIDs.  However there are situations where a GUID is not a good solution, and in fact SQL Server presents a common one.  So in my next post I'm going to discuss why GUIDs should never be used as the unique identifier for rows in a transaction database.... I'll also discuss how a transaction database is different from a data warehouse and as a result why a GUID might work just fine for that solution.

Sunday, June 12, 2005 4:00:02 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
Technology | SQL Server
Comments are closed.
Archive
<July 2010>
SunMonTueWedThuFriSat
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567
About the author/Disclaimer

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2010
Bill Sheldon
Sign In
All Content © 2010, Bill Sheldon