Public musings, often on software development RSS 2.0
# Thursday, April 17, 2008

This is one of those 'hey look at me' posts that always make me feel like... well if you recognize the title of the movie that the quote which is the title of this movie comes from - that pretty much sums it up.

Anyway a few 'ads'. 

First off, I have a new article available over at SQL Magazine.  It's a very introductory article to LINQ for SQL so if you are looking for a good starting point for just getting started with LINQ, here's a short article that might be of assistance: http://www.sqlmag.com/Article/ArticleID/98205/sql_server_98205.html

The second item fits the post a bit better.  Back in the first Quarter I signed on to do another book - yes my wife is ready to kill me - which since she is pregnant get's the pregnancy multiplier (we're currently around 7 or 8 so the danger level is getting pretty high).  At any rate if you are interested it's still way out in the future - like October 2008 - if "we" (me) make "our" (my) final due date - here is the page: http://www.amazon.com/o/ASIN/0470377313/105-1544171-6096430

As you can note on that page this next book is an Office Business Applications book.  Of note, it will have both C# and VB samples (ok VB sample - but more on that later) and covers using WPF with Outlook Form Regions and Excel not to mention server side document generation.  That's the good news - the bad news - well I'm late on my chapters - of course that's pretty typical for me - the question is can I catch up in the next few weeks - especially given the increasing pregnancy multiplier...

Finally, I thought I should mention that my last book is finally getting read to be available.  At 1600 pages it pretty much is a phone book, and it should ship for the first week of May which apparently is fast approaching: http://www.amazon.com/Professional-Visual-Basic-2008-Evjen/dp/0470191368/ref=sr_1_1

 

Thursday, April 17, 2008 5:27:25 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
.NET | LINQ | SQL Server | Technology | Visual Basic | VSTO
# Sunday, February 24, 2008

The Heroes Happen Here (http://HeroesHappenHere.com) launch event for Visual Studio 2008 happens this week.  As with the Visual Studio 2005 launch event this is a shared launch that includes Windows Server 2008 and SQL Server 2008.   Visual Studio 2008 actually released back in November of 2007 and since then we've all had a chance to start building applications.  The new features in Visual Studio 2008 are very powerful across the board, but those associated with Visual Basic and LINQ happen to be particularly powerful.  I'm sure the the launch will do a great job of showing off the new features, after all it seems like just last week at the Office Developers Conference that we were showing off several of the features related to created solution based on the Office 2007 products. (There was a truly awesome demonstration of the built in debugging capability of Visual Studio 2008 with custom MOSS workflows during the ODC that I happen to know a little about.)

I don't want to steal any of Microsoft's thunder related to Wednesday's launch event in LA, but instead to look beyond the current release.  Not way into the future (ie. Hawaii - a future version (next?) of Visual Studio) but something much closer - the coming updates to Visual Studio 2008.  That's right we haven't even gotten to the launch and of course the focus is already shifting to the next set of features.  This makes more sense when you consider that SQL Server 2008 - which is part of this week's launch event isn't actually releaseing until sometime in the 3rd quarter of 2008.  Thus I'm going to point out some things which it certainly appears will be releasing for Visual Studio 2008 at the same time as SQL Server 2008 releases.

Why do I say that - well the first example is a set of updates to ADO.NET to support the new features of SQL Server 2008 - http://blogs.msdn.com/adonet/archive/2008/02/21/coming-soon-to-linq-to-sql.aspx Yes as part of the release of SQL Server 2008 ADO.NET will be getting updates to support the new data types which are going to be available.  The nice thing about that post is that it helps clarify those new SQL Server features which are most likely to be used by application developers right out of the gate.

On the other hand a single blog post really doesn't firm up that those enhancements have to arrive with SQL Server 2008 -  they might not arrive until say October or November or even later, rather it is the fact that we also have word on several other new Client features which are going to be releasing "this summer."  Scott Guthrie outlines several new enhancements to Visual Studio 2008's client model http://weblogs.asp.net/scottgu/archive/2008/02/19/net-3-5-client-product-roadmap.aspx 

I personally am very interested in some of the new deployment enhancements, but lets get real there is no way Microsoft is making more then one release of enhancements this summer so the combination of the two blog posts gives us a pretty good idea of when to expect the next set of developer tool updates (http://www.sqlmag.com/Article/ArticleID/98161/sql_server_98161.html).

It's nice to see that new tools and enhancements to Visual Studio 2008 are on the way even if as some at Microsoft admit the new tools are coming at a breakneck pace: http://blogs.msdn.com/rbarker/archive/2008/02/04/test-post-from-windows-live-writer.aspx  (btw, I'm planning to download the Live Writer SDK as soon as I get caught up and have some free time...)

Sunday, February 24, 2008 3:22:08 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] -
.NET | SQL Server | Technology
# Thursday, July 27, 2006

Almost a year ago I posted my second note regarding GUIDs and Databases.  That post is available here: http://blogs.interknowlogy.com/billsheldon/archive/2005/07/29/369.aspx but in the time since that post SQL Server has seen a new release and now we have SQL Server 2005 available and the SQL Team has added another wrinkle to the discussion.

In talking to one of my coworkers (Tim McCarthy) the other day he mentioned that he had run into a new SQL Function which would increment GUIDs, which is the reason for this post.  Now to briefly recap my previous posts one of the biggest disadvantages to using GUIDs is that people tend to figure that since they should remain unique they will make a good primary key.  Unfortunately the nature of a Primary Key is that it by default relies on a clustered index.  For those unfamiliar with indexes, a clustered index is an index in which entries are written to the disk in their sorted order so that lookups against it are very fast.  As such the fastest way to add entries to a clustered index is with the use of incremental values so that the system appends new entries to the end of the index.  This reduces the number of page splits which are far more expensive and occur when you insert items into the middle of the index.

Unfortunately by default GUIDs are randomly generated which results in new entries constantly being inserted into the middle of the clustered index.  I knew one company which got a new implementation of their database from an offshore company where all the indexes had been changed to GUIDs and as a result their new database couldn't even support 10% of their customer base.  Their short term solution, outside of rearchitecting the work, was to create a middle tier function which would sequentially generate those GUIDs to cut down on page splits.  (Long term they reduced the number of tables using GUIDs and generally cleaned up a mess of other problems - the GUIDs alone weren't the only issue.) However, if they had had SQL 2005 available then instead of needing several days to write and a custom solution they could simply have implemented the new "NewSequentialID()" function.  NewSequentialID does just what the name implies it creates a new GUID which is one greater then the last GUID created on that server.  Note that it is server sequential and not table sequential, for each table the guarantee is only that each new entry will be larger then the last.

More information on this new function is available at: http://msdn2.microsoft.com/en-us/library/ms189786.aspx

As noted in that function there are a couple of issues - for example using this function on you user table makes it possible to guess the internal identity of the next user, so if you are using the randomness of GUIDs as part of a security scheme that prevents easy guessing of other user id's this function isn't for you.

Additionally it limits one of the other claims for GUIDs which is that you can use them to make data portable between databases.  Since randomly generated GUIDs are (almost always) unique even though generated at the same time on different machines it is possible to transfer all of the data associated with a user's GUID into another database with minimal risk of collision. 

Of course these were the only two functions that made using a GUID a useful idea.  So don't see this as a function which will suddenly allow you to use GUIDs without a performance penalty.  Fact is int and bigint columns with an identity property will still be faster for your overall database.  Instead this function is there for all of you who already have that GUID primary key and now need a quick and easy solution to reduce it's impact on your overall database performance.

Thursday, July 27, 2006 10:35:42 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
.NET | SQL Server
# Tuesday, April 18, 2006

By default ADO.NET, including ADO.NET 2.0, uses Named Pipes to connect to SQL Server. (Including SQL Server 2005) However, in some environments where there are firewalls and other network appliances between say your ASP.NET web client or other Windows Form client and the database you might find that Named Pipes don't work.

If you need to use TCP/IP to connect to SQL Server instead of Named Pipes then the solution is to use the connection string parameter "Network Library".  You simply add "Network Library=dbmssocn" to your connection string.  This is defined in the MS KB article at: http://support.microsoft.com/default.aspx?scid=kb;en-us;Q315159 

Tuesday, April 18, 2006 2:38:47 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
.NET | SQL Server | Technology
# Monday, January 23, 2006

Just a couple quick hits from this weekend's code camp.

Firstly, I'll have the materials from all three of my sessions posted on either Wed. or Thursday of this week.  I've got another presentation Tuesday night at the San Diego .NET User Group which I'm currently focusing on, but you are welcome to attend that meeting it's down in San Diego here's their website for more information: http://www.sandiegodotnet.com

Secondly after my session on Visual Studio 2005 and SQL Server 2005 I was asked a pair of questions.  The first question was "We are seeing tools put the T-SQL text of database statements into our source code.  Is this as good as using stored procedures?".  The answer to this question was easy.  Never use text based queries in your code as that leaves the possibility that you will be susceptible to SQL Injection.  There are several tools that will try to help ensure that this isn't possible by blocking special characters etc.  but the fact is the easiest solution which is also one of the most performance enhancing is to ALWAYS USE STORED PROCEDURES.

The second question they asked however caught me off-guard.  It was "We've seen that Microsoft tends to separate the Create, Update and Delete statements into separate stored procedures.  However, other tools have combined these with a simple flag to indicate which action should be taken.  Is there a specific reccomendation?".  Here my answer was that I couldn't think of anything specific and that although I always separated the items I couldn't think of a reason that mattered beyond style.  In this case the person asking was saying they liked to have a smaller number of stored procedures.... which has it's own disadvantage with multiple developers.   But anyway after the day was over and I was driving home the correct answer came to mind, which is that you should NOT COMBINE the CREATE, UPDATE, and DELETE statements in a SINGLE STORED PROCEDURE.

The reason is quite simple, security (again).  In this case it has to do with the fact that you of course are not using your 'SA' account for your website.  You should be using an account with limited permission (for a whole host of reasons).  So what's the big deal, well in most cases we design systems that don't allow a user to Delete an entry from one of our tables.  So if I have a stored procedure that creates my entries and I of course give anonymous or external customers access to that stored procedure then that's in the scope of what I expect a hacker might at somepoint compromise and I accept that risk.  However, if that stored procedure also contains the entry delete logic, then I have no way of limiting a user's permissions at the database level to prevent a hacker who might violate a small portion of my application's logic from doing significant damage to my application.  By separating out the Create, Update and Delete logic, I can create a security model which will prevent an anonymous hacker who get's unfettered access to my create logic from also being able to delete valid data from my database.

That's the short of what I consider to be a very good reason to not combine your Create, Update and Delete logic in a single stored procedure.

Monday, January 23, 2006 10:32:19 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] -
.NET | SQL Server | Technology
# Friday, July 29, 2005

So I started documenting my observations on GUIDs and how they are used in databases in a previous entry and it’s time to get back to this topic.  The original entry which discusses what GUIDs are and the model they typically follow can be found here: http://nerdnotes.net/blog/PermaLink,guid,0ab98866-4d22-4874-bd1d-10ab6fa483b6.aspx

 

This entry is going to focus on why it’s bad to use a GUID as the primary key on a SQL Table.  As I noted in my previous entry, one of the characteristics of GUIDs that make them unique is that they are generated randomly.  On the other hand a primary key in SQL Server uses a clustered index.  A clustered index describes an index where each of the entries in the index is kept in sorted order.   Specifically the idea is that since entries are inserted into the index in sorted order, it is possible to very quickly find an entry without needing to scan the entire index.

 

Key to this discussion is an understanding of the implications of a clustered index.  As with any index in a database, it takes extra time to add an entry into an index.  The advantage of an index is that they speed up lookup on a table.  Sometimes, indexes are unique (such as the one for the primary key) and in some cases they are clustered.  Each of these requirements can slow the amount of time needed to add the latest entry to that index.  The result is that when we work with a table and we start making decisions on what should and shouldn’t be indexed we are looking to balance the cost of doing an insertion or update vs. the speed of doing a lookup.

 

Of the indices you place on your table, the one with the most direct impact on the speed of your database's query is the clustered index which is by default on the primary key.  It is possible to create a  primary key that doesn’t use a clustered index, but this is a rare event and not even considered by most DBAs and developers.  What's important however is the unique characteristic of a clustered index.  Clustered indices are sorted, that is each entry is physically ordered based on it’s sort order.  This is where the difference between an identity/integer based primary key and a GUID based key becomes vital.

 

The way that GUIDs are generated randomly means that unlike an identity column where the next value always sorts after the preceding value, GUIDs are random.  Thus when you need to add this new entry to your clustered index, instead of SQL Server appending the latest entry to the bottom of the index which is a quick and relatively painless operation, it is constantly attempting to insert your new entry into the middle of the index.  Doesn’t sound like a big deal, well it is because it triggers block splits and a lot of additional information.  Not only that but a 4 or 8 byte integer value takes up a lot less space then does a 64 byte GUID.

 

Yes, space, not something we normally worry about anymore, and from the standpoint of your database as a whole it’s not an issue.  It's an issue because of the fact that you are putting this value in an index.  The whole point of an index is that multiple entries can be quickly accessed because each entry in the index is small, but with GUID values aren't small.

 

So this accounts for two easy to verify technical reasons why GUIDs are a poor choice for use as the primary key on your table.  Other reasons include the fact that since SQL Server and .NET v1.1 have slightly different physical representations for GUIDs you are constantly converting between the formats, and the fact that identity values are much more human readable in a test environment as opposed to GUIDs.

 

But aside from the arguments against GUIDs what are the arguments for GUIDs.  Well the first is that since behind the scenes SQL Server applies a row identifier in the form of a GUID you are actually saving space by re-using that row GUID for your primary key.  Of course the fallacy here is that just because SQL Server has assigned a value behind the scenes to the row doesn’t mean you are using that value in your index.  As a result your indexing is still more expensive.

 

More importantly is a common security issue in web based applications.  HTML/CGI developers and their future counterparts in ASP and ASP.NET need a way to identify the user between HTTP requests.  The challenge here is that each HTTP request needs to carry an identifier for the current record in the database, so there is a huge temptation to use the primary key for that record.  The problem is that if you are using identity columns and are passing those integer values, it is very easy for a hacker to manipulate that integer value by adding or subtracting 1 and getting someone else’s record.  This is VERY bad.

 

So one way to resolve this was to use GUID values as the primary key.  This way a hacker couldn’t just shift a digit and get another record, of course they could start guessing at the correct ID for another user or record, but the odds of guessing correctly are incredibly small.  Unfortunately this solution still has a flaw, which is that once a hacker has that identity value even though it’s a GUID security is still compromised.  Fortunately there is a simple and much better solution for .NET developers.  The Session ID allows you to associate each request with data maintained on the server.

 

Session values are temporary and as a result resolve the security issues seen with passing identity columns across the web.  The only remaining reason (that I know of off the top of my head) for using GUIDs has to do with resolving differences across multiple independent databases.  This is another large topic, and so I’m going to put off that discussion until a future post, but suffice it to say for the current discussion that this remaining reason has nothing to do with transaction databases.   Unlike a database which is being used to hold records from multiple different unique sources your transaction database doesn’t need to be concerned with integrating entries from another source… but as I said that’s a topic for another post.

Friday, July 29, 2005 9:15:46 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
SQL Server | Technology
# Sunday, June 12, 2005

Time for a more technical entry here in my blog.  As you can probably tell, I have a pretty strong belief that my blog, should not be a flat one-dimensional product.  After all it's important that we maintain interests outside of our job, so as I sit here watching a bit of OLN's Cyclism Sunday I figure I'll put in a quick technical entry before I go for my own Sunday ride down the coast.

In this case I want to talk about the use of GUID's.  First I'm going to talk about the characteristics of what a GUID is...  Historically of course Microsoft introduced the GUID structure around the same time that UUID's were introduced.  SO let's start with what's a UUID.  Good definitions are available here: http://www.dsps.net/uuid.html and here: http://www.opengroup.org/onlinepubs/9629399/apdxa.htm.  As you can see the UUID is defined as a Universally Unique Identifier and it is a 128 bit or 16byte value.  A value of this size is represented bya seres of hexadecimal (base 16) pairs.  This format is the same one used by Microsoft in the implementation of the GUID.  One of my favorite lines which I remember hearing from a Microsoft employee was that Microsoft had named their implementation of UUID as GUID, because they didn't consider a value of this size to be unique across the universe but hopefully something which would be valid at the global level. 

in part I'm going to allow Microsoft to give us a complete defintion available from MSDN here: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemguidclasstopic.asp  I happen to really like this definition from Microsoft because it clarifies something which people sometimes mistake: GUIDs are NOT guaranteed to be unique.  In fact let me quote the way that Microsoft phrases from this page: "Such an identifier has a very low probability of being duplicated."  This is an important statment because it recognizes the reality that GUIDs aren't guaranteed to be unique and shouldn't be treated as such in every instance.  The reason that a GUID can be treated as unique is that it uses a number range that is large enough to make random generation of the same values a low probability.  However, it is the use of random generation which also causes GUID's to repeat before the entire range of values can be used.

So what are the 'system' characteristics where a GUID makes sense?

Let me break the answer into a series of bulleted characteristics:

  • A decentralized model where several disconnected or different systems need to generate identifiers.
  • Identifiers will generally be unique. (meaning they might not always be unique, so uniqueness is not a true requirement)
  • it should be possible to easily change the identifier assigned to any given system with little or no impact to account for those instances where a duplicate GUID is created.  
  • the order that the identifiers are created and assigned should have no impact on behavior.

So what are some example systems where these characteristics fit well, well for starters the 'system' for which they were originally created, as identifiers for objects.  For developers using Microsoft technology GUIDs became famous during the progression of COM.  As an object is created a unique GUID is assigned to that object and when deployed those objects which may have been created by any software vendor should be unique.  Every now and then a collision does occur and sometimes those collisions even make it out into the world at large, but when that happens the next version of software can change out the GUIDs associated with that software with little or no system impact.  (Yes I've seen it in comercial products and I would prefer to not name the vendor involved, but in short they had an object which collided with a GUID used by a driver on certain PC's) Similarlly the value of the GUID that is assigned to an object does not change the performance of that object either during installation or at runtime.

Of course developers fell in love with this model and became focused on the word 'unique' in the title.  As a result, whenever a developer thinks they need a 'unique' value they immediately think of GUIDs.  However there are situations where a GUID is not a good solution, and in fact SQL Server presents a common one.  So in my next post I'm going to discuss why GUIDs should never be used as the unique identifier for rows in a transaction database.... I'll also discuss how a transaction database is different from a data warehouse and as a result why a GUID might work just fine for that solution.

Sunday, June 12, 2005 4:00:02 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0] -
Technology | SQL Server
Archive
<March 2010>
SunMonTueWedThuFriSat
28123456
78910111213
14151617181920
21222324252627
28293031123
45678910
About the author/Disclaimer

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2010
Bill Sheldon
Sign In
All Content © 2010, Bill Sheldon