The data returned is completely different - not through chance, but by presentation of a representative sample. How about taking a sample at regular intervals throughout the table? The above is clearly false, so we need a better way of sampling. Taking this result set, a person uninformed about the nature of the table might conclude all their customers live in Bothell. To guarantee the order of rows returned from a heap, you must use the ORDER BY clause." Note this is NOT guaranteedīehavior for heap data in particular - see this quote from BOL: "Usually data is initially stored in the order in which is the rows are inserted into the table, but the Database Engine can move data around in the heap to store the rows efficiently so the data order cannot be predicted. This is because the results were specified to be returned in no particular order, but were in fact returned in order of the AddressID column. Going from this sample alone, we can see that all the people returned live in Bothell, and share zip code 98011. Selecting the top 10 rows of data yields this result (just to give you an idea of the shape of the data).Īs an aside, this is a general piece of code I created to generate random-ishĭata whenever I needed it - feel free to take it and augment/pillage it to your I will also be using the AdventureWorks 2012 (non-DW) database, available here: It should take only a couple of minutes to run and is tested on SQL Server 2012 Developer Edition. You can use the T-SQL code below to set this up. Setupįor this tip, I will be using a data set containing an identity INT column (to establish the degree of randomness when selecting rows) and other columns filled with pseudo-random data of different data types, to (vaguely) simulate real data in a table. Tip, you should have an appreciation of the benefits of sampling over using methods like TOP N and know how to apply at least one method to achieve this in SQL Server. We'll talk a little about statistical sampling in general (the differences between random, systematic and stratified) with examples, and we'll take a look at how SQL statistics are sampled as a case in point, and the options we can use to override this sampling. It will also show you an alternative method - a mathematical method using NEWID() coupled with CHECKSUM and a bitwise operator, noted by Microsoft in the TABLESAMPLE TechNet article. This tip will show you how to use TABLESAMPLE in T-SQL to retrieve pseudo-random data samples, and talk through the internals of TABLESAMPLE and where it isn't appropriate.
0 Comments
Leave a Reply. |