From time to time I find some SQL that really makes me smile and sit back and stare at it with awe. Today is one of those days, let me show you…
Many of the maintenance windows in Dynamics GP ask for ranges of numbers to be entered. They consist of a start number and then an end number, start and end points for which you would like to execute a operation over.
It is a common thing in GP is to use a GP macro mail merge to automate the user interface where a repetitive operations are required. If a macro mail merge is being used for one of these windows, then it requires the start and end numbers over which we would like to perform the operation.
Let move to something more tangible and the real world example I’m working on. However don’t get tied down by my example, there are many other range windows in GP that this principle would work on too.
I want to surgically remove 901,872 historical sales documents from GP, using a macro. I am using a macro so that we are safe, the GP biz logic gets applied by the UI. I could use SQL but always best to try macro first so to benefit from the safety the UI gives us. This could take some time, supplying the document numbers and one by one deleting them.
Master numbers are a number id that ties together a string of sales documents that are related, it is held against all sales documents. For efficiency we notice that we can remove documents by master number, so all related documents, quote->order->invoice->return will be deleted at the same time. This would mean looping through the remove operation of window much less, as one master number removes many documents.
This brings us to 446,741 unique master numbers that need removing from 901,872 sales documents (yes I’ve checked and only three master numbers in my range link to related documents outside my range, so I’ve taken those out of this list). I could macro the window to set the "from" and "to" to be the same master number, thus removing one set of related documents at a time, then moving to the next number and then the next. It would work, but testing showed I’d be here all week doing it. Besides I can see somehow that I can be more clever than that!
GP is more efficient on these types of operation if you can supply ranges of numbers to work on, so really I want to look at the list of 446,741 numbers and find all the contiguous ranges within the sequence. Then identifying the start and end numbers to put in the UI start and end fields. Although it is worth remembering sometimes there will just be on number in a particular range, when there are gaps on both sides of the master number.
Hitting Google I found the following post, where they use recursive CTEs to find the ranges. Restricting my sequence to only 10,000 records, it took about three mins to run. Letting it run on the full sequence, I ran out of patience (and probably server resources) to let it get to end. I did try to tune it with some physical tables indexed to try and improve performance, but ultimately recursive CTE’s are terrible for large data sets like this.
I went looking again and found another solution in this post.
This is genius insight in its script design. I do use ROW_NUMBER() a lot myself for all sorts of weird problem solving, but had never seen it applied like this before.
I adapted the snippet of the SQL to my problem and came up with this:
DECLARE @MNTable TABLE (MSTRNUMB INT PRIMARY KEY (MSTRNUMB)); INSERT INTO @MNTable SELECT DISTINCT MSTRNUMB AS MSTRNUMB FROM [dbo].[SOP30200] --WHERE Blah blah... ORDER BY MSTRNUMB; WITH CTE AS ( SELECT ROW_NUMBER() OVER ( ORDER BY MSTRNUMB ) - MSTRNUMB AS Groups ,MSTRNUMB FROM @MNTable ) SELECT MIN(MSTRNUMB) AS [From] ,MAX(MSTRNUMB) AS [To] FROM CTE GROUP BY Groups ORDER BY MIN(MSTRNUMB)
So from a query that never finished processing we go to this:
Really, it is now down to a few seconds, to get all the results!
We have now only 3,760 row ranges, from 446,741 master numbers, so I now only need to get the macro to iterate the remove window 3,760 times, that is a much more acceptable!
So the mail merge macro now takes each row of this result set and uses the from and to values from the query to populate the boxes for the master number and then sets the report options and finally then processes. Job done.
Back to the SQL for a moment…
The SQL is fascinating for this though, looking at the CTE data we see how it works.
The row number increments but the subtracting of the ROW_NUMBER from the master number sequence causes a banding or grouping number to be created. So long as the sequence number is only one greater than the previous number, then the banding number is maintained the same in the group. This is because the relative difference between the ROW_NUMBER and the master number has not changed. If the master number jumps up because of a gap it will have gone up by more than the row number, so the grouping will number will be changed to a new grouping number.
This grouping number is then subsequently used to gather the max and min values for each section. I love the efficiency and simplicity of this solution. It also shows how many ways there can be to solve a problem.
I can see further improvements are possible. If we in-fill the sequence of master numbers before processing them with dummy master numbers, where documents don’t exist in the database, then this will further reduce the number of times we have to loop. In that case it would allow cases where the max of one range to the min of the next contain no documents to be combined into one range.
I’ve ran out of time and leave it to the reader to investigate that optimisation.