Tuesday, June 21, 2016

When to Use Option Recompile in Sql

What is option recompile?

When a stored procedure is compiled the generated query plan is optimized for the current state of the database. This is really good if the relative sizes of the objects used in the stored procedure don't change very much from the last time the query plan was generated. The efficient query plan can easily be fetched from the cache and won't have to be generated again which is generally good for performance. But, this could be terrible for performance if the database objects changed drastically from the previous run.

By default the query plan is fetched from the cache and is only recompiled when the underlying tables change or SQL server is restarted. By adding the option recompile statement at the end of a stored procedure you are telling SQL to always recompile the query plan instead of fetching it from the cache.

When would I use this in practice?

The problem - We had a scheduled stored procedure taking a really long time in production. Usually a slow running query could be easily reproduced in testing, which would lead to an eventual fix. In this case the query was only slow sometimes in production. And it was not possible to reproduce in staging with the same data.

Experimentation - Normally in this situation when the problem could not be reproduced on prod data I would think it's most likely load related, outdated statistics, or even a maintenance script running at the wrong time. All of these problems were ruled out as we made sure statistics were up to date, the SQL server load was low, and no maintenance scripts were running yet the problem still occurred.

I had a feeling it had to do with a bad query plan but couldn't figure out why until I was able to luckily reproduce the issue in staging. I accidentally ran the procedure once with incorrect parameters and then a second time after putting in the correct parameters was able to duplicate the issue. The generated query plan differed greatly from what I saw previously in the quick runs. I had stumbled across what is known as "parameter sniffing".

The solution - Adding option recompile to the stored procedure of course! Parameter sniffing was the issue because the stored procedure was executing dynamic SQL and embedding parameter date values into the executed statement. This caused the performance go haywire if the query was run once on a set of dates that returned few records, then immediately after running on a different set of parameters that returned hundreds of thousands of records.

The query plan for the amount of data within those parameter ranges was cached as if the amount of data in those plans was always small. It was NOT always small after the query was executed again with a different set of parameters. Meaning the next time this script ran it used the bad query plan with different dates and puked. Option recompile fixed the problem because a new plan was generated before each run.

Tuesday, June 7, 2016

What is the Difference Between Currying and Partial Application?

Currying is the act of breaking apart functions with multiple parameters into multiple one parameter functions.

Here is a simple function with multiple parameters that is not curried.
// multiple parameter function
let add x y = x + y 

Here is the same function but curried.
// curried function
let add x = 
  let addInner y = x + y 
  addInner

As you can see, the curried function is made up of another function with the first parameter baked in. Because this function is curried it can now be partially applied (you actually get this automatically with any multi parameter function in F#). Partial application is a powerful technique in functional programming if used correctly. For instance, here is a simple partial application example for the function above.
// partially applied functions
let add4 = add 4 
let add5 = add 5 

add4 1 // returns 5
add5 1 // returns 6

The add4 function partially applies the add function, which means it returns the addInner function with 'x' baked in. So the add4 function becomes:
addInner y = 4 + y

This example is trivial, how could this be used in a useful way in production code? Often in our production code we have to retrieve something from the database, like a customer, and then continue with the business operation

Business logic function:
let businessLogic (dbFun:int->Customer) customerId = 
  let customer = dbFun customerId
  // other logic

This business logic function takes as a parameter a function that retrieves a customer from the database. But the database function below needs a connection string as well (passing connection string info in makes managing connection strings easier).

Database function:
let getCustomerFromDb connectionString customerId = 
  // get customer from real database

This function has the function signature of string->int->Customer. We use partial application to turn it into an int->int so it can be used in the businessLogic function.
let customerId = 5

// partially applied get customer function
let entryPoint = businessLogic (getCustomerFromDb "RealConnection") customerId 

In a non functional language it is necessary to use dependency injection to achieve this same thing. Partial application achieves the same goal but is simpler and doesn't require extra setup code.

Tuesday, May 24, 2016

Your Builds Should Compile Without Warnings

Few developers would disagree with that statement. Yet it's rare when I come upon a project where I can build successfully without any compiler or build warnings. Why is this the case? Why are we not more rigorous as a profession?

The objections I usually hear are that warnings are nit-picky and the team was on a time crunch. This is not acceptable in my opinion. It's a classic case of racking up technical debt without an immediate plan to pay off that debt. A big no-no.

Let's review a few of the bugs I have come across in my time trying to clean up warnings.

1. Sql injection security holes - I have actually found a few of these in my time cleaning up code warnings. If you use dynamic sql to build up a sql query and don't parameterize your inputs (eek!) then you expose yourself to a sql injection attack. Code warnings can detect this.

2. Memory Leaks - If you instantiate a class that implements IDisposable and don't dispose it you are wasting memory and relying on the garbage collector to do the job. These bugs suck to find in production. Best to find them with the compiler.

3. Conflicting DLL versions - This build warning is gross and often difficult to track down. It means that somewhere there are two references to the same DLL but with a different version. Ignore this warning at your own peril, there is no telling the type of runtime errors that might result.

Considering warnings can detect and fix these types of problems at runtime it seems the excuse your team was on a time crunch is a very bad one. You are trading short term speed for long term pain.

How to achieve a build without warnings?
From the very beginning of a project the build should be set up to fail if there are any warnings. And the warning level should be set high. If there are nit-picky warnings (there are admittedly quite a few) that can be ignored then decide as a team to have the compiler suppress them. It should be a conscious decision. Start out strict and dial back the strictness after carefully reviewing each warning.

Tuesday, May 10, 2016

Improve a Little Each Day

This post is a little philosophical and not very original, but I think it's a powerful idea that is backed up by modern psychological research. There are few geniuses among us, so for the majority mastering a subject requires methodical and persistent practice to become an expert. This idea can be applied to a few areas in software development.

1. Apply it to improving yourself

Pick out three topics where you would like to improve as a developer and do one small thing a day to improve in those areas. Track your progress and mark an X for each time you accomplish a small goal. I would be a much better developer had I been doing this since the beginning of my career. Little did I realize learning begins when school ends, not the other way around.

Use this method to become more efficient with your tools. Just once each day do one of the following:

  a. Look up a shortcut for something you've done manually instead of using the mouse.
  b. Learn how to use the command line to accomplish a specific task.
  c. Instead of looking through text or code files use a regex find and replace.

Over time you will become a master of your environment.

2. Apply this method to the code you touch - "Boy Scout Rule"

Each time you are in an area of code leave it at least slightly better than when you arrived. If the whole team takes this approach to heart it your codebase will drastically improve.

3. Apply it to your team as a whole

Are provisioning new machines difficult? Is pushing to production difficult? Then change it. Each time you provision a machine automate one small thing. Each time you push to production add one automation step to make it easier the next time. This will add up over time.

At the heart, improving each day involves first identifying the major problems. Then break them down into bit-size chunks and fix them methodically. Over time you will see big results.

Tuesday, April 26, 2016

How to Develop an F# .NET Web App on a Mac using Atom + Ionide + Mono

I've been developing .NET apps for 8 years and for most of my career the thought of developing a .NET app outside of visual studio was crazy. Not all developers like heavy handed IDEs like visual studio, and to make matters worse as a .NET developer you are stuck on windows (I actually like Windows OS though). But, all those things are changing as Microsoft seems to be embracing open source (hello coreCLR!) and are winning back the developer community. It's now possible to write professional .NET apps on a Mac that can run on Mac OS, Linux, and Windows.

I'll walk you through a sample project I completed on my Mac at home. The app is a simple web service called "Sports Stats" (here is the full source code) and it's a web service that retrieves stats for a specific baseball player or golfer.

The Setup

Step 1 - Install Atom. It's also possible to use Visual Studio Code, but I chose Atom for this project. Atom is a text editor created by GitHub.

Step 2 - Install Mono. Mono is an open source implementation of .NET that allows you to run .NET applications cross-platform. Eventually you will be able to use coreCLR for this purpose but it's not quite ready yet.

Step 3 - Next comes Ionide. Ionide is an awesome open source package that allows F# development in VSCode and Atom. Install these Ionide packages in Atom.

Step 4 - Install fsharp yeoman. Yeoman will create scaffolding projects and solutions so you don't have to manually create the visual studio project xml files.

Step 5 - Now that everything is installed we start by using yeoman to create the solution. Just start by typing "yo fsharp" in the command line. In this project I created a simple web service so I used the Console application template.

The Dependencies

1. FAKE - F# library for building.
2. Fsharp.Data - F# HTML type provider used for parsing stat web pages.
3. FsUnit - Unit testing library to make f# tests read well. So instead of using Assert.Equals tests could look like this:
result |> should equal 10
4. Newtonsoft.Json - Library for serializing JSON
5. Suave - Amazing F# library that allows setting up a fast, lightweight, and non blocking web server.
6. xUnit - Unit testing library that works better than nUnit with F# projects.
7. Paket - Dependency manager for .NET (much better than NuGet).

The Code

I used FAKE to build all the projects. The build script uses Paket to download all dependencies, then builds the projects, then runs tests. Here is the full build.fsx file for reference.

One of the reasons I love writing code in F# is the ability to easily use the REPL. After programming this way there is no going back for me. The F# REPL in Atom isn't perfect yet but it really helps development. This allows for quickly testing of small functions and promotes the use of pure functions in your program.

REPL example
I used Suave for my web server and there are a lot of things I like about this library. It makes writing asynchronous code very easy, and it provides full flexibility without requiring you to write a ton of code. Here is the entry point of my program which uses Suave. It's very simple to understand. It forwards the HTTP routes specified to the appropriate function. This is much nicer than the WebApi controller classes that were necessary when using Microsoft.Owin.

Entry point:
let routes (db:IDB) =
  choose
    [ GET >=>
      choose [ path "/Golf/LowestTournament" >=> SportService.getLowestTournament db
               path "/Golf/LowestRound" >=> SportService.getLowestRound db
               path "/Golf/TotalEarnings" >=> SportService.getTotalGolfEarnings db
               path "/Baseball/Homeruns" >=> SportService.getHomeruns db
               path "/Baseball/Strikeouts" >=> SportService.getStrikeouts db
               path "/Baseball/Steals" >=> SportService.getSteals db ]]

[<EntryPoint>]
let main argv =
    startWebServer defaultConfig (routes Database.DB)
    0

The other good thing about the routes function is that it's fully unit-testable. The database connection is passed in at runtime so it's possible to test HTTP request and responses by simply testing the routes function.

Here is an example of a unit test that does just that.

let fakeDB (response:Response) =
  { new IDB with
      member x.GetLowestTournament first last = response
      member x.GetLowestRound first last = response
      member x.GetTotalGolfEarnings first last = response
      member x.GetHomeruns first last = response
      member x.GetStrikeouts first last = response
      member x.GetSteals first last = response
  }

[<Fact>]
let ``Golf lowest tournament total Tiger Woods``() =
  let expectedResponse = "{\"FirstName\":\"Tiger\",\"LastName\":\"Woods\",\"Stat\":{\"Case\":\"LowestTournament\",\"Fields\":[-27]}}"
  let athlete = defaultAthlete "Tiger" "Woods" (LowestTournament -27)

  result "Tiger" "Woods" "Golf\LowestTournament" (fakeDB athlete)
  |> validateSuccess expectedResponse

This unit test creates a fake database on the fly and passes that database into the routes function. The HTTP response is then fully validated. This unit test provides a lot of value and actually helped me quite a few times in development when I broke some of the routes by accident.

Eventually after the route is matched and its corresponding function is called the Fsharp.Data HTML type provider is used. The type provider loads the specified HTML page and parses through it appropriately. The parsing code I wrote is a little dirty because the page I used for getting the stats is created dynamically and didn't have good class names. Here is the parsing code for the golf stats.
let stat (html:HtmlDocument) (input:GolfInput) =
  let tables = html.Descendants ["table"]

  match Seq.length tables with
  | 0 -> Failure RecordNotFound
  | _ -> let value =
           tables
           |> Seq.head
           |> (fun x -> x.Descendants ["tbody"])
           |> Seq.head
           |> (fun x -> x.Descendants ["tr"])
           |> Seq.map (input.MapFunction input.Data.ColumnIndex)
           |> Seq.filter input.FilterFunction
           |> input.TotalFunction

         Success { FirstName = input.Data.FirstName
                   LastName = input.Data.LastName
                   Stat = input.Data.ValueFunction value }

This is also fully unit-testable. I simply pass in a sample HTML page and verify the result like so.

[<Literal>]
let golfHtml =
  """<html>
         <body>
             <table>
                 <tbody>
                     <tr>
                        <td>login</td>
                        <td>Win</td> <!-- Final finish -->
                        <td>61-67-70-71=269</td> <!-- Final score -->
                        <td>-27</td> <!-- Final score to par -->
                        <td>$864,000</td> <!-- Final money -->
                        <td>fedex</td>
                    </tr>
                    <tr>
                        <td>login</td>
                        <td>T15</td> <!-- Final finish -->
                        <td>66-71-70-71=278</td> <!-- Final score -->
                        <td>-28</td> <!-- Final score to par -->
                        <td>$1,997,000</td> <!-- Final money -->
                        <td>fedex</td>
                    </tr>
                    <tr>
                        <td>login</td>
                        <td>Win</td> <!-- Final finish -->
                        <td>72-71-70-71=284</td> <!-- Final score -->
                        <td>-18</td> <!-- Final score to par -->
                        <td>$322,000</td> <!-- Final money -->
                        <td>fedex</td>
                   </tr>
                   <tr>
                        <td>login</td>
                        <td>T33</td> <!-- Final finish -->
                        <td>58-77-64-60=259</td> <!-- Final score -->
                        <td>-17</td> <!-- Final score to par -->
                        <td>$659,000</td> <!-- Final money -->
                        <td>fedex</td>
                   </tr>
               </tbody>
          </table>
      </body>
  </html>"""

[<Fact>]
let ``Golf lowest round``() =
  let input = { FirstName = "Tiger"; LastName = "Woods"; ColumnIndex = 2; ValueFunction = LowestRound }
  let golfInput = { Data = input; MapFunction = GolfStats.lowestRoundMap; FilterFunction = (fun x -> x > 50); TotalFunction = Seq.min }
  let expected = Success { FirstName = "Tiger"; LastName = "Woods"; Stat = LowestRound 58}
  let doc = HtmlDocument.Parse golfHtml

  (GolfStats.stat doc golfInput)
  |> should equal expected

Here is the end result. A beautiful front-end showcasing my work!
Simple front-end using the API
Results: 

The bad - I couldn't get the FSI REPL to work with the FSharp.Data type provider. This was a shame because (as far as I know) debugging is not enabled in Atom with Ionide. Because of the limitation it made it difficult to write some of the HTML parsing code. Also, adding new files to the project was painful because manually editing the .fsproj files was error prone.

The good - Love the fact I can create F# .NET apps on a Mac without running a windows VM. Atom and Ionide work well together and this app was created with all open source packages and software. Given this process would also run on linux it is possible to create first class, scalable web services that would be inexpensive to host. It's close to becoming a viable option for a startup in my opinion.

Tuesday, April 19, 2016

What is Box in F#?

The first time I had to use the box function in F# there was some confusion on our team what exactly was happening under the hood. Our situation came when we had set an HTTP parameter to required in our WebAPI controller.

The Problem - The variable was in Int, it was required, and if it was not provided our application was supposed to return an error code. When the parameter was not provided WebAPI would set the required parameter to null even though Int cannot be null. This was very confusing, and to make matters worse we were not able to make the required parameter a Nullable<Int> with WebAPI.

The Solution - Box is the solution! From our searching, by boxing the int we could check if it is null even though it's not nullable (confusing). But what does the box function do? According to MSDN, the box function in F# "boxes a strongly typed value". Ok... Thanks documentation.

That's not all very helpful, but with further digging in this MSDN article the answer is "Boxing is the process of converting a value type to the type object..." It goes on to say "it wraps the value inside a System.Object and stores it on the managed heap..."

So, boxing a value wraps the value type inside a System.Object and therefore we can check if the object is null. Then we have to unbox it back to an Int after the null check.

Now the next time you box you'll hopefully have a better idea of what's happening. Happy boxing everyone!


Tuesday, April 12, 2016

From Development to Production

Each development team has their own way of developing software. I wouldn't advocate for only one development method, but here is a method we have used in the past that worked well.

The steps:

1. Have a list of well thought out and priority ordered tasks ready to go. Having a well groomed backlog is the best way to identify risks early in the process and is also a team moral booster. Estimate items at a granular level at a scale like small, medium, or large and only allow items less than large to be marked as ready.

2. It's also important to note QA should be involved and a necessary step to calling an item ready is everyone in the development cycle knows what work needs to be completed to call the task finished.

3. The developer picks a task off the top of the list and marks it as in progress, then starts work in a development branch off of the team integration branch.

*Branching aside: All our development was done in individual development branches and testing done in an integration branch.

4. Once development is complete the task is marked as resolved and a pull request is created into the integration branch.

5. Another team member will grab the task and mark it as in progress code review. Then the code reviewer proceeds to code review the pull request and when finished merges the pull request into integration and marks the task as dev complete.

*One additional note here is that in order to get to this stage a build would have to be completed on TeamCity, which includes all unit tests runs. If any of the tests failed the build would fail.

6. QA is then notified of a completed task, so the QA member will take the task and create a Git tag (release) off the integration branch. This is done so development can keep on churning without additional commits being added to the integration branch that would require re-testing.

7. QA writes automation and tests the task and when complete marks the task as QA complete and assigns the task back to dev.

8. Dev then merges the tag into master and creates a PROD release with a proper version number (we  did our best semantic versioning attempt).

9. The PROD release is then built in TeamCity (along with unit test runs). We used Octopus release promotion so this release was required to be deployed to a QA environment where all automation was automatically run again (any of these failing would cause release to halt). Once completed and successful then the release would be deployed to production with a click of the button.

This process isn't completely novel, but it worked really well for our team. The particular pieces I liked were our use of code reviews and git releases. Code reviews helped with defects and team cohesion, whereas git releases made it really easy to separate and document our work.