Functional Adam: 2016

Tuesday, June 21, 2016

When to Use Option Recompile in Sql

What is option recompile?

When a stored procedure is compiled the generated query plan is optimized for the current state of the database. This is really good if the relative sizes of the objects used in the stored procedure don't change very much from the last time the query plan was generated. The efficient query plan can easily be fetched from the cache and won't have to be generated again which is generally good for performance. But, this could be terrible for performance if the database objects changed drastically from the previous run.

By default the query plan is fetched from the cache and is only recompiled when the underlying tables change or SQL server is restarted. By adding the option recompile statement at the end of a stored procedure you are telling SQL to always recompile the query plan instead of fetching it from the cache.

When would I use this in practice?

The problem - We had a scheduled stored procedure taking a really long time in production. Usually a slow running query could be easily reproduced in testing, which would lead to an eventual fix. In this case the query was only slow sometimes in production. And it was not possible to reproduce in staging with the same data.

Experimentation - Normally in this situation when the problem could not be reproduced on prod data I would think it's most likely load related, outdated statistics, or even a maintenance script running at the wrong time. All of these problems were ruled out as we made sure statistics were up to date, the SQL server load was low, and no maintenance scripts were running yet the problem still occurred.

I had a feeling it had to do with a bad query plan but couldn't figure out why until I was able to luckily reproduce the issue in staging. I accidentally ran the procedure once with incorrect parameters and then a second time after putting in the correct parameters was able to duplicate the issue. The generated query plan differed greatly from what I saw previously in the quick runs. I had stumbled across what is known as "parameter sniffing".

The solution - Adding option recompile to the stored procedure of course! Parameter sniffing was the issue because the stored procedure was executing dynamic SQL and embedding parameter date values into the executed statement. This caused the performance go haywire if the query was run once on a set of dates that returned few records, then immediately after running on a different set of parameters that returned hundreds of thousands of records.

The query plan for the amount of data within those parameter ranges was cached as if the amount of data in those plans was always small. It was NOT always small after the query was executed again with a different set of parameters. Meaning the next time this script ran it used the bad query plan with different dates and puked. Option recompile fixed the problem because a new plan was generated before each run.

Tuesday, June 7, 2016

What is the Difference Between Currying and Partial Application?

Currying is the act of breaking apart functions with multiple parameters into multiple one parameter functions.

Here is a simple function with multiple parameters that is not curried.

// multiple parameter function
let add x y = x + y

Here is the same function but curried.

// curried function
let add x = 
  let addInner y = x + y 
  addInner

As you can see, the curried function is made up of another function with the first parameter baked in. Because this function is curried it can now be partially applied (you actually get this automatically with any multi parameter function in F#). Partial application is a powerful technique in functional programming if used correctly. For instance, here is a simple partial application example for the function above.

// partially applied functions
let add4 = add 4 
let add5 = add 5 

add4 1 // returns 5
add5 1 // returns 6

The add4 function partially applies the add function, which means it returns the addInner function with 'x' baked in. So the add4 function becomes:

addInner y = 4 + y

This example is trivial, how could this be used in a useful way in production code? Often in our production code we have to retrieve something from the database, like a customer, and then continue with the business operation

Business logic function:

let businessLogic (dbFun:int->Customer) customerId = 
  let customer = dbFun customerId
  // other logic

This business logic function takes as a parameter a function that retrieves a customer from the database. But the database function below needs a connection string as well (passing connection string info in makes managing connection strings easier).

Database function:

let getCustomerFromDb connectionString customerId = 
  // get customer from real database

This function has the function signature of string->int->Customer. We use partial application to turn it into an int->int so it can be used in the businessLogic function.

let customerId = 5

// partially applied get customer function
let entryPoint = businessLogic (getCustomerFromDb "RealConnection") customerId

In a non functional language it is necessary to use dependency injection to achieve this same thing. Partial application achieves the same goal but is simpler and doesn't require extra setup code.

Tuesday, May 24, 2016

Your Builds Should Compile Without Warnings

Few developers would disagree with that statement. Yet it's rare when I come upon a project where I can build successfully without any compiler or build warnings. Why is this the case? Why are we not more rigorous as a profession?

The objections I usually hear are that warnings are nit-picky and the team was on a time crunch. This is not acceptable in my opinion. It's a classic case of racking up technical debt without an immediate plan to pay off that debt. A big no-no.

Let's review a few of the bugs I have come across in my time trying to clean up warnings.

1. Sql injection security holes - I have actually found a few of these in my time cleaning up code warnings. If you use dynamic sql to build up a sql query and don't parameterize your inputs (eek!) then you expose yourself to a sql injection attack. Code warnings can detect this.

2. Memory Leaks - If you instantiate a class that implements IDisposable and don't dispose it you are wasting memory and relying on the garbage collector to do the job. These bugs suck to find in production. Best to find them with the compiler.

3. Conflicting DLL versions - This build warning is gross and often difficult to track down. It means that somewhere there are two references to the same DLL but with a different version. Ignore this warning at your own peril, there is no telling the type of runtime errors that might result.

Considering warnings can detect and fix these types of problems at runtime it seems the excuse your team was on a time crunch is a very bad one. You are trading short term speed for long term pain.

How to achieve a build without warnings?

From the very beginning of a project the build should be set up to fail if there are any warnings. And the warning level should be set high. If there are nit-picky warnings (there are admittedly quite a few) that can be ignored then decide as a team to have the compiler suppress them. It should be a conscious decision. Start out strict and dial back the strictness after carefully reviewing each warning.

Tuesday, May 10, 2016

Improve a Little Each Day

This post is a little philosophical and not very original, but I think it's a powerful idea that is backed up by modern psychological research. There are few geniuses among us, so for the majority mastering a subject requires methodical and persistent practice to become an expert. This idea can be applied to a few areas in software development.

1. Apply it to improving yourself

Pick out three topics where you would like to improve as a developer and do one small thing a day to improve in those areas. Track your progress and mark an X for each time you accomplish a small goal. I would be a much better developer had I been doing this since the beginning of my career. Little did I realize learning begins when school ends, not the other way around.

Use this method to become more efficient with your tools. Just once each day do one of the following:

a. Look up a shortcut for something you've done manually instead of using the mouse.
b. Learn how to use the command line to accomplish a specific task.
c. Instead of looking through text or code files use a regex find and replace.

Over time you will become a master of your environment.

2. Apply this method to the code you touch - "Boy Scout Rule"

Each time you are in an area of code leave it at least slightly better than when you arrived. If the whole team takes this approach to heart it your codebase will drastically improve.

3. Apply it to your team as a whole

Are provisioning new machines difficult? Is pushing to production difficult? Then change it. Each time you provision a machine automate one small thing. Each time you push to production add one automation step to make it easier the next time. This will add up over time.

At the heart, improving each day involves first identifying the major problems. Then break them down into bit-size chunks and fix them methodically. Over time you will see big results.

Tuesday, April 26, 2016

How to Develop an F# .NET Web App on a Mac using Atom + Ionide + Mono

I've been developing .NET apps for 8 years and for most of my career the thought of developing a .NET app outside of visual studio was crazy. Not all developers like heavy handed IDEs like visual studio, and to make matters worse as a .NET developer you are stuck on windows (I actually like Windows OS though). But, all those things are changing as Microsoft seems to be embracing open source (hello coreCLR!) and are winning back the developer community. It's now possible to write professional .NET apps on a Mac that can run on Mac OS, Linux, and Windows.

I'll walk you through a sample project I completed on my Mac at home. The app is a simple web service called "Sports Stats" (here is the full source code) and it's a web service that retrieves stats for a specific baseball player or golfer.

The Setup

Step 1 - Install Atom. It's also possible to use Visual Studio Code, but I chose Atom for this project. Atom is a text editor created by GitHub.

Step 2 - Install Mono. Mono is an open source implementation of .NET that allows you to run .NET applications cross-platform. Eventually you will be able to use coreCLR for this purpose but it's not quite ready yet.

Step 3 - Next comes Ionide. Ionide is an awesome open source package that allows F# development in VSCode and Atom. Install these Ionide packages in Atom.

Step 4 - Install fsharp yeoman. Yeoman will create scaffolding projects and solutions so you don't have to manually create the visual studio project xml files.

Step 5 - Now that everything is installed we start by using yeoman to create the solution. Just start by typing "yo fsharp" in the command line. In this project I created a simple web service so I used the Console application template.

The Dependencies

1. FAKE - F# library for building.
2. Fsharp.Data - F# HTML type provider used for parsing stat web pages.
3. FsUnit - Unit testing library to make f# tests read well. So instead of using Assert.Equals tests could look like this:

result |> should equal 10

4. Newtonsoft.Json - Library for serializing JSON
5. Suave - Amazing F# library that allows setting up a fast, lightweight, and non blocking web server.
6. xUnit - Unit testing library that works better than nUnit with F# projects.
7. Paket - Dependency manager for .NET (much better than NuGet).

The Code

I used FAKE to build all the projects. The build script uses Paket to download all dependencies, then builds the projects, then runs tests. Here is the full build.fsx file for reference.

One of the reasons I love writing code in F# is the ability to easily use the REPL. After programming this way there is no going back for me. The F# REPL in Atom isn't perfect yet but it really helps development. This allows for quickly testing of small functions and promotes the use of pure functions in your program.

REPL example

I used Suave for my web server and there are a lot of things I like about this library. It makes writing asynchronous code very easy, and it provides full flexibility without requiring you to write a ton of code. Here is the entry point of my program which uses Suave. It's very simple to understand. It forwards the HTTP routes specified to the appropriate function. This is much nicer than the WebApi controller classes that were necessary when using Microsoft.Owin.

Entry point:

let routes (db:IDB) =
  choose
    [ GET >=>
      choose [ path "/Golf/LowestTournament" >=> SportService.getLowestTournament db
               path "/Golf/LowestRound" >=> SportService.getLowestRound db
               path "/Golf/TotalEarnings" >=> SportService.getTotalGolfEarnings db
               path "/Baseball/Homeruns" >=> SportService.getHomeruns db
               path "/Baseball/Strikeouts" >=> SportService.getStrikeouts db
               path "/Baseball/Steals" >=> SportService.getSteals db ]]

[<EntryPoint>]
let main argv =
    startWebServer defaultConfig (routes Database.DB)
    0

The other good thing about the routes function is that it's fully unit-testable. The database connection is passed in at runtime so it's possible to test HTTP request and responses by simply testing the routes function.

Here is an example of a unit test that does just that.


let fakeDB (response:Response) =
  { new IDB with
      member x.GetLowestTournament first last = response
      member x.GetLowestRound first last = response
      member x.GetTotalGolfEarnings first last = response
      member x.GetHomeruns first last = response
      member x.GetStrikeouts first last = response
      member x.GetSteals first last = response
  }

[<Fact>]
let ``Golf lowest tournament total Tiger Woods``() =
  let expectedResponse = "{\"FirstName\":\"Tiger\",\"LastName\":\"Woods\",\"Stat\":{\"Case\":\"LowestTournament\",\"Fields\":[-27]}}"
  let athlete = defaultAthlete "Tiger" "Woods" (LowestTournament -27)

  result "Tiger" "Woods" "Golf\LowestTournament" (fakeDB athlete)
  |> validateSuccess expectedResponse

This unit test creates a fake database on the fly and passes that database into the routes function. The HTTP response is then fully validated. This unit test provides a lot of value and actually helped me quite a few times in development when I broke some of the routes by accident.

Eventually after the route is matched and its corresponding function is called the Fsharp.Data HTML type provider is used. The type provider loads the specified HTML page and parses through it appropriately. The parsing code I wrote is a little dirty because the page I used for getting the stats is created dynamically and didn't have good class names. Here is the parsing code for the golf stats.

let stat (html:HtmlDocument) (input:GolfInput) =
  let tables = html.Descendants ["table"]

  match Seq.length tables with
  | 0 -> Failure RecordNotFound
  | _ -> let value =
           tables
           |> Seq.head
           |> (fun x -> x.Descendants ["tbody"])
           |> Seq.head
           |> (fun x -> x.Descendants ["tr"])
           |> Seq.map (input.MapFunction input.Data.ColumnIndex)
           |> Seq.filter input.FilterFunction
           |> input.TotalFunction

         Success { FirstName = input.Data.FirstName
                   LastName = input.Data.LastName
                   Stat = input.Data.ValueFunction value }

This is also fully unit-testable. I simply pass in a sample HTML page and verify the result like so.


[<Literal>]
let golfHtml =
  """<html>
         <body>
             <table>
                 <tbody>
                     <tr>
                        <td>login</td>
                        <td>Win</td> <!-- Final finish -->
                        <td>61-67-70-71=269</td> <!-- Final score -->
                        <td>-27</td> <!-- Final score to par -->
                        <td>$864,000</td> <!-- Final money -->
                        <td>fedex</td>
                    </tr>
                    <tr>
                        <td>login</td>
                        <td>T15</td> <!-- Final finish -->
                        <td>66-71-70-71=278</td> <!-- Final score -->
                        <td>-28</td> <!-- Final score to par -->
                        <td>$1,997,000</td> <!-- Final money -->
                        <td>fedex</td>
                    </tr>
                    <tr>
                        <td>login</td>
                        <td>Win</td> <!-- Final finish -->
                        <td>72-71-70-71=284</td> <!-- Final score -->
                        <td>-18</td> <!-- Final score to par -->
                        <td>$322,000</td> <!-- Final money -->
                        <td>fedex</td>
                   </tr>
                   <tr>
                        <td>login</td>
                        <td>T33</td> <!-- Final finish -->
                        <td>58-77-64-60=259</td> <!-- Final score -->
                        <td>-17</td> <!-- Final score to par -->
                        <td>$659,000</td> <!-- Final money -->
                        <td>fedex</td>
                   </tr>
               </tbody>
          </table>
      </body>
  </html>"""

[<Fact>]
let ``Golf lowest round``() =
  let input = { FirstName = "Tiger"; LastName = "Woods"; ColumnIndex = 2; ValueFunction = LowestRound }
  let golfInput = { Data = input; MapFunction = GolfStats.lowestRoundMap; FilterFunction = (fun x -> x > 50); TotalFunction = Seq.min }
  let expected = Success { FirstName = "Tiger"; LastName = "Woods"; Stat = LowestRound 58}
  let doc = HtmlDocument.Parse golfHtml

  (GolfStats.stat doc golfInput)
  |> should equal expected

Here is the end result. A beautiful front-end showcasing my work!

Simple front-end using the API

Results:

The bad - I couldn't get the FSI REPL to work with the FSharp.Data type provider. This was a shame because (as far as I know) debugging is not enabled in Atom with Ionide. Because of the limitation it made it difficult to write some of the HTML parsing code. Also, adding new files to the project was painful because manually editing the .fsproj files was error prone.

The good - Love the fact I can create F# .NET apps on a Mac without running a windows VM. Atom and Ionide work well together and this app was created with all open source packages and software. Given this process would also run on linux it is possible to create first class, scalable web services that would be inexpensive to host. It's close to becoming a viable option for a startup in my opinion.

Tuesday, April 19, 2016

What is Box in F#?

The first time I had to use the box function in F# there was some confusion on our team what exactly was happening under the hood. Our situation came when we had set an HTTP parameter to required in our WebAPI controller.

The Problem - The variable was in Int, it was required, and if it was not provided our application was supposed to return an error code. When the parameter was not provided WebAPI would set the required parameter to null even though Int cannot be null. This was very confusing, and to make matters worse we were not able to make the required parameter a Nullable<Int> with WebAPI.

The Solution - Box is the solution! From our searching, by boxing the int we could check if it is null even though it's not nullable (confusing). But what does the box function do? According to MSDN, the box function in F# "boxes a strongly typed value". Ok... Thanks documentation.

That's not all very helpful, but with further digging in this MSDN article the answer is "Boxing is the process of converting a value type to the type object..." It goes on to say "it wraps the value inside a System.Object and stores it on the managed heap..."

So, boxing a value wraps the value type inside a System.Object and therefore we can check if the object is null. Then we have to unbox it back to an Int after the null check.

Now the next time you box you'll hopefully have a better idea of what's happening. Happy boxing everyone!

Tuesday, April 12, 2016

From Development to Production

Each development team has their own way of developing software. I wouldn't advocate for only one development method, but here is a method we have used in the past that worked well.

The steps:

1. Have a list of well thought out and priority ordered tasks ready to go. Having a well groomed backlog is the best way to identify risks early in the process and is also a team moral booster. Estimate items at a granular level at a scale like small, medium, or large and only allow items less than large to be marked as ready.

2. It's also important to note QA should be involved and a necessary step to calling an item ready is everyone in the development cycle knows what work needs to be completed to call the task finished.

3. The developer picks a task off the top of the list and marks it as in progress, then starts work in a development branch off of the team integration branch.

*Branching aside: All our development was done in individual development branches and testing done in an integration branch.

4. Once development is complete the task is marked as resolved and a pull request is created into the integration branch.

5. Another team member will grab the task and mark it as in progress code review. Then the code reviewer proceeds to code review the pull request and when finished merges the pull request into integration and marks the task as dev complete.

*One additional note here is that in order to get to this stage a build would have to be completed on TeamCity, which includes all unit tests runs. If any of the tests failed the build would fail.

6. QA is then notified of a completed task, so the QA member will take the task and create a Git tag (release) off the integration branch. This is done so development can keep on churning without additional commits being added to the integration branch that would require re-testing.

7. QA writes automation and tests the task and when complete marks the task as QA complete and assigns the task back to dev.

8. Dev then merges the tag into master and creates a PROD release with a proper version number (we did our best semantic versioning attempt).

9. The PROD release is then built in TeamCity (along with unit test runs). We used Octopus release promotion so this release was required to be deployed to a QA environment where all automation was automatically run again (any of these failing would cause release to halt). Once completed and successful then the release would be deployed to production with a click of the button.

This process isn't completely novel, but it worked really well for our team. The particular pieces I liked were our use of code reviews and git releases. Code reviews helped with defects and team cohesion, whereas git releases made it really easy to separate and document our work.

Tuesday, April 5, 2016

How to Watch ESPN While Traveling Internationally

I'm currently traveling internationally for a few months and one of my biggest dilemmas was how I was going to keep up with my sports teams. ESPN and other content providers will often only serve content within the U.S. To get around this we have a few options.

Option 1 - Use a DNS proxy
This may be the simplest option and by using Smart DNS Proxy setup is a breeze. The proxy server needs to be located in an approved content region for your provider (in my case the U.S.) The DNS server obscures your client IP address by routing certain traffic through other proxy servers. This is a simple solution but not as fast or secure as using a VPN as in option 2.

Option 2 - Set up a VPN
For the best solution, set up a VPN using Hide My Ass. This is the fastest, most secure, and reliable option. Hide My Ass sets up a private network and routes your traffic through their proxy servers, which masks your location.

What all of this means is it was possible to watch my beloved Kentucky Wildcats while on the beach in Fiji.

Tuesday, March 29, 2016

F# Yield Keyword

The yield keyword in F# is similar to the Return keyword in that it returns a value. The difference is yield is used within a sequence (or an enumeration) and it does not stop the enumeration.

Here is an example of filtering a list and returning a list of lists.

[3; 6; 8; 9]
|> List.filter(fun x -> x > 5)
|> List.map(fun x -> [x + 1])

Returns:

val it : int list list = [[7]; [9]; [10]]

To accomplish this task with for..in and yield we could do the following:

[for x in [3; 6; 8; 9] do
  if x > 5 then
    yield [x + 1]]

Returns:

val it : int list list = [[7]; [9]; [10]]

yield is similar to Return in that they both have counterparts like ReturnFrom (return!) and YieldFrom (yield!). Yield bang will do a collect and flatten the returned sequence. Here is the example using yield bang.

[for x in [3; 6; 8; 9] do
  if x > 5 then
    yield! [x + 1]]

Returns:

val it : int list = [7; 9; 10]

Tuesday, March 22, 2016

What are Fluent Interfaces?

What are they?
A fluent interface is an API facade that uses method chaining that preserves state throughout the method calls. This type of API can lead to more readable OO code because each method call returns a new class that contains the new state. Meaning the result of each call does not have to be stored in a variable and can be chained together.

When to use?
Functional programming has pipelining and higher order functions to achieve the readability win that fluent interfaces provide in OO languages. So, I think fluent interfaces are unnecessary in functional languages. Fluent interfaces can provide more readable code using method chaining in OO languages. Use them when your API needs to be simplified and you want it to read well.

Take this LINQ example:

[1; 2; 4]
  .Select(fun x -> x + 1)
  .Where(fun y -> y < 4)

There is a lot going on under the hood, but by simply looking at this code it is easy to tell what is supposed to happen.

How to implement?
To implement method chaining each method must return a new instance of a class taking into account the current state. A fluent interface is not necessarily just method chaining, so how to implement a fluent interface is specific to the domain space. We'll use a simple method chaining example here.

Here is a class with two actions.

type Counter = { Count: int } with
  member this.Increment : Counter = 
    { Count = this.Count + 1 }
  member this.Print : Counter = 
    printfn "Current Count = %i" this.Count
    { Count = this.Count }

Each action returns a new instance of the class and the method calls can be chained together.

{ Count = 3 }
  .Increment
  .Print
  .Increment
  .Increment
  .Print

The output of the above statement would be:

Current Count = 4
Current Count = 6

This example is trivial but this method can be used with complicated scenarios resulting in more readable code.

Tuesday, March 15, 2016

My Favorite F# Feature - Creating an Anonymous Class from an Interface

There are a lot of delightful F# language features and the best may be the ability to create an anonymous class implementing an interface. When I first discovered this feature I jumped out of my chair with excitement. Let's walk through a simple scenario.

Here we have a simple interface:

type SomeRecord = { 
    ID : int 
    Name : string 
}
    
type IDatabase =
    abstract member Get: int -> SomeRecord
    abstract member Save: SomeRecord -> bool

Here is the normal way to implement an interface:

type RealDatabase() =
    interface IDatabase with 
        member this.Save x = // database get implementation
        member this.Get record = // database save implementation

This could be used in code as follows (admittedly a rather contrived example):

let Service (db:IDatabase) id = db.Get id

When writing a unit test for this function we could create a test class implementing the interface. But, using F# allows creating an instance of the interface on the fly which makes testing really easy like so:

[<fact>]
let ``Some unit test``() =
    let fakeDb = { new IDatabase with 
                       member this.Save record = true 
                       member this.Get id = {ID = id; Name = "Test"} } 

    (Service fakeDb 2) |> should equal {ID = 2; Name = "Test"}

This is much easier than using a moq library. The ability to easily create classes implementing an interface without much code is another reason I really like F#. The language doesn't get in the way when trying to accomplish a task.

Tuesday, March 8, 2016

The Search Wars - grep vs ack vs ag

Developers should be constantly searching and refining their toolkit. Knowing the right tool to use for the job is an important aspect of being productive. Here are three competing ways to search using the command line.

grep

grep is a tried and true UNIX command that has been around for decades. It's lightweight, fast, and successfully does the job of quickly searching. grep is a command line utility that every developer should become familiar with. While grep is great, there are a few alternatives that offer some advantages.

ack

Why use it?

For one, ack has 25% fewer characters than grep, typing win! ack is designed for developers and is best suited for searching code files. It is faster than grep because it will ignore common source control files like .git and subversion source control directories.

How to install for OSX?

Use homebrew:

    brew install ack

How to use?

Here is a simple usage. The results are very clean and even come with highlighting.

ag

Why use it?

Ag contains 33% fewer characters than ack! It's also an order of magnitude faster than ack because of it's written in C and is able to take advantage of multiple CPU cores. This as well as a number of other performance enhancements described on the GitHub page.

How to install for OSX?

Use homebrew:

    brew install the_silver_searcher

How to use?

Very similar usage as ack.

I have to admit sometimes I still use sublime for regex pattern matching, but the above utilities are good, reliable, and fast options that every developer should have in their toolkit.

Wednesday, March 2, 2016

Book Review: Dependency Injection in .NET

Dependency Injection in .NET changed the way I think about software development. Dependency injection is a topic that is easy to describe, but not until reading this book did I feel it in my bones as I was writing code. It's a powerful topic. Here are my main takeaways from reading this excellent book.

Use constructor injection - not setter injection. Constructor injection is the cleanest way of passing dependencies. This way, it is known when you have an instance of that class the dependency has been specified. Otherwise with setter injection guard clauses must be added to ensure the dependency has been set (besides as we know, mutation is the devil!)

Make your code testable - unit tests help ensure correctness. A side benefit of writing testable code is loose coupling where components have no knowledge of how the surrounding system works. This enables changing the program easier in the future. If you start with this principle when designing your application you will end up with much cleaner design resulting in fewer defects.

.NET Containers - the last part of the book goes over the most popular .NET dependency injection containers. I'm not going to go over them in detail here, but castle windsor, structuremap, and spring .NET are among the containers covered in the book. Good patterns and anti-patterns when using these containers are discussed with clear examples. I was surprised at the extensive features provided by these libraries.

Even non OO and non .NET programmers can benefit from the ideas in this book. These patterns remind me of good functional code, where the program is made up of mostly pure functions that are easily testable. Passing dependencies and settings into functions instead of defining them within the function is a great way to write applications.

Tuesday, February 23, 2016

How to Format Code in a Blog Post?

There are quite a few ways to easily format code in HTML, I'll walk you through the steps I took.

Step 1 - Find a good CSS library for code formatting

The best HTML syntax highlighter library I found was PrismJs. It's free, incredibly thorough the languages it supports, and it provides a professional look. You simply choose the language support you like and download the CSS and Javascript files.

Here is a before look:

let items = [1; 3; 4]
items |> List.map(fun x -> x + 1)

Here is the code after formatting with PrismJs:

let items = [1; 3; 4]
items |> List.map(fun x -> x + 1)

Step 2 - Move from Wordpress to Google Blogger

I originally had the blog on Wordpress using a free template but I decided to move it to Google Blogger. The reason for moving was Google Blogger provided free template editing. Wordpress has other advantages but for my purpose I simply needed to tweak the template to include the Prism CSS and Javascript files. Since I only had a few posts this was not a difficult move.

Step 3 - Host PrismJs CSS and Javascript files in DropBox
I used the instructions in this article to host the files in DropBox so they can be referenced publicly.

Step 4 - Update template HTML to include CSS and Javascript files

In Google Blogger I went to Template -> Edit HTML and added this piece in the header.

<link href='https://dl.dropboxusercontent.com/s/wvc7y6gckrru3ex/prism.css' rel='stylesheet'/>
<script src='https://dl.dropboxusercontent.com/s/sy0k7bc7ixl6xan/prism.js' type='text/javascript'/>

It was that easy! Now my blogging code looks much nicer.

Saturday, February 20, 2016

CORS for security

What is it?

Cross-Origin Resource Sharing is a W3C spec that allows resource sharing across domains. Some resources are allowed to come from any domain, but web fonts and AJAX requests are limited to accessing the same domain as the parent web page. This presents a problem if an AJAX request from www.example1.com wants to request a resource from www.example2.com. In order to fulfill that request a CORS security header will have to be added to the response.

Why?

All modern browsers implement the same-origin policy as a preventative measure to keep attackers from redirecting users to malicious websites. This is a bit unfortunate for development teams implementing APIs to be used across distributed environments. It can be perfectly valid for an API to serve AJAX requests to multiple domains. This is why the CORS spec was introduced and is necessary for teams with environments spanning multiple domains.

How to use it?

To successfully serve a cross-origin request the server side must specify the Access-Control-Allow-Origin and Access-Control-Allow-Methods headers. The Allow-Origin header specifics the domains allowed to access the resource and the Allow-Methods specifies the HTTP methods allowed (GET, PUT, etc). To specify all origins allowed the Allow-Origin header could just specify '*'.

To Vim or Not to Vim?

I try to learn something new about one of my tools every day. Slow, methodical, constant improvement is important to staying effective as a developer. We interact with numerous tools and we should constantly be re-evaluating them to try and become more efficient at our job.

While I have been keeping with my commitment to learning about my tools, one of the tasks I've been putting off is going back to basics and learning Vim. You may be asking with all the advances throughout the years in editors why would I want to go back and learn Vim?

Well, the fact this editor is still widely used in the community after 25 years is reason enough. Besides, even though one can be extremely productive using a modern IDE I feel they can make you complacent. Too much magic behind the scenes. So, that's why I decided to start the Vim journey.

Now what?

I knew it would be frustrating at first, but not this frustrating! In the beginning I felt completely worthless in this crazy Vim land. I try to type a word and the next thing I know I've deleted half the text on the screen and my cursor is skipping around everywhere.

A co-worker introduced me to vimtutor and sanity was restored. Over time I became familiar with the commands and I'm starting to understand the logic behind its power. Small commands that can be combined to do really complicated things. I'm a huge fan of not using the mouse and Vim takes this to a whole new level. The only problem is now I want it everywhere! At least thanks to vimium I can happily vim in the browser :)

Why Would I Use a Monad?

In this post I am going to describe a simple case for when to use a monad. A monad is nothing to be frightened of, it's simply a structure that describes a chain of operations. They should be embraced because if used correctly they can lead to more readable code. And, monads are easy to understand! So, let's walk through a common scenario.

We'll start out with a standard pattern I've observed in a normal C# application. Let's say there is a function that should be called, but only if a number of other validation functions first succeed. The C# code could look as follows (this code could be written better but it is just an example):

public static bool SaveRecord(int input) {
    var valid = IsGreaterThanZero(input);
    if (valid) {
        valid = IsLessThanTen(input);
    }
    if (valid) {
        valid = EqualsFour(input);
    }
    if (valid) {
        SaveRecord(input);
    }
    return valid
}

In this example we want to call SaveRecord, but only depending on the results of some other functions. Instead of branching on the valid boolean we could use exceptions, or we could put those validation calls in one line, but I'm going to argue for a better way. First let's write some code in F#. I'm a fan of Scott Wlaschin's railway programming, where program execution shortcuts if there is an error instead of using exceptions. To do this we need a type like so:

type Response =
 | Success of int
 | Failure of string

Instead of returning simple types our function inputs and outputs can be wrapped with this Response type. By using this response type our functions will have more well defined inputs and outputs which can lead to more readable programs as we will see below. So, let's re-implement the above function in F# using this response type (warning: ugly code below!):

let yuck input =
       match isGreaterThanZero input with
       | Failure f -> Failure f
       | Success a -> match isLessThanTen a with
                         | Failure f -> Failure f
                         | Success b -> match equalsFour b with
                                           | Failure f -> Failure f
                                           | Success c -> saveRecord c

As you can see that looks terrible. After every response we have to check whether not the response is valid before continuing, but luckily the language provides a way to clean it up. I'll show you step by step. First, let's create a function called bind that will help remove some duplicate logic in this function.

let bind m f =
  match m with
  | Failure f -> Failure f
  | Success a -> f a

This bind function takes a response type and a function, then simply propagates the failure if it's a failure or if it's a success it calls the function with the unwrapped value as its parameter. This serves the purpose of removing the nested match statements above and helps readability.

let withBind input =
  let result1 = bind (Success input) isGreaterThanZero
  let result2 = bind result1 isLessThanTen
  let result3 = bind result2 equalsFour
  bind result3 saveRecord

This is much more readable than the first function but we can take it a step further. We can use computation expressions in F# by building a type with the bind function we created that looks as follows:

type ValidationBuilder() =
  member this.Bind(m, f) = bind m f
  member this.Return(x) = Success x
  member this.ReturnFrom(x) = x
  member this.Zero(x) = Success x

let validation = new ValidationBuilder()

With this type we can write a function that is cleaner than before.

let withContinuation input =
  validation {
    let! result1 = isGreaterThanZero input
    let! result2 = isLessThanTen result1
    let! result3 = equalsFour result2
    return! (saveRecord result3)
  }

The computation expression allows us to build in the continuation logic so it doesn't have to be repeated everywhere in our application. The ! operator automatically calls the bind function, unwraps the value and continues on unless there is a failure in which case none of the other lines are executed. One other option we could use here is an infix operator instead of a computation expression. These can be confusing if used too often but here is how we could change the function using an infix operator.

let (>>=) m f =
  match m with
  | Failure f -> Failure f
  | Success a -> f a

The infix operator works like the bind function and when we use it in our original function this is the result.

let withInfixOperator input =
  input
  |> isGreaterThanZero
  >>= isLessThanTen
  >>= equalsFour
  >>= saveRecord

This is the most readable of all the options and shows the power of F#. See, monads aren't scary! If used correctly your programs can contain mostly program logic and are very easy to read as long as the basic concepts are well understood.

What I'm Currently Reading - Release It!

It's an oldie but a goodie. If you look on any top 10 list of software engineering Release It! will be somewhere near the top. Release It! focuses on designing robust software applications and some of the pitfalls that come with distributed complex systems.

I like this book because it takes a deep dive on those nasty, crippling production bugs that can bring down even the most well thought out system. I've seen my fair share of these types of bugs, where the actual problem is an innocuous line of code that was successfully code reviewed, unit-tested, and automated, yet it wasn't enough to prevent the debilitating application crash. This book gives good advice on how to deal with this reality and design systems that can withstand the rigors of production.
Here are my main takeaways so far.

Be cynical!: After reading this book we are taught to be cynical, pessimistic, and distrustful of our code. Danger lurks at every integration point and we must write our code expecting it to fail. Think deeply about the consequences of our failing code and design your architecture with that in mind.

Be anti-fragile: Unit tests and automation are wonderful, but we need our software to be anti-fragile. Beat up your software with in testing. Stress your system with load tests and keep your application running in test for long periods of time to try and uncover those bugs usually only found in production.

Know your design requirements: Should the checkout process be frozen because we can't talk to a third party affiliate? Probably not. Know the requirements of your system and design your failure scenarios accordingly.

Deploy early and often: The deployment process is one of the first things you should iron out. Deployments should be simple and low risk, if it is too hard then you need to rethink your design. Having an easy deployment process will make the team more likely to refactor code and make changes, which is a good thing. I have seen codebases where the team is afraid to make changes because of the difficulty and risk of deploying. This is demoralizing and a lesson that pushing our code needs to be easy.

I would recommend this book to a software engineer with any level of experience. It is a good lesson for all.

My Journey to Functional Programming

Over the last year I have transitioned from an imperative and object oriented style to a declarative and functional style of programming. Even though there was much gnashing of teeth along the way it was a worthy pursuit. I believe a declarative functional style is beneficial in a few ways I will describe below, but first I'd like to describe the journey in case you find yourself making the same transition. Last year a functional programming advocate on our team convinced us to use F# for our team's next project. What followed was a path to the ~~dark~~ functional side.

Dismissal: I was initially skeptical about this decision as functional programming reading materials were being passed around. They stated the style was easier to read and reason about, but this did not seem so at first. The code looked mathematical and cryptic. During these initial stages it was difficult to see how these differences would make life easier as a programmer.
Helplessness: Using a functional programming language for the first time felt like trying to program blindfolded without hands alone in a dark room. How do you loop? Why can't I change this record? Why is this not building!? What's a continuation? Programs that resulted from those initial iterations was a hybrid functional/object oriented style that took longer to create and didn't have a consistent feel. It wasn't until after building multiple real-world applications did it begin to sink in. Then after many tears...
Bliss: It's all about the functions stupid! For my entire software engineering career building software consisted of imperative object mutation while using OO patterns as the main way of sharing code. Turning the thinking around to writing programs using mostly pure composable functional expressions was difficult, but the benefits are worth the initial struggle.

Why declarative and functional?

Expressiveness: Writing code in a mostly pure, declarative way using a functional language leads to easy to change, easy to understand, and testable code. If these practices are followed then it is rather easy to understand the flow of the program because most of the logic is contained in small blocks unaffected by state. Declarative code is telling you its intent (side benefit is fewer code comments) whereas imperative code tells you how.
Functional patterns: Many books have been written on OO patterns and a lot of those patterns are useful and elegant ways of solving specific problems. In functional programming the answer to most of these problems are to simply create more functions. With continuations and currying it's trivial to break apart functions and share code in elegant ways.
Interfaces: Even when using an OO style best practice is to program to the interface instead of the implementation. An interface is simply a function definition, so functional programming takes the adage to heart.

I never thought I would say it, but I'm firmly in the functional camp now and happy to be there! In later posts I will go over specific examples of why I enjoy this style.

"Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached"

An email inbox full of this error message is not the best way to start a morning. Our service had been running in production smoothly for months, but software engineering is a cruel profession and bad things are bound to happen.

Luckily only one of our three production servers was down so we quickly pulled the problem child out of the load balancer rotation and began to investigate. We used perfmon to monitor the open connections on the server and were easily able to fill up the connection pool with only a few hundred concurrent requests from our tests scripts. After waiting a few minutes the connections were reclaimed and the service would become responsive again.

At this point it was rather obvious we were leaking connections, but how? This error message is an indication that connections were being taken from the pool but not released again. So the first obvious thing to do was check to make sure all our SQL clients were being properly disposed. This would be an easy fix if somehow we missed putting our SQL clients in using blocks, but after looking through the code all of our clients were in using blocks and should be disposed properly.

Connection pooling

SQL connections are expensive to create. It consists of creating a socket, doing an initial handshake, authenticating the credentials, doing transaction checks, etc. Connection pooling is used as an optimization technique used to minimize the number of times a connection must be established. So, when a SQL client wants to connect to the database a connection will be created or one taken from the existing pool of connections. Once the SQL client is finished with the connection it will close the connection when Dispose() is called and the connection will go back to the connection pool. Each connection pool is associated with a distinct connection string. If the maximum number of connections in the pool has been reached and there are no more available connections in the pool then a timeout will occur when a connection is requested.

Duplicating a production problem locally is the holy grail of fixing production bugs. Once local duplication has been reached fixing the bug is usually straight-forward. In this case I was not able to reproduce this problem locally even when sending 10,000 concurrent requests at my local service. Am I hitting the same endpoint? Yes. Similar CPU/memory specs as production? Yes. After running through many of these questions we tried setting <gcServer enabled="true"/> on my local dev machine and YES!! Duplication. This server setting is used in production and was required to duplicate this problem.

Garbage collection

The gcServer element specifies whether or not to use server or workstation garbage collection. Workstation garbage collection collects on the same user thread that triggered the garbage collection, so it must compete with other threads when garbage collecting. Server garbage collection dedicates threads specifically to garbage collection at high priority, so this setting will perform better on multi-core CPU machines.

After being able to reproduce the problem locally it was easy to narrow down the offending code (F#).

use cmd = new GetRecords(connectionString)
let! dbResult = cmd.AsyncExecute(contactID)
let records =
    dbResult
    |> Seq.map Convert.convertRecords

That doesn't look so bad right? The SQL command is in a using and Dispose() should be called on it when it goes out of scope. Well, we had our suspicions about the fact that the database call was retuning a sequence and we were curious how this code would perform:

use cmd = new GetRecords(connectionString)
let! dbResult = cmd.AsyncExecute(contactID)
let records =
    dbResult
    |> Seq.toArray
    |> Array.toSeq
    |> Seq.map Convert.convertRecords

Voila! Materializing the sequence to an array before returning actually fixed the problem. But why? Sequences are not computed until it is traversed, so returning a sequence of objects retrieved from a database will keep open the database connection until the sequence is fully traversed. This is bad because we want that database connection returned back tot he pool as quick as possible so it can be used by other threads.

Summary
Be weary of sequences of items returned from a database that are lazily evaluated. It is best practice to materialize the sequence to an array before returning it out of the function. Also, make sure your load tests are run before EVERY release in a QA environment that exactly mirrors production. Hopefully that will save you headaches in the future.