In computing, a newline, also known as a line ending, end of line (EOL), or line break, is a special character or sequence of characters signifying the end of a line of text and the start of a new line. The actual codes representing a newline vary across operating systems, which can be a problem when exchanging text files between systems with different newline representations.
I was using a Resource (resx) file to store large text of comma separated values (CSV). This key-value mapping represented the mapping of product codes between an old and new system. In code, I split this whole text using Environment.NewLine and then by comma to generate the map, as shown below.
1 2 3 4
It all worked fine on my machine and even on other team members machines. There was no reason to doubt this piece of code, until on the development environment we noticed the mapped value in the destination system always null.
Analyzing the Issue
Since in the destination system, all the other values were getting populated as expected, except for this mapping it was easy to narrow down to the class that returned the mapping value, to be the problematic one. Initially, I thought this was an issue with the resource file not getting bundled properly. I used dotPeek to decompile the application and verified that resource file was getting bundled properly and had exactly the same text (visually) as expected.
I copied the resource file text from disassembled code in dotPeek into Notepad2 (configured to show the line endings) and everything started falling into place. The resource text file from the build generated code ended with LF (\n), while the one on our development machines had CRLF (\r\n). All machines, including the build machines are running Windows and the expected value for Environemnt.Newline is CRLF - A string containing “\r\n” for non-Unix platforms, or a string containing “\n” for Unix platforms.
Finding the Root Cause
We use git for our source control and configured to use ‘auto’ line endings at the repository level. This ensures that the source code, when checked out, matches the line ending format of the machine. We use Bamboo on our build servers running Windows. The checked out files on the build server had LF line endings, which in turn gets compiled into the assembly.
The checkout step in Bamboo used the built in git plugin (JGit) and has certain limitations. It’s recommended to use native git to use the full git features. JGit also has a known issue with line endings on a Windows machine and checks out a file with LF endings. So whenever the source code was checked out, it replaced all line endings in the file with LF before compilation. So the resource file ended up having LF line endings in the assembly, and the code could no longer find Environment.Newline (\r\n) to split.
Two possible ways to fix this issue is
- Switch to using native git on the bamboo build process
- Use LF to split the text and trim any excess characters. This reduces dependency on line endings variations and settings between different machines only until we are on a different machine which has a different format.
I chose to use LF to split the text and trim any additional characters, while also updating Bamboo to use native git for checkout.
1 2 3 4
Protecting Against Line Endings
The easiest and fastest way that this would have come to my notice was to have a unit test in place. This would ensure that the test fails on the build machine. A test like below will pass on my local but not on the build machine as UsageMap would not return any value for the destination system.
1 2 3 4 5 6 7 8 9
Since there are different systems with different line endings and also applications with different line ending settings and issues of its own, there does not seem to be a ‘one fix for all’ cases. The best I can think of in these cases is it protect us with such unit tests. It fails fast and brings it immediately to out notice. Have you ever had to deal with an issue with line endings and found better ways to handle them?