The Carriage Return and Line Feed Characters -- How They Affect Text Files On Different Platforms
If you're like I used to be, you always have trouble remembering the difference between how Windows and Linux terminate lines in text files. Does Windows add the extra stuff, or does Linux? What exactly is the extra stuff? How do I get the stuff out?
Well, hopefully by the end of this you'll be taken care of once and for all.
First and foremost, let's establish what the characters are and the differences between them. Both characters are control characters, meaning they're invisible and meant to keep track of something within an application rather than be interfaced with by the user directly. The Carriage Return (CR) is represented by ASCII number 13, and came from the movement of a typwriter to the left of a sheet of paper. Think "returning of the carriage" to the left.
The Line Feed is represented by ASCII number 10, and it harkens back to the action of a typewriter rolling a piece of paper up by one line. Interestingly enough, the combination of these two functions is integrated into the ENTER/RETURN key. Also known as the CRLF character, this handy shortcut both moves you to the left and down a line.
Essentially, the crux of the whole CR / LF / File Corruption issue is the fact that Windows, Macs, and *Nix terminate text file lines differently. Below is a list of how they break down:
- *Nix uses the LF character
- Macs use the CR character
- Windows uses both -- with the CR coming before the LF
How this ends up playing out is that if you write a file in Windows and transfer it bit for bit to a *Nix machine, it'll have extra CR characters that can cause all sorts of havoc. On the other hand, if you transfer a file from a *Nix machine to a Windows machine in the same way, you'll end up with a bunch of lines joined together by little boxes where there are supposed to be line breaks (because the lines are lacking the CR character).
How To Fix It
The good news is that there are plenty of ways to fix this problem. To start with, if you have ever used one of the more advanced FTP programs you've probably noticed the Binary and ASCII options. Well, if you use Binary, files are transfered "bit for bit", or exactly as they are between the source and destination. If a text file is transfered between a *Nix and Windows box (or vice versa) using this mode the symptoms mentioned above will surface.
If you use the ASCII mode, however, and you peform that same transfer, the CR / LF conversions are done for you, i.e. if it's a Windows --> *Nix transfer, the CR characters will be removed, and if it's a *Nix --> Windows transfer they will be added.
In addition, you can always use
tr to translate from one to another:
Windows --> NIX:
tr -d '\r' < windowsfile > nixfile // delete the carriage returns
Mac --> NIX:
tr '\r' '\n' < macfile > nixfile // translate carriage returns into newlines
NIX --> Mac:
tr '\n' '\r' < macfile > nixfile // translate newlines into carriage returns
Yet another option is to do this from within
vi like so:
:set fileformat = unix
You can simply change the format among the three (unix, mac, and dos) in this fashion. And when you save via
:w, it rewrites the file in the correct format.: