Remove Carriage Returns (^M)

Brief tutorial on how to remove carriage returns in UNIX. Use CTRL-V+CTRL-M in SED or VI. Use octal \026 \015. Use tr or perl.

DOS to UNIX

Whenever you transfer files from DOS (Windows) to UNIX boxes it is possible that carriage returns may be added to your text files.  This will sometimes prevent the file from working properly in UNIX.  I have actually had some files work with carriage returns (CR hex 0D).  I know the simple VI command to remove the carriage returns (^M) at the end of each line. And there are several other ways to remove carriage returns.

I ran into a problem on my home router running DD-WRT. It would not allow me to type CTRL-V+CTRL-M to create the carriage return (^M) in vi or anywhere else. I suspect is a busybox shell issue. So I had to find another way. This document shows the standard methods as well as a new method I used on my router. I am not including 'dos2unix' and 'unix2dos' as these may not by installed. I prefer to remember standard methods that should work on all boxes without installing additional software.

Who Uses What

A brief rundown on which OS uses which symbol for a line terminator.

  • LF = linefeed (move cursor down) - CTRL-J / ^J / hex 0A / Sometimes written as NL (newline)
  • CR = carriage return (return cursor to left margin) - CTRL-M / ^M / hex 0D
  • UNIX = LF only
  • DOS = CRLF (each line ends with CR then LF)
  • MAC = CR only

Typing ^M

To create the return character in vi or the command line you normally type "CTRL-V+CTRL-M". That is you press and hold the control key and then press 'v' and then 'm' (without the single quotes) and then release the control key. Another way is to use '\r'. (backslash r) And one more way is to use octal. Type '\026' for 'CTRL-V' and '\015' for 'CTRL-M'.

The Commands

Below are the various methods to edit the file. There are more methods than listed here. One of these is likely to work.

vi

The percent sign (%) will perform the search and replace on all lines. And the 'g' at the end does the search and replace globally and not on just the first instance in the line.

:%s/^M//g

Other ways to accomplish the same task.

:%s/\r//g
:%s/\r\(\n\)/\1/g

This wiki article has some other useful tidbits about VI (VIM) and file formats.

Display the fileformat option (ff) for the current buffer, and the fileformats global option (ffs) which determines how VIM reads and writes files.

:set ff? ffs?
:verbose set ff? ffs? (Helps to see in your vimrc)

Convert from dos/unix to unix

:update             Save any changes
:e ++ff=dos         Edit file again using dos FF (fileformats                                  ignored)
:setlocal ff=unix   This buffer will use LF only
:w                  Write buffer using unix (LF-only) line endings

Convert from dos/unix to dos

:update             Save any changes
:e ++ff=dos         Edit file again using dos FF (fileformats                                  ignored)
:w                  Write buffer using unix (LF-only) line endings

tr (translate)

The '-d' flag deletes the tokens. Input the original file and output the new file. The '\r' (slash r) is another way to type the carriage return.

tr -d '\r' < file > newfile

sed

Search and replace using sed is quite common. You can type ^M various ways. I am showing three methods. The '-e' is for scripting on the command line and running those commands directly. The greater than sign (>) redirects the output to a new file. Otherwise the results are just displayed to stdout (standard out). You can add a 'g' to the end for global replacement just like in vi. The '-r' is for using regex (regular expressions).

sed -e 's/\r//' file > newfile

sed -e 's/^M//' file > newfile

sed -r $'s/\026|\015//g' file > newfile

perl

You can also use a one line perl command. The '-e' is used to enter one line of script to be run by perl. Multiple '-e' commands may be given. The '-p' causes perl to assume a certain type of loop command. Useful for one liner commands. The '-i(extension)' will edit the file in place. The extension is added to the filename. The original file is saved as 'filename.extension'. The edited file is saved as the original filename.

perl -pi.bak -e 's/\r//g' filename

GETOPTS and OPTIND

Understanding OPTIND

Figuring out OPTIND in shell scripting can be difficult.  And if you don't touch a script for a while it is an easy thing to forget.  This is a quick post to help explain OPTIND while using GETOPTS.

Definitions and Explanations

OPTIND Definition = The index of the next argument to be processed by GETOPTS.  By default the system initializes this value to 1.

GETOPTS Flags

  • a: = If a flag is followed by a colon (e.g. 'a:') an argument is required. This is OPTARG.
  • a = A flag with no colon does not require an additional argument.

 Quick Script to Help Explain

The script below will be used to help explain OPTIND and other GETOPTS settings.

#!/bin/bash
#
# Quick and dirty script to explain getopts
# OPTIND SHIFT and $#
#

echo BEFORE GETOPTS OPTIND = "$OPTIND"
while getopts "xzyc:b:e:hv" flag
do
echo FLAG="$flag" OPTIND="$OPTIND" OPTARG="$OPTARG"
done
echo
echo ALL PASSED PARMS \(\$@\) = "$@"
echo Total Num of PARMS \(\$#\) = "$#"
echo SHIFT of OPTIND minus 1 aka
echo OPTIND = "$OPTIND" minus SHIFTING NUM = `expr $OPTIND - 1`
echo Before SHIFT ARG1 \(\$1\) = $1
shift `expr $OPTIND - 1`
echo AFTER SHIFT ARG1 \(\$1\) = $1

#EOF

Examples

Using "xyzc:b:e:hv" as the GETOPTS flags here is how to count the arguments and OPTIND index numbers.

Here are two examples that pass the same options to GETOPTS.  But the OPTIND will be different.

  • OPTION 1:  # ./parse.sh -x -y -z -c c_arg -b b_arg ARG1 ARG2
  • OPTION 2:  # ./parse.sh -xyz -c c_arg -b b_arg ARG1 ARG2

OPTION 1 Explanation

OPTIND starts at 1 for both options.

  • GETOPTS will parse the flag '-x' and the OPTIND index will increase to 2
  • Parse the '-y' flag and the OPTIND index will increase to 3
  • Parse the '-z' flag and the OPTIND index will increase to 4
  • Parse the '-c c_arg' flag and the OPTIND index will increase to 6.  The '-c' will increase the OPTIND.  And the 'c_arg' will be set as OPTARG and also increase the OPTIND index number.
  • Parse the '-b b_arg' flag and the OPTIND index will increase to 8.

The final OPTIND index number is 8.  The total number of parameters ($#) parsed by GETOPTS is 9 after counting both ARG1 and ARG2.

OPTION 2 Explanation

Again OPTIND starts at 1.

  • GETOPTS will parse the flag '-xyz'.  But instead of increasing the index for X and Y the OPTIND index will only increase to 2 after GETOPTS parses 'z'.  Three flags are combined into one flag as none of them require a separate argument (OPTARG).
  • Parse the '-c c_arg' flag and increase OPTIND by 2 just like in OPTION 1.  So OPTIND will change from 2 to 4 after parsing this flag.
  • Parse the '-b b_arg' flag and the and the OPTIND index will increase to 6.

The final OPTIND index number is 6.  The total number of parameters ($#) parsed by GETOPTS is 7 after counting both ARG1 and ARG2.  The '-xyz' flag only counts as one parameter and only increases OPTIND by 1.

Script Output

To make this clearer here is the output of the script after running each option.

OPTION 1:

# ./parse.sh -x -y -z -c c_arg -b b_arg ARG1 ARG2
BEFORE GETOPTS OPTIND = 1
FLAG=x OPTIND=2 OPTARG=
FLAG=y OPTIND=3 OPTARG=
FLAG=z OPTIND=4 OPTARG=
FLAG=c OPTIND=6 OPTARG=c_arg
FLAG=b OPTIND=8 OPTARG=b_arg

ALL PASSED PARMS ($@) = -x -y -z -c c_arg -b b_arg ARG1 ARG2
Total Num of PARMS ($#) = 9
SHIFT of OPTIND minus 1 aka
OPTIND = 8 minus SHIFTING NUM = 7
Before SHIFT ARG1 ($1) = -x
AFTER SHIFT ARG1 ($1) = ARG1


OPTION 2:

# ./parse.sh -xyz -c c_arg -b b_arg ARG1 ARG2
BEFORE GETOPTS OPTIND = 1
FLAG=x OPTIND=1 OPTARG=
FLAG=y OPTIND=1 OPTARG=
FLAG=z OPTIND=2 OPTARG=
FLAG=c OPTIND=4 OPTARG=c_arg
FLAG=b OPTIND=6 OPTARG=b_arg

ALL PASSED PARMS ($@) = -xyz -c c_arg -b b_arg ARG1 ARG2
Total Num of PARMS ($#) = 7
SHIFT of OPTIND minus 1 aka
OPTIND = 6 minus SHIFTING NUM = 5
Before SHIFT ARG1 ($1) = -xyz
AFTER SHIFT ARG1 ($1) = ARG1