NUTS AND BOLTS sed Lead image: Lead Image © Dmitriy Sladkov, 123RF.com
Lead Image © Dmitriy Sladkov, 123RF.com
 

Shell practice: Introduction to the sed stream editor

Quick Edit

With sed, you can edit text data without an interactive user interface, using pipes or input redirection. Sed lets you execute extensive editing commands on a single line. By Harald Zisler

Sed (Stream EDitor) [1] automates repetitive operations on a text file and is especially effective when used in a shell script and with regular expressions (regex). In this article, I send program output to the screen. If you want to participate and practice, simply use the text files provided [2].

Sed Commands

The program calls up and accepts commands from virtually anywhere. You can pass in commands directly or read them in from a file. The data can be piped, redirected, or input from a text file. The output can be sent to the screen (usually stdout), through a pipe to the next command, or redirected to a destination file. (See the "Sed Call Options" box.) To resolve shell variables, you sometimes need to substitute the " for the ' character.

Syntax

The basic syntax structure is shown in Figure 1. Everywhere an editing command should be used, addressing is required. You can provide many addresses as long as doing so doesn't affect clarity. If you want to change "everything except," you can negate addressing with the ! character.

Sed syntax structure.
Figure 1: Sed syntax structure.

To put multiple commands on one line use the -e option:

sed -e 'command1' -e 'command2' ... -e 'commandN' ....

Or, you can add these commands from a script file.

Script Files

A script file should have a single line for each statement. For example:

s/Gans//
s/jo/Jo/g

The first line removes the word Gans – rather, it substitutes nothing for the word, but only for the first instance of the search string. The second line substitutes Jo for jo for all instances because of the g option.

To create an executable sed script, include the shebang (#!) interpreter statement on the first line:

#!/bin/sed -f

If you make the script executable (e.g., chmod 700 [SCRIPTNAME]), you can call it like any other program. You wouldn't normally use this option. Rather, you would put sed and any script file calls in a shell script. In some cases, the order of the commands matters. Test your scripts before making them "real" to avoid errors and data loss.

Sample Data

To exercise your sed skills you can use the textdata.txt file in Listing 1. This file contains empty lines, typos, and other errors. The second sample file I'll use in this article is called testlist.txt (Listing 2) and contains dates formatted in a number of different ways.

Listing 1: textdata.txt

chris hemsworth - Thor 0885465468798746
Scarlett Johansson - Black Widow 08755466584
Robert Downey - Iron Man 0987654321
Mark Ruffalo - Hulk 0405458765143321
Chris Evans - Captain America 0548/9988776655
Jeremy renner - Hawkeye 555/8812470
Tom Hiddleston  - Loki 87841487014848
Samuel Jackson - Nick Fury 043/956026386
Cobie Smulders - Maria Hill 23514560145
Hugh jackman - Wolverine 801539193
Paul Rudd - Ant Man 497349000

Listing 2: testlist.txt

22 April 1984
 7.04.1985
30 March 1986
19 April 1987
03.04.1988
26 March 1989
15 April 1990
31-March-1991
19 April 1992
11 April 1993
 3 April 1994
16. April 1995
 7 April 1996
30 March 1997
12 April 1998

Regular Expressions

Regular expressions are used in sed to describe string patterns. The more regex you use, the more complex the statement and the more confusing the command can be to understand. Some characters are valid both as special shell characters and as regex instructions, so you need to "escape" them with the \ character (Table 1). The construct [ABC] means "contains A or B or C," whereas the construct /ABC/ means "contains exactly that string."

Tabelle 1: Special Characters

Character

Function

(

Opens statement

)

Ends statement

{

Opens optional statement

}

Closes optional statement

[

Opens a list of characters

]

Closes a list of characters

"

Masks a statement in which shell variables are resolved

'

Masks a statement in which shell variables not resolved

`

Encloses a statement block

.

Any character other than a newline

,

Separates parameters, such as line items

:

Sets labels (t and b command)

$

End of document, end of line or last line

&

Placeholder for search patterns, included in the replacement statement

|

OR (regex separator)

/

Separator in editing commands

^

Beginning of line, or negation in a search pattern

\

Escape character

!

After a line number: do not output this line

*

0 or any number of times

+

Pattern present at least once

=

Output line number

\n

Newline, line feed

\t

Tab character

Options and Editing Commands

Confusingly, sed has both options and commands with options. As is usual in Linux, options are preceded by the - character. The command options follow the command. Tables 2-4 provide an overview.

Tabelle 2: Sed Options

Action

Function

Execute command (can usually be omitted)

-e

Disable data buffering

-u

Treat files separately

-s

Use extended regex

-r

Create backup file

-i [FILEEXTENSION]

Read and execute script file

-f [SCRIPTFILE]

Suppress (unaffected) text areas

-n

Show version

-v

Tabelle 3: Editing Commands

Action

Command

Add lines above this one

i

Add lines below this one

a

Output this line

p

Output this line with a maximum length

l [LENGTH]

Replace signs with others

y

End sed

q

Replace text in this line

c

Delete this line

d

Search and replace

s

Tabelle 4: Editing Command Options

Action

Option

Output line number

=

All occurrences

g

Outputs modified line with the s editing command

p

Write the edited line in the file

w

Searching

The search function, among other things, can be used to replace text sections, in which the search query represents the addressing. You can also use regex for search patterns. Table 5 shows some of the possibilities, with some examples in Table 6, and Figures 2-7 show some of the results. In these tables, sed both pipes in a data stream and directly accesses a text file.

Tabelle 5: Patterns and Addressing

Action

Pattern

All lines

(null)

Line 25

25

Not line 25

25!

Lines 10 through 20

10,20

Last line

$

Not pattern

'/PATTERN/!'

Character at beginning of line

^CHAR

String

/STRING/

Character set

[CHARS]

Any character

[:alpha:]

Lowercase

[:lower:]

Uppercase

[:upper:]

Alphanumeric

[:alnum:]

Digit

[:digit:]

Hexadecimal digit

[:xdigit:]

Tab and space

[:blank:]

Space

[:space:]

Control character

[:cntrl:]

Printable characters (no control characters)

[:print:]

Visible characters (without spaces)

[:graph:]

Punctuation

[:punct:]

Tabelle 6: Sample Searches and Patterns

Search for

Pattern

Example

Figure

Term, Name

'/TERM/'

cat textdata.txt | sed -n '/Evans/p'

All lines containing 'man' or 'Man'

'/[Mm]an/p'

sed -n '/[Mm]an/p' textdata.txt

2

All lines except 3 through 5

'3,5!'

sed -n '3,5!'p textdata.txt

3

All lines except those containing 'Man'

'/Man/!'

sed -n '/Man/!'p textdata.txt

4

Lines containing 'H' or 'G'

'/[H|G]/'

sed -n '/[H|G]/'p textdata.txt

Lines not containing 'H' or 'G'

'/[H]\|[G]/!'

sed -n '/[H]\|[G]/!'p textdata.txt

5

Line 3

3

cat textdata.txt | sed -n '3p'

Last line

'$p'

cat textdata.txt | sed -n '$p'

Multiple patterns: Do not output lines containing an 'R' somewhere and an 'M' somewhere else

'/[R]./,/[M]./!'

sed -n '/[R]./,/[M]./!'p textdata.txt

6

All lines containing some alphanumeric characters (i.e., not all spaces)

'/[:alnum:]/'

cat textdata.txt | sed -n '/[:alnum:]/'p

7

Output all lines except the third through fifth.
Figure 3: Output all lines except the third through fifth.
Output of lines except those containing 'Man'.
Figure 4: Output of lines except those containing 'Man'.
Output all lines except those containing 'H' or 'G'.
Figure 5: Output all lines except those containing 'H' or 'G'.
Output lines except those containing an 'R' somewhere and an 'M' somewhere else.
Figure 6: Output lines except those containing an 'R' somewhere and an 'M' somewhere else.
Output all lines that contain alphanumeric characters (no empty lines or lines containing only spaces, tabs, etc.).
Figure 7: Output all lines that contain alphanumeric characters (no empty lines or lines containing only spaces, tabs, etc.).

Note the output of the following command:

sed -n '/[C]/,/[c]/!'p textdata.txt
chris hemsworth - Thor ...
Scarlett Johansson - Black Widow ...
Robert Downey - Iron Man ...
Mark Ruffalo - Hulk ...
Paul Rudd - Ant Man ...

Only lines that don't fall between the first occurrence of C (Chris Evans/Cobie Smulders) and c (Samuel Jackson/Hugh jackman) are output. If you reverse the letters, putting the lowercase c before the uppercase,

sed -n '/[c]/,/[C]/!'p textdata.txt
Jeremy renner - Hawkeye ...
Tom Hiddleston - Loki ...

you only get two lines. Everything between chris and Chris (inclusive), between Jackson and Cobie (just those lines), and between jackman and EOF are suppressed.

If you want be absolutely certain that sed is doing what you want, you can combine several simple calls with pipes. The following command suppresses empty lines and lines with Man (Figure 8):

Processing several instances of sed using a pipe.
Figure 8: Processing several instances of sed using a pipe.
cat textdata.txt | sed -n '/[:alnum:]/'p | sed -n '/Man/!'p

Substituting and Removing

With the s command, you can replace matched expressions. The length of search-and-replace strings is irrelevant. The detailed syntax is shown in Figure 9.

Syntax of the search-and-replace statement.
Figure 9: Syntax of the search-and-replace statement.

You can limit the search and replace statement to specific lines by preceding the command with the line number, as shown here:

sed -n '5s/OLD/NEW/p' [TEXTFILE]

Or, for a range of lines, use:

sed -n '1,4/OLD/NEW/p' [TEXTFILE]

You can also suppress changes to certain lines using the exclamation mark:

sed -n '20-80!s/OLD/NEW/p' [TEXTFILE]

Furthermore, you can limit changes to lines that contain certain strings or patterns that are not the same as those used for the search-and-replace statement,

sed -n '/[STRING|PATTERN]/s/OLD/NEW/gp' [TEXTFILE]

and you can delete the matched string with an empty string.

The first occurrence of the search string on a line is processed. To replace all instances, add the g (greedy) option at the end of the statement. The stream editor can be a silent partner if the -n option is set, so if you want to see what's going on, add the p (print) option. You can also write results to an output file with w (write). Table 7 shows some short examples.

Tabelle 7: Sample Search and Replace

Action

Example

Figure

Replace pattern at the first occurrence only

cat textdata.txt | sed -n 's/e/E/p'

10

Replace pattern at every occurrence

cat textdata.txt | sed -n 's/e/E/gp'

10

Delete the word 'Man'

sed -n 's/Man//gp' textdata.txt

11

Replace 'Iron' with 'Tin' on line 4

cat textdata.txt | sed -n '4s/Iron/Tin/gp'

12

Replace '0' with '089' on all lines containing 'Man' or 'man'

sed -n '/[Mm]an/s/0/089/gp' textdata.txt

13

Replace '0' with '089' on all lines except those containing 'Man' or 'man'

sed -n '/[Mm]an/!s/0/089/gp' textdata.txt

14

Delete all numbers and slashes (/) and hyphens (-)

cat textdata.txt | sed -n s'/[0-9\/-]//'gp

15

Using the "greedy" (g) option.
Figure 10: Using the "greedy" (g) option.
Deleting the word 'Man'.
Figure 11: Deleting the word 'Man'.
Limiting the search and replace to one line.
Figure 12: Limiting the search and replace to one line.
Limiting the search and replace to selected lines.
Figure 13: Limiting the search and replace to selected lines.
Excluding lines for the search-and-replace statement.
Figure 14: Excluding lines for the search-and-replace statement.
Deleting numbers and symbols from lines.
Figure 15: Deleting numbers and symbols from lines.

The more complex example in Figure 16 converts the inconsistently formatted date syntax in the testlist.txt file to a common, unified (European) date format DD/MM/YYYY. Be sure to press Enter immediately after the backslash at the line's end. Alternatively, you can omit the backslash and let the command wrap; the pipe character connects with the lines that follow; however, this results in a less readable screen display.

Formatting dates.
Figure 16: Formatting dates.

The list is read in the first line and the following lines each pipe their output to the next command: Take any present leading space characters and substitute the number 0; replace any minus signs in dates with spaces; substitute any month written as a word with its numeric value followed by a slash; substitute any two-digit number at the beginning of a line (^), 0 through 3 and any digit, and any space character with "itself" (&) followed by a slash.

To make the search pattern repeatable during the replacement, enclose it in parentheses – which you have to be sure to escape with \. The final sed statements delete all existing space characters (through s command's option g).

The uniq command on the last line ensures that all duplicate lines are uniquely output. You can also "carry over" all or part of the original string into the replacement patterns in the replacement statement. Check out the following example:

echo "happy" | sed -n s'/happy/un&/'p

This example replaces happy with unhappy. You can also convert characters from lowercase to uppercase:

cat textdata.txt | sed -n s'/\([[:lower:]]\)/\U&/'pg

The \U before the & indicates the output must be converted into uppercase. You can do the following:

cat textdata.txt | sed -n s'/\([[:upper:]]/\L&/'pg

to convert from uppercase to lowercase.

Character Replacement

For character filtering, use the y option. The pattern should contain all the characters that need to be replaced, and the replacement statement should have the same number of characters. The command structure should only have s, and -n should be omitted:

sed y'/[Search CHAR]/[Replacement CHAR]/'

Substitute the first character of only the lines in textdata.txt that begin with lowercase c with uppercase C (Figure 17).

One-for-one character replacement.
Figure 17: One-for-one character replacement.

You use c to replace entire lines,

sed 'PATTERN'c'REPLACEMENT'

or like this:

sed [LINE(n)] c'REPLACEMENT'

The example in Figure 18 replaces an empty line with a series of dashes.

Replacing an entire line matched to a search pattern.
Figure 18: Replacing an entire line matched to a search pattern.

In place of a search pattern, you can use line numbers. Be aware that even if you specify multiple line numbers they will all be replaced by a single instance of the replacement string. If you choose three lines, for example, it will look like the first line is replaced and the second and third lines are deleted.

The top example in Figure 19 replaces the blank line with a series of hash marks. The bottom example removes lines 2 through 4 and inserts a given line of text.

Replacing whole lines by line number.
Figure 19: Replacing whole lines by line number.

The d option deletes lines that match a pattern or line numbers:

sed '/PATTERN/'d
sed [LINE(n)]d

Using the commands in Figure 20, you can search for and delete an empty line and then delete the fourth line.

Deleting lines.
Figure 20: Deleting lines.

Adding and Inserting

You add lines in a file by stating a line number or after (a) and before (i) text found with a search pattern (Figure 21). If you enter multiple line numbers or the pattern matches multiple times, the insertion occurs for each instance.

Searching for uppercase (Man) or lowercase (man).
Figure 2: Searching for uppercase (Man) or lowercase (man).
Adding lines.
Figure 21: Adding lines.

In the first sed command, a new line is added above the first line in the file; in the next command, it's added at the end ($). The command at the next prompt adds a new line above the matched search pattern, and the next line adds it below.

Shell Variables

If a shell variable needs to be resolved, you need to enclose the statements in double quotes (") instead of single quotes ('). The short shell script in Listing 3 shows how to handle variables by searching through the sample file and outputting the matching lines. Figure 22 shows the result.

Listing 3: searchString.sh

01 #! /bin/sh
02 echo -n "Enter search string: ";read sstring
03 cat textdata.txt | sed -n "/$sstring/"p
Handling variables in Bash scripts.
Figure 22: Handling variables in Bash scripts.

Conclusion

With sed, you can execute complex text manipulation commands without intervention. Its cryptic syntax encourages building scripts one bit at a time.