Shell practice: Introduction to the sed stream editor
Quick Edit
Sed (Stream EDitor) [1] automates repetitive operations on a text file and is especially effective when used in a shell script and with regular expressions (regex). In this article, I send program output to the screen. If you want to participate and practice, simply use the text files provided [2].
Sed Commands
The program calls up and accepts commands from virtually anywhere. You can pass in commands directly or read them in from a file. The data can be piped, redirected, or input from a text file. The output can be sent to the screen (usually stdout), through a pipe to the next command, or redirected to a destination file. (See the "Sed Call Options" box.) To resolve shell variables, you sometimes need to substitute the "
for the '
character.
Syntax
The basic syntax structure is shown in Figure 1. Everywhere an editing command should be used, addressing is required. You can provide many addresses as long as doing so doesn't affect clarity. If you want to change "everything except," you can negate addressing with the !
character.
To put multiple commands on one line use the -e
option:
sed -e 'command1' -e 'command2' ... -e 'commandN' ....
Or, you can add these commands from a script file.
Script Files
A script file should have a single line for each statement. For example:
s/Gans// s/jo/Jo/g
The first line removes the word Gans – rather, it substitutes nothing for the word, but only for the first instance of the search string. The second line substitutes Jo for jo for all instances because of the g
option.
To create an executable sed script, include the shebang (#!
) interpreter statement on the first line:
#!/bin/sed -f
If you make the script executable (e.g., chmod 700 [SCRIPTNAME]
), you can call it like any other program. You wouldn't normally use this option. Rather, you would put sed and any script file calls in a shell script. In some cases, the order of the commands matters. Test your scripts before making them "real" to avoid errors and data loss.
Sample Data
To exercise your sed skills you can use the textdata.txt
file in Listing 1. This file contains empty lines, typos, and other errors. The second sample file I'll use in this article is called testlist.txt
(Listing 2) and contains dates formatted in a number of different ways.
Listing 1: textdata.txt
chris hemsworth - Thor 0885465468798746 Scarlett Johansson - Black Widow 08755466584 Robert Downey - Iron Man 0987654321 Mark Ruffalo - Hulk 0405458765143321 Chris Evans - Captain America 0548/9988776655 Jeremy renner - Hawkeye 555/8812470 Tom Hiddleston - Loki 87841487014848 Samuel Jackson - Nick Fury 043/956026386 Cobie Smulders - Maria Hill 23514560145 Hugh jackman - Wolverine 801539193 Paul Rudd - Ant Man 497349000
Listing 2: testlist.txt
22 April 1984 7.04.1985 30 March 1986 19 April 1987 03.04.1988 26 March 1989 15 April 1990 31-March-1991 19 April 1992 11 April 1993 3 April 1994 16. April 1995 7 April 1996 30 March 1997 12 April 1998
Regular Expressions
Regular expressions are used in sed to describe string patterns. The more regex you use, the more complex the statement and the more confusing the command can be to understand. Some characters are valid both as special shell characters and as regex instructions, so you need to "escape" them with the \
character (Table 1). The construct [ABC]
means "contains A or B or C," whereas the construct /ABC/
means "contains exactly that string."
Tabelle 1: Special Characters
Character |
Function |
---|---|
|
Opens statement |
|
Ends statement |
|
Opens optional statement |
|
Closes optional statement |
|
Opens a list of characters |
|
Closes a list of characters |
|
Masks a statement in which shell variables are resolved |
|
Masks a statement in which shell variables not resolved |
|
Encloses a statement block |
|
Any character other than a newline |
|
Separates parameters, such as line items |
|
Sets labels ( |
|
End of document, end of line or last line |
|
Placeholder for search patterns, included in the replacement statement |
|
OR (regex separator) |
|
Separator in editing commands |
|
Beginning of line, or negation in a search pattern |
|
Escape character |
|
After a line number: do not output this line |
|
0 or any number of times |
|
Pattern present at least once |
|
Output line number |
|
Newline, line feed |
|
Tab character |
Options and Editing Commands
Confusingly, sed has both options and commands with options. As is usual in Linux, options are preceded by the -
character. The command options follow the command. Tables 2-4 provide an overview.
Tabelle 2: Sed Options
Action |
Function |
---|---|
Execute command (can usually be omitted) |
|
Disable data buffering |
|
Treat files separately |
|
Use extended regex |
|
Create backup file |
|
Read and execute script file |
|
Suppress (unaffected) text areas |
|
Show version |
|
Tabelle 3: Editing Commands
Action |
Command |
---|---|
Add lines above this one |
|
Add lines below this one |
|
Output this line |
|
Output this line with a maximum length |
|
Replace signs with others |
|
End sed |
|
Replace text in this line |
|
Delete this line |
|
Search and replace |
|
Tabelle 4: Editing Command Options
Action |
Option |
---|---|
Output line number |
|
All occurrences |
|
Outputs modified line with the |
|
Write the edited line in the file |
|
Searching
The search function, among other things, can be used to replace text sections, in which the search query represents the addressing. You can also use regex for search patterns. Table 5 shows some of the possibilities, with some examples in Table 6, and Figures 2-7 show some of the results. In these tables, sed both pipes in a data stream and directly accesses a text file.
Tabelle 5: Patterns and Addressing
Action |
Pattern |
---|---|
All lines |
(null) |
Line 25 |
|
Not line 25 |
|
Lines 10 through 20 |
|
Last line |
|
Not pattern |
|
Character at beginning of line |
|
String |
|
Character set |
|
Any character |
|
Lowercase |
|
Uppercase |
|
Alphanumeric |
|
Digit |
|
Hexadecimal digit |
|
Tab and space |
|
Space |
|
Control character |
|
Printable characters (no control characters) |
|
Visible characters (without spaces) |
|
Punctuation |
|
Tabelle 6: Sample Searches and Patterns
Search for |
Pattern |
Example |
Figure |
---|---|---|---|
Term, Name |
|
|
– |
All lines containing 'man' or 'Man' |
|
|
2 |
All lines except 3 through 5 |
|
|
3 |
All lines except those containing 'Man' |
|
|
4 |
Lines containing 'H' or 'G' |
|
|
– |
Lines not containing 'H' or 'G' |
|
|
5 |
Line 3 |
|
|
– |
Last line |
|
|
– |
Multiple patterns: Do not output lines containing an 'R' somewhere and an 'M' somewhere else |
|
|
6 |
All lines containing some alphanumeric characters (i.e., not all spaces) |
|
|
7 |
Note the output of the following command:
sed -n '/[C]/,/[c]/!'p textdata.txt chris hemsworth - Thor ... Scarlett Johansson - Black Widow ... Robert Downey - Iron Man ... Mark Ruffalo - Hulk ... Paul Rudd - Ant Man ...
Only lines that don't fall between the first occurrence of C
(Chris Evans/Cobie Smulders) and c
(Samuel Jackson/Hugh jackman) are output. If you reverse the letters, putting the lowercase c
before the uppercase,
sed -n '/[c]/,/[C]/!'p textdata.txt Jeremy renner - Hawkeye ... Tom Hiddleston - Loki ...
you only get two lines. Everything between chris and Chris (inclusive), between Jackson and Cobie (just those lines), and between jackman and EOF are suppressed.
If you want be absolutely certain that sed is doing what you want, you can combine several simple calls with pipes. The following command suppresses empty lines and lines with Man (Figure 8):
cat textdata.txt | sed -n '/[:alnum:]/'p | sed -n '/Man/!'p
Substituting and Removing
With the s
command, you can replace matched expressions. The length of search-and-replace strings is irrelevant. The detailed syntax is shown in Figure 9.
You can limit the search and replace statement to specific lines by preceding the command with the line number, as shown here:
sed -n '5s/OLD/NEW/p' [TEXTFILE]
Or, for a range of lines, use:
sed -n '1,4/OLD/NEW/p' [TEXTFILE]
You can also suppress changes to certain lines using the exclamation mark:
sed -n '20-80!s/OLD/NEW/p' [TEXTFILE]
Furthermore, you can limit changes to lines that contain certain strings or patterns that are not the same as those used for the search-and-replace statement,
sed -n '/[STRING|PATTERN]/s/OLD/NEW/gp' [TEXTFILE]
and you can delete the matched string with an empty string.
The first occurrence of the search string on a line is processed. To replace all instances, add the g
(greedy) option at the end of the statement. The stream editor can be a silent partner if the -n
option is set, so if you want to see what's going on, add the p
(print) option. You can also write results to an output file with w
(write). Table 7 shows some short examples.
Tabelle 7: Sample Search and Replace
Action |
Example |
Figure |
---|---|---|
Replace pattern at the first occurrence only |
|
10 |
Replace pattern at every occurrence |
|
10 |
Delete the word 'Man' |
|
11 |
Replace 'Iron' with 'Tin' on line 4 |
|
12 |
Replace '0' with '089' on all lines containing 'Man' or 'man' |
|
13 |
Replace '0' with '089' on all lines except those containing 'Man' or 'man' |
|
14 |
Delete all numbers and slashes (/) and hyphens (-) |
|
15 |
The more complex example in Figure 16 converts the inconsistently formatted date syntax in the testlist.txt
file to a common, unified (European) date format DD/MM/YYYY. Be sure to press Enter immediately after the backslash at the line's end. Alternatively, you can omit the backslash and let the command wrap; the pipe character connects with the lines that follow; however, this results in a less readable screen display.
The list is read in the first line and the following lines each pipe their output to the next command: Take any present leading space characters and substitute the number 0
; replace any minus signs in dates with spaces; substitute any month written as a word with its numeric value followed by a slash; substitute any two-digit number at the beginning of a line (^
), 0 through 3 and any digit, and any space character with "itself" (&
) followed by a slash.
To make the search pattern repeatable during the replacement, enclose it in parentheses – which you have to be sure to escape with \
. The final sed statements delete all existing space characters (through s
command's option g
).
The uniq
command on the last line ensures that all duplicate lines are uniquely output. You can also "carry over" all or part of the original string into the replacement patterns in the replacement statement. Check out the following example:
echo "happy" | sed -n s'/happy/un&/'p
This example replaces happy with unhappy. You can also convert characters from lowercase to uppercase:
cat textdata.txt | sed -n s'/\([[:lower:]]\)/\U&/'pg
The \U
before the &
indicates the output must be converted into uppercase. You can do the following:
cat textdata.txt | sed -n s'/\([[:upper:]]/\L&/'pg
to convert from uppercase to lowercase.
Character Replacement
For character filtering, use the y
option. The pattern should contain all the characters that need to be replaced, and the replacement statement should have the same number of characters. The command structure should only have s
, and -n
should be omitted:
sed y'/[Search CHAR]/[Replacement CHAR]/'
Substitute the first character of only the lines in textdata.txt
that begin with lowercase c with uppercase C (Figure 17).
You use c
to replace entire lines,
sed 'PATTERN'c'REPLACEMENT'
or like this:
sed [LINE(n)] c'REPLACEMENT'
The example in Figure 18 replaces an empty line with a series of dashes.
In place of a search pattern, you can use line numbers. Be aware that even if you specify multiple line numbers they will all be replaced by a single instance of the replacement string. If you choose three lines, for example, it will look like the first line is replaced and the second and third lines are deleted.
The top example in Figure 19 replaces the blank line with a series of hash marks. The bottom example removes lines 2 through 4 and inserts a given line of text.
The d
option deletes lines that match a pattern or line numbers:
sed '/PATTERN/'d sed [LINE(n)]d
Using the commands in Figure 20, you can search for and delete an empty line and then delete the fourth line.
Adding and Inserting
You add lines in a file by stating a line number or after (a
) and before (i
) text found with a search pattern (Figure 21). If you enter multiple line numbers or the pattern matches multiple times, the insertion occurs for each instance.
In the first sed command, a new line is added above the first line in the file; in the next command, it's added at the end ($
). The command at the next prompt adds a new line above the matched search pattern, and the next line adds it below.
Shell Variables
If a shell variable needs to be resolved, you need to enclose the statements in double quotes ("
) instead of single quotes ('
). The short shell script in Listing 3 shows how to handle variables by searching through the sample file and outputting the matching lines. Figure 22 shows the result.
Listing 3: searchString.sh
01 #! /bin/sh 02 echo -n "Enter search string: ";read sstring 03 cat textdata.txt | sed -n "/$sstring/"p
Conclusion
With sed, you can execute complex text manipulation commands without intervention. Its cryptic syntax encourages building scripts one bit at a time.