NUTS AND BOLTS Data Entry Shell Masks Lead image: Dmitriy Syechin, 123RF

Data acquisition with shell tools

Wheat and Chaff

If you need to enter large quantities of data manually, input masks in a terminal are often faster than a GUI. By Harald Zisler

Companies that need to enter large volumes of data manually often take on temporary staff and allow little time to train their new recruits. Additionally, standard programs such as spreadsheets can be hard to understand and thus might not be an option for data entry.

At this point, companies might turn to the IT department for a more intuitive approach. Data entry scripts that prompt the data entry staff in a very simple way can sometimes provide a solution.

Terminal programs exist for any operating system, and they can support access via dialup connections, encrypted Internet connections, internal networks, or even the serial port.

Masks for data entry programs also are easily coded and modified, thus removing the need for complex provisioning at the desktop. In the simplest of cases, the data entry script will be based directly on the structure of the data you need to enter.

Ergonomics and Tools

Mask-oriented data entry scripts (Figure 1) query the user for the fields in the order in which they appear on printed forms (Figure 2).

Figure 1: Mask-oriented data entry script based on the form shown in Figure 2.

Figure 2: Data entry form with fixed fields.

In contrast, universally deployable scripts work with entry data that is based on indexes. In a separate step, the information media are manually edited before proceeding to data entry. In some cases, the data entry staff also merges the input values and the field labels. This process is useful if the data is complex and the structure cannot be rendered in a fixed mask.

The advantage here is that the program is used in the same way no matter what data you are working with. In most cases, the order in which the data is entered is not important.

As the script example shows, the indexes, referred to as keys in this case, handle the submask functions. You can use this technique to define whether you are entering new records or modifying existing ones, whether the records are destined for the expense or income accounts, and so on. The instruction sheet doesn't need to be a fixed form; instead, it must include keys, indexes, and shortcuts.

Figure 3 shows where data entry for the receipt starts. The key (fuel costs) is entered first and applies until the data entry employee interrupts the loop. The employee then simply enters the field numbers with the accompanying values (vehicle number in this case).

Figure 3: Starting to enter a fuel receipt.

The script displays the data entered until the entry loop terminates (Figure 4). The data entry employee thus has a view of what has been typed from the sheet for the current key and can identify any errors.

Figure 4: Fuel receipt entry: Displaying the entered data.

Data entry is more user friendly if users can use the number pad to type figures. If the database uses a comma as the decimal separator, the script can handle the conversion to a period:

echo -n "Net price: ";read input
net=$(echo $input | tr \, \.)

This functionality removes the need for users to switch between the alphanumeric keys and the numeric pad on the keyboard.

In many cases, users repeatedly need to enter the same, or very similar, data. Inserting a default value into the input line prevents errors and saves time.

Some versions of read allow you to set a prompt and thus save the effort of an echo command, but they do not allow you to set defaults for editing. The readpreprompt command [1] gives administrators a convenient approach of implementing this in shell script:

vbyear="2010"
year=`readpreprompt "Calendaryear: " $vbyear`

The readpreprompt command expects a prompt and a default surrounded by backticks, which you can either hard code or feed in using a variable.

The use of intuitive shortcuts, which are expanded by the scripts, is generally a big help for users. This approach also speeds up data entry and decreases input errors.

At the end of the input, the script displays the new record once more and offers the user the option of correcting the entry before finally sending the record to the input file or database. Batch sums and checksums are also useful for identifying data entry errors.

Value and Plausibility Checking

Input errors can be identified by reading the input (before saving the data) or, in more complex cases, by dual entry. Plausibility checks are hard-coded into the data entry script (preventing dual entry lines; e.g., as in Listing 1 and Figure 5); for more complex tasks, the information on which the checks are based can be swapped out to more easily editable control files or retrieved by querying a database. The script then loads the controls as needed.

Listing 1: Offline Data Entry Program

01 #! /bin/sh
02 clear
03 while true
04 do
05 echo "New record"
06 maxrecord=`ls -1 *.dat | sort | tail -1 | cut -d. -f 1`
07 newrecord=$(echo $maxrecord + 1 | bc )
08 width=`echo $newrecord | wc -c`
09
10 ...
11
12 # If the line already exists ....
13 if [ $back -eq 0 ];
14    then
15    echo "############### Line already exists! #############"
16    grep $keyfield $newdat
17    echo "------------------------------------------------------"
18    echo -n "Replace stored data line (y)? ";read we
19    if [ "$we" = "y" ];
20       then
21
22       # Delete line and write to
23       # temporary file
24
25       # Build sed instruction
26       resp=$(echo "sed '/$keyfield'/d")
27
28       # Execute with eval
29       cat $newdat | eval $resp > $newdat.tmp
30
31       # Rename temporary file again
32       mv $newdat.tmp $newdat
33
34       # Add new data line
35       echo $input >> $newdat
36
37    fi
38 else
39    echo $input >> $newdat
40 fi
41
42 clear
43
44 done
45 # Calculate checksum invoice amount
46
47 a=`cat $newdat | grep "50|21|" | cut -d\| -f3`
48 b=`cat $newdat | grep "50|25|" | cut -d\| -f3`
49
50 a=$(echo $a | tr \, \.)
51 b=$(echo $b | tr \, \.)
52
53
54 psum=$(echo "$a+$b" | bc )
55 echo -n "Invoice amount: $psum, continue .....";read we
56
57 done
58 done

Figure 5: Plausibility checks warn the user of possible input errors.

Creating checksums (number of lines, sum of numeric values) helps ensure that records are complete. The number of lines is returned by a call to wc -l. Listing 1 also shows an example of calculating a checksum when calculating the invoice amount for the fuel receipt. The complete listing is available from the Admin magazine website [2].

Online and Offline Data Entry

Online data entry sends the data directly to the database. The shell script opens a connection to the database, generates SQL instructions, and passes them into the database. The advantage of this approach is that your records are always up to date.

Shell sessions for online input can even use a dialup connection. Users of Windows operating systems can use the PuTTY [3] SSH terminal, which supports the required encrypted terminal access to the server system. A small script opens the database connection (Listing 2).

Listing 2: Database Connection

01 #! /bin/sh
02 echo "Demo for database write access"
03 rm input.sql
04 touch input.sql
05 sleep 3
06 clear
07
08 # Entry
09
10 while true
11 do
12 echo -n "Text entry: ";read text
13 echo -n "Numeric entry: ";read number
14 echo -n "Save (s) End (e) ";read we
15 if [ "$we" = "s" ];
16    then
17    echo "Insert into demo values ('$text',$number);" >> input.sql
18 elif [ "$we" = "e" ];
19    then
20    # Write to database
21    psql -d harald -f input.sql
22    exit
23 fi
24 done

In contrast, data entry and database updates occur separately in offline mode. After entry, the data is sent to the database en bloc, thus avoiding the need for a permanent connection to the database or application server.

Additionally, hybrid approaches are possible. For example, you could open a connection to the database server after entering a stack of receipts and submit the cached commands. This approach reduces network load on the one hand and keeps the database up to date in batches.

Data from offline entries can be prepared for writing to the database with the use of grep, cut, tr, sed, or awk. When the data has been prepared, Listing 3 creates the required SQL code.

Listing 3: Generating SQL Code

01 #! /bin/sh
02 echo -n "Name database input file: ";read dn
03 for i in `ls -1 *.dat`
04 do
05 echo $i
06
07 # Parse data
08 vehicle=$(cat $i | grep "50|10|" | cut -d \| -f 3)
09 mileage=$(cat $i | grep "50|15|" | cut -d \| -f 3)
10 date=$(cat $i | grep "50|17|" | cut -d \| -f 3)
11 quantity=$(cat $i | grep "50|20|" | cut -d \| -f 3 | tr \, \.)
12 net=$(cat $i | grep "50|21|" | cut -d \| -f 3 | tr \, \.)
13 tax=$(cat $i | grep "50|25|" | cut -d \| -f 3 | tr \, \.)
14
15 # Write SQL instruction file
16 echo "Insert into fuel values ('$vehicle',$mileage,'$date',$quantity, $net, $tax)" >> $dn

Conclusions

Shell scripts are easy to write, cause virtually no load on the system, and are a very useful tool in manual data entry. Temporary workers without IT skills can put in useful work after a short introductory session.