Data acquisition with shell tools
Wheat and Chaff
Companies that need to enter large volumes of data manually often take on temporary staff and allow little time to train their new recruits. Additionally, standard programs such as spreadsheets can be hard to understand and thus might not be an option for data entry.
At this point, companies might turn to the IT department for a more intuitive approach. Data entry scripts that prompt the data entry staff in a very simple way can sometimes provide a solution.
Terminal programs exist for any operating system, and they can support access via dialup connections, encrypted Internet connections, internal networks, or even the serial port.
Masks for data entry programs also are easily coded and modified, thus removing the need for complex provisioning at the desktop. In the simplest of cases, the data entry script will be based directly on the structure of the data you need to enter.
Ergonomics and Tools
Mask-oriented data entry scripts (Figure 1) query the user for the fields in the order in which they appear on printed forms (Figure 2).
In contrast, universally deployable scripts work with entry data that is based on indexes. In a separate step, the information media are manually edited before proceeding to data entry. In some cases, the data entry staff also merges the input values and the field labels. This process is useful if the data is complex and the structure cannot be rendered in a fixed mask.
The advantage here is that the program is used in the same way no matter what data you are working with. In most cases, the order in which the data is entered is not important.
As the script example shows, the indexes, referred to as keys in this case, handle the submask functions. You can use this technique to define whether you are entering new records or modifying existing ones, whether the records are destined for the expense or income accounts, and so on. The instruction sheet doesn't need to be a fixed form; instead, it must include keys, indexes, and shortcuts.
Figure 3 shows where data entry for the receipt starts. The key (fuel costs) is entered first and applies until the data entry employee interrupts the loop. The employee then simply enters the field numbers with the accompanying values (vehicle number in this case).
The script displays the data entered until the entry loop terminates (Figure 4). The data entry employee thus has a view of what has been typed from the sheet for the current key and can identify any errors.
Data entry is more user friendly if users can use the number pad to type figures. If the database uses a comma as the decimal separator, the script can handle the conversion to a period:
echo -n "Net price: ";read input net=$(echo $input | tr \, \.)
This functionality removes the need for users to switch between the alphanumeric keys and the numeric pad on the keyboard.
In many cases, users repeatedly need to enter the same, or very similar, data. Inserting a default value into the input line prevents errors and saves time.
Some versions of read
allow you to set a prompt and thus save the effort of an echo
command, but they do not allow you to set defaults for editing. The readpreprompt
command [1] gives administrators a convenient approach of implementing this in shell script:
vbyear="2010" year=`readpreprompt "Calendaryear: " $vbyear`
The readpreprompt
command expects a prompt and a default surrounded by backticks, which you can either hard code or feed in using a variable.
The use of intuitive shortcuts, which are expanded by the scripts, is generally a big help for users. This approach also speeds up data entry and decreases input errors.
At the end of the input, the script displays the new record once more and offers the user the option of correcting the entry before finally sending the record to the input file or database. Batch sums and checksums are also useful for identifying data entry errors.
Value and Plausibility Checking
Input errors can be identified by reading the input (before saving the data) or, in more complex cases, by dual entry. Plausibility checks are hard-coded into the data entry script (preventing dual entry lines; e.g., as in Listing 1 and Figure 5); for more complex tasks, the information on which the checks are based can be swapped out to more easily editable control files or retrieved by querying a database. The script then loads the controls as needed.
Listing 1: Offline Data Entry Program
01 #! /bin/sh 02 clear 03 while true 04 do 05 echo "New record" 06 maxrecord=`ls -1 *.dat | sort | tail -1 | cut -d. -f 1` 07 newrecord=$(echo $maxrecord + 1 | bc ) 08 width=`echo $newrecord | wc -c` 09 10 ... 11 12 # If the line already exists .... 13 if [ $back -eq 0 ]; 14 then 15 echo "############### Line already exists! #############" 16 grep $keyfield $newdat 17 echo "------------------------------------------------------" 18 echo -n "Replace stored data line (y)? ";read we 19 if [ "$we" = "y" ]; 20 then 21 22 # Delete line and write to 23 # temporary file 24 25 # Build sed instruction 26 resp=$(echo "sed '/$keyfield'/d") 27 28 # Execute with eval 29 cat $newdat | eval $resp > $newdat.tmp 30 31 # Rename temporary file again 32 mv $newdat.tmp $newdat 33 34 # Add new data line 35 echo $input >> $newdat 36 37 fi 38 else 39 echo $input >> $newdat 40 fi 41 42 clear 43 44 done 45 # Calculate checksum invoice amount 46 47 a=`cat $newdat | grep "50|21|" | cut -d\| -f3` 48 b=`cat $newdat | grep "50|25|" | cut -d\| -f3` 49 50 a=$(echo $a | tr \, \.) 51 b=$(echo $b | tr \, \.) 52 53 54 psum=$(echo "$a+$b" | bc ) 55 echo -n "Invoice amount: $psum, continue .....";read we 56 57 done 58 done
Creating checksums (number of lines, sum of numeric values) helps ensure that records are complete. The number of lines is returned by a call to wc -l
. Listing 1 also shows an example of calculating a checksum when calculating the invoice amount for the fuel receipt. The complete listing is available from the Admin magazine website [2].
Online and Offline Data Entry
Online data entry sends the data directly to the database. The shell script opens a connection to the database, generates SQL instructions, and passes them into the database. The advantage of this approach is that your records are always up to date.
Shell sessions for online input can even use a dialup connection. Users of Windows operating systems can use the PuTTY [3] SSH terminal, which supports the required encrypted terminal access to the server system. A small script opens the database connection (Listing 2).
Listing 2: Database Connection
01 #! /bin/sh 02 echo "Demo for database write access" 03 rm input.sql 04 touch input.sql 05 sleep 3 06 clear 07 08 # Entry 09 10 while true 11 do 12 echo -n "Text entry: ";read text 13 echo -n "Numeric entry: ";read number 14 echo -n "Save (s) End (e) ";read we 15 if [ "$we" = "s" ]; 16 then 17 echo "Insert into demo values ('$text',$number);" >> input.sql 18 elif [ "$we" = "e" ]; 19 then 20 # Write to database 21 psql -d harald -f input.sql 22 exit 23 fi 24 done
In contrast, data entry and database updates occur separately in offline mode. After entry, the data is sent to the database en bloc, thus avoiding the need for a permanent connection to the database or application server.
Additionally, hybrid approaches are possible. For example, you could open a connection to the database server after entering a stack of receipts and submit the cached commands. This approach reduces network load on the one hand and keeps the database up to date in batches.
Data from offline entries can be prepared for writing to the database with the use of grep
, cut
, tr
, sed
, or awk
. When the data has been prepared, Listing 3 creates the required SQL code.
Listing 3: Generating SQL Code
01 #! /bin/sh 02 echo -n "Name database input file: ";read dn 03 for i in `ls -1 *.dat` 04 do 05 echo $i 06 07 # Parse data 08 vehicle=$(cat $i | grep "50|10|" | cut -d \| -f 3) 09 mileage=$(cat $i | grep "50|15|" | cut -d \| -f 3) 10 date=$(cat $i | grep "50|17|" | cut -d \| -f 3) 11 quantity=$(cat $i | grep "50|20|" | cut -d \| -f 3 | tr \, \.) 12 net=$(cat $i | grep "50|21|" | cut -d \| -f 3 | tr \, \.) 13 tax=$(cat $i | grep "50|25|" | cut -d \| -f 3 | tr \, \.) 14 15 # Write SQL instruction file 16 echo "Insert into fuel values ('$vehicle',$mileage,'$date',$quantity, $net, $tax)" >> $dn
Conclusions
Shell scripts are easy to write, cause virtually no load on the system, and are a very useful tool in manual data entry. Temporary workers without IT skills can put in useful work after a short introductory session.