Working with Files

This chapter will first describe general characteristics of Unix commands. It will then discuss some commands which are commonly used to create and manipulate files.

A summary of some of the most commonly used Unix commands is presented in Command Comparisons. A list of Unix utilities by function as well as a detailed description of the most commonly used Unix utilities are included at the back of A Practical Guide to the Unix System by Mark G. Sobel.

Unix File Names

It is important to understand the rules for creating Unix files:

  • Unix is case sensitive! For example, "fileName" is different from "filename".
  • It is recommended that you limit names to the alphabetic characters, numbers, underscore (_), and dot (.). Dots (.) used in Unix filenames are simply characters and not delimiters between filename components; you may include more than one dot in a filename. Including a dot as the first character of a filename makes the file invisible (hidden) to the normal ls command; use the -a flag of the ls command to display hidden files.
  • Although many systems will allow more, a safe length is 14 characters per file name.

Unix shells typically include several important wildcard characters. The asterisk (*) is used to match 0 or more character (e.g., abc* will match any file beginning with the letters abc), the question mark (?) is used to match any single character, and the left ([) and right (]) square brackets are used to enclose a string of characters, any one of which is to match. Execute the following commands and observe the results:

  ls m*
  ls *.f
  ls *.?
  ls [a-d]*

Notes for PC users: Unix uses forward slashes ( / ) instead of backslashes ( \ ) for directories

Looking at the Contents of Files

You can examine the contents of files using a variety of commands. cat, more, pg, head, and tail are described here. Of course, you can always use an editor; to use vi in "read-only" mode to examine the contents of the file "argtest", enter:

  vi  -R   argtest

You can now use the standard vi commands to move through the file; however, you will not be able to make any changes to the contents of the file. This option is useful when you simply want to look at a file and want to guarantee that you make no changes while doing so.

Use the vi """ command to exit from the file.

cat Command

cat is a utility used to conCATenate files. Thus it can be used to join files together, but it is perhaps more commonly used to display the contents of a file on the screen.

Observe the output produced by each of the following commands:

  cd;    cd  xmp
  cat        cars
  cat  -vet  cars
  cat  -n    cars

The semicolon (;) in the first line of this example is a command separator which enables entry of more than one command on a line. When the <Return> key is pressed following this line, the command cd is issued which changes to your home directory. Then the command "cd xmp" is issued to change into the subdirectory "xmp." Entering this line is equivalent to having entered these commands sequentially on separate lines. These two commands are included in the example to guarantee that you are in the subdirectory containing "cars" and the other example files. You need not enter these commands if you are already in the "xmp" directory created when you copied the example files (see Sample Files if you have not already copied these files).

The "-vet" options enable display of tab, end-of-line, and other non-printable characters within a file; the "-n" option numbers each line as it is displayed.

You can also use the cat command to join files together:

  cat  page1
  cat  page2
  cat  page1  page2 > document
  cat  document

Note: If the file "document" had previously existed, it will be replaced by the contents of files "page1" and "page2".

Cautions in using the cat command

The cat command should only be used with "text" files; it should not be used to display the contents of binary (e.g., compiled C or FORTRAN programs). Unpredictable results may occur, including the termination of your logon session, when the cat command is used on binary files. Use the command "file *" to display the characteristics of files within a directory prior to using the cat command with any unknown file. You can use the od (enter "man od" for details on use of Octal Dump) command to display the contents of non-text files. For example, to display the contents of "a.out" in both hexadecimal and character representation, enter:

  od  -xc  a.out

Warning! cat (and other Unix commands) can destroy files if not used correctly. For example, as illustrated in the Sobell book, the cat (also cp and mv) command can overwrite and thus destroy files. Observe the results of the following command:

  cat  letter page1 >  letter
Typically Unix does not return a message when a command executes successfully. Here the Unix operating system will attempt to complete the requested command by first initializing the file "letter" and then writing the current contents of "letter" (now nothing) and "page1" into this file. Since "letter" has been reinitialized and is also named as a source file, an error diagnostic is generated. Part of the Unix philosophy is "No news is good news". Thus the appearance of a message is a warning that the command was not completed successfully.

Now use the "cat" command to individually examine the contents of the files "letter" and "page1". Observe that the file "letter" does not contain the original contents of the files "letter" and "page1" as was intended.

Use the following command to restore the original file "letter":

  cp  ~aixstu00/xmp/letter  .

more Command

You may type or browse files using the more command. The "more" command is useful when examining a large file as it displays the file contents one page at a time, allowing each page to be examined at will. As with the man command, you must press the space bar to proceed to the next screen of the file. On many systems, pressing the <b> key will enable you to page backwards in the file. To terminate more at any time, press <q>.

To examine a file with the more command, simply enter:

  more  file_name
See the online manual pages for additional information.

The man command uses the more command to display the manual pages; thus the commands you are familiar with in using man will also work with more.

Not all Unix systems include the more command; some implement the pg command instead. VTAIX includes both the more and pg commands. When using the pg command, press <Return> to page down through a file instead of using the space bar.

Observe the results of entering the following commands:

  more  argtest
  pg    argtest

head Command

The head command is used to display the first few lines of a file. This command can be useful when you wish to look for specific information which would be found at the beginning of a file. For example, enter:

  head  argtest

tail Command

The tail command is used to display the last lines of a file. This command can be useful to monitor the status of a program which appends output to the end of a file. For example, enter:

  tail  argtest

Copying, Erasing, Renaming

Warning! The typical Unix operating system provides no 'unerase' or 'undelete' command. If you mistakenly delete a file you are dependent upon the backups you or the system administrator has maintained in order to recover the file. You need to be careful when using commands like copy and move which may result in overwriting existing files. If you are using the C or Korn Shell, you can create a command alias which will prompt you for verification before overwriting files with these commands.

Copying Files

The cp command is used to copy a file or group of files. You have already seen an example application of the cp command when you copied the sample files to your userid (see Sample Files). Now let's make a copy of one of these files. Recall that you can obtain a listing of the files in the current directory using the ls command. Observe the results of the following commands:

  ls  l*
  cp  letter  letter.2
  ls  l*

Note: Unlike many other operating systems, such as PC/DOS, you must specify the target with the copy command; it does not assume the current directory if no "copy-to" target is specified.

Erasing Files

Unix uses the command rm (ReMove) to delete unwanted files. To remove the file "letter.2" which we have just created, enter:

  rm  letter.2
Enter the command "ls l*" to display a list of all files beginning with the letter "l". Note that letter.2 is no longer present in the current directory.

The remove command can be used with wildcards in filenames; however, this can be dangerous as you might end up erasing files you had wanted to keep. It is recommended that you use the "-i" (interactive) option of rm for wildcard deletes -- you will then be prompted to respond with a "y" or "Y" for each file you wish to delete.

Renaming a File

The typical Unix operating system utilities do not include a rename command; however, we can use the mv (MoVe) command (see for additional uses of this command) to "move" Working with Directories) a file from one name to another. Observe the results of the following commands:

  ls  [d,l]*
  mv  letter  document
  ls  [d,l]*
  mv  document letter
  ls  [d,l]*

Note: The first mv command overwrites the file "document" which you had created in an earlier exercise by concatenating "page1" and "page2". No warning is issued when the mv command is used to move a file into the name of an existing file. If you would like to be prompted for confirmation if the mv command were to overwrite an existing file, use the "-i" (interactive) option of the mv command, e.g.:

  mv  -i  page1  letter

You will now be told that the file "letter" already exists and you will be asked if you wish to proceed with the mv command. Answer anything but "y" or "Y" and the file "letter" will not be overwritten. See Command Alias Applications for information on creating an alias for mv which incorporates the "-i" option to prevent accidental overwrites when renaming files.

Using the Command Line

The command interpreter (shell) provides the mechanism by which input commands are interpreted and passed to the Unix kernel or other programs for processing. Observe the results of entering the following "commands":

  ./filesize
  ./hobbit
  ./add2
  ls -F
Observe that "filesize" is an executable shell script which displays the size of files. Also note that "./hobbit" and "./add2" generate error diagnostics as there is no command or file with the name "hobbit" and the file "add2" lacks execute permission.

Standard Input and Standard Output

As you have seen previously, Unix expects standard input to come from the keyboard, e.g., enter:

  cat
  my_text
  <Ctrl-D>

Standard output is typically displayed on the terminal screen, e.g., enter:

  cat cars

Standard error (a listing of program execution error diagnostics) is typically displayed on the terminal screen, e.g., enter:

  ls xyzpqrz

Redirection

As illustrated above, many Unix commands read from standard input (typically the keyboard) and write to standard output (typically the terminal screen). The redirection operators enable you to read input from a file (<) or write program output to a file (>). When output is redirected to a file, the program output replaces the original contents of the file if it had previously existed; to add program output to the end of an existing file, use the append redirection operator (>>).

Observe the results of the following command:

  ./a.out

You will be prompted to enter a Fahrenheit temperature. After entering a numeric value, a message will be displayed on the screen informing you of the equivalent Centigrade temperature. In this example, you entered a numeric value as standard input via the keyboard and the output of the program was displayed on the terminal screen.

In the next example, you will read data from a file and have the result displayed on the screen (standard output):

  cat  data.in
  ./a.out  <  data.in
Now you will read from standard input (keyboard) and write to a file:
  ./a.out  >  data.two
  35
  cat  data.two
Now read from standard input and append the result to the existing file:
 ./a.out  <  data.in  >>  data.two

As another example of redirection, observe the result of the following two commands:

  ls  -la  /etc  >  temp
  more  temp
Here we have redirected the output of the ls command to the file "temp" and then used the more command to display the contents of this file a page at a time. In the next section, we will see how the use of pipes could simply this operation.

Additional exercises illustrating the use of redirection are included in Using the C Programming Language and Review of Redirection.

Using Pipes and Filters

A filter is a Unix program which accepts input from standard input and places its output in standard output. Filters add power to the Unix system as programs can be written to use the output of another program as input and create output which can be used by yet another program. A pipe (indicated by the symbol "|" -- vertical bar) is used between Unix commands to indicate that the output from the first is to be used as input by the second. Compare the output from the following two commands:

  ls -la /etc
  ls -la /etc | more

The first command above results in a display of all the files in the in the "/etc" directory in long format. It is difficult to make use of this information since it scrolls rapidly across the screen. In the second line, the result of the ls command are piped into the more command. We can now examine this information one screen at a time and can even back up to a prior screen of information if we wished to do so. As you became more familiar with Unix, you will find that piping output to the more command will be very useful in a variety of applications.

The sort command can be used to sort the lines in a file in a desired order. Now enter the following commands and observe the results:

  who
  sort cars
  who  |  sort

The who command displays a listing of logged on users and the sort command enables us to sort information. The second command sorts the lines in the file cars alphabetically by first field and displays the result in standard output. The third command illustrates how the result of the who command can be passed to the sort command prior to being displayed. The result is a listing of logged on users in alphabetical order.

The following example uses the "awk" and "sort" commands to select and reorganize the output generated by the "ls" command:

  ls -l | awk '/:/ {print $5,$9}' | sort -nr

Note: Curly braces do not necessarily display correctly on all output devices. In the above example, there should be a left curly brace in front of the word print and a right curly brace following the number 9.

Observe that the output displays the filesize and filename in decreasing order of size. Here the ls command first generates a "long" listing of the files in the current directory which is piped to the "awk" utility, whose output is in turn piped to the "sort" command.

"awk" is a powerful utility which processes one or more program lines to find patterns within a file and perform selective actions based on what is found. Slash (/) characters are used as delimiters around the pattern which is to be matched and the action to be taken is enclosed in curly braces. If no pattern is specified, all lines in the file are processed and if no action is specified, all lines matching the specified pattern are output. Since a colon (:) is used here, all lines containing file information (the time column corresponding to each file contains a colon) are selected and the information contained in the 5th and 9th columns are output to the sort command.

Note: If the ls command on your system does not include a column listing group membership, use {print $4,$8} instead of the "print" command option of awk listed above.

Here the "sort" command options "-nr" specify that the output from "awk" is to be sorted in reverse numeric order, i.e., from largest to smallest.

For additional information on the "awk" and "sort" commands, see the online man pages or the References included as part of this documentation; the appendix of the Sobell book includes an overview of the "awk" command and several pages of examples illustrating its use.

The preceding command is somewhat complex and it is easy to make a mistake in entering it. If this were a command we would like to use frequently, we could include it in a shell scripts as has been in sample file "filesize". To use this shell script, simply enter the command:

  ./filesize
      or
  sh  filesize
If you examine the contents of this file with the cat or vi commands, you will see that it contains nothing more the piping of the ls command to awk and then piping the output to sort.

The tee utility is used to send output to a file at the same time it is displayed on the screen:

  who | tee who.out | sort 
  cat who.out

Here you should have observed that a list of logged on users was displayed on the screen in alphabetical order and that the file "who.out" contained an unsorted listing of the same userids.

Some Additional File Handling Commands

Word Count

The command wc displays the number of lines, words, and characters in a file.

To display the number of lines, words, and characters in the file file_name, enter: wc file_name

Comparing the Contents of Two Files: the cmp and diff Commands

The cmp and diff commands are used to compare files; the "comp" command is not used to compare files, but to "compose a message".

The cmp command can be used for both binary and text files. It indicates the location (byte and line) where the first difference between the two files appears.

The diff command can be used to compare text files and its output shows the lines which are different in the two files: a less than sign ("<") appears in front of lines from the first file which differ from those in the second file, a greater than symbol (">") precedes lines from the second file. Matching lines are not displayed.

Observe the results of the following commands:

  cmp   page1  page2
  diff  page1  page2
  
Lines 1 and 2 of these two files are identical, lines 3 differ by one character, and page one contains a blank line following line three, while page2 does not.

See the man pages for additional information on using these commands.

Not logged in. [Log in]