Files in Unix ============================================================================= Overview: In this lesson, we will begin studying files, which store long term data on disks. We talk about how files are named, how they are created and destroyed and viewed, showing some simple commands. ============================================================================= Section Topics ------- ------ What is a file? File Names Contents of files Viewing Contents of Files Changing Files Some basic file commands More about File Types Creating files Deleting and Renaming Files What is a file? --------------- Many people have an intuitive concept of a computer file from work on Macintoshes, IBMs or other micros. Just to review, a file is any collection of data that is stored in a permanent area in a computer system. Thus, a file may be on a disk (hard or floppy) or on tape or on other long-term storage devices. Files model real-world entities like documents, forms, reports, memos, pictures, sound clips, video clips, and almost any other way of storing some information together. In the "old days" (twenty years ago or more), files were datasets, which were huge lists of numbers representing measurements on something. Every operating system has some way of storing and accessing files. Some operating systems distinguish one type of file from another. For example a database file is stored in a different way than an executable program. In order to be as simple and general as possible, UNIX makes no assumption about what is in a file, and so it says that any sequence of bytes is a file. The exact type of file that it is doesn't matter to UNIX, although the program that reads and interprets the file expects it to be a certain type. Files may be stored on long-term storage devices like hard or floppy disks, but they are organized in logical clusters called directories. (This is the same concept as directories in MS-DOS or folders in the Macintosh.) Every user has a directory called their home directory. When you log in, you are placed in your home directory. You can find out what this is by typing the command "pwd": % pwd UNIX responds with something like the following (if you username is "jim"): /usr1/mnt/dept/jim More about directories is given in the "directories" tutorial. File Names ---------- Files have names so that we can tell UNIX to do things with them. These names can be very long in UNIX...up to 512 bytes long, and can even contain special characters (with a few exceptions). UNIX is, in general, case sensitive, so a file name "xyz" is NOT the same as "Xyz". Contrary to what you might have learned in other places, you can have numbers as file names, or you can use dashes or pluses, or asterisks or even blanks. Blanks, however, present a slight problem, since UNIX doesn't know where the file's name ends. Thus, if you have a blank in your file name, you must enclose the whole name in double quotes, such as "my file". The slash is the one forbidden character in file names, since it is used to separate directory names. Contents of files ----------------- Though any given file is, to UNIX, just an undifferentiated sequence of bytes, a file is usually meant to be used with a particular program or command. We sometimes call this interpretation of the file's contents its type. For example, a file that contained a spreadsheet would have different data than one that contained a video image. In UNIX, unlike MS-DOS, the file's name does not need to indicate what its type is. For example, a file that contained a C program usually ends in ".c" such as program1.c. However, there is no rule that says it MUST have this kind of name. This is different from other systems, like MS-DOS, where the file type is part of the file's name. The most generic type of file is just ASCII text. Since ASCII is the character code that is used to store characters in 8-bit chunks of computer memory (called bytes), we say that a file that contains only printable characters stores ASCII text. This might be a letter or a memo to somebody, a document, or a dataset of numbers. Other types of files will be discussed as they arise. Not all of them use just printable characters, so some care must be used when printing or viewing such files. Viewing Contents of Files ------------------------- To see what is in an ordinary text file, a purr-fect command is "cat". Just follow the cat command with the file's name: % cat syllabus This command just reads the file line by line and prints it out to your termi- nal, scrolling it if the file is longer than one screen's worth. Sometimes this goes by so fast that you can't see it all before it is gone. A better command to view a file in a controlled way is "less", which shows you only as many lines as your screen can display at one time. Then it gives you a prompt, which is a colon. To continue, press the space bar. Or you can type q to quit. Other options include going backwards or advancing the file one line at a time, rather than one screenful. As usual, there is help for the less command. To see it, just type the letter h. % less syllabus Another similar command is "more" which works more or less like "less". If you want to see the end of a file without waiting for the entire file to scroll past you using cat, the tail command pulls the cat's tail, and shows you the "rear end" of the file: % tail bigfile There are options which allow you to control how big of a tail section you can see. "Head" is an analogous command that shows you just the first part of a file. Changing Files -------------- Naturally we often need to change the contents of a file. Sometimes we make up new files, and other times we change existing files: we edit our term paper, fix program errors, or enter data into a dataset file. The process of changing the contents of a file is called "editing". There are many ways to edit a file, often using programs like "vi" or "emacs". These programs are called editors and they allow you to view the file on your terminal screen and to make additions, corrections, or deletions, much as you would alter a document in a word processing system on a Macintosh. There are other commands that change files. Sed and Awk are commonly used for this. They are sometimes called stream editors, or non-interactive editors because they allow you to specify changes to files, but only in a batch or group, not interactively as you sit and watch the changes being made on your screen. There are two main styles of interactive editors. Older editors are line editors because they display one line at a time and allow you to specify a subcommand to make a change. You do not "move around" on the screen, however, in order to specify where to make the change. Newer editors are more like software that most people are familiar with on personal computers. There is a cursor that you can move around and changes happen at the spot where the cursor is. They are sometimes called "full-screen editors" or "visual editors" (as if one used the older line editors with eyes closed!) We will not even discuss the older line editors heres, but rather just name them; they are "ed" and "ex". A separate lesson discusses the popular "vi" editor, which is the most common "full screen" editor in UNIX. Another popular editor is "emacs", but we do not use it at Canisius. Some basic file commands ------------------------ One of the most overworked UNIX commands doesn't even deal with files directly. It is the "ls" command, which lists the names of files in a directory. Since users want to see what files are on disk, they often use "ls" to do this. % ls This lists the files in the current directory. You can also give the name of a directory explicitly, if it is different than the current one. % ls /mnt1/dept/meyer There are a ton of options on the ls command, some of which will be explained later. You can read all about them by doing % man ls There are two ways to find out how big a file is. The "wc" command, which stands for "word count", counts lines, words and bytes. A word is defined to be any consecutive sequence of non-blank characters. Here's an example: % wc file1 199 445 4827 file1 This says that there are 199 lines in this file, named "file1", and that there are 445 words and 4827 bytes. There are several options with "wc" that allow you to just get the line count, or just the word count or just the byte count. Here's an example of printing out only the line count: % wc -l file1 $$$ Another way to use the "ls" command is with the -l option. -l stands for "long information". Here's an example: % ls -l main.c -rw------- 1 meyer 4827 June 9 12:44 main.c The number 4827 is the number of bytes. It is possible, by the way, to have a file that is 0 bytes long. It still exists, but contains no information and uses up no disk space for data, although some small amount of disk space is used to store the name and attributes of the file. (The "directories" tutorial explains more about the long output of "ls".) More about File Types --------------------- Earlier we alluded to the fact that the data in files has a certain type. For example, some files contain only printable ASCII text and may have been created by "vi" or some other editor. Another file may contain compiled object code. Other files contain shell scripts, sound files, or video images. Often the name suggests the file's type by using an extension (suffix), like in MS-DOS. For example, "mypgm.C" is probably a C source file. Unlike MS-DOS, UNIX files are not limited to just one extension or an extension of just three characters. UNIX does not care what extensions you have on your file names. The extensions are only important because they allow you and the utility programs that you are running to identify files. Here are some of the more common extensions and what they signify: .c C source program .o object program created by some compiler .s assembler source program .h header file for C or some other language .a archive library, used to bundle together compiled subroutines .Z compressed file .f FORTRAN source program .p Pascal source program But many of the extensions that you find are JUST conventions that are not enforced by the compiler and exist only to help the user. Here are some of them: .tar archive library produced by "tar" command .a Ada source program .ada Ada source program .l LISP source program Some file types are printable and some are unprintable, meaning that the ASCII codes they contain cannot be displayed on a printer or a screen and may in fact "screw up" the printer or screen. If you "cat" one of these files to the screen, bad things may happen. (Usually the results are never anything more disastrous than temporarily locking up your terminal. You will have to turn off your terminal and login again.) $$$ To avoid screwing up your terminal but still determine what type of data is in a file, the "file" command can be used. It will make a guess as to what type of data is in the file, but it can be fooled. For example, if the file contains code of some unknown programming language, UNIX will just report that this file contains "English text". % file xyz If the file type contains "executable" in it, then it is definitely unprintable. There are several different types of executable files on the SUN UNIX system. For example, one is sparc demand paged dynamically linked executable not stripped Believe it or not, each of these terms has a definite meaning and is not there just to confuse or irritate you! Creating files -------------- Some files are created by humans, such as source programs. These files are usually created using editors such as "vi". Other files are created by programs when they run, usually as output files. These files can be subse- quently changed by using an editor, if they are printable. Though the "vi" editor is explained in detail in a later lesson, here we want to introduce a very simple way to create a printable file of text. You can use this method whenever you want to, even when later you know how to use "vi". First, let's demonstrate it and then explain it. Using the "cat" command, we are going to have UNIX redirect what we type in at the keyboard into a file called "my.newfile". % cat > my.newfile Hickory dickory dock the mouse ran up the clock When we are done, we press CONTROL-D, and we will get the shell prompt again. CONTROL-D is a general end of input signal in UNIX. This works becuase UNIX allows us to alter the direction of our data. The cat command specifies no filename, so the default is taken to be the terminal. Whatever is typed in at the terminal will be echoed to the screen. That is not terribly useful in itself. But if we use the > sign to tell UNIX to redirect the data into a file called "my.newfile" then we will accomplish our purpose. Deleting and Renaming Files --------------------------- To remove those unwanted files, use rm. For example, rm syllabus.dit This command is often called "delete" or "erase" in other systems. But WATCHOUT!!! Unix is unforgiving and does not let you "un-remove" a file. What is gone is gone! Sometimes files must be renamed. UNIX does not have a rename command, but another command, called "mv", which stands for "move", is used to rename files. The new name follows the old name. In the following example, an existing file "xyz" will be renamed to "abc": % mv xyz abc Move does other things, like placing files into different directories. This will be discussed later.