A text file is a sequence of bits stored on a secondary storage device such as a floppy disk or hard disk that is interpreted in a particular way. In a text file, the sequence of bits is divided up into byte-sized pieces (8 bits to a byte) where each byte represents a code for an ASCII character (see Appendix A of your textbook for a table of the ASCII values).
For many applications, it is inconvenient for a user to type all of the information a program needs to run every time that program runs. For example, an airline reservation program needs to be told significant information to work (what flights are available, which seats are taken so far, etc.), but if a user had to type all of this information before the program could be used the program would be impractical. Files provide a mechanism for programs to use information multiple times. We store the information needed by the program in a file that is then re-read every time the program runs. Thus the user needs to type this information only once.
Another useful aspect of files is that we can have a program not only read but write new files and update existing files. For example, we might have an airline reservation program add reservations to a file as they are made so that future users will not try to reserve seats already taken. This information would then be available to future programs by reading this file.
To make it possible to manipulate files, C provides mechanisms for creating connections to files (using fopen) and for closing those connections (using fclose). Details on how to use these routines are given in the textbook and class notes.
Once a connection is made to a file, we can read information from that file (if the connection is a read connection) or write information to that file (if the connection is a write or append connection). To read information from a file we can use the fscanf routine which is a form of the scanf command that applies to files (scanf applies to information typed to the keyboard). To write information to a file we can use the fprintf routine which is a form of the printf command that applies to files (printf applies to information printed to the terminal). We also have other routines that let us read (getc, fgetc) and print (putc, fputc) single characters from a file. These routines allow us to do more low-level reading of a file (though fscanf and fprintf can also be used to read a single character).
In this lab we are going to get some experience with opening and closing files and with reading information from and writing information to files.
The following program attempts to open a file named getadr.htm and to show that file to the user by printing it to the text window. In this lab we are going to make a few changes to the program to make it able to read and show us the text from HTML (web) documents. We are then going to write the resulting text to a new file.
#include <stdio.h> #include <stdlib.h> void main() { FILE *instream; int ch; if ((instream = fopen("C:\getadr.htm","r")) == NULL) { printf("Unable to open file \"getadr.htm\"\n"); exit(-1); } while ((ch = fgetc(instream)) != EOF) { printf("%c",ch); } fclose(instream); }
First, copy the code to a directory on your disk. Next, make sure that no file named getadr.htm is around on that disk. Now compile and run the code. The program should terminate immediately indicating that there is no file named getadr.htm.
Access the file at this link through your web browser and use the FILE/SAVE AS command in your web browser to save the file as C:\getadr.htm. Now rerun your program seeing what the program produces.
Add statements to the code printing out a line number before each line in the form:
nnnn:as in
1:for line 1. The easiest way to do this is to print the line number for line 1 before the loop for reading the characters starts, and then adding code to print a new line number after each newline character is read.
Next, set your code so that any piece of text between a pair of angle brackets (<>) does not appear on the screeen. To do this, check the character just read before writing it, if the character is a left angle bracket (<) do not print it out. Instead, set a flag (an integer variable) to indicate that you are currently reading an HTML command (something between angle brackets). Then, as long as this flag is set, you would not print out any of the characters read. When a right angle bracket (>) is read you would set the flag back to its initial state (not reading an HTML command).
Finally, set up the program so that every character that appears to the screen is also printed to a file named getadr.txt. To do this you will need to add a new FILE * variable that is set up using an fopen on getadr.txt. Then, every time you do a printf command to show a character to the screen, do a corresponding fprintf command to write the character to the file getadr.txt. Do not forget to add a correspond fclose command at the end of your program.
Turn in a hard copy of your final program. Also, turn a copy of your input file and the file produced by your program.