In this assignment you will read and analyze a text file. The problem is to read a text file character by character and print out a ``token map'' of the file, where tokens are meaningful objects from the file like words, numbers or punctuation marks. In the token map you will print out a message Xlength for each word or number, where length is the number of characters in the token and X is 'C' if the token is a word that starts with a capital letter, 'L' if it is a word that starts with a lowercase letter, and 'N' if the token is a number. If a punctuation character is encountered you will print out a 'P'. For example, suppose the text file contains the following:
Constitution of the United States of America (In Convention, September 17, 1787) Preamble We the people of the United States, in order to form a more perfect union, establish justice, insure domestic tranquility, provide for the common defense, promote the general welfare, and secure the blessing of liberty to ourselves and our posterity, do ordain and establish the Constitution of the United States of America.Your program should produce the following response:
1: C12 L2 L3 C6 C6 L2 C7 2: P C2 C10 P C9 N2 P N4 P 3: 4: C8 5: C2 L3 L6 L2 L3 C6 C6 P L2 L5 L2 L4 L1 L4 6: L7 L5 P L9 L7 P L6 L8 L11 P 7: L7 L3 L3 L6 L7 P L7 L3 L7 L7 P L3 8: L6 L3 L8 L2 L7 L2 L9 L3 L3 L9 P L2 9: L6 L3 L9 L3 C12 L2 L3 C6 C6 L2 10: C7 P 11:
Note that:
I suggest that you proceed in stages:
While you are free to design your program any way you wish, you must follow good top-down design principles. For example, you might write your program such that each time the start of a word or number was read a function or functions would be called that would read to the end of the word or number.
2 extra points - make it so that one single quote character may appear in a word (though not as the first character). For example, don't would count as one word of length 5 rather than as one word of length 3, a punctuation mark, and then another word of length 1.
2 extra points - allow multiple dashes (and ONLY dashes) as in -- (2 consecutive dashes) or --- (three consecutive dashes) to be treated as a single punctuation mark. Make it so that if the punctuation mark is not a single character, your program will print out not only P, but the number of characters in the punctuation, but only if the punctuation has more than 1 character.