Or, the Art of WAR
Ted Pedersen
September 2003
There are a number of good books available that talk about writing for research. I recommend that you find one that you like, and that you read it carefully and follow it’s advice (let me know whose advice you are following before you get too deeply into this). One book I like fairly well is called “The Craft of Research”, by Wayne C. Booth, et. al. If you aren’t able to find a guidebook on your own, I recommend this one to you.
The important thing about your writing is that it be clear. The particular style or organization is up to you, but it must end up making sense. We presume that as graduate students you are able to write in a grammatical and organized way. If this is not the case then you probably should not have been admitted to the program, so you can either work to correct these problems or resign in disgrace.
Here is my most basic and important rule. When you are writing, do not refer to any other books, articles, or papers. Do not, under any circumstances, read other people's papers while you write. You will almost certainly plagiarize. If you can't write about a topic without looking at published sources or descriptions, then you don't understand it well enough to have any business writing about it. You will of course need notes of your own experimental results and algorithms. These should be ideas and materials that are original to you. Any notes derived from other sources should be done with the utmost care to avoid plagiarizing. The “Craft of Research” has some ideas about how to do this that I suggest you follow. I have some extended examples of plagiarism found here: http://www.d.umn.edu/~tpederse/Pubs/plag.htm Please make sure you read over this very carefully, and let me know if there is anything at all that is unclear.
My best advice as to writing is to write within yourself. Do not try and sound like me, do not try and sound like anyone but yourself. Write what you know and what you understand and tell me what you have learned and what you think. Do so clearly and concisely. Be formal in your writing, but not excessively so. Introduce notation only when you must, and make sure that it is easy to understand and consistent. Define terms and ideas as they are introduced. Theses should be relatively short, so don’t repeat yourself. Say things once, say them well.
How do you know if your writing is any good? Does it pass the reading aloud test. Can you read your paper to someone and have them follow it relatively easily without having a paper copy in front of them? If so, your paper is probably pretty well organized. You have probably started with more general ideas, and then gotten more specific. You have probably defined terms as you introduced them. You probably have not written things in such a way that you force the reader to turn back and forth in your paper, or have them make notes to themselves as they read in order to understand your presentation. You must strive to build a clean and coherent representation of your ideas in the mind of your reader. Bad organization, poor grammar, and spelling errors chip away very quickly at that model, and ultimately make it impossible for the reader to construct an elegant structure in their mind regarding your ideas. Instead, they may be left with a tiny broken down shack with no running water, and this is not a pleasant thing to have in your mind.
Your reference list should be honest. In other words, cite only those books, papers, and articles that you have actually read (or at least skimmed) and that have actually provided insight into your work, and given you ideas. Don't pad your reference lists. Any reference mentioned in a paper or your thesis will be assumed to be known to you, so you may expect questions on why you cited it and what it contains during your defense. I may also ask you to produce copies of any reference you cite, so please make sure that you keep track of your references. In general, books, journals, and conference papers are good sources of information. Workshop papers, book reviews, technical reports, and material on web pages are not usually good. There are exceptions to this of course, but as a rule of thumb this is good.
For our discipline, the journal "Computational Linguistics" is the premiere forum for published research. Other reliable journals include "Machine Translation" and the "Journal of Natural Language Engineering". Conferences that usually contain good information include the annual meetings of the Association for Computational Linguistics, (ACL, NAACL, and EACL), the biennial International Conference on Computational Linguistics (COLING), and the annual conference on Empirical Methods in Natural Language Processing (EMNLP). All papers appearing in ACL related events are available at the ACL/LDC Repository (http://acl.ldc.upenn.edu) You can find out all about ACL related conferences at the ACL web site (http://www.aclweb.org).
Conferences on Artificial Intelligence (AAAI and IJCAI), machine learning (ICML), or data mining (KDD) also contain high quality publications on NLP or closely related issues. The Journal of Artificial Intelligence Research sometimes has NLP related material that is quite useful (http://www.jair.org/). We are also starting to see NLP papers creeping into computational biology and bioinformatics. These are less likely to serve as references in your work, but nonetheless it is interesting to see how these ideas are being applied in a rather new area.
Again, I think the book the Craft of Research may be helpful, in that it goes into a bit of detail about writing introductions. One of their suggestions is that you might want to start with an outline. (They make the point though that one should not be a slave to your outline, or spend a great deal of time coming up with a very formal outline.) This is entirely appropriate, and might help to structure things. As I've said before, your thesis will be a short enough document where there is no need to repeat much of anything, so careful organization is important.
Your introduction should be exactly that - an introduction to your thesis.
Write it so that someone who knows very little about our area can understand
what you are doing. It's important to remember that you are not writing this
for me - the introduction is for that committee member who may have little background
in this area, or for your fellow students (not in the nlp
group) who might be interested.
One of my thoughts when writing an introduction is that I would like my mother
or father to be able to understand it. This is actually a rather nice goal because
your mother and/or father would probably enjoy reading about your thesis topic,
and while they won't want to read about your algorithms and so forth, a general
overview that explains what problem you have tried to solve, why it really
matters that anyone solve this problem, and how you went about doing this might
be rather satisfying for them.
In a more pragmatic sense, consider the introduction as an executive summary
that you could give to a potential employer so as to explain to them what it is
you did your thesis on. Imagine taking the introduction, printing it up
separately and distributing it attached to your resume. I am not suggesting
that you do this of course, but this is meant to give you an idea of what the
introduction should achieve. The important point is that it should be
relatively intuitive and it should be self-contained.
So, the introduction to your thesis (or your thesis proposal) must have the
following goals:
1) Describe the problem you are solving.
This must be very intuitive and written in an engaging way that will make the
reader interested in what you are doing. What's the problem? Be specific, use
examples, make the examples interesting and compelling. (I am using the term
"solve" quite loosely here of course.)
2) Explain why you want to solve this problem.
This is where you motivate that the problem is worth solving, and you can
describe the potential impact of your work. Imagine that you completely solve
your problem. How is the world a different and better place? What can we do now
that we couldn't before your thesis?
3) Explain the approach you take to solving this problem.
This is not the place for
algorithms, instead simple examples are best. You should explain the general
ideas that underly your approach and make you believe
that it is sound. Your mission here is to convince the reader that your
approach is sensible and reasonable.
4) Explain how you know that you solved it (evaluate).
A thesis claims to make some contribution. Here you tell us how you know that you
did what you claim. This section will be hard to write as of now, but you can
at least summarize how you are planning to evaluate your solution, and how you
will know that you have made progress.
5) A formal statement of your thesis (ie the thesis statement).
What hypothesis underlies your research? What is the question that drives your
research? What question are you seeking to answer? This must be specific and it
is the one part of the introduction that should be technical to a degree. The
question should be on that is interesting regardless of the outcome. For
instance, "Can I implement the algorithm of
The above is not intended to serve as an outline. You must organize your
introduction in a style that suits you. These are just items that you must make
sure you address.
The introduction should be unique text. In other words, don't cut and paste
text from the interior of the thesis in the introduction, and vice versa. You
are writing the introduction for the novice who is trying to decide if they
really want to read about your research. Make it engaging, exciting, and indeed
entertaining. Then, when they get to the interior sections of the thesis you
can hit them with the details. They will want them at that point. They do not
want them in the introduction.
The introduction is probably the hardest part of the thesis to write, it is also,
along with your the conclusions that you draw from your resarch,
the most important.
I urge you to refer to books like the Craft of Research to get ideas on how to
do this. You are also welcome to look at other theses and dissertations to see
how they organize. Keep track of the outside sources you are looking at in
terms of getting ideas for writing - I would like to know what they are,
especially if you find them helpful. There are quite a few books about writing
and technical writing that you can draw upon as well. I understand you probably
have not written like this before, so you should seek out as much help as you
can from these kinds of external sources.
How to Do Research
This is a tricky question, and we’ll work on this throughout your time
here. However, I firmly believe there is a connection between how you think
about your writing and how you think about your research. (That seems obvious
now that I say it). What I mean is that when you think about your research, you
should think about how you are going to write about it to make it compelling,
interesting, and important. If you can’t think of any way to do that, it might
be that the question you are researching is not terribly interesting, or you
don’t understand it very well yet.
Another book that I like very much and find somewhat inspiring (really)
is called “Advice For a Young Investigator” by
Santiago Ramon y Cajal. The author is often called
the founder of neuroscience (1852-1934) and he talks about the challenges of
doing research, and gives some ideas for how to think about it (and how to do
it). The UMD library has this available for electronic checkout.
Technical Issues
Your thesis should be written
using LaTex, a Unix/Linux word processing system. You
should start to get used to LaTex now by using it for
your writing. You should find a latex source file from a previous thesis and
use that as a guideline for your own work. We have such examples/templates
available in /home/cs/tpederse/mypublic. A good Latex
book is: LaTeX: A Documentation Preparation System
User's Guide and Reference Manual by Leslie
Lamport. You can find quite a bit of information on