Some Frequent Writing Tips I Give Software Engineering Thesis Students
There is some writing-related advise I have found myself giving over and over to many of my thesis students (as well as some doctoral students) — so I decided to write a writing FAQ rather than dozens of individual emails. A few comments before we get started:
- The text is mainly written with bachelor and master students working on their thesis report (and especially those in software engineering or closely related disciplines) in mind. However, large parts will also be applicable to any other kind of academic texts, such as scientifc papers or class reports. That said, do note that conventions between different fields (even within Computer Science) vary quite a bit.
- Evidently, all of the following is my personal view. I don’t claim to be an expert writing guru. However, I have supervised in the range of 30 student projects, (co-)authored over a 100 papers, and reviewed a couple of factors as many. That said, at the end of the day, the stylistic preferences of your examiner or examination committee are more important than anything I write here.
- I will be talking exclusively about writing your report. This is not the right place if you are looking for tips how to plan and conduct your research.
- The text uses Latex terminology in a few places and assumes that you actually use Latex (rather than, say, MS Word). Most of the text will still be applicable if you don’t use Latex, but you’ll need to translate the terminology here and there.
I expect that this will become some sort of living documents, with me adding tips and refining what’s already there based on my ongoing supervising experience.
Changelog:
26–07–21: added that this text assumes usage of Latex, added some more FAQs
Outlining
Your outline is the skeleton of your report. It’s the structure that holds everything together, and if your outline is confusing there isn’t much the rest of your text can do to fix it. Hence, it pays to put a little love and thought into what you describe where.
- Have a standard top-level outline. I understand the desire to innovate, but the top-level structure (the sections or chapters) of your report is not the place to let your creativity flow. Thesis reports in software engineering tend to follow a standardised template. Start with the following structure: (1) Introduction, (2) Background, (3) Related Work, (4) Method, (5) Results, (6) Discussion, and (7) Conclusions. Some deviations are plausible (e.g., not every report needs a background), but if you feel you need additional sections, think hard about whether this content really should not just be a subsection in one of the sections above. Ask your supervisor for examples of good previous theses and study what they write where.
- Your report needs to tell a story. You describe a difficult problem in the introduction, show how previous approaches have not solved this problem in the related work section, describe how you will tackle it in the method, show how well it went in the results, and describe what your results mean in a broader context (and especially who should care about these results) in the discussion. Develop your story early in the writing process (I have found it helps to explain your thesis to other people front-to-back, if necessary a rubber duck).
- Describe what’s needed for your story to work. Not more, not less. Whenever you describe anything in your report, think about how the thing you are currently writing supports your story. Counter-intuitively, your thesis report does not need to report everything that you have done, and it does not need to report on results in the order you have acquired them. Only the parts that help your story need to be reported. This does not mean that you should cherry-pick your best data (your story can easily be “we tried this reasonable thing and it mostly did not work”). However, it does mean that auxiliary results you find mildly interesting but which are only loosely connected to your main story should probably be left out. By and large the same is true for ideas you toyed around with but which went nowhere conclusive.
- Plan what you write. Different people have different writing approaches, and I’m not going to tell you what will work best for you. Some need to assemble most of the text in their head before they can start writing a section, and others start dumping text to the page immediately and then edit and edit again. You do you. However, at some point in the process you need to stop and think about what you are actually writing. Every section consists of subsections (which in turn may consist of sub-subsections), and all of these sections consist of paragraphs. Each paragraph should have more or less one idea, and these ideas need to logically build onto each other. Think about what each paragraph and section contributes to the story of your thesis and the smaller story of this section, either before writing anything or when editing. However, do think about it. The one approach I never see working is dumping lots of text onto a page and then micro-editing it ad infinitum without ever looking at the larger composition of the text.
- Be mindful of repetition. Repetition, if used sparsely and strategically, can be a powerful tool to emphasize key findings or key aspects of your work. However, in general, you should avoid repeating the same argument or data in different places in your thesis report, unless you have a specific reason why it is important to emphasise a specific argument.
- Be mindful of the order in which you explain things. Your report should be written for a person with no prior knowledge of your work. That means that you cannot, for example, write your introduction with the assumption in mind that your reader already knows roughly what your method looks like. You can safely assume that your reader has normal computer science knowledge that can be expected from an average graduate of your program. Beyond that, the only prior knowledge you can expect in Section N is what you have explicitly described in Section 1 to Section N-1. Watch out for this specifically when proof-reading your work — will the reader be lacking context or prior knowledge to follow your argument? If yes, this context or prior knowledge needs to be established earlier (often in the introduction or background section).
On Floats (Figures, Tables, etc.)
“Floats” are all the things in your thesis that aren’t text, such as figures, tables, algorithms, code snippets, interview quotations, etc (not all of them are strictly “Latex floats”, but you get the idea).
This will be a somewhat long section, since many students struggle with how to use floats effectively in their reports.
- There is a duality between floats and text. Floats and text live side-by-side in your report, and they need to tell your story together. You can’t have only text, but you also can’t have only floats. Both are important. That said, in most cases students aren’t using nearly enough floats.
- All floats need to be described in text (…) Depending on the type of float and its purpose in your story, the float may sometimes provide an overview which is then detailed in the text (example: your method section will often have a graphical overview of the steps of your research, which are then described in more detail in the text). In other cases, the float has the details whereas the text only provides a summary or interpretation (example: tables or graphs will often contain much more detail than the text describing it). Yet, in both cases, the float needs to at least be introduced in the text, and in both cases the float and text work together. Don’t put a figure in and expect the reader to guess when to look at it, and what to look for specifically. As a high-level guideline, I am hoping to roughly understand your story from looking at your floats alone, and when detailedly reading the report the text should make clear when to look at which float (and what I should note in it).
- (…) and most text should be accompanied by a float. Most students have a tendency to produce predominantly text (“walls of text”), and to only add figures or tables if clearly and evidently necessary (e.g., in the results). This leads to thesis reports that are exceptionally unpleasant to read. Consider generously adding floats, if for no other reason than to give your text structure and to make it a less tedious read (however, once you start doing that, you will find that often the float you added originally as window dressing actually does make your story a lot more easy to digest). For scientific papers my personal guideline is that every page that is only text (no title page, no figure, no table, no quotation, no nothing) is ugly and should be avoided. For thesis reports you can be a bit more generous, but if you have, say, 5 consecutive pages of “only text” you should start to wonder if something can be done to make this part less tedious.
- All important results should be supported by a float (…) do not, I repeat, do not hide important data in your text alone. If you are doing quantitative work, all important results should be shown in a table or figure. If you are analyzing interview data, all important arguments should be accompanied by a related interview quote.
- (…) and all important results should be interpreted in text. It’s not enough if your important results are shown in a float, they also need to be mentioned and interpreted in the text. Yes, this sometimes leads to small redundancies. That’s fine. That said, if you feel your text is just re-iterating what the float already shows, your text isn’t actually providing an interpretation of your results, just a transcript. The fix is then to improve your text, not remove it. For example, assume you have a table that shows survey responses in %. Your text should not just say “45% have picked option A, but only 12% have picked option B.” Your text needs to discuss these results — are they unexpected, or in line with existing work? What can be learned from these results? Do they fit previous results you have presented earlier in your report? Similarly, if you discuss a plot (for example, some performance measurements), your text should not just say that “the response time is stable at about 65ms”, but should provide an interpretation of what this actually means. Is 65ms good? Did we expect a stable performance, or did we expect that performance would improve over time? This interpretation doesn’t have to be long, but it does have to be there and it should not be entirely trivial. Interpreting your results is part of the research, and presumably you had some idea of what to do with the data you collected when you started collecting it.
- However, not every little result needs to be discussed in text. The tip above explicitly says “important results”. It’s fine to skip discussing unimportant results. For example, your survey results table may contain a dozen different options, some of which have rarely been picked (and you did not expect them to be picked often). It’s ok to just write “Further, options E, F, and G have been selected by less than 3% of participants.”, without deeper commentary.
- Avoid walls of text at all costs. A “wall of text” is a long paragraph that is not broken up by the start of a new section, a float, or at the very least a paragraph break. What exactly counts as a wall of text is difficult to define in a vacuum, but every paragraph approaching a third or half page of length should be critically examined. You are probably trying to do too much in one paragraph. Think about ways to break it up. However, even with paragraphs, a text that’s just lots of paragraphs without floats or subsections can start to look like a wall of text. If your text has this problem, you are probably under-structuring it, and should be using more subsections / sub-subsections or floats.
- Floats can be used as a structuring tool. Good writers use float positioning cleverly to break their text when one argument is finished and the next starts, or to break up what would otherwise be a wall of text. This requires you to use the
[h!]
float argument in Latex (by default Latex likes to move floats to the top and gobble them together, hence the name “float” — they float through the text). This is not always required or even useful, but I have found that students who think not only about what floats to use, but also where (on which page, between which paragraphs) they should be positioned, to write the most pleasant-to-read reports. - However, there is such a thing as a float that’s too obvious. Contrary to what has been discussed so far, not every possible float actually improves a report. Tables or pie charts that only show two numbers, “architectural diagrams” that only contain two boxes and an arrow, and similar overly simplistic floats look unprofessional and give the reader the feeling that the report needed filler material. The fix here is normally not to remove the float, but to add more detail. Can the two numbers be split up and presented in a more detailed and insightful manner (e.g., rather than just saying “response time A” and “response time B”, measure the performance of different parts of your solution)? Is the architecture of your system really that simple, or are you just massively over-abstracting?
- Floats need to be readable. This should go without saying, but a float that cannot be read and understood is pointless. That means that fonts in your figures should not be (much) smaller than in the text, and data in plots should not be overlapping to the extent that you cannot tell anymore what’s going on. Further, make sure that your figures are not “fuzzy” or have a distored aspect ratio — avoid this problem by only including vector graphics, avoiding bitmap formats of all sorts (PNG, JPEG, etc.), and always fixing the aspect ratio in Latex.
- Symbols in diagrams should have semantics. When drawing a diagram, avoid using a bunch of different types of boxes for no reason at all. Preferably, use a well-defined notation such as the UML, EER, or BPMN. If you are using your own notation, make sure that you are using the same boxes for the same type of element (and different boxes for different types of elements). Consider adding a legend to your custom-notation diagram. If you struggle defining a legend, your notation is probably not well-defined.
- Floats need to be your own. Never just copy-and-paste figures you found on the Internet or in other papers. You’ll need to draw your own visuals (although you can make a derivation of an existing figure, as long as you cite the source appropriately).
- Strive for a decent level of “visual appeal” in your figures. Not everybody is a budding graphics designer, but visually appealing figures immediately make a thesis report look substantially more professional. Do your best. It’s fine if you spend some time working on your figures. After all, they are the part of your report that stands out most to a casual reader.
Important Overall Stylistic Tips
A general theme of the following tips that relate to writing style (this section and next) is that you should strive for consistency.
- Use consistent terminology. In contrast to what you may have learned in your English writing classes in school, avoid using different terms for the same concepts “for variety”. This is fine for the “general English” part of the text (do vary sentence structures, verbs, etc.), but for technical terminology, pick the most fitting term and stick with it through the entire text. For example, don’t use “fault tolerance” and “reliability”, or “response time” and “execution time”, interchangeably. Decide which term is most appropriate, and use only that. Flip-flopping between different terms introduces confusion.
- Use consistent (and appropriate) tenses. Past tense and present tense can both be appropriate in for your thesis, but be consistent. Decide whether the report is written as if you were currently doing the work (present tense), or as if you were writing the report after finishing the work (past tense). This decision is independent from when you actually do the writing! Don’t mix between sections, and definitely don’ mix within sections. An exception is the conclusions — there, only past tense really fits. Future tense should be reserved for the thesis proposal or if you are describing future work.
- Be careful not to speculate too much. For the largest part, everything you write in your thesis needs to be based on data (either your own or what previous work has reported, with reference). However, there are some places where you need to speculate a little. For instance, when interpreting results, you will need to do more than just transcribe numbers from the plot (see also the section on floats). However, make clear what is evidence that’s directly observable from your results and what is your own interpretation of that evidence. Weasel words (“we believe”, “the data suggests that”, etc.) can be used to certain extent, but don’t overdo it. And, of course, make sure that your interpretation is reasonable and the most likely interpretation. It should also be noted that the discussion section by its nature has more speculative content than, for instance, the results or related work sections.
- “Folk results” do not require a reference. Not literally every statement in your report needs a reference. There is a notion of “folk results”, things that are evident to everybody with experience and the field and / or common sense, and you don’t need to provide references for those. For example, a claim that “software developers would like to improve their productivity” should be sufficiently self-evident that a reference would not be required. When deciding if you need a reference for a statement, ask yourself if a reasonable colleague could doubt the veracity of the statement. If yes, you’ll need evidence through a reference (or your own data).
- Make sure that it’s abundantly clear what a reference refers to. When using references, the most important thing is that the text is clear about what part of your argument a reference is meant to support. For instance, if you are writing a background section that’s predominantly based on a single source, you don’t have to cite this source in every paragraph (or, worse, after every sentence) — it’s entirely sufficient to say at the beginning that “the following section is based on Leitner et al. [2].” Avoid peppering in references in random places when the sentence structure does not make it clear what I will find in that reference. For instance, in the sentence “Contrary to popular belief [3], response time is not always taken seriously by developers [4].”: it’s quite clear to me what to expect in ref. [4], but what is ref. [3] doing? For similar reasons it’s rarely good style to group many references together ([5,6,7,8,9]).
- Avoid grandiose claims. Be truthful (and conservative) about what your research really shows. Avoid over-stating the importance of your results, as well as over-interpreting your data. It’s fine if your research mostly provides a narrow view on an even more narrow problem (that’s what most research does), but it’s not ok if you use this narrow data to make far-reaching claims about software development or the world in general. For example, a mining study on 20 GitHub projects cannot be used to “prove that software developers do not consider performance”, nor does a survey with 50 respondents allow you to talk about software developers in general. Again, some amount of “weaseling” with weak formulations is often required to avoid unwarranted generalization (“Our interviewees consistently argued that …”). As a sidenote, the word “prove” is problematic in general — very little of what we do in software engineering research (empirical studies, design science, experiments, etc.) is suitable to “prove” anything in a strict, mathematical sense. It’s best to avoid the word entirely, unless you are actually developing a mathematical proof.
Stylistic Nitpicks
Here follows a longer list of smaller, detailed comments. They aren’t individually very important, but they add up to give your report a professional look and feel.
- Avoid colloquial language. In formal texts, such as a paper or thesis report, you want to stay away from informal language of all sorts. This includes word contractions (“don’t”, “isn’t”), but also influences your choice of words and style. When in doubt, go for a more formal style in your report.
- Active voice is ok (even preferred). In contrast to what many academic style guides say, it is nowadays perfectly fine to use active phrasing in your thesis report rather than passive (e.g., “We have collected 60M data points” versus “60M data points have been collected”). Active and passive phrasing can be mixed, but in general try to prefer active since it just sounds more natural. Never speak about yourself in third person, that’s just weird (“The authors of this thesis have interviewed 20 subjects.”).
- Avoid empty subsections. Every section and subsection has text, even if it’s just one sentence that explains what will follow. That means that in your Latex source code a
\section{A}
should never be directly followed by\subsection{B}
without any normal text between. - Acronyms are introduced on first usage, and subsequently only the acronym is used. Your report probably contains many acronyms (TCP, SE, WWW, …). Every acronym should be introduced when it is first used in the text in the form “full name (acronym)”, and from then on you only ever use the acronym. Don’t flip-flop between long and short form, and never use the acronym before it is introduced. There are many good Latex packages that automate this process (for instance the acronym package). Use them. Exceptions: avoid using the short form of acronyms in the abstract as well as in section headers (acronyms that are so common that they can be considered folk knowledge, such as WWW, are ok).
- Be consistent about word capitalization. In titles and captions, either use “Normal English capitalization” or “Every Major Word Starts With a Capital Letter Capitalization”, but pick one and then stick with it. In text, avoid randomly capitalizing words (there are a few exception — notably, the word “Web”, e.g., Web applications, is often spelled with a capital W). Acronyms are often ALL CAPS. The most important thing, as always, is to be consistent — if you choose to spell “Web” with a W, then do it every time.
- “Labels” are spelled with a capital first letter. All kinds of “labels” that you use for cross-referencing in your report (Figure 1, Table 2, Algorithm 3, Chapter 4, etc.) are always spelled with a capital first letter. However, that’s not true if the word “figure” or “table” is just used in the normal English sense. Example: “As can be seen in Figure 1, the response time degrades after 120s of experiment time. Comparing to the results shown in the previous figures, this is unexpected (…)”.
- Be consistent about formatting. If you choose to format project names in
\texttt{}
in one section, do it in every section. If you refer to projects in the format “orgname/projectname” in one section, do it in every section. - “i.e.” and “e.g.” are always followed by a comma. “i.e.” (“id est”, or “that is”) and “e.g.” (“exempli gratia”, or “for example”) are two shortcuts that are super-frequently used in English academic texts. They are always followed by a comma, e.g., in the following example I am using both shortcuts in a weird, i.e., not very natural, manner. Also: don’t confuse them, they mean very different things.
- References are annotations, not part of the text. Think of references [5] as metadata that’s not supposed to be read. Consequently, avoid writing things like “As can be seen in [6], bots are mainly used as productivity tool.”. A better wording would be “As argued by Leitner et al. [6], bots are mainly used as productivity tool.”
- “et al.” is only used for papers with three or more authors. On the topic of “et al.” (“et alia”, or “and others”), this convention is only used for papers with three or more authors. For papers with a single or two authors simply write their name(s), as in “Erlenhov and Leitner have argued …”. And yes, the correct spelling is “et al.” — no dot after “et”, but with a dot after “al”.
Writing Mechanics
This isn’t about your report text per se, but I felt it useful to also add some words about how to actually do the writing, and what tools to use.
- Use a spell checker. In the year 2021, there is absolutely no reason to submit a text full of spelling and grammar errors. The absolute minimum is using a basic spell checker. However, there are much better tools available that also help with grammar and even to some degree style (many people like Grammarly, especially if you can fork over the money for a premium subscription). One thing is clear — don’t assume that your supervisor will handle spell-checking for you.
- Use Latex. There isn’t much more to say about this, but if you are writing an academic text that’s longer than a few pages, neither MS Word nor Google Docs is the right tool. Latex is.
- MS Excel is not a great tool for data analysis and plotting. MS Excel, well, excels at many things, but doing thesis-level data analysis and plotting isn’t one of them — and one of the reason for that is that the resulting plots will often look downright terrible (and you can’t export plots easily in a vector format either). Better options include ggplot2 for R or Plotly for Python. At the end of the day, I don’t care so much what you use to produce your plots, as long as the end result looks professional (and plots from Excel usually don’t).
- Use either Overleaf or store your report in a Git repo. I don’t care much how exactly you want to manage your text, but the only copy of your Latex source code should not be sitting on your computer. Overleaf is quickly becoming the de-facto standard for thesis students; if you have no other preferences, then use that. Alternatively, we have run successful thesis projects where students managed their report source code using Git (and before that SVN) for years. This also works, and may be particularly attractive if you are old-fashioned like me and enjoy editing text in a good text editor.
Supervisor Interactions
Finally, some tips about how to effectively collaborate with your supervisor (or any other person whose job is to give you feedback, but not do the work for you).
- Give your supervisor time to review your work. One of my pet peeves as supervisor is getting texts “to review” a few hours before a meeting. This most likely means that I will look over your text live during the meeting, and the best you will get are high-level comments. This is particularly important towards the end of your thesis project, when I should detailedly read your thesis. When doing your time plan for the final weeks, consider that I will need time to read 50–100 pages of text, and that afterwards you will need time to address what are often multiple pages of comments. Often, more than one such cycle is needed. In short, if your plan is to send me your first draft a week before you need to submit, we are going to have a problem.
- Manage your supervisor. Related to the previous point, do communicate clearly to your supervisor what you expect from them until the next meeting. Should I be reading specific section(s)? If yes, what kind of feedback are you looking for (Just general “is this the right direction” kind of feedback, or is the text stable and you are expecting detailed line-level comments? Should I be looking primarily at your writing, or do you want to discuss your approach and results?). Specific requests for feedback make your supervisor’s life substantially easier, especially if they are realistic (“please give me feedback on my new method figure and let me know if it’s understandable” is realistic, “please go over the entire report every week” isn’t).