COMP0087: Statistical Natural Language Processing (2024/25)

Group project

Due 9 April 2025 (Wednesday) at 16:00 (Europe/London) and constitutes 100% of your mark. In the case of extenuating circumstances, the longest extension that any group member is individually entitled to applies to the group as a whole.

The goal of the project is to be defined by its members and is only constrained by two primary requirements. Firstly, the project must involve natural language in some form, although it can for example involve other modalities or formal languages in addition to natural language. Secondly, the project must be empirical in nature and involve some form of research hypothesis: "Which model works best for this form of data?", "What empirical impact does using this model in combination with this method have?", "To which degree does the following existing models fail for the given task?", etc. Apart from these two requirements, our wish is for students to be as free as possible in forming their own project goals.

Group formation

Register your group of at least six and at most seven students in the dedicated Moodle forum "Group project formation" by 21 January 2025 (Tuesday) at 23:59 (Europe/London). In the post you make to the forum, you shall provide the following details: (1) UCL e-mails for all members of the group, (2) a desired name for the group, and (3) a brief description of up to 100-words of the project goal.

Optionally, you may list your preference in terms of which teaching assistants you would prefer to be assigned, but please understand that all teaching assistants are capable of supervising each project and that we may not be able to satisfy all student preferences in terms of teaching assistants.

After the group formation deadline, each group will be assigned a teaching assistant.

We reserve the right to assign students who fail to find a group on their own to groups with fewer members than the cap.

It should be noted that you are allowed to change the project goal at any point after the deadline in light of for example discussions with your assigned teaching assistant, but we advice each group to have committed to a project goal at the very latest by week three.

Dataset restrictions

Student projects may under no circumstances use data such as social media posts, clinical data, etc. Note that anonymisation or evidence of prior use of the data for scientific publications does not exempt you from this rule. No exceptions will be made, but if you find that this restriction affects your group, do reach out to your assigned teaching assistant or the teaching staff and we will do our best to find alternative approaches and datasets that align to the greatest degree possible with your project vision.

Progress reports

For the week of 3 February, 24 February, 10 March, and 24 March each group shall provide a progress report to their assigned teaching assistant by 23:59 (Europe/London) on the Friday of that week. The purpose of these reports is to ensure the steady, gradual progress of the project and to keep track of group member contributions. Submit a single PDF with at most one page via e-mail to your assigned teaching assistant and use the following LaTeX template. Do not make changes to the LaTeX template, just change the group name definition and add text to the document body.

Students are not expected to spend a substantial amount of time on these reports and instead focus on their project. Please see this example of a perfectly valid, brief progress report.

Project report

Your report shall at least contain the following: (1) an abstract briefly summarising the research problem and the outcome of your work; (2) an introduction describing the problem/research question on a high-level, as well as the main outcomes of your work; (3) a related work section discussing how your work relates to prior work and what makes it different; (4) a methods section, detailing what you did, how you did it, and with what motivation; (5) an experiments section, describing the empirical setup and metrics; (6) a results and discussion section, presenting your findings as well as critically analysing them using quantitative and qualitative methods; and (7) a conclusions section, summarising your research outcome, as well as a discussion of future work if you or others were to build upon your work.

You are required to use LaTeX and the Transactions of the Association for Computational Linguistics (the leading scientific journal in natural language processing) style to typeset your report, as doing so will give you access to and experience with the tools used for professional academic writing and prepare you for your thesis project. The style and template files are available on the journal submissions page in the "General Submission Format" section.

Many (if not most) academics use Overleaf to collaboratively author documents, an option that you may want to consider depending on your personal preferences.

Submit a single PDF with at most eight pages (excluding references and an optional appendix which is not guaranteed to be taken into consideration when marking) on Moodle. Use only your group name for the file (for example: "largest language models.pdf").

Do not make any modifications to the LaTeX template, by for example reducing or increasing the font size, margins, etc. Instead, carefully weigh what is essential to present the most important aspects of your project finding(s) and move all other content to the optional appendix which you can reference in the main body of the report ("...for a listing of additional experiments, please see Appendix A"). A well-written report is one which succinctly communicates the core finding(s) to the reader; not the one with the most pages.

Marking

All projects are double-marked and we use the 2020/21 UCL Computer Science Postgraduate Project Marking Guidelines as the basis for the marking.

Group members generally receive the same mark, but we will make adjustments in cases of substantially unequal contributions among the group members. If you have concerns related to member contributions, please contact the member of the teaching team responsible for the group projects.