Optimized-1B3 (3)

I am a research assistant focusing on corpus-based translation studies at the Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University (PolyU). In the next academic year (September 2021), I will be starting my PhD study at the same department.

After getting my M.A. in English Language Studies, I have worked for a number of projects: In 2019-20, I was a research fellow at Lingnan University (LU), in which I worked for social policy research from a linguistic perspective; in 2017-19, I was a research associate working on a discipline-specific English enhancement programme for Department of Civil and Environmental Engineering (PolyU). During 2016-17, I built a postgraduate thesis corpus and assisted in incorporating data-driven learning in graduate thesis writing course materials at The University of Hong Kong (HKU); in 2018, I archived, annotated and studied the digital representation of South Asians on diasporic websites (LU).

I earned my master degree in English Language Studies (PolyU) and my bachelor degree in Communication – Journalism (HKBU), during which I received training in performing both quantitative and qualitative investigation. It is such a bless that I have met so many wonderful people in the field who engage me in their research and inspire me to pursue further study.



My research aims at studying what cross-linguistic big data can inform us. By comparing the patterns driven from (or testing hypotheses in) corpora of different languages, we can learn from the systematicity of naturally occurring data to make a better sense of the world. Research interests: corpus linguistics, computational linguistics, NLP, translation studies, (critical) discourse analysis.


Past Projects

United Nations General Debate Corpus Research


I worked with Dr. Kanglong Liu and Dr. William Feng of The Hong Kong Polytechnic University to diachronically evaluate the stances taken by the U.S. and China in the United Nations General Debate Corpus (UNGDC).

ECS Grant: Digital Representations of Cultural Identities in Online Spaces: A Multimodal Social Semiotic Study of South Asian Diasporic Websites

(2018 – 2019)

I assisted Dr. Preet Hiradhar of Lingnan University to archive and study South Asian community homepages in Hong Kong and Singapore. In order to make sense of how South Asians represent themselves online, we designed coding schema, annotated the multimodal elements, and compared the patterns with the labels suggested by extant literature. We look forward to sharing the findings soon.

‘Fear and disgust’ – A corpus study of sentiment towards sporting events as expressed multimodally on 4chan’s /sp/ board


Dr. Peter Crosthwaite and I co-authored a chapter on studying how sentiment was played out linguistically and visually in online discussion on exciting live sports events such as UFC game. We analysed the sentiment negotiated in text and memes respectively and further discussed the intermodal relation between the two. The chapter can be found in Callies and Levin (2019) Corpus Approaches to the Language of Sports: Texts, Media, Modalities.

TDLE Grant: Content-based English enhancement for Science and Engineering students

(2017 – 2019)

Wordcloud for fluid mechanics. Students can click on a word to see its collocation and example.

In light of the fact that many civil engineering students lack the chance to read, write and speak English when compared to their humanities’ counterparts, Dr. Barbara Siu of The Department of Civil and Environmental Engineering (PolyU) initiated a subject-specific English enhancement programme with the English Language Centre in 2017.

Over 2017/18 Semester 2 and 2018/19, we selected a total of 4 core subjects as a testbed. Two mini corpora of textbook and reading materials were compiled to generate the most frequent wordlist. Collocations and their example concordances are given to the subject students for reference. With the provision of authentic frequent word examples, we hope students can, to the least, thoroughly understand the readings without many difficulties. The data is visualised in the form of a word cloud.

We have also developed final year project report analyses, exam questions deconstruction, subject-specific glossary definition pronunciation practice (with SpeechAce), essay annotation, exam paper group discussion, as well as speech imitation game for the  4 cohorts of Civil Engineering undergraduates.

Writing with Suspense: A Corpus-based News Lead Analysis


This is my side project investigating the syntactic structures of news lead favoured by a different genre of news. The major motivation lies in my undergraduate background – when I was a journalism student, we spent extensive time on learning to write concise, accurate and intriguing leads (the first sentence of a news article) since news lead plays a decisive role whether readers want to read along. However, many coaching guides only introduce various types of leads but fail to systematically present the syntactic features which are marked in different news types.

Therefore, I started my own inquisition… Preliminary findings were presented at the 2018 International Conference on Bilingual Learning and Teaching.


TDG Project: Enhancing Disciplinary Postgraduate Thesis Writing via a Data-Driven Learning Approach

(2016 – 2017)

I was recruited by Dr. Lillian Wong, Dr. Peter Crosthwaite and Dr. Lisa Cheung to work on a Teaching Development Grant funded project to enhance HKU Postgraduate research students’ academic writing. The project involved five phases in one year:

  1. Ask supervisors across 10 faculties to recommend excellent Ph.D. / M.Phil. theses supervised under their disciplines.
  2. Compile a 10m word corpus of HKU research theses
  3. Prepare corpus activity tasks and demonstration video to guide and instruct students to conduct corpus-informed language enquiry on a proactive basis
  4. Promote the said corpus use in vast workshops and core thesis writing courses
  5. Evaluate the effectiveness of corpus-informed language practice through survey and focus group interview

The general feedback is positive and I heard a lot of constructive ideas from the users too. We have published some interesting findings on ReCALL 🙂

MA Thesis: Heavy metal music lyrics: identity construction and social struggles

(2015 – 2016)

I would say this is the sheer joy I have ever had in study! Being a music enthusiast (a metalhead in particular), I am thrilled to put my hands on identifying the heavy metal music identity and making sense of the social relation (mostly struggles) between metal bands and the mainstream cultural society.


1,152 Heavy metal song lyrics were collected to form a metal lyrics corpus; on the other hand, 692 mainstream popular song lyrics were collected to compile a ‘control corpus’ of the same number of words. I compared the frequency ratio of the most frequent lexical words found in the heavy metal corpus with that in the popular song corpus. Then, the list was narrowed down to 11 lexical words which show the highest frequency difference between the corpora. They are, in other words, dominant and exclusively-used in heavy metal lyrics.  Concordances of these words were evaluated with Martin & White’s (2005) Appraisal System to systematically describe the affect, judgement and appreciation negotiated in metal lyrics. I then compared my findings with existing labels found in extant literature.

I am happy to tell you that surprising patterns have been found and shared on Social Semiotics!



Cheung, J.O., & Feng, D. (2021). Attitudinal meaning and social struggle in heavy metal song lyrics: A corpus-based analysis. Social Semiotics. 31(2), 230-247. Doi: 10.1080/10350330.2019.1601337

Mok, K.-H., Xiong, W., Ke, G., & Cheung, J.O. (2021). Impact of COVID-19 pandemic on international higher education and student mobility: Student perspectives from mainland China and Hong Kong. International Journal of Educational Research. 105(101718). Doi: 10.1016/j.ijer.2020.101718

Wong, A.H.K., Cheung, J.O., & Chen, Z. (2021). Promoting effectiveness of “working from home”: Findings from Hong Kong working population under COVID-19. Asian Education and Development Studies. 10(2), 210-228. Doi: 10.1108/AEDS-06-2020-0139

Cheung, J.O. (26 June 2020). Public housing notice in monolingual Chinese: Is Hong Kong still bilingual? Think Hong Kong.

Cheung, J.O. (26 June 2020). 《賢聚嶺南》只印有中文的屋邨告示 ─ 香港的雙語實踐還復存? [Chinese]

Cheung, J.O. (2019). Review of the book Sensory Linguistics: Language, Perception and Metaphor, by B. Winter,

Crosthwaite, P., & Cheung, J.O. (2019). ‘Fear and Disgust’ – A corpus study of sentiment towards sporting events as expressed multimodally on 4chan’s /sp/ board. In M. Callies & M. Levin (eds.) Corpus Approaches to the Language of Sports: Texts, Media, Modalities. London: Bloomsbury Academic.

Crosthwaite, P., Wong, L.L.C., & Cheung, J.O. (2019). Characterising graduate students’ corpus query and usage patterns for Data-driven LearningReCALL. 31(3), 255-275. doi:10.1017/S0958344019000077



Cheung, J.O. (2020). Guest lecture: exploring social media policy with corpus linguistic methods. DPS709 Engaging the Media and Public Communications. Lingnan University, 28 March 2020.

Cheung, J.O. (2020). Guest lecture: exploring social media policy with visual grammar. DPS709 Engaging the Media and Public Communications. Lingnan University, 21 March 2020.

Siu, W.Y., & Cheung, J.O. (2018). Content-based English enhancement scheme: A case study in Civil Engineering core courses. 2nd International Conference on English Across the Curriculum. The Hong Kong Polytechnic University, 4-5 December 2018.

Cheung, J.O. (2018). Writing with suspense: A corpus-based news lead analysis. 2018 International Conference on Bilingual Learning and Teaching. The Open University of Hong Kong, 25-27 October 2018.

Cheung, J.O. (2018). ‘Disciplinary DDL is not enough’: content-based subject-specific wordlists for civil engineering undergraduates. International Conference on English Language Education in the Chinese Context. The Education University of Hong Kong, 4-5 May 2018.

Cheung, J.O. (2018). Guest lecture: critical discourse analysis. GE1305 Understanding media in a multicultural world. School of Continuing and Professional Studies, The Chinese University of Hong Kong, 14 March 2018.

Cheung, J.O. (2017). Attitude, identity and social struggle: a corpus-based appraisal analysis of Heavy Metal music lyrics. 6th New Zealand discourse conference. Auckland University of Technology, 6-9 December 2017.