教师简介

胡海 长聘教轨助理教授

部门:翻译系

主要经历

上海交通大学内部绝密信封料39英语系/翻译系助理教授(2021-2023 / 2023-今)

美国印第安纳大学布鲁明顿分校计算语言学博士(2015-2021)

中国人民大学英语语言学硕士(2012-2015)

中国人民大学英语语言文学学士(2008-2012)

德国图宾根大学语言学系交换生(2013-2014)

 

个人主页

教学科研

欢迎对计算语言学/自然语言处理/语料库语言学感兴趣、或有意读硕士(语言学学硕、翻译专硕)的同学联系我(hu.hai [shift+2] sjtu.edu.cn)。

 

News

欢迎使用我们小组开发的ChatGPT英语作文检测器:https://huggingface.co/spaces/SJTU-CL/argugpt-detector

论文:Liu, Y., Zhang, Z., Zhang, W., Yue, S., Zhao, X., Cheng, X., Zhang, Y., & Hu, H. (2023). ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models. ArXiv, abs/2304.07666. paper data

 

欢迎使用我们的汉语句法可接受度数据集评测您的语言模型:https://github.com/huhailinguist/CoLAC/ 

Hai Hu*, Ziyin Zhang*, Weifang Huang, Jackie Yan-Ki Lai, Aini Li, Yina Patterson, Jiahui Huang, Peng Zhang, Chien-Jer Charles Lin, Rui Wang. (2023). Revisiting Acceptability Judgements: CoLAC - Corpus of Linguistic Acceptability in Chinese. ArXiv, abs/2305.14091. *equal contributionspaperdata.

 

研究兴趣:

计算语言学;自然语言理解;自然语言推理;句法树库;语料库翻译研究;数字人文

 

科研项目:

  • 主持教育部人文社科青年项目(2022-今,在研)
  • 翻译文本句法树库建立(2019-2022,在研):中国人民大学-印第安纳大学种子基金 (Renmin University of China–Indiana University Joint Funding Program) 
  • 原生汉语自然语言推理数据集 Original Chinese Natural Language Inference (OCNLI) corpus. link
  • 中文语言理解测评基准 Chinese Language Understanding Evaluation (CLUE) benchmark. link

 

人才项目:

  • 上海市浦江人才计划(2022)

 

教授课程

上海交通大学:学术英语写作、大学英语、英语视听说、语言智能

印第安纳大学:语言学入门、认知科学中的逻辑与数学(助教)

 

研究领域及论文:


【自然语言理解/自然语言推理; natural language understanding/natural language inference】

 

In this line of research, I work on:

 

1) teaching computers to understand human language in the form of natural language inference (自然语言推理), employing both logic-based methods (monotonicity calculus) and deep learning methods (pre-trained language models such as BERT), in collaboration with logicians and computer scientists.

We ask questions such as:

if we know that "All students party on New Year's Eve" and that "Most students get drunk in every party", does it follow that "Most PhD students get drunk on New Year's Eve"? (find the answer at the bottom of the page)

简言之,我用逻辑模型或深度学习模型(如BERT)教计算机做推理。

 

2) constructing benchmarks/infrastructure for evaluating NLP models mainly in Chinese, partly aiming to expose NLP models' weaknesses on specific linguistic phenomena such as the classifiers in Chinese.

In other words, I build datasets!

简言之,我建数据集训练/测试/玩弄计算机模型。

 

  • Aikaterini-Lida Kalouli*, Hai Hu*, Alexander F. Webb, Lawrence S. Moss, Valeria de Paiva. (2023). Curing the SICK and other NLI maladies. Computational Linguistics. 49 (1): 199–243. doi: https://doi.org/10.1162/coli_a_00465. *equal contributions. paper. data. (SSCI)
  • Xu, Liang, Xiaojing Lu, Chenyang Yuan, Xuanwei Zhang, Huilin Xu, Hu Yuan, Guoao Wei, Pan Xiang, Xin Tian, Hai Hu. (2021). FewCLUE: A Chinese few-shot learning evaluation benchmark. arXiv preprint arXiv:2107.07498. paper. code.
  • Hu, Hai, He Zhou, Zuoyu Tian, Yiwen Zhang, Yina Ma, Yanting Li, Yixin Nie, Kyle Richardson (2021). Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference. In: Findings of ACLpapercode

  • Xu, Liang, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, and Zhenzhong Lan (2020). CLUE: A Chinese Language Understanding Evaluation Benchmark. In Proceedings ofthe 28th International Conference on Computational Linguistics (COLING). pp. 4762–4772paperwebsitegithub page

  • Hu, Hai, Kyle Richardson, Liang Xu, Lu Li, Sandra Kuebler, and Larry Moss. (2020). OCNLI: Original Chinese Natural Language Inference. In: Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 3512–3526papercode and dataleaderboard.

  • Richardson, Kyle, Hai Hu, Larry Moss, and Ashish Sabharwal. (2020). Probing Natural Language Inference Models through Semantic Fragments. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence. pp. 8713-8721papercode and data.

  • Hu, Hai, Qi Chen, Kyle Richardson, Atreyee Mukherjee, Lawrence S Moss, and Sandra Kuebler. (2020). MonaLog: a Lightweight System for Natural Language Inference Based on Monotonicity. In: Proceedings of the Society for Computation in Linguistics 2020. pp. 319-329. paperpostercode.

  • Hu, Hai, Qi Chen and Larry Moss. (2019). Natural Language Inference with Monotonicity. In Proceedings of the 13th International Conference on Computational Semantics (IWCS 2019), pp. 8–15. Gothenburg, Sweden. paper.

  • Hu, Hai, and Lawrence S. Moss. (2018). Polarity Computations in Flexible Categorial Grammar. In Proceedings of the 7th Joint Conference on Lexical and Computational Semantics: *SEM, pp. 124–129. New Orleans, Louisiana, USA. paperpostercode.


【语义变迁; semantic change】

 

Here I work on detecting semantic change using word embeddings (word2vec, GloVe) in low-resource scenarios, e.g., medieval Spanish. 

简言之,我用词向量的方法探测哪些词语的词义发生了历时变化。

 

  • Amaral, Patrícia, Hai Hu and Sandra Kübler (2022). "Tracing semantic change with distributional methods: The contexts of algo". Diachronica. https://doi.org/10.1075/dia.21012.ama paper (SSCI).

  • Hu, Hai, Patrícia Amaral and Sandra Kübler (2022). "Word Embeddings and Semantic Shifts in Historical Spanish: Methodological Considerations". Digital Scholarship in the Humanities. Volume 37, Issue 2, Pages 441–461. https://doi.org/10.1093/llc/fqab050 papercode (SSCI)


【语料库翻译研究/翻译汉语树库建设; corpus translation studies/treebank construction】

 

I am also interested in the morphological, syntactic and stylistic characteristics of translated Chinese (翻译汉语) and Europeanized Chinese (欧化汉语).

To this end, I 1) employ machine learning methods to study translations and 2) build treebanks (=syntactically annotated corpora) to look into the syntactic features of translationese. 

简言之,我用机器学习的方法和自建的句法树库研究翻译文本特征。

 

  • Hu, Hai and Sandra Kübler. (2021). Investigating Translated Chinese and Its Variants Using Machine Learning. In Natural Language Engineering. Volume 27, Issue 3 , May 2021 , pp. 339 - 372. https://doi.org/10.1017/S1351324920000182 (SCI/SSCI/AHCI) papercode.

  • Hu, Hai, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Sandra Kübler, and Chien-Jer Charles Lin (2020). "Building a Literary Treebank for Translation Studies in Chinese". In: Proceedings of 19th International Workshop on Treebanks and Linguistic Theories (TLT). pp. 18-31. paper.

  • Hu, Hai, Wen Li, and Sandra Kübler. (2018). Detecting Syntactic Features of Translated Chinese. In Proceedings of the 2nd Workshop on Stylistic Variation, pp. 20-28. New Orleans, Louisiana, USA. paperslidesvideo presentation.


【其它; others】

 

I'm a linguist, so I also collaborate with other linguists on very linguistic-y projects where computational modeling is sometimes used. 

作为语言学家,我也做一些有趣的语言学研究,比如为什么成都人把"吃饭"说成"吃fɛn"。

 

  • Li, A., Tamminga, M., & Hu, H. (2023). Intra- and interspeaker repetitiveness in Chengdu Mandarin locative variation. Language Variation and Change, 1-21. doi:10.1017/S095439452300008X Paper.
  • Lin, Chien-Jer Charles, and Hai Hu. (in press). Linking comprehension and production: Frequency distribution of Chinese relative clauses in the Sinica Treebank. In Chu-Ren Huang, Shukai Hsieh, & Peng Jin (eds.) Chinese Language Resources: Data Collection, Linguistic Analysis, Annotation and Language Processing. Springer. paper

  • Hu, Hai and Yiwen Zhang. (2017). Path of Vowel Raising in Chengdu Dialect of Mandarin. In Proceedings of the 29th North America Conference on Chinese Linguistics. Rutgers, NJ. pp. 481-498. paperabstract.

 

所有发表文章请参看:https://huhailinguist.github.io/publications/

 

翻译:

  • 《表象与本质——类比,思考之源和思维之火》刘健、胡海、陈祺 译;[美] 侯世达 / [法] 桑德尔 著;浙江人民出版社;2018年

 


Recent talks: 

  • 2023/04: 预训练模型进展与展望. SJTU SFL.
  • 2022/03: Examining the Replicability of Grammaticality Judgments in Chinese Journal Articles: Dialectal Influences and Sources of Variability. Annual Conference on Human Sentence Processing (UC Santa Cruz; Online)
  • 2021/12: Recent progress in natural language inference. AWS AI Lab Shanghai.
  • 2021/11: Everytime I hire a linguist, my accuracy goes down: why NLU still needs linguists now? Fudan University NLP lab.

 


Students: 

  • Yikang Liu: master student in linguistics at SFL, SJTU 

 

Thesis advisees/committee member (previous; current):

  • BS thesis in CS: Ziyin Zhang (co-adviser; BS in CS/BA in English, SJTU; master student in CS, SJTU)
  • BA thesis: Chunhao Wang (BA in English/BS in Math; master student at UC Berkeley); Shisen Yue; Jieqiong Ding
  • PhD thesis: Qi Chen (committee member; Cognitive Science, Indiana University)

 

(The answer to the inference question is: NO. It does not follow. )

社会兼职

会议组织:

  • NAtural LOgic meets MAchine learning (NALOMA) workshop; Workshop at WESSLLI 2020. webpage

 

审稿:

  • ACL, EMNLP, NAACL, CCL等计算语言学会议
  • Natural Language Engineering等学术期刊

地址:中国上海东川路800号上海交通大学闵行校区杨咏曼楼

  邮编:200240  网址:http:

​​​​​​​ 电话:021-34205664 (党政办公室)  021-34204723(教学科研办公室)

Copyright @ 2017 内部绝密信封料39 - 环球白银理财网 旧版网站