EB: Tell our readers a little about your background – what kick-started your interest in technology?
YL: Though today I work in the artificial intelligence-related field of Natural Language Processing (NLP), I never saw a real computer until I went to college. Growing up in the 1980s in Jinsha, a remote small town in southwest China, the term computer was merely an abstract concept that I heard from TV and our local newspaper.
Nevertheless, when I became the first student from my county admitted to Tsinghua University, 2,000 kilometers away in Beijing, and ranked at the top of my class with over a quarter million participates in the National Higher Education Entrance Examination of the entire Guizhou province, I chose to pursue a dual-degree in Automation and Economics. I was inspired by what I read in the newspaper and in science fiction. I hoped that this combined curriculum of Computer Science, Electrical Engineering, and Economics would equip me with the knowledge and tools to make people’s lives better by automating tedious manual work.
EB: So how did you end up in the US?
YL: My new found passion to use computers to make the world better led me to a Computer Science PhD in the US, at the University of Michigan, where I studied under Dr. H. V. Jagadish, who is well-known for database usability research. But even here, my double life continued, as I took an internship with MBA students to transfer technology we invented in school out into the real world. For one project, we helped commercialize a patented technology in culturing bone issues to speed up the discovery of drugs for diseases such as osteoporosis; for another we developed mobile-based technology to help improving student engagement in the classroom.
Now at IBM Research for almost 10 years, I lead the ScalableNLP, or SNap group.
EB: What does a Natural Language Processing researcher actually do?
YL: A key problem we work on is information extraction (IE), or the task of extracting structured information from unstructured or semi-structured data. It allows machines to read and construct knowledge bases – the cornerstone of many cognitive systems, including IBM Watson.
And, in perhaps perfect professional poetry for me, there are two general approaches towards IE. The machine learning approach is extremely popular in academia research. However, this approach typically requires a large collection of labeled datasets that are often difficult to obtain in practice. Moreover, the models learned are often black boxes in that their inner workings are often somewhat hidden, and thus difficult to understand and explain.
The other approach to IE is to develop algorithms using declarative languages. It is a popular approach in the commercial world, as it requires virtually no labeled data and results in easy-to-understand programs. However, this approach can be time-consuming and labor-intensive.
The research philosophy of my team, as exemplified by SystemT — a state-of-the-art natural language processing engine currently powering over 10 IBM products and services — has been that the best solution is the one that combines the best of these two approaches.
EB: What have you been working on recently?
YL: We recently created SEER, a tool that learns how to create models in the form of visual declarative extraction programs, based on a small number of user-specified examples. It lets users develop high-quality NLP algorithms that are transparent and explainable with minimal training. For example, SEER can identify education and employment history from bibliographies, as well as identify natural habitats for endangered species. So, government agencies could potentially better understand how to educate citizens for in-demand jobs, while other agencies can better protect the natural resources that preserve wildlife.
Next, my team and I hope to open SEER to IBM developers, and help them with everything from mobile apps, to the next Watson API.
EB: Going from small-town Guizhou to Silicon Valley is amazing – how what do you hope others can learn from your journey?
YL: I’m proud that I followed my dream all the way to the world’s technology mecca, Silicon Valley, where I can share my passion to improve diversity in the STEM field. My journey has given me the opportunity to actively mentor women and under-represented minorities, thanks to programs such as the internship program Leading to Africa.
My studies and work at IBM have motivated me to regularly organize technical talks and activities for the Women’s Network of Northern California, a community for IBM technical women. And I am privileged to have served on the MentorNet Mentor-Protégé Council since 2013 and, beginning this year, the BSCS External Advisory Board of San Jose State University.
EB: What would your advice be to others looking to embark on a similar journey – small-town people with big aspirations?
YL: My message through all of these activities is to encourage people to pursue the opportunities they are most passionate about, even if they are following a unique path. I’m proof that you can combine two seemingly disparate disciplines. Or become proficient in an area that would seem out-of-reach given your background. In short, it’s not where you’re from that determines your future. It’s you.