KDD-2018 Tutorial T39

Building a Large-scale, Accurate and Fresh Knowledge Graph



One of the recurring criticism about the current state of Artificial Intelligence (AI) is the deficiency caused by the lack of background knowledge. A knowledge graph, where entities are represented as nodes and relations among entities are represented as directional edges, can significantly close such gap. Building and maintaining a large-scale, accurate and fresh knowledge graph, however, is a significant endeavor. From the ontological systems required to map the world’s knowledge, to the information extraction systems needed to collect accurate and high-coverage facts from both structured and unstructured sources, and from the quality assurance process necessary for weeding out errors and inconsistencies, to the frequent updates demanded by a fresh knowledge graph, the tasks present numerous challenges and still constitute many open research areas. In this tutorial we first provide an overview of Microsoft’s Knowledge Graph. We then survey the recent work addressing the challenges encountered in all phases of the construction of a knowledge graph. Finally we conclude with a view towards the future, where knowledge graph can contribute even more significantly in several application domains.

Target Audience

Our goal is to give the audience a comprehensive understanding of the challenges faced in constructing a large-scale, accurate and fresh knowledge graph, survey the state-of-the-art solutions proposed in the field, and hopefully spark the excitement about the endeavor leading to the improvements of AI applications.

The audience is required to have basic understanding about knowledge graphs (knowledge bases), and intermediate knowledge about natural language processing and machine learning.


ICC Capital Suite Room 10 (Level 3), ExCel London


13:00-17:00, August 19, 2018 (Sunday)


Yuqing Gao

Affiliation: Knowledge Graph, Microsoft

Email: yuga@microsoft.com

Bio: Yuqing Gao, Ph. D., is an IEEE Fellow for her distinguished contribution to speech recognition, speech-to-speech translation and natural language understanding, and a Partner Group Engineering Manager at Microsoft. She leads the world-wide team for Microsoft’s Knowledge Graph. Her work was featured by MIT Technology Review Magazine, Time Magazine, CNN, ABC, BBC and many major media outlets. She published over 120 papers, holds 35 issued patents. Prior to joining Microsoft, she worked as a research staff member and later Distinguished Engineer in IBM TJ Watson Research Center, and she created IBM Watson for Finance, IBM Mastor (speech-to-speech translator) which won number of DARPA evaluations and deployed to the real world.  

Jisheng Liang

Affiliation: Knowledge Graph, Microsoft

Email: jilian@microsoft.com

Bio: Jisheng Liang, Ph.D. is an applied scientist and a Partner Software Engineering Manager at Microsoft. He helped create Microsoft’s Knowledge Graph. Before Microsoft, he was the Chief Scientist and one of the founding members of an NLP-based, semantic search start up named Evri. He received his Ph.D. from the University of Washington in the areas of pattern recognition and machine learning.

Benjamin Han

Affiliation: Knowledge Graph, Microsoft

Email: diha@microsoft.com

Bio: Benjamin Han is a Principal Scientist in the Knowledge Graph group at Microsoft. His research has focused on multilingual, multi-domain information extraction (mention detection, coreference resolution, relation extraction and slot-filler extraction) and other NLP related technologies. He was a major contributor to the top performing systems as part of the IBM teams in the ACE, GALE, TAC-KBP and other evaluations. Prior to joining Microsoft, he worked as a research staff member in the Multilingual NLP Technologies group at IBM TJ Watson Research Center.

Mohamed Yakout

Affiliation: Knowledge Graph, Microsoft

Email: myakout@microsoft.com

Bio: Mohamed Yakout, Ph.D. is a Principal Scientist at Microsoft. His area of research involves improving data quality, web scale data integration and building knowledge graph. Particularly he focuses on scalable technologies for entity matching and resolution; and on ontological representation of structured data along with automatic techniques for schema inferencing, matching and mapping. Before Microsoft, Mohamed was pursuing his PhD from Purdue University. He was awarded the best dissertation award in the International Conference on Information Quality in 2012.

Ahmed Mohamed

Affiliation: Knowledge Graph, Microsoft

Email:  ahmedat@microsoft.com

Bio: Ahmed Mohamed is a Senior Applied Data Scientist in the Knowledge Graph team at Microsoft.  He leads the team that is responsible for knowledge-based conversation understanding. Prior to this, Ahmed was the main developer that built many releases of the Bing Question Answering System for many years. Before that, Ahmed helped building custom probabilistic graphical models platform for Bing's Document Understanding engine. Ahmed's semantic extractor models held the record in quality and coverage in the entire Bing index in 2012.  Prior to Microsoft, Ahmed worked in a research startup focusing on image processing and before that he finished his undergraduate and graduate studies focusing on parallel algorithms and machine learning from Faculty of Engineering in Cairo University.

Corresponding Author

Benjamin Han (diha@microsoft.com)

Tutorial Outline

Part I: Introduction

Part II: Acquiring Knowledge in the Wild

Part III: Building Knowledge Graph​

Part IV: Serving Knowledge to the World​


Final version


(c) 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it.

Some examples are for illustration only and are fictitious. No real association is intended or inferred.

This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.