Building a Large-scale, Accurate and Fresh Knowledge Graph
One of the recurring criticism about the current state of Artificial Intelligence (AI) is the deficiency caused by the lack of background knowledge. A knowledge graph, where entities are represented as nodes and relations among entities are represented as directional edges, can significantly close such gap. Building and maintaining a large-scale, accurate and fresh knowledge graph, however, is a significant endeavor. From the ontological systems required to map the world’s knowledge, to the information extraction systems needed to collect accurate and high-coverage facts from both structured and unstructured sources, and from the quality assurance process necessary for weeding out errors and inconsistencies, to the frequent updates demanded by a fresh knowledge graph, the tasks present numerous challenges and still constitute many open research areas. In this tutorial we first provide an overview of Microsoft’s Knowledge Graph. We then survey the recent work addressing the challenges encountered in all phases of the construction of a knowledge graph. Finally we conclude with a view towards the future, where knowledge graph can contribute even more significantly in several application domains.
Our goal is to give the audience a comprehensive understanding of the challenges faced in constructing a large-scale, accurate and fresh knowledge graph, survey the state-of-the-art solutions proposed in the field, and hopefully spark the excitement about the endeavor leading to the improvements of AI applications.
The audience is required to have basic understanding about knowledge graphs (knowledge bases), and intermediate knowledge about natural language processing and machine learning.
ICC Capital Suite Room 10 (Level 3), ExCel London
13:00-17:00, August 19, 2018 (Sunday)
Affiliation: Knowledge Graph, Microsoft
Email: yuga@microsoft.com
Bio: Yuqing Gao, Ph. D., is an IEEE Fellow for her distinguished contribution to speech recognition, speech-to-speech translation and natural language understanding, and a Partner Group Engineering Manager at Microsoft. She leads the world-wide team for Microsoft’s Knowledge Graph. Her work was featured by MIT Technology Review Magazine, Time Magazine, CNN, ABC, BBC and many major media outlets. She published over 120 papers, holds 35 issued patents. Prior to joining Microsoft, she worked as a research staff member and later Distinguished Engineer in IBM TJ Watson Research Center, and she created IBM Watson for Finance, IBM Mastor (speech-to-speech translator) which won number of DARPA evaluations and deployed to the real world.
Affiliation: Knowledge Graph, Microsoft
Email: jilian@microsoft.com
Bio: Jisheng Liang, Ph.D. is an applied scientist and a Partner Software Engineering Manager at Microsoft. He helped create Microsoft’s Knowledge Graph. Before Microsoft, he was the Chief Scientist and one of the founding members of an NLP-based, semantic search start up named Evri. He received his Ph.D. from the University of Washington in the areas of pattern recognition and machine learning.
Affiliation: Knowledge Graph, Microsoft
Email: diha@microsoft.com
Bio: Benjamin Han is a Principal Scientist in the Knowledge Graph group at Microsoft. His research has focused on multilingual, multi-domain information extraction (mention detection, coreference resolution, relation extraction and slot-filler extraction) and other NLP related technologies. He was a major contributor to the top performing systems as part of the IBM teams in the ACE, GALE, TAC-KBP and other evaluations. Prior to joining Microsoft, he worked as a research staff member in the Multilingual NLP Technologies group at IBM TJ Watson Research Center.
Affiliation: Knowledge Graph, Microsoft
Email: myakout@microsoft.com
Bio: Mohamed Yakout, Ph.D. is a Principal Scientist at Microsoft. His area of research involves improving data quality, web scale data integration and building knowledge graph. Particularly he focuses on scalable technologies for entity matching and resolution; and on ontological representation of structured data along with automatic techniques for schema inferencing, matching and mapping. Before Microsoft, Mohamed was pursuing his PhD from Purdue University. He was awarded the best dissertation award in the International Conference on Information Quality in 2012.
Affiliation: Knowledge Graph, Microsoft
Email: ahmedat@microsoft.com
Bio: Ahmed Mohamed is a Senior Applied Data Scientist in the Knowledge Graph team at Microsoft. He leads the team that is responsible for knowledge-based conversation understanding. Prior to this, Ahmed was the main developer that built many releases of the Bing Question Answering System for many years. Before that, Ahmed helped building custom probabilistic graphical models platform for Bing's Document Understanding engine. Ahmed's semantic extractor models held the record in quality and coverage in the entire Bing index in 2012. Prior to Microsoft, Ahmed worked in a research startup focusing on image processing and before that he finished his undergraduate and graduate studies focusing on parallel algorithms and machine learning from Faculty of Engineering in Cairo University.
Benjamin Han (diha@microsoft.com)
Part I: Introduction
Part II: Acquiring Knowledge in the Wild
Part III: Building Knowledge Graph
Part IV: Serving Knowledge to the World
(c) 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it.
Some examples are for illustration only and are fictitious. No real association is intended or inferred.
This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.