Ontology Enrichment

A central focus of our prior work has been the collection and analysis of contemporary and historical constitutions, which we have organized as the Comparative Constitutions Project (CCP) (Elkins and Ginsburg 2021). Using the CCP ontology, we have identified a core set of 330 topics in each written constitution of independent countries from 1789 to the present. This spatial-temporal identification of constitutional elements has since facilitated at least 200 published analyses of the origins and consequences of these institutions. Yet, we do not see CCP as the authoritative ontology in this space. If anything, we see CCP as a preliminary reference set for the integration of other ontologies in comparative law.

Our ontology enrichment project thus refines the CCP ontology to expand its conceptual scope and to optimize the ontology for use with NLP tools. Ontologies that express topics in ways that rely on analysts’ implicit knowledge are not best suited for use with NLP tools. For one thing, the definitions of two distinct concepts in an ontology may include similar words if they relate to the same context, leaving concepts insufficiently disambiguated and unable to retrieve distinct passages in a corpus. Topic refinement is thus needed to increase the resolution of topics, rewording them to increase the semantic distance between topics and reduce the number of common sentences they match. Ontology enrichment is also needed to expand the conceptual scope of our ontology to capture evolving constitutional content and diverse conceptualizations in the constitutional domain.

Our approach takes several steps to refine the CCP ontology for use with our NLP tools and to increase its topical scope. This will:

Refinement of the CCP ontology is a requisite step for our follow-on projects. Yet, it also allows us to answer several research questions related to ontology enrichment, namely: Which topic structures are most compatible with our NLP tools? How can we measure improved CCP topic separation and topic matching? How can ontology comparison improve ontology accuracy, completeness, and alignment across a domain?

Papers

Expanding Your Vocabulary: Topic Integration Combining Automation and Human Expertise

Topic discovery and integration are essential to keep vocabularies—the set of concepts underlying a textual corpus—relevant. New topics come from a number of sources, including recent additions to a corpus, reconceptualizations of a domain, or concepts from other domains. We present a methodology that combines automation and human expertise to assess candidate topics. To develop the methodology we used a vocabulary created by the Comparative Constitutions Project (CCP) that tracks more than 330 topics within a corpus of national constitutions. The methodology enables users to formulate candidate topics that comprise grammatically well-formed labels and descriptions. A sentence-level semantic similarity model is used to search for constitution sections that are similar in meaning to a topic, provided the topic is not similar in meaning to any existing topic. Search results are output as a CSV file in which users identify the constitution sections that best match the meaning of the topic. Domain experts collaborate on the design of topics by iteratively refining the topic formulation until it captures all applicable sections. A panel of scholars then decides whether to accept topics into the CCP vocabulary, after which matching constitution sections are automatically tagged with accepted topics. Using our methodology, several new topics have been added to the CCP vocabulary, some of which we use here to illustrate our process and results. The methodology provides researchers with a systematic way to expand existing vocabularies.

Aligning Social Science Ontologies via Multipartite Graphs and Text Embeddings

We propose a methodology to align social science ontologies, which are often loosely connected yet not fully cross-compatible. In our framework, multiple ontologies are represented via one multipartite graph, where vertices are topics and edges measure the semantic similarity between topics from different ontologies. This abstraction allows us to make the problem of alignment tractable by generating graph-level statistics and visualizations. In an illustration, we align two ontologies for topics in constitutions and peace agreements. Our methodology allows us to identify where existing ontologies align or differ, as well as to assess how potential new topics might fit in them. Future steps include public-facing interactive tools and direct connections with in-ontology and out-of-ontology text corpora.

Tools

We have developed new interactive tools we are using in our work and preparing for public release in the next year. These include the following tools related to ontology and corpus comparison.

Ontology Comparison Tool

This tool allows users to explore how their topics relate to topics in other ontologies. It identifies topics in two ontologies that are most similar, producing clusters of similar topics. This allows users to see how others have conceptualized similar topics and decide whether they should reformulate their topics or add new topics to fully capture the concept.

Ontology Summarization Tool

This tool allows users to explore ontologies to identify topics that are related to their corpus but missing from their own ontology. It matches their and other ontologies to their corpus, identifying whether each section of the corpus best matches their or another ontology. This reveals corpus text that is not yet captured by their ontology—and prospective new topics to do so.

Ontology Enrichment Tool

This tool allows users to leverage both automation and human insight in integrating new topics into their ontologies. It allows users to test varied topic formulations to ensure a new topic’s formulation is distinct from existing topics in the ontology and captures most appearances of the topic in the corpus.

Segments-as-Topic Matching Tool

This tool allows users to create new topics from clusters of semantically similar segments in the corpus they seek to classify. It allows users to gather an initial set of segments that form the conceptual baseline of the topic and leverage those segments to find additional relevant segments in the corpus.

Documentation

Topic Refinement Process

This documentation describes our rationale for refining existing topics in the Comparative Constitutions Project (CCP) ontology. It also describes our initial revisions of the CCP ontology for use with NLP applications, results of those revisions, and ongoing revisions to improve topic matching and classification. View our topic refinement process.

Ontology Enrichment Workflow

This documentation describes our workflow for combining automation and human expertise to systematically discover new topics and assess existing topics in an ontology. Topic discovery includes generating new candidate topics, selecting a candidate topic, designing and classifying the topic, assessing if the topic should be accepted or rejected or re-designed, applying a classification, and documenting the decision. Topic assessment includes selecting an existing topic, classifying the topic, assessing if the topic should be maintained or retired or re-designed, and documenting the decision. View our ontology enrichment workflow.

Ontology Enrichment Methodology

This documentation describes our methodology for using the project’s ontology enrichment tool to add and revise topics in the CCP ontology. It defines how to select the search parameters, search the CCP ontology to compare a candidate topic to existing CCP topics, search national constitutions to view constitution sections that match the candidate topic, review results and revise the topic formulation to capture appropriate constitutional content, and document these decisions. View our ontology enrichment methodology.