Utility-Theoretic Ranking for Semi-Automated Text Classification

Day - Time: 20 May 2013, h.11:00

Place: Area della Ricerca CNR di Pisa - Room: C-29

Share with Whatsapp

Share with Telegram

Send by email

Speakers

Giacomo Berardi

Referent

Abstract

Suppose an organization needs to classify a set D of textual documents, and suppose that D is too large to be classified manually, so that resorting to some form of automated text classification (TC) is the only viable option. Suppose also that the organization has strict accuracy standards, so that the level of effectiveness obtainable via state-of-the-art TC technology is not sufficient. In this case, the most plausible strategy to follow is to classify D by means of an automatic classifier F, and then to have a human editor inspect the results of the automatic classification, correcting misclassifications where appropriate. The human annotator will obviously inspect only a subset D' of D (since it would not otherwise make sense to have an initial automated classification phase). We call this scenario Semi-Automated Text Classification (SATC). An automated system can support this process by ranking the automatically labelled documents in a way that maximizes the expected increase in effectiveness that derives from inspecting D'. An obvious strategy is to rank D so that the documents that F has classified with the lowest confidence are top-ranked. In this work we show that this strategy is suboptimal. We develop a new utility-theoretic ranking method based on the notion of inspection gain, defined as the improvement in classification effectiveness that would derive by inspecting and correcting a given automatically labelled document. We also propose a new effectiveness measure for SATC-oriented ranking methods, based on the expected reduction in classification error brought about by partially inspecting a list generated by a given ranking method.
We report the results of experiments showing that, with respect to the baseline method above, and according to the proposed measure, our ranking method can achieve substantially higher expected reductions in classification error.

NOTE: This seminar is the third one of the series of six seminars presented by the winners of the prize "Young researchers ISTI 2013". Giacomo Berardi placed first in the PhD student category.

Latest Announcements

Seminars

A general framework for distributed approximate similarity search with arbitrary distances

2024-09-26

While many similarity search algorithms are specifically adapted to metric distances,they are unsuitable for alternatives like the cosine distance, which has gained popularity, particularly with embeddings and text mining. To address thisissue, we propose GDASC (General Distri...

Room: C-29

Seminars

Model-Driven Engineering Meets Model-Based Testing

2024-09-17

In this talk, I will focus on a connection between stable-failures refinement and the ioco conformance relation. Both behavioural relations underlie methodologies that have gained traction in industry: stable-failures refinement is used in several commercial Model-Driven Engin...

Room: Faedo (C-29)

Seminars

Explainability in deep learning models applied to spatio-temporal problems

2024-09-13

Artificial Intelligence (AI) is transforming society, affecting everything from industry to decision making, and concerns about its transparency have increased. Explainable Artificial Intelligence (XAI) is crucial to address this problem, allowing to obtain a better understand...

Room: C-29

Seminars

Making 5G Networks Reliable for Next-generation Applications using AI

2024-05-27

The emergence of 5G technology marks a significant milestone in developing telecommunication networks, enabling exciting new applications such as augmented reality and self-driving vehicles. However, these improvements bring an increased management complexity and a special con...

Room: C-29

Seminars

Giovani in un'ora - Ciclo di seminari - Quinta parte

2024-11-27

Giulio Del Corso - "Generating a physically accurate cardiac MRI: a story of (interesting) failures and (justifiable) numerical shortcuts"Abstract: "The generation of physically accurate cardiac-MRI video sequences is a challenging topic that requires a combination of modern g...

Room: C-29

Seminars

Giovani in un'ora - Ciclo di seminari - Quarta parte

2024-11-20

Luca Ciampi - "Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting"Abstract: "Object counting estimates the number of objects in images or video frames. Studies reveal that the human brain employs two distinct methods for counting objects owing to the s...

Room: C-29

Seminars

Giovani in un'ora - Ciclo di seminari - Terza parte

2024-10-16

Gabriele Lagani - "Hebbian learning algorithms for deep neural networks: explorations and outlooks"Abstract: Deep learning systems have achieved outstanding results in various AI tasks. However, such system suffer from a number of limitations, for example in terms of energy an...

Room: C-29

Seminars

Giovani in un'ora - Ciclo di seminari - Seconda parte

2024-10-09

Saira Bano - "From Complexity to Clarity: Enhancing Cross-Modal Knowledge Distillation via Multimodal Teacher Ensembles"Abstract: Traditional knowledge distillation (KD) typically uses a large, complex teacher model, often trained to a single modality, to transfer knowledge to...

Room: C-29

Seminars

Patient Interaction – for well-being, productivity and sustainability

2024-10-08

We live in aworld of instant results and fleeting gratification. In HCI no less: the designprinciples for direct manipulation require immediate feedback and, in the caseof graphical actions, sub-second responses. In addition, computers expect us togive them our undivided atten...

Room: Aula Faedo C-29

Seminars

Giovani in un'ora - Ciclo di seminari - Prima parte

2024-10-02

Ali Reza Omrani - "Machine Learning to Measure Vocal Stereotypy: An Extension"Abstract: Repeated measurement of behavior is a process central to behavior analysis, but its implementation occasionally requires hiring observers dedicated exclusively to data collection, which may...

Room: C-29

Seminars

A general framework for distributed approximate similarity search with arbitrary distances

2024-09-26

While many similarity search algorithms are specifically adapted to metric distances,they are unsuitable for alternatives like the cosine distance, which has gained popularity, particularly with embeddings and text mining. To address thisissue, we propose GDASC (General Distri...

Room: C-29

Seminars

Model-Driven Engineering Meets Model-Based Testing

2024-09-17

In this talk, I will focus on a connection between stable-failures refinement and the ioco conformance relation. Both behavioural relations underlie methodologies that have gained traction in industry: stable-failures refinement is used in several commercial Model-Driven Engin...

Room: Faedo (C-29)

Seminars

Explainability in deep learning models applied to spatio-temporal problems

2024-09-13

Artificial Intelligence (AI) is transforming society, affecting everything from industry to decision making, and concerns about its transparency have increased. Explainable Artificial Intelligence (XAI) is crucial to address this problem, allowing to obtain a better understand...

Room: C-29

Seminars

Making 5G Networks Reliable for Next-generation Applications using AI

2024-05-27

The emergence of 5G technology marks a significant milestone in developing telecommunication networks, enabling exciting new applications such as augmented reality and self-driving vehicles. However, these improvements bring an increased management complexity and a special con...

Room: C-29

Seminars

Giovani in un'ora - Ciclo di seminari - Quinta parte

2024-11-27

Giulio Del Corso - "Generating a physically accurate cardiac MRI: a story of (interesting) failures and (justifiable) numerical shortcuts"Abstract: "The generation of physically accurate cardiac-MRI video sequences is a challenging topic that requires a combination of modern g...

Room: C-29

Seminars

Giovani in un'ora - Ciclo di seminari - Quarta parte

2024-11-20

Luca Ciampi - "Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting"Abstract: "Object counting estimates the number of objects in images or video frames. Studies reveal that the human brain employs two distinct methods for counting objects owing to the s...

Room: C-29

Seminars

Giovani in un'ora - Ciclo di seminari - Terza parte

2024-10-16

Gabriele Lagani - "Hebbian learning algorithms for deep neural networks: explorations and outlooks"Abstract: Deep learning systems have achieved outstanding results in various AI tasks. However, such system suffer from a number of limitations, for example in terms of energy an...

Room: C-29

Seminars

Giovani in un'ora - Ciclo di seminari - Seconda parte

2024-10-09

Saira Bano - "From Complexity to Clarity: Enhancing Cross-Modal Knowledge Distillation via Multimodal Teacher Ensembles"Abstract: Traditional knowledge distillation (KD) typically uses a large, complex teacher model, often trained to a single modality, to transfer knowledge to...

Room: C-29

Seminars

Patient Interaction – for well-being, productivity and sustainability

2024-10-08

We live in aworld of instant results and fleeting gratification. In HCI no less: the designprinciples for direct manipulation require immediate feedback and, in the caseof graphical actions, sub-second responses. In addition, computers expect us togive them our undivided atten...

Room: Aula Faedo C-29

Seminars

Giovani in un'ora - Ciclo di seminari - Prima parte

2024-10-02

Ali Reza Omrani - "Machine Learning to Measure Vocal Stereotypy: An Extension"Abstract: Repeated measurement of behavior is a process central to behavior analysis, but its implementation occasionally requires hiring observers dedicated exclusively to data collection, which may...

Room: C-29

Seminars

A general framework for distributed approximate similarity search with arbitrary distances

2024-09-26

While many similarity search algorithms are specifically adapted to metric distances,they are unsuitable for alternatives like the cosine distance, which has gained popularity, particularly with embeddings and text mining. To address thisissue, we propose GDASC (General Distri...

Room: C-29

Seminars

Model-Driven Engineering Meets Model-Based Testing

2024-09-17

In this talk, I will focus on a connection between stable-failures refinement and the ioco conformance relation. Both behavioural relations underlie methodologies that have gained traction in industry: stable-failures refinement is used in several commercial Model-Driven Engin...

Room: Faedo (C-29)

Seminars

Explainability in deep learning models applied to spatio-temporal problems

2024-09-13

Artificial Intelligence (AI) is transforming society, affecting everything from industry to decision making, and concerns about its transparency have increased. Explainable Artificial Intelligence (XAI) is crucial to address this problem, allowing to obtain a better understand...

Room: C-29

Seminars

Making 5G Networks Reliable for Next-generation Applications using AI

2024-05-27

The emergence of 5G technology marks a significant milestone in developing telecommunication networks, enabling exciting new applications such as augmented reality and self-driving vehicles. However, these improvements bring an increased management complexity and a special con...

Room: C-29