The Digester

Researchers built negativas, a spaCy-based tool that automates the search and classification of three types of não negation in Brazilian Portuguese speech, reporting 93% accuracy but noting limits from missing prosody, double negation ambiguity, and data imbalance.

Negativas is a Python prototype built with spaCy that automatically locates and labels three sentential negation patterns with não: pre-verbal (NEG1), double (NEG2), and post-verbal (NEG3).
The tool was tested on 22 interviews from the Deslocamentos 2020 sample of the Falares Sergipanos corpus and found 3,338 occurrences of não, of which 2,085 were classified as sentential negation.
Negativas achieved 93% overall accuracy and moderate agreement with human annotations, with Cohen's kappa equal to 0.58.
Human annotators showed moderate inter-annotator agreement (Fleiss' kappa = 0.57), and NEG1 (pre-verbal) accounted for about 91% of negation instances.
Key limitations include transcripts without punctuation or prosodic cues, frequent confusion around double negation, and class imbalance that inflates some performance metrics.
The code is openly available on GitHub and the interview recordings can be accessed from the hosting lab on request for research purposes.

Negativas: NLP tool classifies 'não' negation in Brazilian Portuguese speech with 93% accuracy

Sources