This research project aimed to address the pressing issue of toxic language in online conversations through the development and evaluation of multi-label classification models. Using the Kaggle competition dataset, the study focused on building classifiers using the Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) model frameworks, to compare performance and validate accuracy.
The CNN and LSTM models demonstrated robust performance, achieving high label accuracy and ROC AUC values, signifying their effectiveness in classifying toxic language. The evaluation extended beyond the familiar Kaggle dataset to an entirely new, unknown dataset, affirming the models’ generalisability and potential for real-world applications. Both models were proficient for multi- and single-label classification of unfamiliar data, showcasing their versatility in toxicity detection.
Several avenues for future work were proposed to: address potential misclassifications, explore multilingual toxicity detection, optimise real-time detection in live conversations, and consider hybrid systems that combine human input with pre-trained language models. Attention was further directed towards detection of subtler forms of toxicity by considering platform-specific nuances and addressing bias patterns, to ensure transparency and consistency across results.
This project contributes valuable tools for enhancing toxicity detection accuracy and fostering a more respectful and inclusive digital environment. Both the CNN and LSTM models stand out as effective solutions with potential applications beyond the initial dataset, emphasising the significance of continued exploration in refining algorithms for adaptive and fair toxicity mitigation in diverse online environments.