Motivation
For what purpose was the dataset created?
We created TCAB to facilitate future research on analyzing, understanding, detecting, and labeling adversarial attacks for text classifiers.
Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)?
TCAB was created by the REACT-NLP team, consisting of researchers at the University of Oregon and University of California Irvine. The project was led by Daniel Lowd and Sameer Singh, and the dataset was constructed by Kalyani Asthana, Zhouhang Xie, Wencong You, Adam Noack, and Jonathan Brophy.
Who funded the creation of the dataset?
This work was supported by the Defense Advanced Research Projects Agency (DARPA), agreement number HR00112090135.
Composition
What do the instances that comprise the dataset represent (e.g., documents, photos, people, countries)?
The dataset contains text only.
How many instances are there in total (of each type, if appropriate)?
In total, TCAB contains 2,414,594 instances: 1,504,607 successful adversarial instances, and 909,987 clean (unperturbed) examples.
Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?
We attack the test sets of each domain dataset, thus adversarial examples in TCAB comprise attack instances from a portion of the original domain datasets.
What data does each instance consist of?
Each instance contains the original text, perturbed text, original ground-truth label, and whether the instance is clean or perturbed. For perturbed instances, we also provide information about the attack such as the method used, toolchain, maximum number of queries, time taken per attack, and more. Please refer to our dataset page for a full description of all attributes.
Is there a label or target associated with each instance?
We provide labels indicating (1) whether the text is an adversarial instance and (2) what algorithm created the corresponding adversarial instance, which could then be used as labels for detecting adversarial attacks and identifying the attacking algorithm.
Is any information missing from individual instances?
There is no missing information from our instances.
Are relationships between individual instances made explicit (e.g., users' movie ratings, social network links)?
There are no explicit relationships between instances. However, our metadata supports grouping operations such as collecting instances perturbed from the same original text and instances attacked by the same attacking algorithm.
Are there recommended data splits (e.g., training, development/validation, testing)?
We randomly split the original dataset into training, validation and test sets. To avoid potential information leakage, all instances that originate from the same original text (including the unperturbed text) are guaranteed to be partitioned to the same split.
Are there any errors, sources of noise, or redundancies in the dataset?
Yes, adversarial attacks are by nature noisy, and there are cases where human judges believe the attack flips the true label of the text being perturbed. We present human evaluation results to help researchers evaluate the degree of such scenario in our dataset.
Is the dataset self-contained, or does it link to or otherwise rely on external resources (e.g., websites, tweets, other datasets)?
The dataset is self-contained and does not have external dependencies.
Does the dataset contain data that might be considered confidential (e.g., data that is protected by legal privilege or by doctor–patient confidentiality, data that includes the content of individuals’ non-public communications)?
The dataset does not contain confidential information.
Does the dataset contain data that, if viewed directly, might be offensive, insulting, threatening, or might otherwise cause anxiety?
Some instances from our dataset are derived from hatespeech datasets, and thus may contain offensive content.
Does the dataset relate to people?
The dataset is not related to people.
Collection Process
How was the data associated with each instance acquired?
Our dataset is derived from publically available datasets.
What mechanisms or procedures were used to collect the data (e.g., hardware apparatus or sensor, manual human curation, software program, software API)?
We use TextAttack and OpenAttack --- open-source toolchains that provide fully-automated off-the-shelf attacks --- to generate adversarial examples for TCAB. We ensure all released adversarial instances successfully flips the prediction of the victim model. We conduct additional human evaluation to check the quality of generated adversarial instances.
If the dataset is a sample from a larger set, what was the sampling strategy (e.g., deterministic, probabilistic with specific sampling probabilities)?
We randomly split the data into a train/validation/test split for each domain dataset. We then train different target models using the training and validation sets, and generate adversarial instances by attacking the target model on the test set.
Who was involved in the data collection process (e.g., students, crowdworkers, contractors) and how were they compensated (e.g., how much were crowdworkers paid)?
Our data collection process is automated. First, we train multiple target model classifiers for each domain dataset, then we attack those target models using publicly-available attack algorithms to generate adversarial instances.
Over what timeframe was the data collected?
We generated attacks and curated instances for TCAB over a period of 12 months, from November 1, 2020 to November 1, 2021.
Were any ethical review processes conducted (e.g., by an institutional review board)?
No.
Does the dataset relate to people?
The dataset is not related to people.
Preprocessing/Cleaning/Labeling
Was any preprocessing/cleaning/labeling of the data done (e.g., discretization or bucketing, tokenization, part-of-speech tagging, SIFT feature extraction, removal of instances, processing of missing values)?
No, we directly use the text instances from each domain dataset.
Uses
Has the dataset been used for any tasks already?
We present benchmarks for attacking detection and labeling.
Is there a repository that links to any or all papers or systems that use the dataset?
Xie et al. (2021) use this dataset for inferring attributes of different attacking algorithms.
What (other) tasks could the dataset be used for?
TCAB can also be used for attack localization, attack target labeling, and attack characterization.
Is there anything about the composition of the dataset or the way it was collected and preprocessed/cleaned/labeled that might impact future uses?
Currently, all instances are in English, and thus might yield observations that are specific to English. However, we publicly release our code for generating adversarial attacks and training victim models, enabling the extension of TCAB to other languages.
Are there tasks for which the dataset should not be used?
No.
Distribution
Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created?
The dataset is publicly available.
How will the dataset will be distributed (e.g., tarball on website, API, GitHub)? Does the dataset have a digital object identifier (DOI)?
All instances in TCAB are publicly available as CSV files on Zenodo. The DOI of the dataset is 10.5281/zenodo.7226519.
When will the dataset be distributed?
The dataset is currently available for download.
Will the dataset be distributed under a copyright or other intellectual property (IP) license, and/or under applicable terms of use (ToU)?
The dataset is distributed with a Creative Commons Attribution 4.0 International license.
Have any third parties imposed IP-based or other restrictions on the data associated with the instances?
No.
Do any export controls or other regulatory restrictions apply to the dataset or to individual instances?
No.
Maintenance
Who will be supporting/hosting/maintaining the dataset?
The REACT-NLP team will continue to support the TCAB dataset.
How can the owner/curator/manager of the dataset be contacted (e.g., email address)?
Please direct any questions to Sameer Singh (sameer@uci.edu) or Daniel Lowd (lowd@cs.uoregon.edu).
Is there an erratum?
We will provide an erratum on the github repository if errors are found in the future.
Will the dataset be updated (e.g., to correct labeling errors, add new instances, delete instances)?
If errors need to be corrected, we will publish a new version of the dataset and record those changes on the main project website.
If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)?
Our dataset is not related to people.
Will older versions of the dataset continue to be supported/hosted/maintained?
If TCAB is updated in the future, we will ensure backward compatibility by time-stamping different versions of the dataset.
If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so?
TCAB is designed to be extended with new datasets, attack instances, and target models. Code and instructions for extending TCAB is available in the TCAB Generation repository.