Car Preference Dataset

Car Preference Dataset


We set up an experiment in Amazon Mechanical Turk to collect real pair-wise preferences over users. In this experiment users are presented with a choice to prefer a car over another based on their attributes. We have run two experiments with 10 and 20 cars respectively.

The preference questionnaire contains control questions that were randomly selected from the preferences with reversed order. These control questions were included to measure the consistency of the answers provided. In the dataset, we include all the information collected from the experiments. The reader may choose to exclude information based on the control questions which are indicators for random responses from that user.

Data format:

The data collected from two experiments is formatted in three csv files respectively representing user attributes, item attributes, and preferences of users over items. In the preference sets for the given user, the first item (indicated by its ID) is preferred to the second one. If the preference was collected as a control question it is indicated by 1 otherwise 0.

User attributes:

We restrict our experiments to US users only to collect more unified preferences. The following information were collected from the participants and included in the dataset:

  • Education:No response (0), High school (1), Bachelors (2), PhD (3)
  • Age: No response (0), Below 25 (1), Between 25-30 (2), Between 30-35 (3), Above 40 (4)
  • Gender: No response (0), Male (1), Female (2)
  • Region: No response (0), South (1), West (2), North East (3), Mid West (4)

For each user, an ID (in the first column) and the number of correctly answered control questions (in the last column) is also included in the user data files.

First Experiment

In this experiment, we used 10 items and generated all 45 possible preferences accordingly. We performed this experiment in two shots collecting the data from 40 and 20 users separately. Cars with following attributes were presented to the participants in the first experiment:

Car Attributes

  • Body type: Sedan (1), SUV (2)
  • Transmission: Manual (1), Automatic (2)
  • Engine capacity: 2.5L, 3.5L, 4.5L, 5.5L, 6.2L
  • Fuel consumed: Hybrid (1), Non-Hybrid (2)

In addition to these information, an ID is assigned to the car that was used in the preference set and included in items file.
Download: [Compressed zip] [User set] [Item set] [Preference set]

Second Experiment

In the second experiment, we used 20 items and randomly generated 5 subsets of 38 preferences for each user. In the contrary to the previous case, from each user only a sparse preference set was then collected. Cars with following attributes were presented to the participants in the second experiment:

Car Attributes

  • Body type: Sedan (1), SUV(2), Hatchback(3)
  • Transmission: Manual (1), Automatic (2)
  • Engine capacity: 2.5L, 3.5L, 4.5L, 5.5L, 6.2L
  • Fuel consumed: Hybrid (1), Non-Hybrid (2)
  • Engine/Transmission layout: All-wheel-drive (AWD) (1), Forward-wheel-drive (FWD) (2)

In addition to these information, an ID is assigned to the car that was used in the preference set and included in items file.
Download: [Compressed zip] [User set] [Item set] [Preference set]

Please note:

  • There are users who left questions unanswered that led to fewer number of preferences than expected in each experiment (less that 45 in the first and less that 38 in the second).

  • There is a simple Matlab (Octave) function provided here that can remove the control questions and users with less than an acceptable number of mistakes (argument of the function). Please make sure to remove the header information in the files (users and preferences) before calling the function.


You are welcome to use the data in this page for research, however please acknowledge its use with a citation:

Learning Community-based Preferences via Dirichlet Process Mixtures of Gaussian Processes, E. Abbasnejad, S. Sanner, E. V. Bonilla, P. Poupart, In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013. Beijing, China.

@INPROCEEDINGS{Abbasnejad2013,
   author = {E. Abbasnejad, S. Sanner, E. V. Bonilla, P. Poupart},
   title = {Learning Community-based Preferences via Dirichlet Process Mixtures of Gaussian Processes},
   booktitle = {In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI)},
   year = {2013}
}

For inquiries or bug report please contact Ehsan Abbasnejad: ehsan (dot) abbasnejad (at) nicta (dot) com (dot) au.