One of the major terminological forces driving ICT development today is that of ‘big data.’ While the phrase may sound inclusive and integrative, in fact, ‘big data’ approaches are highly selective, excluding any input that cannot be effectively structured, represented, or, indeed, digitised. Data of this messy, dirty sort is precisely the kind that humanities and cultural researchers deal with best. It will therefore be the contribution of the K-PLEX project to investigate these elements of humanities and cultural data, and the strategies researchers have developed to deal with them. In doing so it will remain at the margins of ICT so as to better shed light on the gap between analogue or augmented digital practices and fully computational ones. As such, it will expand our awareness of the risks inherent in big data and to suggest ways in which phenomena that resist datafication can still be represented (if only by their absence) in knowledge creation approaches reliant upon the interrogation of large data corpora. K-PLEX approaches this challenge in a comparative, multidisciplinary and multisectoral fashion, focusing on 3 key challenges to the knowledge creation capacity of big data approaches: the manner in which data that are not digitised or shared become ‘hidden’ from aggregation systems; the fact that data is human created, and lacks the objectivity often ascribed to the term; the subtle ways in which data that are complex almost always become simplified before they can be aggregated. It will approach these questions via a humanities research perspective, but using social science research tools to look at both the humanistic and computer science approaches to the term ‘data’ and its many possible meanings and implications. As such, K-PLEX project defines and describes key aspects of data that are at risk of being left out of our knowledge creation processes in a system where large scale data aggregation is becoming ever more the gold standard.