Big Data — As the behavioral economist Dan Ariely by now famously put it, “Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it.” Big Data refers to a collection of data sets with sizes beyond the ability of commonly used software tools to manage. Typically these are measured by funny prefixes such as Tera and Peta-bytes. Big Data analytics is basically about making business sense out of a lot of information through dedicated tools to support data-driven decisions.
Hadoop — In plain words, Hadoop is a scalable database framework built to cope with massive amounts of data (“Big Data”). It is the current bon-ton in data management technology, used by the likes of Yahoo, Facebook and Amazon. The underlying concept of Hadoop is maintaining all data in a distributed and flat rather than a hierarchical structure, which makes it more manageable, robust and accessible.
UGC (User Generated Content) — Any form of written content that people create on the Internet, which in healthcare refers to patients and their caregivers. It is typically described as unstructured data, i.e. information that is not organized or easily interpreted by traditional databases or data models.
NLP (Natural Language Processing) — NLP refers to technology that “understands” plain language and can derive meaning from natural human input. Based typically on sets of rules, machine learning capabilities, statistics and dictionaries, NLP is used throughout our day-to-day lives, with Apple’s Siri being one well-known example. Health-specific NLP is purposely designed to extract patient experiences, attitudes and perceptions from patient UGC.
Scalability — The ability to rapidly process growing amounts of data without compromising the breadth or depth of analysis. Scalable technology makes it possible to “handle” the data quickly or in real-time regardless of how much the volume increases. In the realm of the social health web, this volume of data can include millions of patients’ posts daily.
Sentiment Analysis — Refers to analytical tools capable of identifying and extracting “opinions” or subjective information based on simple terms to determine the attitude of a speaker about a discussed topic. In the context of social media, such tools can measure key metrics like Facebook “likes” or brand mentions to determine their popularity or “buzz.”
Ontologies — A structure that organizes terms and concepts through their internal relationships and connections. Medical ontologies provide the “right” context for further analysis, by mapping how different concepts (such as drugs and conditions) are associated to one another. By essentially “understanding” the domain, ontologies enable enhanced opportunities for deeper content analysis.