Activities for FYF 101J
18 September 2009

Introduction to data


Today's session will have four purposes

1. To review the email and social networking suggestions developed during Wednesday's session.

2. To define the term "data"

3. To address why people should care about data

4. To discuss kinds of data

Introduction to Data

Basic premise: Virtually all scientists, engineers, and educated citizens are intimately familiar with data. Data are used to better understand a particular event, relationship, or phenomenon. On that basis, students should have some understanding concerning issues pertaining to data.

A. Define data

B. Usage - Data is the plural of the Latin word "datum", and is thus most correctly used as a plural. However, the word is commonly used as a singular noun and is often acceptable as such.


I. Why should we care about data?

A. Virtually all objective analyses (e.g., scientific, business) depends upon the collection and analysis of data. Must be done correctly or our knowledge is based on faulty information.

B. Much science leads to decision making by administrators and the general public (e.g., effectiveness of certain drugs, safety of hazardous materials, design of technological items, determining how to best feed ourselves). If data are faulty or poorly presented, then decision-making can be impaired.

C. The decades to come promise to be a time in which data will become more easily available, especially when shared over the internet. Uploading that information, and making it in an easily digestible form will be a huge undertaking over the next 10-15 years.


II. Format of lectures relating to data

A. Discuss general issues relating to data

B. Discuss graphical presentation of data

C. Investigate how to draw inferences from data (use information to gain knowledge).


III. Kinds of Data

A. Data are often classified as being qualitative or quantitative

1. Qualitative data - cannot be expressed numerically (e.g., shapes, colors, degree of health)

2. Quantitative data - can be expressed numerically

B. Quantitative data can be in the form of discrete or continuous variables

1. Discrete variables are expressed as integers (e.g., # neutrons in an atom)

2. Continuous variables are expressed as numbers and a fraction thereof (e.g., human height, percent carbon dioxide in the air)

IV. Data are often used to depict information about a population

A. Generate a histogram

B. Measures of central tendency

1. Mean

2. Mode

3. Median

C. Measures of variability

1. Range

2. Variance

3. Standard deviation

D. Histogram analysis of populations often result in a bell-shaped curve

1. Discuss population parameters in relation to curve

2. Sometimes get bimodal and trimodal distributions

3. Need to be aware of kurtosis and skewness in bell-shaped curve


V. Additional issues relating to data

A. An important activity conducted in labs is Quality Assurance / Quality Control (QA/QC). Each lab should follow a set of standards to provide confidence in the data. See example 1, example 2, example 3.

B. Data are often described by metadata. Information that describes individual categories of data. Generating good metadata will be important for efforts to put data on-line, as metadata provides a basis for searching data among various databases. See example. See also this page.

C. Data are often proprietary, and subject to restrictions concerning intellectual property rights.

D. Exchange and analysis of data in an on-line environment is termed informatics. Data-mining and meta-analyses will be activities that many people will pursue in the future.

Return to Homepage for FYF 101J

This page posted and maintained by Kenneth M. Klemow, Ph.D., Biology Department, Wilkes University, Wilkes-Barre, PA 18766. (570) 408-4758,