Assessment+Tools

=||~ The Learner ||~ Assessment Tools ||~ Outcomes ||~ Resources || = =Assessment Tools = In this day of standardized testing and teacher ability being measured through test scores, there seems to be no mightier thing in school than the assessments. Administrators look at assessment scores throughout the year as they hope to be able to predict how the students will do on the end-of-the-year tests - exams, CRCT, and EOCT. When selecting assessments to most accurately measure student knowledge, there are several aspects that need to be considered. It is the goal of each teacher - no matter the level - to improve in any way that he can so that he can instruct the students better. Being able to utilize these tools and concepts allow each of us to accurately gather data so that we may use it in our research and improve our instructional design process.

//**Assessment issues **//  While "assessment results have important implications for instruction," (Porter, 1995) assessment should be used as a means of ensuring that worthwhile academic content is being taught and in an effective way. The results of an assessment can provide a plethora of information for the teacher, department, administration, and the school. Results from the Georgia Criterion-Referenced Competency Test (CRCT) are used as a way of assessing the teaching and learning within schools across the state. Communities - within the school, community, and state - utilize the "assessment results in a formative way to determine how well they are meeting instructional goals and how to alter curriculum and instruction so that goals can be better met" (Porter, 1995).

Individual departments will probably take the time to analyze the scores from a major test. In analyzing and comparing scores, the department can see the strengths and weaknesses of the group. The teachers may also utilize this information as a way of assessing their own effectiveness in the classroom. There is a struggle that schools face when analyzing this information. As Porter (1995) summarizes, if there is a discrepancy between what the school deems important to assess and how the school chooses to assess the information, "the results are meaningless, if not potentially harmful." In addition, the quality of the assessment tool plays an important role when measuring the effectiveness of the lessons. We have all seen the results of tests that were given then deemed invalid. It does nothing more than have the potential to harm the futures of our students.

**//Performance Based Assessments//**
Performance Based Assessment are "sometimes characterized as assessing real life" (Price, 2010) . The tasks presented when utilizing performance based assessment are "authentic tasks [such as] essays, oral presentations, open-ended problems, hands-on problems, [and] real-world situations" (Price, 2010). These types of tasks are inteded for those teachers who are not interested in the child being able to accurately select an answer from four choices or match vocabulary words to their definition. As Price (2010) summarizes, teachers using these types of assessments are "concerned with problem solving and understanding." There are three general categories of performance based assessments - "performances, portfolios, and projects" (Price, 2010). ** //Portfolios:// **

A portfolio is a growing document that the learner constructs, consisting of a representative sample of their work. It can be in paper form or electronic/web based. Students use these materials to chart and reflect on their evolution as learners. Portfolios are a tool that can be used for elementary school students to see the growth in their writing within a year and from year-to-year or for a college student as an opportunity to catalog their coursework. The portfolio can be used as an assessment tool for documenting growth and learning. Rather than assessing a student's writing performance from one piece of writing, his growth can be assessed throughout the year.

 An answer key  is document with the answers to a given assessment

//**Content Analysis** //  Content analysis <span style="font-family: Verdana,Geneva,sans-serif;"> is "any technique for making inferences by objectively and systematically identifying specified characteristics of messages" (Stemler, 2001). Through the use of these codes, systems, and characteristics, researchers and Instructional Designers can sift through lots of data in a reasonable amount of time. In the world of Instructional Design, the researcher will use the information to assess various components of the instruction and assessment. The researcher will investigate the problem and the situation around the problem. Many research and ID designs explicitly use content analysis as part of their design models.

Stemler (2001) discusses "six questions [that] must be addressed in every content analysis:

<span style="font-family: Verdana,Geneva,sans-serif;">1) Which data are analyzed? 2) How are they defined? 3) What is the population from which they are drawn? 4) What is the context relative to which the data are analyzed? 5) What are the boundaries of the analysis? 6) What is the target of the inferences"

The following, taken from The Content Analysis Guidebook Online (2007) is an example of a context in which content analysis is part of the design from //The Content Analysis Guidebook// by Kimberly A. Neuendorf.

**//<span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Test Reliability and Validity //**
media type="youtube" key="DkWukQ0AbRg" height="258" width="416" align="left"

<span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;"> Test reliability <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;"> is important for ensuring the "consistency of [the assessment] measure" (Cherry, 2010). Test are considered "reliable if it is consistent within itself and across time" (Hoover, 2010). As indicated by the video (PearsonPTE, 2009), the assessment should be constructed in a way that allows for a consistency of score across time. Kendra Cherry (2010) has identified four ways in which reliability can be measured: <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">While these are the types of reliability that an assessment should have, there are several factors that can affect the reliability of the measure. As Lucy Jacobs (1991) discusses, "reliability shows the extent to which test scores are free from errors of measurement." She does indicate that it is nearly impossible for classroom tests to be perfectly reliable "because random errors operate to cause scores to vary or be inconsistent from time to time and situation to situation. The goal is to try to minimize these inevitable errors of measurement" (Jacobs, 1991) as a way to have as much reliability as possible. The following factors should be taken into account when creating an assessment: <span style="font-family: Verdana,Geneva,sans-serif;"> Test validity, in contrast to test reliability, measures the level "to which the test actually measures what it claims to measure" (Hoover, 2010). In a practical sense, if the teacher is hoping to assess the knowledge of the Pythagorean Theorem, should it be required that the answer be written in simplest form? Is the ability to simplify a square root a critical skill in finding the missing side length of a right triangle? These are the types of questions that the individual or group making the assessment should ask themselves. Likewise, as Packer (2004) discusses, should success on a math exam be dependent on a high level understanding of the English language? Many math teachers would agree that the ability to take the information from a word problem and apply the skills learned in the math class would indicate a high level of understanding and ability in the math class. However, if a student struggles in reading or if English is their second or weaker language, their ability to apply the math might be inhibited. Is it really their knowledge of math that is being tested or their ability to read critically and to read well? Again, these are questions that must be addressed in order to construct a valid assessment. Rather than assessing a <span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;">subskill <span style="font-family: Verdana,Geneva,sans-serif;">, the teacher needs to ensure that he/she is assessing the actual skill in question. In addition, whether or not a test is valid affects the level at which the researcher (teacher, department head, principal, curriculum instruction personnel, etc.) can make conclusions and decisions based on the test data. If the assessment measure is deemed invalid or unreliable, the data is also invalid and unreliable. Another integral part of test validity and reliability is to remember the <span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;">audience <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">- the intended participants in the assessment. If a test is intended for a high school student, the reliability and validity could be affected if the test were given to a middle or elementary school student.
 * <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Test-Retest Reliability: A test has test-retest reliability if, when "the test is administered twice at two different points in time," (Cherry, 2010) the scores are similar. This type of reliability is only pertinent when considering "things that are stable over time, such as intelligence" (Cherry, 2010). If a person takes an intelligence test at one point in their life and then again at a different time, the scores should be similar. If there is a large discrepancy in scores, one can assume that the test is an unreliable measure.
 * <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Inter-rater Reliability: Another way to ensure reliability of the test is to have multiple people score the assessments. After scoring, "the scores are then compared to determine consistency" (Cherry, 2010). As summarized by Cherry (2010), after the assessments have been scored, the raters compare the scores for the items and "calculate the percentage of agreement between the raters. So, if the raters agree 8 out of 10 times, the test has an 80% inter-rater reliability rate."
 * <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Parallel-Forms Reliability: In order to determine if a test meets parallel-forms reliability, one would compare it to another test that uses the same content. If the two assessments are given at the same time, they should yield the same scores. If they do, the test is deemed reliable.
 * <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Internal Consistency Reliability: Within this type of reliability, "you are comparing test items that measure the same construct to determine the tests internal consistency" (Cherry, 2010). When assessing for internal consistency reliability, one would expect for the taker to mark the same answers to questions that ask for the same thing. If so, one could determine that the test does indicate reliability.
 * <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Item Sampling: The length of the test can indicate reliability, as it ensures that an adequate sampling of questions are being included. This allows for the cases in which a student is permitted to miss some questions while they get others correct, indicating their overall knowledge of a topic. Jacobs (1991) discusses that, "a one-question test would not provide a reliable estimate of the students' knowledge ... [because s]tudents who knew this one question would have perfect achievement, but students who didn't would fail. It is important to include enough questions to accurately measure the knowledge of the student. Length of the test also works to combat other "chance factors, such as guessing" (Jacobs, 1991).
 * <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Item Construction: The way in which a test question is written can be a major factor in the way that it is answered. When questions are "poorly worded or ambiguous or trick questions " (Jacobs, 1991), the questions are up to interpretation which affects the answers given and the test reliability. One example of a poor question would be: "//To Kill a Mockingbird// was written by _." While it may seem clear that the answer would be Harper Lee, the author, other correct answers could include "a woman," "1960," "a woman who hoped to bring racial issues to the forefront." It is important to consider the way in which a question is worded when constructing an assessment.
 * <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Test Administration: There is a reason that standardized tests are scripted. The words spoken can affect the way in which a student selects an answer. In addition, "factors such as "heat, light, noise, confusing directions, and different testing time allowed to different students can affect students' scores" (Jacobs, 1991). These issues are difficult to control since they affect each student in a different way. The test giver can do his/her best to minimize distraction and make the environment comfortable for all involved.
 * <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Scoring: The ways in which a test is scored can also relate to its reliability. Jacobs (1991) indicates that, overall, multiple choice tests tend to have a higher level of reliability than <span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;">short answer <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;"> - answers "requir[ing] responses of one word to a few sentences" (Short answer, 2010) - or <span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;">essay <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">tests due to the subjectivity when scoring. However, there are measures that can be used to ensure scoring objectivity:
 * <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Rubrics, a specific type of scoring guide, are ways that raters can ensure fairness when scoring assessments. <span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;"> Rubrics <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">and <span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;">scoring guides <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">are tools for "judg[ing] the quality of student performance in relation to content standards. [They] provide a specific criteria to describe a range of possible student responses and a constant set of guidelines to rate student work" (Munsen, 2009). When utilizing a rubric, it is important to remember a few key aspects. Rubrics and scoring guides are strong tools that allow the student to understand exactly how he/she is being assessed. They " <span style="font-family: Verdana,Geneva,sans-serif;">communicate detailed explanations of what constitutes excellence throughout a project and provide a clear teaching directive" (Rose, 2010). In addition, they ensure that the teacher is rating the knowledge and learning of the student. Websites exist that allow teachers the chance to create their own rubric, create a rubric based on the categories and number of choices, or use ready made rubrics.
 * <span style="font-family: Verdana,Geneva,sans-serif;">Another type of scoring guide is the rating scale . A rating scale (another word for checklist) is a set of categories designed to elicit information about a <span style="font-family: Verdana,Geneva,sans-serif;"><span class="wiki_link_ext">quantitative (emotional data) or a <span class="wiki_link_ext">qualitative (numerical data) attribute. A Likert scale is probably the most common rating scale. When utilizing a Likert scale, the participant is asked to select how strongly they agree or disagree with a statement. These types of choices, where participants are given a selection of options and must chose one, are called <span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;">forced-choice questions <span style="font-family: Verdana,Geneva,sans-serif;">.
 * <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Difficulty of the test: If a test is too hard or too easy, it indicates low reliability. When scores are "clustered together at either the high end or the low end of the scale, with small differences among students" (Jacobs, 1991), it suggests that the test has little reliability in truly assessing knowledge. Scores should be "spread out over the entire scale, showing real differences among students" (Jacobs, 1991).
 * <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">Student Factors: There are always factors that can affect the way that a student performs on a day to day basis. The condition in which he enters the classroom is essentially out of the hands of the teachers. The teacher has no control over the home life or morning that the student may have on the day of the test. She also has no control over the health and well-being on this day. Factors such as "fatigue, illness, or anxiety can induce error and lower reliability because they affect performance and keep a test from being a measure of their true ability or achievement" (Jacobs, 1991). The teacher must seek to find a balance between the <span style="font-family: Verdana,Geneva,sans-serif;"><span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;">behavior <span style="color: #000000; font-family: Verdana,Geneva,sans-serif;">of the student and the need for neutrality when testing.

//Case Study Analysis//
<span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;">Case studies <span style="font-family: Verdana,Geneva,sans-serif;"> are often used as a way to record "events that occured at a particular company or within a particular industry over a number of years" (Schweitzer, 2010). Case studies are utilized in many areas of business and education. The benefits of a case study transcend the setting; Information related to "objectives, strategies, challenges, results, recommendations, and more" can be determined based on the case study (Schweitzer, 2010). Case studies are used, in part, to "illustrate what a student has learned and retained in class" (Schweitzer, 2010). If the <span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;">case study analysis <span style="font-family: Verdana,Geneva,sans-serif;"> is to be professional and accurate, there must be a clear understanding and expectation of the issues presented within the classroom and among the students. The specific details surrounding the school, classroom, teacher, and students will affect the outcome of the case study, and it is imperative that these issues be dealt with when analyzing the case study. Schweitzer (2010) recommends doing a thorough reading of the case study before beginning your analysis. In addition, she suggests taking notes and "re-reading the case just to make sure you haven't missed anything (Schweitzer, 2010).

<span style="font-family: Verdana,Geneva,sans-serif;">While a case study analysis allows the instructional designer to garner more information about the participants and surroundings of a case, a <span style="color: #ff00ff; font-family: Verdana,Geneva,sans-serif;">task analysis <span style="font-family: Verdana,Geneva,sans-serif;"> is important for "analyzing and articulating the kind of learning that you expect the learners to know how to perform" (Dabbagh, 2010). As an instructional designer, it is critical that one assesses the outcomes of the tasks assigned, and utilizing a task analysis can allow for the "classifi[cation of] tasks according to learning outcomes, inventory [of] tasks, select[ion of] tasks, decompos[ition of] tasks, and sequencing [of] tasks and subtasks" (Dabbaugh, 2010). In addition to the instructional designer having a better understanding of the task assigned and its outcomes, <span style="font-family: Verdana,Geneva,sans-serif;">there are many reasons that one might choose to perform a task analysis. Dabbaugh (2010) describes eight outcomes that might cause one to participate in a task analysis:"
 * 1) <span style="font-family: Verdana,Geneva,sans-serif;">determine the instructional goals and objectives;
 * 2) <span style="font-family: Verdana,Geneva,sans-serif;">define and describe in detail the tasks and sub-tasks that the student will perform;
 * 3) <span style="font-family: Verdana,Geneva,sans-serif;">specify the knowledge type (declarative, structural, and procedural knowledge) that characterize a job or task;
 * 4) <span style="font-family: Verdana,Geneva,sans-serif;">select learning outcomes that are appropriate for instructional development;
 * 5) <span style="font-family: Verdana,Geneva,sans-serif;">prioritize and sequence tasks;
 * 6) <span style="font-family: Verdana,Geneva,sans-serif;">determine instructional activities and strategies that foster learning;
 * 7) <span style="font-family: Verdana,Geneva,sans-serif;">select appropriate media and learning environments;
 * 8) <span style="font-family: Verdana,Geneva,sans-serif;">construct performance assessments and evaluation"

=**||~ The Learner ||~ Assessment Tools ||~ Outcomes ||~ Resources ||**=