AN ANALYSIS OF FINAL TEST QUALITY ON EFL STUDENTS

,


Introduction
One of the most important part in the learning process is a test. The aim of the learning process can be got or achieved when students pass the test and have standards scores that have been established by the teachers (Hasanah et al., 2020). When students cannot pass the test, most of the teachers think that the student's ability is decreasing without regard to quality standards in designing the test. They are only doing their duty to design the test.
Student achievement can be measured by giving a test and the test made should through a validation (Bachman, 2002). However, the accuracy of the information can only be got by measuring accuracy, which means the test of good. Based on information got informally on February 13, 2020, by interviewing several teachers who taught in a school of Parepare, they only designed, created, and checked the final test without analyzing the test.
Related to it, when the researcher did Field Experience Program (FEP) in one school in Parepare, he found that some teachers shared each other for the item of test through short message service or Whatsapp. They made the test form of 50% for the third grade in senior high school, 30% for the second grade, and 20% for the first grade, but sometimes they didn't make it on their own. They just asked another teacher for the items of the test. The excuse they made for this trend were; a) too many classes and teaching hours, b) some teachers teach over one school, c) tired after doing many activities at school.
According to the researcher's interview, several problems affected the weaknesses of the test made by the teacher. Some of them were better on time, teacher's energy, and cost. However, the main factor of the failure was the teacher's own opportunity to do a trial run for the test or tested the test to measure and analyzed the item of the test.
Generally, because the test is made by the teachers their selves, so they only use it for private, it is meant only for their students and not for the public. That's the reason they rarely test the test. Because they don't test the test so the teacher tests are not equipped with information about the quality of the items, validity, and reliability of the items.
Research on the final test result so far has only been conducted regarding the result as NEM and suspicion about the item of test it's started in Budiharso et al.'s research (2020) about the validity and reliability of English tests for senior high school, Riandy's research (2018) about the item analysis of English final test for junior high school and Astutia's research (2019) about the quality of English final test for junior high school. The three studies show that the items of each question and the items in the English final test are indeed low. We can imagine that not only because of the decreasing ability of students but also the quality of English final tests made by the teacher can be a consideration and included in the factor that causes student scores to decline. Every test given to the students should be proper in every ethnic and district (Pudjastawa et al., 2021), including in Parepare. the Therefore, the researcher is inspired to know about the quality of English teacher final test for senior high school in Parepare.
This current research was conducted in SMA Negeri 3 Parepare in the academic year 2020-2021, which focused on the second-grade students. This school is one of the best senior high schools in Parepare. It is also a school that is noted to be fast using curriculum 2013 or K-13 as their standard competence for teaching and learning since 2013. It proves that the school is quick to adapt to new changes, especially the

Subject of the Research
The subject of this research was the 58 sheets English final test items used to test the students who are registered as the first-year students in the academic year 2020-2021 for second grade of senior high school SMA Negeri 3 Parepare incorporated in concurrent test execution for the second grade of senior high school SMA Negeri 3 Parepare that comprising three classes. They were XI MIPA 1, XI MIPA, and XI IPS 1, and the total population was 134 students. Yet, the researcher only took the 58 students from the three classes as test participants, where the sample of this research retrieval using a random sampling group technique.

Method
The researcher used three steps to collect the data. First, the test answer sheet was identified and analyzed for each item of the test with the sign of true and false. True-false identification produced a record of the number of items that are answered correctly and answer wrong. Second, split the answer sheet into two groups, that was the top group and bottom group 29 sheets in each of the groups. This step produced data to identify P and D for each item. After these two steps are done, interpretation was also carried out to see the effectiveness distracter, validity, and reliability.

The Difficulty of The Final Test
They have done the final test years 2020/2021. Then these 58 sheets of English teacher final test, which was used as a sample. The findings show that all test items were at a medium level. For clarity, the researcher provides a table that gave a brief description of the difficulty level of each item. There are four columns on the table; the first column shows information about the number of the test, the second column shows information about the result of P level analysis, the last column shows information about the difficulty of classification. And, the fourth column shows information about difficulty level status. Then, to get the P level of the test, the researcher used the formula of Bachman.
For example, at the first item of the test there were 58 students who have taken the test, and the test has 30 questions. All the questions were multiple choice. And, out of 58 students there were 38 students answered correctly. The first item of the test, the index of the difficulty level of the test, was 0,64.

The Discrimination of the Final Test
The data of findings shows that the final test for second grade SMA Negeri 3 Parepare academic year 2020/2021 have six items were excellent classification, one item satisfactory and the rest of the items have good classification. For clarity, the researcher provides a table that gave a brief description of the discrimination of each item. The table comprises 4 columns. The initial column contains data about the number of tests. The second column displays the results of the descriptive analysis. The third column contains data on classification D. The fourth column contains data on interpreting comparisons. The researchers used a recipe adapted from Proyonigor.

The Effectiveness of the Distracter
The data of the findings show that several distractors have scores under 5% and to be a good distractor the score should be more than 5%. For clarity, the researcher provides a table that gave a brief description of the effectiveness of distractors of each item. From the table above, there are seven columns; the first column to show the item, second, third, fourth, fifth, and sixth column to show the result of the effectiveness of distracter analysis, and the last column to show the information of the answer key and the answer key has been written by the researcher with closed brackets and open brackets. And, to get the effectiveness score from the test, the researcher used a formula adopted from Sudjiono.
We can conclude that the result of the existing data of the effectiveness of distracter of final test for second grade SMA Negeri 3 Parepare reported that there are several some distractions that need to be replaced because it has low distracting power that is lower than 5%. As we know, a good distracter is it can be avoided by smart students and selected by the less intelligent students. If the distracter is selected 5% by the number of students, it means that the distracter has a good function.
From those data and statement above, the writer could conclude that the distracter function in the final test for second grade student SMA Negeri 3 Parepare have a poor function so its need to be replaced and it have to find a new distracter which has a good function.

The Validity of the Final Test
The data of the findings show that twenty one item of the final test were valid and nine item were invalid. For clarity, a table that gave a brief description of the effectiveness of distractors of each item provides below. There are 4 columns in the table; initial column shares data about the test no, column that shares data about the results of the validity analysis, column that shares data on the table of product-moment critical values with a significance level of 95%, and column that shares data about the validity status. Not only that, to get test validity.
Based on the table above, the author can plan that the results of the available test information tell that twenty-one items are valid and 9 items are invalid. This is quite a reflection of the state of the final test of the second grade students of SMA Negeri 3 Parepare.
Not only that, an item is said to be valid if the correlation coefficient of each item is greater than or equal to the product-moment critical value table with a significance level of 95%. Therefore, questions that have been valid in the test do not need to be exchanged and can be placed in the question bank for future use, but invalid test items need to be exchanged. The data shows that the final test of class II SMA Negeri 3 Parepare is reliable with a reliability index of 0.972. This reliability works with standard indexes. A product is said to be reliable if the correlation coefficient for each point is greater than or equal to the critical value table for product moment with a significance level of 95% (Arikunto, 2010). For more details, the researcher presents the reliability analysis table as follows: Based on the table above, the author concludes that the reliability score of the item with Spearman-Brown (2013) product moment is greater than the reliability of the final test of class II students of SMA Negeri 3 Parepare, because the reliability index is 0.972. We can plan that the marker is reliable. This is greater than the critical value of the moment table. Reliability is the extent to which the same score or score is opposite when the same test is assessed by 2 or more different examiners or by the same examiner on different occasions. For the test to be reliable, the measurement must be constant.

The Difficulty Level of Final Test
Based on the findings, the results of the information in the last test difficulty show that all test items are at another level. As shown by the information, the difficulty index is between 0.30-0.70 and more. tier items. This means that the test can be placed in a question bank and can be useful or reused for further tests. The following could be an example for making a new test for class II students of SMA Negeri 3 Parepare.
Not only that, a good test is a test that is not very easy or the opposite is very difficult for students. Should share answer options that can be selected by students and near to the answer key. And the last test carried out by the teacher for class II SMA Negeri 3 Parepare already had these criteria, very easy questions can also be formed into several things that affect the feeling of "success" in low-ability students and are used as warm-up questions, and very difficult items. Can be a challenge for the most highly skilled students. A good test ensures students recognize and note the characteristics of the teacher's test that the test always becomes very easy and very  (Ali, 2020). Therefore, the standard mandatory test and meet the characteristics of a good test. The text of the test should fill the need of the students' guidance reading . From the statement above, the author can plan that the final test made by the teacher for Grade 2 has the standard criteria of SMA Negeri 3 Parepare. It isnot very easy or vice versa, very difficult for students.

The Discrimination of the Final Test
Referring to the result of the data of final test for second grade SMA Negeri 3 Parepare academic year 2020/2021 has six items were excellent classification with have score range 0,70-1,00. One item satisfactory to have a score range 0,20-0,40 and the rest of the items have good classification with have score 0,40-0,70.
Based on statement above, the writer could conclude that it is a good result because there were no negative signs in each item when a test item inversely to shows the quality of test items, namely smart students are considered not smart, and less intelligent students are smart. And, the final test for second grade SMA Negeri 3 Parepare.
A good test can be answered correctly by smart students. For example, if a group of smart students can answer the test correctly and the groups of students who are less intelligent answer incorrectly, it means that the test has excellent discrimination power.
Based on statement above, the writer could conclude that the final test for second grade SMA Negeri 3 Parepare has an excellent discrimination level it means that students who master the subject that has been taught by the teacher have answered the test correctly it is showed by the data have a good discrimination level.

The Effectiveness of the Distracter
Based on the findings, the outcome of the existing data of the effectiveness of distracter of final test for second grade SMA Negeri 3 Parepare reported that there are several some distractions that need to be replaced because it has low distracting power that is lower than 5%. As we know, a good distracter is, it can be avoided by smart students and selected by the less intelligent students. Also, if the distracter is selected 5% by the number of students, it means that the distracter has a good function.
However, form the statement above the writer could conclude that the distracter function in the final test for second grade student SMA Negeri 3 Parepare have a poor function so its need to be replaced and its have to find a new distracter which has a good function.

The Validity of the Final Test
Based on the findings, the outcome of the existing data of the test reported that twenty-one items were valid and nine items were invalid. This simply provides us Besides, an item is stated valid if the coefficient correlation of each item is higher or equal to the table of the critical value of product-moment with the level of significance 95%. Therefore, items that are already valid in the test do not need to be replaced and can be saved in the question bank for future use, but the test items were invalid need to be replaced.

The Reliability of the Final Test
Referring to the result of the data, the reliability of the result using the Spearman-Brown (2013) product moment showed that the reliability index of the final test for the sophomore SMA Negeri 3 Parepare was reliable as the index reliability was 0.972, which was higher than the product-moment critical value table. Reliability is the extent to which the same grades or marks are awarded when the same tests are graded by two or more different examiners or by the same examiner on different occasions. In short, to be reliable, a test must be consistent in its measurement. This just gives us a point about the current state of the final exam for the sophomore SMA Negeri 3 Parepare. It is the degree to which a test consistently measures whatever it measures. The researcher realizes that this study contains a weakness. Since it is a quantitative descriptive study that analyzed the final test for the second-year student SMA Negeri 3 Parepare in relation to the difficulty, discrimination level, distracter effectiveness, validity and reliability of the final test, this research will be more useful if the researcher helps the teacher if necessary to reshape the evidence element of the final examination. By this, it is also important that the material of the final test should closely related to the culture of students (Rahman et al., 2022).

Conclusion
Alderson (1995) said, "Language Test Construction and Evaluation describes the process of language test construction clearly and comprehensively". After conducting the research data and doing analysis data, the difficulties of final test items SMA Negeri 3 Parepare have a medium level in difficulty level and it was a good sign because the test had a medium level, not too easy, or vice versa too difficult for students. The discrimination level of final test items SMA Negeri 3 Parepare have a good result there were not a negative sign in each item, and students who master the subject that have been taught by the teacher have answered the test correctly it is showed by the data that have a good discrimination level. The effectiveness of distractions in several items was not effective and several distractions need to be replaced, even removed. For example; distracter D and E in the item number one, two distractions have score 1,6% less than 5% and, we can say the two distractions have no good effectiveness. The validity of the final test SMA Negeri 3 Parepare there were nine items (30%) were in valid, item number 6, 9, 10, 17, 24, 25,26,27, and, 28, these Firman Ahmad Faizal et al. (2020) items need to be replaced and twenty-one items (70%) were valid. The final test items were reliable since the reliability index was 0.972, which was higher than the table value of critical of product-moment, it has shown that the final test items were reliable.