Statistical Analysis for Managers

Establishing an airline is a difficult task to finish especially when competition exists. Competition between existing airline companies may be hard to overcome for a newly established airline. The competition between airline companies brought the problem on how a newly established airline company should charge its fare in order to meet potential revenue and at the same time deal with the competition between existing airline companies.

In order to solve the problem, a statistical analysis will be conducted. The statistical analysis will aim to answer certain questions regarding pricing patterns in airline industry. The researcher aims to answer the question can the average fare price that can be charged is below 150. Specifically, the analysis will aim to answer the following questions
Is there a difference in the average fare price charged by different airline company
Is there a trend identifiable regarding the fare price over time
From the obtained research questions, the following hypotheses are created.

There is a difference in the average fare price that can be charged by different airline company.

There is an increasing trend regarding fare price charged by different airline company over time.

The motivation in creating such hypothesis is to give positive insights for the investors who are interested in determining the feasibility of putting up a new airline company. In addition, the creation of the hypotheses establishes the direction of the study that will be conducted. Last, the motivation for the hypotheses created is to identify variables that may be deemed useful for the analysis.

Descriptive Statistics
Descriptive statistics will be used to organize and summarize the given data so that the researcher will be able to easily conduct a statistical analysis. After determining the descriptive statistics, the researcher has obtained the following initial findings.

The average fare charged by airline companies differs from each other. From the result of summary statistics, American Airlines has the highest average fare charged which accounts to 179.03. American Airlines was followed by United Airlines, with a fare charge price of 172.23. The third airline company with high average price was Delta Airlines, with a fare charge price of 166.79. The other companies such as NorthWest, TWA, Continental and US Airways has average fare charge price of  163.56, 157.95, 149.14 and 139.79 respectively. From the average fare charge price obtained, only Continental and US Airways are less than 150. The average spread of the prices does not differ much among the airline companies. American, Continental, Delta, NorthWest, United and US Airways has almost similar standard deviations except for TWA with a value of 9.82.

Scatter plots were also created to determine visually the trend of fare charge price for each airline company and the average of the prices of the airline companies. The scatter plots revealed the same trend for each companies and the average fare charge price trend. The scatter plots revealed an increasing trend for the fare charge prices. This means that as the time increases, the fare charge prices also increases.

Methods
The data that will be used for the analysis was obtained from the actual prices charge by the airline companies from the past 56 months. A likely threat to the validity of the statistical analysis is the underlying assumptions of the statistical tests that will be used to analyze the data. When threats like these occur, the analysis is more likely to miss the result of the analysis.

The researcher is interested at the fare charge price of the airline companies for the past 56 months. Thus, the variables that are important for the study are the fare charge price and time in months. The fare charge price will be the most important variable as it will be included in all the statistical analysis that will help answer the research questions. There was no control variables included in the paper as the paper is not an experimental study.

Statistical analysis is the key to answering the hypotheses and the research questions. The study will focus on using three statistical tools to analyze the data. One way analysis of variance will also be employed. The one way analysis of variance will be used to determine if there is a difference in the average fare charge price of the airline companies. C Last statistical tool to be used is regression analysis. Regression analysis will be used to determine the trend and the causal relationship between the fare charge price and the time in months. The regression analysis will be useful in creating a linear model that can predict fare charge price with time in months. There were no missing values recorded for the data.

The results of the analysis will be presented in graphs and tables. The graphs and tables will be very helpful in easily organizing the results of the data. Graphs and tables will also be very helpful in easily identifying which results are for a certain question and which are not. The results will be obtained using Microsoft Excel Megastat add in.

Results
Analysis of variance was used to determine if there is a difference between the average fare charge prices for different airline companies. The null hypothesis to be tested is that there is no difference between the average fare prices between the airline companies. The analysis is conducted at 0.05 significance level. The decision is to reject the null hypothesis when the p-value of the ANOVA statistic is less than the significance level. Otherwise, the researcher will fail to reject the null hypothesis. After conducting ANOVA, the researcher has found out that the ANOVA statistic is equal to 22.62 with a p-value less than 0.001. Thus, the researcher rejected the null hypothesis at 0.05 significance level.

In conjunction to the analysis of variance, a post hoc analysis is conducted. The post hoc analysis is used to determine the average fare charge price that differs from the group. The post hoc analysis is conducted at 0.05 significance level. After conducting post hoc analysis, the researcher has found out that US Airways differ from all the other airline companies. On the other hand, the average charge price of Continental Airlines also differs from most of the group but does not differ with US Airways.

Regression analysis is used to determine the trend of the fare charge price of each companies and the trend of the average charge price of all the companies with regards to time. The regression analysis is conducted at 0.05 significance level. After conducting regression analysis, the researcher obtained the following results regarding the trend of fare charge price with regards to time. The coefficient of correlation between the fare charge prices of American Airlines obtained from the analysis is 0.706. The value indicates that the there is a strong increasing relationship between the fare price charge of American Airlines and the time. The coefficient of determination obtained from the analysis is equal to 0.498. The value means that 49.8 percent of the price charge of American Airlines is explained by the number of months.

The coefficient of correlation between the fare charge prices of Continental Airlines obtained from the analysis is 0.728. The value indicates that the there is a strong increasing relationship between the fare price charge of Continental Airlines and the time in months. The coefficient of determination obtained from the analysis is equal to 0.530. The value means that 53 percent of the price charge of Continental Airlines is explained by the number of months.

The coefficient of correlation between the fare charge prices of Delta Airlines obtained from the analysis is 0.515. The value indicates that the there is a moderate increasing relationship between the fare price charge of Delta Airlines and the time. The coefficient of determination obtained from the analysis is equal to 0.265. The value means that 26.5 percent of the price charge of Delta Airlines is explained by the number of months.

The coefficient of correlation between the fare charge prices of NorthWest Airlines obtained from the analysis is 0.875. The value indicates that the there is a strong relationship between the fare price charge of NorthWest Airlines and the time in months. The coefficient of determination obtained from the analysis is equal to 0.766. The value means that 76.6 percent of the price charge of NorthWest Airlines is explained by the number of months.

The coefficient of correlation between the fare charge prices of TWA Airlines obtained from the analysis is 0.263. The value indicates that the there is a weak increasing relationship between the fare price charge of TWA Airlines and the time. The coefficient of determination obtained from the analysis is equal to 0.069. The value means that 6.9 percent of the price charge of TWA Airlines is explained by the number of months.

The coefficient of correlation between the fare charge prices of United Airlines obtained from the analysis is 0.609. The value indicates that the there is a moderate increasing relationship between the fare price charge of United Airlines and the time. The coefficient of determination obtained from the analysis is equal to 0.371. The value means that 37.1 percent of the price charge of United Airlines is explained by the number of months.

The coefficient of correlation between the fare charge prices of US Airways obtained from the analysis is 0.883. The value indicates that the there is a strong increasing relationship between the fare price charge of US Airways and the time. The coefficient of determination obtained from the analysis is equal to 0.78. The value means that 78 percent of the price charge of US Airways is explained by the number of months.

The coefficient of correlation between the average fare charge prices of all the airline companies obtained from the analysis is 0.793. The value indicates that the there is a strong increasing relationship between the average fare price charge of all airline companies and the time in months. The coefficient of determination obtained from the analysis is equal to 0.629. The value means that 62.9 percent of the average price charge of all airline companies is explained by the number of months.

Discussion
The statistical analysis showed interesting results regarding the fare charge prices of the airline companies. The statistical analysis showed that there is at least one significant difference between the average fare charge prices of the companies (F  22.62, p  0.001). In addition, the average fare charge prices of Continental Airlines and US Airways differ from the other airline companies. The average charge price of the said airline companies is less than the suggested charge price of 150.

On the other hand, the statistical analysis showed that the trend of the fare charge price for all the companies is increasing. As the time, in months, increases, the fare charge prices of each company and the average charge price also increase. Time in months is observed to be a predicting factor for the fare charge price of each company and the average charge price (p  0.001) except for TWA Airlines (p  0.0503).

From the results obtained from the analysis, the company is suggested to join the groups of the airline companies other than Continental Airlines and US Airways. By not joining the stated airline companies, the researcher is 95 confident that the company will be able to profit since the other airline companies have an average fare charge price of greater than 150. On the other hand, starting up a new airline will also be a good decision to be made. The trend of fare charge prices for each company and the average fare charge price is increasing. Thus, the company will not breakeven as long as the company joins a group that has an average fare charge price greater than 150.

During the course of the study, the researcher has identified certain limitations hindering the true result of the study. One of the limitations is that the fare charge price may not measure profitability of a certain company because the fare charge may have included certain value added services in to it. Thus, the researcher is measuring only the revenues but not the profit itself. Another limitation of the study is that the fare charge price may be influenced by certain factors. Such factor may include the class of fare being paid. For example, a person might be paying for an economy class fare. Thus, the class of the fare may have an effect to the price of the fare needed in order for an airline company to breakeven or to profit.

The success of a school is determined by many factors. Such factors include the effectiveness of teaching across courses and instructors. The effectiveness of teaching across courses and instructors has created a problem that needs to be analyzed using statistical tools. From the problem, the researcher created research questions as a guide to establishing an analysis of the said dilemma. The following are the research questions for the study.

Is the percentage of the student with grade of B or higher greater than 30
Is there a relationship between grade of the students and the instructors
Is there a relationship between grade of the students and the courses
Is there a relationship between grade of the students and class size
Is the grade distribution different between various instructors
Is the grade distribution different between various courses
From the research questions obtained, the researcher created testable hypothesis that will help answer the research questions. The following are the testable hypothesis for the study.
The percentage of the student with grade of B or higher is greater than 30.
The grade of the students is dependent of the instructors.
The grade of the students is dependent of the courses.
The grade of the students is affected by class size.
The grade distribution is different between various instructors.
The grade distribution is different between various courses.

The motivation in creating such hypothesis is to give positive insights for the school administrators who are interested in performance of a school. In addition, the creation of the hypotheses establishes the direction of the study that will be conducted. Last, the motivation for the hypotheses created is to identify variables that may be deemed useful for the analysis.

Descriptive Statistics
Descriptive statistics will be used to organize and summarize the given data so that the researcher will be able to easily conduct a statistical analysis. After determining the descriptive statistics, the researcher has obtained the following initial findings.

The researcher obtained the total number of students based on the following categories grades, instructors and course of the student. In addition, the researcher created graphs that will picture the distribution of the data.

The preliminary analysis of the data has given the researcher the following preliminary findings.

Grades
From the data, the grade that has the highest frequency is A with 118 students. Students with a grade of A accounts for 15.36 percent of the total number of students. While the grade that has the lowest frequency is F with 43 students. The value accounts for 5.6 percent of the total number of students.

Instructor
From the data, the instructor that has the highest number of student is Instructor 8 with 103 students. The 103 students accounts for 13.41 percent of the total number of students. While, the instructor that has the lowest number of student is Instructor 11 with 10 students. The 10 students accounts for 1.3 percent of the total number of students.

Course
From the data, the highest most of the students is enrolled in Course 1 with 221 students. The 221 students accounts for 28.77 percent of the total number of students. On the other hand, Course 5 has the least number of students enrolled with 7 students. The 7 students accounts for 0.9 percent of the total number of students.

Methods
The data obtained includes a total of 768 students categorized to categories such as instructor, course and grade. Certain threats may affect the results that will be obtained after analyzing the data. Since the data will only be analyzed since the data is already given, threats for this particular study will only include the threats brought about by not satisfying the assumptions of the statistical tools that will be used.

The researcher is interested mostly in the relationship of the grade of the students and certain variables like instructor, course and class size. The variables that are needed to analyze the data are the grade of the students, which are categorized using categorical variables like instructor and course. No control variables are found since the study is not an experimental study.

In order to test the hypothesis, certain statistical tools should be employed. The first statistical tool to be used is one sample proportion test. One sample proportion test will be used to determine differences in a hypothesized proportion and a computed proportion. Then, the researcher will employ Chi-square test in order to determine relationships between the categorical variables. Last, Kruskal-Wallis test will be used to determine if there is difference among the categories of a data set.

There are missing values in the data set. Since the missing values are only categories with no included element, the missing values were replaced by zero.
The results of the analysis will be presented in graphs and tables. The graphs and tables will be very helpful in easily organizing the results of the data. Graphs and tables will also be very helpful in easily identifying which results are for a certain question and which are not. Hypothesis test will be obtained using Microsoft Excel Megastat add in. The use of the tool is to obtain quick and accurate results for the analysis.

Results
One sample proportion test is used to determine if the percentage of the student with grade of B or higher is greater than 30. The null hypothesis to be tested is that the percentage of the student with grade of B or higher is equal to 30. The null hypothesis is conducted at 0.05 significance level. The 0.05 significance level is chosen because it is the normal choice for significance level. In addition, choosing higher significance level will make the analysis too conservative while choosing lower significance level will make the analysis too lenient. The decision is to reject the null hypothesis when the p-value of the test statistic is less than the significance level. Otherwise, the researcher will fail to reject the null hypothesis.

After conducting one sample proportions test, the researcher has found out that the value of the test statistic obtained is equal to 12.41 with a corresponding p-value less than 0.001. Since the p-value of the test statistic is less than 0.05 significance level, the researcher rejects the null hypothesis.

Chi-Square test is used to determine if the grade of the students is independent of the instructors. The null hypothesis to be tested is that there is no relationship between the grade of the students and the instructor of the students. The test is conducted at 0.05 significance level. The decision is to reject the null hypothesis when the p-value of the Chi-Square statistic is less than the significance level. Otherwise, the researcher will fail to reject the null hypothesis.

After conducting chi-square test, the researcher has found out that the chi-square statistic is equal to 333.2 with a corresponding p-value less than 0.001. Since the p-value of the Chi-square statistic is less than 0.05, the researcher rejects the null hypothesis.

Chi-Square test is used to determine if the grade of the students is independent of the course. The null hypothesis to be tested is that there is no relationship between the grade of the students and the course of the students. The test is conducted at 0.05 significance level. The decision is to reject the null hypothesis when the p-value of the Chi-Square statistic is less than the significance level. Otherwise, the researcher will fail to reject the null hypothesis.

After conducting chi-square test, the researcher has found out that the chi-square statistic is equal to 397.83 with a corresponding p-value less than 0.001. Since the p-value of the Chi-square statistic is less than 0.05, the researcher rejects the null hypothesis.

Chi-Square test is used to determine if the grade of the students is independent of the class size. The null hypothesis to be tested is that the grade of the students independent of the class size. The test is conducted at 0.05 significance level. The decision is to reject the null hypothesis when the p-value of the Chi-Square statistic is less than the significance level. Otherwise, the researcher will fail to reject the null hypothesis.

After conducting chi-square test, the researcher has found out that the chi-square statistic is equal to 727.37 with a corresponding p-value less than 0.001. Since the p-value of the Chi-square statistic is less than 0.05, the researcher rejects the null hypothesis.

Kruskal-Wallis test is used to determine if there is difference in grade distribution among the instructors of the school. The null hypothesis to be tested is that there is no difference in the grade distribution among the instructors. The test is conducted at 0.05 significance level. The decision is to reject the null hypothesis when the p-value of the test statistic is less than 0.05. Otherwise, the researcher will fail to reject the null hypothesis.

After conducting the test, the researcher obtained a test statistic value (H) equal to 37.783 with a corresponding p-value that is less than 0.0001. Since the p-value of the test is less than 0.05, the researcher rejected the null hypothesis.

Kruskal-Wallis test is used to determine if there is difference in grade distribution among the courses of the school. The null hypothesis to be tested is that there is no difference in the grade distribution among the courses. The test is conducted at 0.05 significance level. The decision is to reject the null hypothesis when the p-value of the test statistic is less than 0.05. Otherwise, the researcher will fail to reject the null hypothesis.

After conducting the test, the researcher obtained a test statistic value (H) equal to 71.378 with a corresponding p-value that is less than 0.0001. Since the p-value of the test is less than 0.05, the researcher rejected the null hypothesis.

Discussion
After conducting statistical analysis and obtaining necessary results for the study, the researcher has formulated the conclusions regarding the hypothesis of the research. First, the researcher was able to prove that the percentage of the students with a grade of B or higher is greater than 30 (Z  12.41 p  0.001). Thus, it indicates that grade inflation increase in the school. Second, the researcher has found out that there is an association between the grades of the students and the instructors of the school (2  333.2 p  0.001). Thus, students  grades are dependent of the instructor. In addition, the researcher has found out that there is also an association between grades of the students and the course of the student (2  397.83 p  0.001). Thus, students  grades are also dependent of the course of the student. The researcher also found out that there is an association between the grades of the students and the class size (2  727.37 p  0.001). Thus, students  grades are also dependent of the class size. Lastly, the researcher has found out that there is a difference between the grade distributions between instructors (H  37.873 p  0.001). In addition, the researcher also found out that there is a difference between the grade distributions between the course of the student (H  71.378 p  0.001).

Certain limitations in the research may affect the overall result after conducting the analysis. One limitation of the study is that the analysis uses nonparametric tests to obtain results. Although the data analysis is only possible with nonparametric tests, the data analysis in this study is less powerful than using parametric tests such as correlation and analysis of variance. In addition, data in categorical form is less conclusive in contrast with interval data used in quantitative research.