Statistics -
Introduction
The
word Statistics seems to have been derived from the Latin word “status” or the
Italian word Statista. All word means a political state. In early year
“statistics” equipped a collection of facts about the people in the state for
administration or political purpose.
Webster defined statistics
as “the classified facts representing the conditions of the people in a state,
especially those facts which can be stated in numbers or in tables of numbers
or in any tabular or classified arrangement.”
A
comprehensive definition was given by Prof.
Horace Secrist, which is a follows:-
“By Statistics we mean aggregates of facts
affected to a marked extent by multiplicity of causes, numerically expressed,
enumerated or estimated according to a reasonable standards of accuracy,
collected in a systematic manner for a predetermined purpose and placed in
relation to each other.”
The
above definitions clearly points out certain characteristics which numerical
data must possess in order that they may be called statistics. These are as
follows:
(i) Statistics are aggregates of facts: Single and
isolated figures are not statistics because they cannot be compared and no
meaningful conclusion can be drawn from it. It is the only aggregate of facts
capable of offering some meaningful conclusion that constitute statistics.
(All statistics are expressed in numbers but
all numbers are not statistics)
(ii) Statistics must be numerically
expressed: Statistical methods
are applicable only to those data which can be numerically expressed.
Qualitative expressions like honesty, intelligence, sincere are not statistics
unless they can be numerically expressed.
(iii) Statistics should be capable of
being related to each other: Statistical data
should be capable of comparison and connected to each other. If there is no
apparent relationship between the data they cannot be called statistics.
(iv) Statistics should be collected in
a systematic manner: For collecting
statistical data a suitable plan should be prepared and work should be done
accordingly.
(v) Statistics should be collected for
a definite purpose: The purpose of
collecting data must be decided in advance. The purpose should be specific and
well defined.
(vi) Statistics are affected to a
marked extent by a large number of causes: Facts and figures are affected to a marked extent by
the combined influence of a number of forces.
(vii) Reasonable standard of accuracy
should be maintained in collection of statistics: Statistics deals with large number of data. Instead of
counting each and every item, Statisticians take a sample and apply the result
thus obtained from sample to the whole group. The degree of accuracy of sample
largely depends upon the nature and object of the enquiry. If reasonable standard
of accuracy is not maintained, numbers may give misleading result.
Various stages in
statistical investigation:
There are five stages in a statistical
investigation which are given below:
(i) Collection of Data: Utmost care must be exercised in collecting data as
they are the foundation of statistical analysis. If the data are faulty, the
conclusions drawn can never be reliable.
(ii) Organisation of Data: Data collected from published sources are generally in
organised form but data collected from a survey frequently needs organisation.
For meaningful analysis, it is necessary to properly organise the collected
data. Organising of data involves three steps which are:
(a)
Editing
of data
(b)
Classification
of data according to some common characteristics
(c)
Tabulation.
(iii) Presentation of Data: Organised data can be further presented in the form of Diagrams and Graphs.
(iv) Analysis: After collection, organisation and presentation, data
are analysed by adopting various statistical methods such as measure of central
tendency, measure of variation, correlation, regression etc. to dig out
information useful for decision-making.
(v) Interpretation: The last stage
is interpretation which is a difficult task and requires a high degree of
skill, care and experience. If the data have been analysed and not properly
interpreted, the whole object of investigation may be defeated and wrong
conclusion be drawn.
Functions
and Limitations of Statistics:
The functions of statistics are as
follows:
(i) It presents fact in a definite form. Numerical
expressions are convincing and, therefore, one of the most important functions
of statistics is to present statement in a precise and definite form.
(ii) It simplifies mass of figures. The data presented in the form of table, graph or
diagram, average or coefficients are simple to understand.
(iii) It facilitates comparison. Once the data are simplified they can be compared with
other similar data. Without such comparison the figures would have been
useless.
(iv) It helps in prediction. Plans and policies of organisations are invariably
formulated in advance at the time of their implementation. knowledge of future
trends is very useful in framing suitable policies and plans.
(v) It helps in formulating and testing
hypothesis. Statistical methods
like z-test, t-test, X2-test are extremely helpful in formulating
and testing hypothesis and to develop new theories.
(vi) It helps in the formulation of
suitable policies. Statistics provide
the basic material for framing suitable policies. It helps in estimating
export, import or production programmes in the light of changes that may occur.
(vii) Statistics indicates trend
behavior. Statistical
techniques such as Correlation, Regression, Time series analysis etc. are
useful in forecasting future events.
Limitations of statistics are
as follows:
(i) Statistics deals only with
quantitative characteristics. Statistics are
numerical statements of facts. Data Which cannot be expressed in numbers are
incapable of statistical analysis. Qualitative characteristics like honesty,
efficiency, intelligence etc. cannot be studied directly.
(ii) Statistics deals with aggregates not
with individuals. Since statistics
deals with aggregates of facts, the study of individual measurements lies outside
the scope of statistics.
(iii) Statistical laws are not perfectly
accurate. Statistics deals
with such characteristics which are affected by multiplicity of causes and it
is not possible to study the effect of these factors. Due to this limitation, the
results obtained are not perfectly accurate but only an approximation.
(iv) Statistical results are only an
average. Statistical results
reveal only the average behavior. The Conclusions obtained statistically are
not universally true but they are true only under certain conditions.
(v) Statistics is only one of the methods
of studying a problem. Statistical tools
do not provide the best solution under all circumstances.
(vi) Statistics can be misused. The greatest limitation of statistics is that they are
liable to be misused. The data placed to an inexperienced person may reveal
wrong results. Only persons having fundamental knowledge of statistical methods
can handle the data properly.
Types of statistical
data:
Statistical data are of two types
(a)
Primary
data
(b)
Secondary
data.
Primary Data: Data which are
collected for the first time for a specific purpose are known as Primary data.
For example:
Population census, National income collected by government, Textile Bulletin
(Monthly), Reserve bank of India
Bulletin (Monthly) etc.
Secondary Data: Data which are
collected by someone else, used in investigation are knows as Secondary data.
Data are primary to the collector, but secondary to the user.
For example:
Statistical abstract of the Indian Union ,
Monthly abstract of statistics, Monthly statistical digest, International
Labour Bulletin (Monthly).
Merits
and Demerits of Primary Data:
Merits:
(a)
They
are reliable and accurate.
(b)
If
during collection, the Data are wrong they can be checked again by cross
examination.
(c)
It is more suitable if the field of enquiry is
small.
Demerits:
(a)
It
the field of enquiry is too wide, it is not suitable.
(b)
Collection
of primary data is costly and time consuming.
(c)
Personal
Bias, prejudice and whims may affect the data.
Merits
and Demerits of Secondary Data:
Merits:
(a)
While
using secondary data, time and labour are saved.
(b)
It
may also be collected from unpublished form.
(c)
If
secondary Data are available, they are much quicker to obtain than primary
data.
Demerits:
(a)
Degree
of accuracy may not be acceptable.
(b)
Secondary
Data may or may not fit the need of the project.
(c)
Data
may be influenced by personal bias of investigator.
Difference between
Primary Data and Secondary Data:
(a)
Primary
data are those which are collected for the first time and thus original in
character. While Secondary data are those which are already collected by
someone else.
(b)
Primary
data are in the form of raw-material, whereas Secondary data are in the form of
finished products.
(c)
Primary
data are collected directly from the people related to enquiry while Secondary
data are collected from published materials.
(d)
Data
are primary in the hands of institutions collecting it while they are secondary
for all others.
Sources of Secondary Data
Sources
of Secondary Data:
(a)
Official
publication by the central and state governments, district Boards.
(b)
Publication
by research institutions, Universities etc.
(c)
Economic
Journals.
(d)
Commercial
Journals.
(e)
Reports
of Commities, commissions.
(f)
Publications
of trade associations, Chamber of Commerce etc.
Precautions in the use of Secondary Data: The following aspects should be
considered before use of secondary data:
(i) Suitability:
The investigator must check before using secondary data that whether they are
suitable for the present purpose or not.
(ii) Adequacy: After satisfying
about the suitability of data, the investigator has to determine whether they
are adequate for the present purpose of investigators.
(iii)
Dependability:
Dependability of secondary data is determined by the following factors:-
(a)
The
authority which collected the data.
(b)
Procedure
of Sampling followed.
(c)
Status
of Investigator.
(iv)
Units in which data are available.
Qualities of Secondary Data:
(a)
Data
should be reliable
(b)
Data
should be suitable for the purpose of investigator.
(c)
Data
should be adequate
(d)
Data
should be collected by trained investigator.
Methods of collecting
primary Data
(a)
Direct Personal Observation: -
Under this method, the investigator collects the data personally from the
persons concerned. The information obtained under this method is original in
nature. This method is suitable when the field of enquiry is small.
(b) Indirect
Oral Investigation: - Under this method, the investigator
collects the data from third parties capable of supplying the necessary information.
This method is suitable where the information to be obtained is of a complex
nature and informants cannot be approached directly.
(c) Schedule
and questionnaire: - A list of question regarding the enquiry is
prepared and printed. Data are collected in any of the following ways:-
(i) By sending the questionnaire to
the persons concerned with a request to answer the question and return the
questionnaire.
(ii) By sending the questionnaire
through enumerators for helping the informants.
(d) Local
reports: - This method gives only approximate results at a low
cost.
Questionnaire
A Questionnaire is simply a list of
questions in a printed sheet relating to survey which the investigators asks to
the informants and the answers of the informants are noted down against the
respective questions on the sheet. Choice of questions is a very important
parts of the enquiry whatever its nature.
Characteristics
of an ideal Questionnaire:
(i)
The
Schedule of question must not be lengthy.
(ii)
It
should be clear and simple.
(iii)
Questions
should be arranged in a logical sequence.
(iv)
Each
question should be brief and must aim to some particular information necessary
for the investigation.
(v)
Questions
of personal matter like income of property should be avoided.
(vi)
The
Units of information should be Cleary shown in the sheet.
Tabulation
Tabulation refers to the systematic arrangement of the information
in rows and columns. Rows are the horizontal arrangement. In simple words,
tabulation is a layout of figures in rectangular form with appropriate headings
to explain different rows and columns. The main purpose of the table is to
simplify the presentation and to facilitate comparisons.
According to Neiswanger, "A statistical table is a
systematic organisation of data in columns and rows."
The principal objectives
of tabulation are stated below:
(i) To make complex data
simple: When data are arranged systematically in a table, such data
become more meaningful and can be easily understood.
(ii) To facilitate comparison: When
different data sets are presented in tables it becomes possible to compare
them.
(iii) To economize space: A
statistical table furnishes maximum information relating to the study in
minimum space.
(iv) To make data fit for
analysis and interpretation: Tabulation serves as a
link between the collection of data on the one hand and analysis of such data
on the other. In other words, after tabulating the data, it becomes possible to
find out their averages, dispersion and correlation. Such statistical measures
are necessary for their interpretation.
(v) To provide reference: A
statistical table can be used as a source of reference for other studies of
similar nature.
Importance
of Tabulation:
a)
Tabulation makes the data brief. Therefore, it can be easily
presented in the form of graphs.
b)
Tabulation presents the numerical figures in an attractive
form.
c)
Tabulation makes complex data simple and as a result of this,
it becomes easy to understand the data.
d)
This form of the presentation of data is helpful in finding
mistakes.
e)
Tabulation is useful in condensing the collected data.
f)
Tabulation makes it easy to analyze the data from tables.
g)
Tabulation is a very cheap mode to present the data. It saves
time as well as space.
h)
Tabulation is a device to summaries the large scattered data.
So, the maximum information may be collected from these tables.
Limitations
of Tabulation
Tabulation
suffers from the following limitations:
a)
Tables contain only numerical data. They do not contain
details.
b)
qualitative expression is not possible through tables.
c)
Tables can be used by experts only to draw conclusions.
Common men do not understand them properly.
Classification
of Data
The
process of arranging the data in groups or classes according to their common
characteristics is technically known as classification. Classification is the
grouping of related facts into classes. It is the first step in tabulation.
In
the words of Secrist, "Classification is the process of arranging data
into sequences and groups according to their common characteristics or
separating them into different but related parts."
Essentials
of classification
a) The
classification must be exhaustive so that every unit of the distribution may
find place in one group or another.
b) Classification
must conform to the objects of investigation.
c) All
the items constituting a group must be homogeneous.
d) Classification
should be elastic so that new facts and figures may easily be adjusted.
e) Classification
should be stable. If it is not so and is changed for every enquiry then the
data would not fit for an enquiry.
f)
The data must not overlap. Each item of the data must be
found in one class.
Population and Sample
Population: Statistics is taken in relation to a large data. Single
and unconnected data is not statistics. In the field of a statistical enquiry
there may be persons, items or any other similar units. The aggregate of all
such units under consideration is called “Universe or Population”.
Sample: If a part is selected out of the universe then the
selected part or portion is known as sample. Sample is only a part of the
universe.
Sample survey: It is a survey under which only a part taken out of the
universe is investigated. It is not essential to investigate every individual
item of the Universe.
Census survey and complete enumeration: Under Census
survey detail information regarding every individual person or item of a given
universe is collected.
Difference between Census and Sample
survey: The following are
the differences between Census and Sample method of investigation:
(a)
Under Census method, each and every individual item is investigated whereas
under sample survey only a part of universe is investigated.
(b)
There is no chance of sampling error in census survey whereas sampling error
cannot be avoided under sample survey.
(c)
Large number of enumerators is required in census whereas less number of
enumerators is required in sample survey.
(d)
Census survey is more time consuming and costly as compared to sample survey.
(e)
Census survey is an old method and it less systematic than the sample survey.
Merits
and Demerits of Census:
Merits:
(a)
Since
all the individuals of the universe are investigated, highest degree of
accuracy is obtained.
(b)
Since
there is no possibility of personal bias affecting investigation, this method
is free from sampling error.
(c)
It is more suitable if the field of enquiry is
small.
(d)
Since
all the items of the universe are taken into consideration, all the
characteristics of the universe
Demerits:
(a)
It
the field of enquiry is too wide, it is not suitable.
(b)
Collection
of primary data is costly and time consuming.
(c)
Personal
Bias, prejudice and whims may affect the data.
Merits
and Demerits of sample survey:
Merits:
(a)
While
using secondary data, time and labour are saved.
(b)
It
may also be collected from unpublished form.
(c)
If
secondary Data are available, they are much quicker to obtain than primary
data.
Demerits:
(a)
Degree
of accuracy may not be acceptable.
(b)
Secondary
Data may or may not fit the need of the project.
(c)
Data
may be influenced by personal bias of investigator.