• Python for Machine Learning
  • Machine Learning with R
  • Machine Learning Algorithms
  • Math for Machine Learning
  • Machine Learning Interview Questions
  • ML Projects
  • Deep Learning
  • Computer vision
  • Data Science
  • Artificial Intelligence

Hypothesis in Machine Learning

  • Demystifying Machine Learning
  • Bayes Theorem in Machine learning
  • What is Machine Learning?
  • Best IDEs For Machine Learning
  • Learn Machine Learning in 45 Days
  • Interpolation in Machine Learning
  • How does Machine Learning Works?
  • Machine Learning for Healthcare
  • Applications of Machine Learning
  • Machine Learning - Learning VS Designing
  • Continual Learning in Machine Learning
  • Meta-Learning in Machine Learning
  • P-value in Machine Learning
  • Why Machine Learning is The Future?
  • How Does NASA Use Machine Learning?
  • Few-shot learning in Machine Learning
  • Machine Learning Jobs in Hyderabad

The concept of a hypothesis is fundamental in Machine Learning and data science endeavours. In the realm of machine learning, a hypothesis serves as an initial assumption made by data scientists and ML professionals when attempting to address a problem. Machine learning involves conducting experiments based on past experiences, and these hypotheses are crucial in formulating potential solutions.

It’s important to note that in machine learning discussions, the terms “hypothesis” and “model” are sometimes used interchangeably. However, a hypothesis represents an assumption, while a model is a mathematical representation employed to test that hypothesis. This section on “Hypothesis in Machine Learning” explores key aspects related to hypotheses in machine learning and their significance.

Table of Content

How does a Hypothesis work?

Hypothesis space and representation in machine learning, hypothesis in statistics, faqs on hypothesis in machine learning.

A hypothesis in machine learning is the model’s presumption regarding the connection between the input features and the result. It is an illustration of the mapping function that the algorithm is attempting to discover using the training set. To minimize the discrepancy between the expected and actual outputs, the learning process involves modifying the weights that parameterize the hypothesis. The objective is to optimize the model’s parameters to achieve the best predictive performance on new, unseen data, and a cost function is used to assess the hypothesis’ accuracy.

In most supervised machine learning algorithms, our main goal is to find a possible hypothesis from the hypothesis space that could map out the inputs to the proper outputs. The following figure shows the common method to find out the possible hypothesis from the Hypothesis space:

Hypothesis-Geeksforgeeks

Hypothesis Space (H)

Hypothesis space is the set of all the possible legal hypothesis. This is the set from which the machine learning algorithm would determine the best possible (only one) which would best describe the target function or the outputs.

Hypothesis (h)

A hypothesis is a function that best describes the target in supervised machine learning. The hypothesis that an algorithm would come up depends upon the data and also depends upon the restrictions and bias that we have imposed on the data.

The Hypothesis can be calculated as:

y = mx + b

  • m = slope of the lines
  • b = intercept

To better understand the Hypothesis Space and Hypothesis consider the following coordinate that shows the distribution of some data:

Hypothesis_Geeksforgeeks

Say suppose we have test data for which we have to determine the outputs or results. The test data is as shown below:

finite and infinite hypothesis spaces in machine learning

We can predict the outcomes by dividing the coordinate as shown below:

finite and infinite hypothesis spaces in machine learning

So the test data would yield the following result:

finite and infinite hypothesis spaces in machine learning

But note here that we could have divided the coordinate plane as:

finite and infinite hypothesis spaces in machine learning

The way in which the coordinate would be divided depends on the data, algorithm and constraints.

  • All these legal possible ways in which we can divide the coordinate plane to predict the outcome of the test data composes of the Hypothesis Space.
  • Each individual possible way is known as the hypothesis.

Hence, in this example the hypothesis space would be like:

Possible hypothesis-Geeksforgeeks

The hypothesis space comprises all possible legal hypotheses that a machine learning algorithm can consider. Hypotheses are formulated based on various algorithms and techniques, including linear regression, decision trees, and neural networks. These hypotheses capture the mapping function transforming input data into predictions.

Hypothesis Formulation and Representation in Machine Learning

Hypotheses in machine learning are formulated based on various algorithms and techniques, each with its representation. For example:

 h(X) = \theta_0 + \theta_1 X_1 + \theta_2 X_2 + ... + \theta_n X_n

In the case of complex models like neural networks, the hypothesis may involve multiple layers of interconnected nodes, each performing a specific computation.

Hypothesis Evaluation:

The process of machine learning involves not only formulating hypotheses but also evaluating their performance. This evaluation is typically done using a loss function or an evaluation metric that quantifies the disparity between predicted outputs and ground truth labels. Common evaluation metrics include mean squared error (MSE), accuracy, precision, recall, F1-score, and others. By comparing the predictions of the hypothesis with the actual outcomes on a validation or test dataset, one can assess the effectiveness of the model.

Hypothesis Testing and Generalization:

Once a hypothesis is formulated and evaluated, the next step is to test its generalization capabilities. Generalization refers to the ability of a model to make accurate predictions on unseen data. A hypothesis that performs well on the training dataset but fails to generalize to new instances is said to suffer from overfitting. Conversely, a hypothesis that generalizes well to unseen data is deemed robust and reliable.

The process of hypothesis formulation, evaluation, testing, and generalization is often iterative in nature. It involves refining the hypothesis based on insights gained from model performance, feature importance, and domain knowledge. Techniques such as hyperparameter tuning, feature engineering, and model selection play a crucial role in this iterative refinement process.

In statistics , a hypothesis refers to a statement or assumption about a population parameter. It is a proposition or educated guess that helps guide statistical analyses. There are two types of hypotheses: the null hypothesis (H0) and the alternative hypothesis (H1 or Ha).

  • Null Hypothesis(H 0 ): This hypothesis suggests that there is no significant difference or effect, and any observed results are due to chance. It often represents the status quo or a baseline assumption.
  • Aternative Hypothesis(H 1 or H a ): This hypothesis contradicts the null hypothesis, proposing that there is a significant difference or effect in the population. It is what researchers aim to support with evidence.

Q. How does the training process use the hypothesis?

The learning algorithm uses the hypothesis as a guide to minimise the discrepancy between expected and actual outputs by adjusting its parameters during training.

Q. How is the hypothesis’s accuracy assessed?

Usually, a cost function that calculates the difference between expected and actual values is used to assess accuracy. Optimising the model to reduce this expense is the aim.

Q. What is Hypothesis testing?

Hypothesis testing is a statistical method for determining whether or not a hypothesis is correct. The hypothesis can be about two variables in a dataset, about an association between two groups, or about a situation.

Q. What distinguishes the null hypothesis from the alternative hypothesis in machine learning experiments?

The null hypothesis (H0) assumes no significant effect, while the alternative hypothesis (H1 or Ha) contradicts H0, suggesting a meaningful impact. Statistical testing is employed to decide between these hypotheses.

author

Please Login to comment...

Similar reads.

  • Machine Learning

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Online ordering is currently unavailable due to technical issues. We apologise for any delays responding to customers while we resolve this. For further updates please visit our website: https://www.cambridge.org/news-and-insights/technical-incident

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

finite and infinite hypothesis spaces in machine learning

  • > Phase Transitions in Machine Learning
  • > Searching the hypothesis space

finite and infinite hypothesis spaces in machine learning

Book contents

  • Frontmatter
  • Acknowledgments
  • 1 Introduction
  • 2 Statistical physics and phase transitions
  • 3 The satisfiability problem
  • 4 Constraint satisfaction problems
  • 5 Machine learning
  • 6 Searching the hypothesis space
  • 7 Statistical physics and machine learning
  • 8 Learning, SAT, and CSP
  • 9 Phase transition in FOL covering test
  • 10 Phase transitions and relational learning
  • 11 Phase transitions in grammatical inference
  • 12 Phase transitions in complex systems
  • 13 Phase transitions in natural systems
  • 14 Discussion and open issues
  • Appendix A Phase transitions detected in two real cases
  • Appendix B An intriguing idea

6 - Searching the hypothesis space

Published online by Cambridge University Press:  05 August 2012

In Chapter 5 we introduced the main notions of machine learning, with particular regard to hypothesis and data representation, and we saw that concept learning can be formulated in terms of a search problem in the hypothesis space H . As H is in general very large, or even infinite, well-designed strategies are required in order to perform efficiently the search for good hypotheses. In this chapter we will discuss in more depth these general ideas about search.

When concepts are represented using a symbolic or logical language, algorithms for searching the hypothesis space rely on two basic features:

a criterion for checking the quality (performance) of a hypothesis;

an algorithm for comparing two hypotheses with respect to the generality relation.

In this chapter we will discuss the above features in both the propositional and the relational settings, with specific attention to the covering test.

Guiding the search in the hypothesis space

If the hypothesis space is endowed with the more-general-than relation (as is always the case in symbolic learning), hypotheses can be organized into a lattice, as represented in Figure 5.6. This lattice can be explored by moving from more general to more specific hypotheses (top-down strategies) or from more specific to more general ones (bottom-up strategies) or by a combination of the two. Both directions of search rely on the definition of suitable operators, namely, generalization operators for moving up in the lattice and specialization operators for moving down.

Access options

Save book to kindle.

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service .

  • Searching the hypothesis space
  • Lorenza Saitta , Università degli Studi del Piemonte Orientale Amedeo Avogadro , Attilio Giordana , Università degli Studi del Piemonte Orientale Amedeo Avogadro , Antoine Cornuéjols
  • Book: Phase Transitions in Machine Learning
  • Online publication: 05 August 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511975509.008

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox .

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive .

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

What is the difference between hypothesis space and representational capacity?

I am reading Goodfellow et al Deeplearning Book . I found it difficult to understand the difference between the definition of the hypothesis space and representation capacity of a model.

In Chapter 5 , it is written about hypothesis space:

One way to control the capacity of a learning algorithm is by choosing its hypothesis space, the set of functions that the learning algorithm is allowed to select as being the solution.

And about representational capacity:

The model specifies which family of functions the learning algorithm can choose from when varying the parameters in order to reduce a training objective. This is called the representational capacity of the model.

If we take the linear regression model as an example and allow our output $y$ to takes polynomial inputs, I understand the hypothesis space as the ensemble of quadratic functions taking input $x$ , i.e $y = a_0 + a_1x + a_2x^2$ .

How is it different from the definition of the representational capacity, where parameters are $a_0$ , $a_1$ and $a_2$ ?

  • machine-learning
  • terminology
  • computational-learning-theory
  • hypothesis-class

nbro's user avatar

3 Answers 3

Consider a target function $f: x \mapsto f(x)$ .

A hypothesis refers to an approximation of $f$ . A hypothesis space refers to the set of possible approximations that an algorithm can create for $f$ . The hypothesis space consists of the set of functions the model is limited to learn. For instance, linear regression can be limited to linear functions as its hypothesis space, or it can be expanded to learn polynomials.

The representational capacity of a model determines the flexibility of it, its ability to fit a variety of functions (i.e. which functions the model is able to learn), at the same. It specifies the family of functions the learning algorithm can choose from.

Saurav Joshi's user avatar

  • 1 $\begingroup$ Does it mean that the set of functions described by the representational capacity is strictly included in the hypothesis space ? By definition, is it possible to have functions in the hypothesis space NOT described in the representational capacity ? $\endgroup$ –  Qwarzix Aug 23, 2018 at 8:43
  • $\begingroup$ It's still pretty confusing to me. Most sources say that a "model" is an instance (after execution/training on data) of a "learning algorithm". How, then, can a model specify the family of functions the learning algorithm can choose from? It doesn't make sense to me. The authors of the book should've explained these concepts in more depth. $\endgroup$ –  Talendar Oct 9, 2020 at 13:09

A hypothesis space is defined as the set of functions $\mathcal H$ that can be chosen by a learning algorithm to minimize loss (in general).

$$\mathcal H = \{h_1, h_2,....h_n\}$$

The hypothesis class can be finite or infinite, for example a discrete set of shapes to encircle certain portion of the input space is a finite hypothesis space, whereas hpyothesis space of parametrized functions like neural nets and linear regressors are infinite.

Although the term representational capacity is not in the vogue a rough definition woukd be: The representational capacity of a model, is the ability of its hypothesis space to approximate a complex function, with 0 error, which can only be approximated by infinitely many hypothesis spaces whose representational capacity is equal to or exceed the representational capacity required to approximate the complex function.

The most popular measure of representational capacity is the $\mathcal V$ $\mathcal C$ Dimension of a model. The upper bound for VC dimension ( $d$ ) of a model is: $$d \leq \log_2| \mathcal H|$$ where $|H|$ is the cardinality of the set of hypothesis space.

A hypothesis space/class is the set of functions that the learning algorithm considers when picking one function to minimize some risk/loss functional.

The capacity of a hypothesis space is a number or bound that quantifies the size (or richness) of the hypothesis space, i.e. the number (and type) of functions that can be represented by the hypothesis space. So a hypothesis space has a capacity. The two most famous measures of capacity are VC dimension and Rademacher complexity.

In other words, the hypothesis class is the object and the capacity is a property (that can be measured or quantified) of this object, but there is not a big difference between hypothesis class and its capacity, in the sense that a hypothesis class naturally defines a capacity, but two (different) hypothesis classes could have the same capacity.

Note that representational capacity (not capacity , which is common!) is not a standard term in computational learning theory, while hypothesis space/class is commonly used. For example, this famous book on machine learning and learning theory uses the term hypothesis class in many places, but it never uses the term representational capacity .

Your book's definition of representational capacity is bad , in my opinion, if representational capacity is supposed to be a synonym for capacity , given that that definition also coincides with the definition of hypothesis class, so your confusion is understandable.

  • 1 $\begingroup$ I agree with you. The authors of the book should've explained these concepts in more depth. Most sources say that a "model" is an instance (after execution/training on data) of a "learning algorithm". How, then, can a model specify the family of functions the learning algorithm can choose from? Also, as you pointed out, the definition of the terms "hypothesis space" and "representational capacity" given by the authors are practically the same, although they use the terms as if they represent different concepts. $\endgroup$ –  Talendar Oct 9, 2020 at 13:18

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged machine-learning terminology computational-learning-theory hypothesis-class capacity ., hot network questions.

  • Some congruence problem
  • What is the meaning of "I am digging this groove"
  • Why aren't tightly stitched commercial pcbs more common?
  • Why is array access not an infix operator?
  • Should I ask for authorship or ignore?
  • Does speeding turn an accident into manslaughter?
  • British child with Italian mother. Which document can she use to travel to/from Italy on her own?
  • Why does Mars have a jagged light curve?
  • What rights do I have to improve upon patented inventions?
  • Question about defect subgroups
  • Is a Poisson minus a constant still a Poisson?
  • Having friends who are talented is great, but it can also be ___ at times
  • Are there statements so self-evident that writing a proof for them is meaningless? Is this an example of one?
  • How do Authenticators work?
  • Cryptic Clue Explanation: Tree that's sported to keep visitors out at university (3)
  • Examples of systems with infinite dimensional Hilbert space, whose energy is bounded from above
  • Reproducing Ómar Rayo's "Fresh Fog" Painting
  • How to create border outline?
  • Word for a country declaring independence from an empire
  • How to cut a large piece of marble 1” thick
  • "Better break out the weapons" before they leave the ship and explore the planet -- what kind?
  • Starlink Satellite Orbits
  • what does the word of seven book- length means
  • Has “loginctl enable-linger user” made the UNIX classic “nohup” superfluous? Or have I misunderstood something?

finite and infinite hypothesis spaces in machine learning

Javatpoint Logo

Machine Learning

Artificial Intelligence

Control System

Supervised Learning

Classification, miscellaneous, related tutorials.

Interview Questions

JavaTpoint

The hypothesis is a common term in Machine Learning and data science projects. As we know, machine learning is one of the most powerful technologies across the world, which helps us to predict results based on past experiences. Moreover, data scientists and ML professionals conduct experiments that aim to solve a problem. These ML professionals and data scientists make an initial assumption for the solution of the problem.

This assumption in Machine learning is known as Hypothesis. In Machine Learning, at various times, Hypothesis and Model are used interchangeably. However, a Hypothesis is an assumption made by scientists, whereas a model is a mathematical representation that is used to test the hypothesis. In this topic, "Hypothesis in Machine Learning," we will discuss a few important concepts related to a hypothesis in machine learning and their importance. So, let's start with a quick introduction to Hypothesis.

It is just a guess based on some known facts but has not yet been proven. A good hypothesis is testable, which results in either true or false.

: Let's understand the hypothesis with a common example. Some scientist claims that ultraviolet (UV) light can damage the eyes then it may also cause blindness.

In this example, a scientist just claims that UV rays are harmful to the eyes, but we assume they may cause blindness. However, it may or may not be possible. Hence, these types of assumptions are called a hypothesis.

The hypothesis is one of the commonly used concepts of statistics in Machine Learning. It is specifically used in Supervised Machine learning, where an ML model learns a function that best maps the input to corresponding outputs with the help of an available dataset.

There are some common methods given to find out the possible hypothesis from the Hypothesis space, where hypothesis space is represented by and hypothesis by Th ese are defined as follows:

It is used by supervised machine learning algorithms to determine the best possible hypothesis to describe the target function or best maps input to output.

It is often constrained by choice of the framing of the problem, the choice of model, and the choice of model configuration.

. It is primarily based on data as well as bias and restrictions applied to data.

Hence hypothesis (h) can be concluded as a single hypothesis that maps input to proper output and can be evaluated as well as used to make predictions.

The hypothesis (h) can be formulated in machine learning as follows:

Where,

Y: Range

m: Slope of the line which divided test data or changes in y divided by change in x.

x: domain

c: intercept (constant)

: Let's understand the hypothesis (h) and hypothesis space (H) with a two-dimensional coordinate plane showing the distribution of data as follows:

Hypothesis space (H) is the composition of all legal best possible ways to divide the coordinate plane so that it best maps input to proper output.

Further, each individual best possible way is called a hypothesis (h). Hence, the hypothesis and hypothesis space would be like this:

Similar to the hypothesis in machine learning, it is also considered an assumption of the output. However, it is falsifiable, which means it can be failed in the presence of sufficient evidence.

Unlike machine learning, we cannot accept any hypothesis in statistics because it is just an imaginary result and based on probability. Before start working on an experiment, we must be aware of two important types of hypotheses as follows:

A null hypothesis is a type of statistical hypothesis which tells that there is no statistically significant effect exists in the given set of observations. It is also known as conjecture and is used in quantitative analysis to test theories about markets, investment, and finance to decide whether an idea is true or false. An alternative hypothesis is a direct contradiction of the null hypothesis, which means if one of the two hypotheses is true, then the other must be false. In other words, an alternative hypothesis is a type of statistical hypothesis which tells that there is some significant effect that exists in the given set of observations.

The significance level is the primary thing that must be set before starting an experiment. It is useful to define the tolerance of error and the level at which effect can be considered significantly. During the testing process in an experiment, a 95% significance level is accepted, and the remaining 5% can be neglected. The significance level also tells the critical or threshold value. For e.g., in an experiment, if the significance level is set to 98%, then the critical value is 0.02%.

The p-value in statistics is defined as the evidence against a null hypothesis. In other words, P-value is the probability that a random chance generated the data or something else that is equal or rarer under the null hypothesis condition.

If the p-value is smaller, the evidence will be stronger, and vice-versa which means the null hypothesis can be rejected in testing. It is always represented in a decimal form, such as 0.035.

Whenever a statistical test is carried out on the population and sample to find out P-value, then it always depends upon the critical value. If the p-value is less than the critical value, then it shows the effect is significant, and the null hypothesis can be rejected. Further, if it is higher than the critical value, it shows that there is no significant effect and hence fails to reject the Null Hypothesis.

In the series of mapping instances of inputs to outputs in supervised machine learning, the hypothesis is a very useful concept that helps to approximate a target function in machine learning. It is available in all analytics domains and is also considered one of the important factors to check whether a change should be introduced or not. It covers the entire training data sets to efficiency as well as the performance of the models.

Hence, in this topic, we have covered various important concepts related to the hypothesis in machine learning and statistics and some important parameters such as p-value, significance level, etc., to understand hypothesis concepts in a better way.





Youtube

  • Send your Feedback to [email protected]

Help Others, Please Share

facebook

Learn Latest Tutorials

Splunk tutorial

Transact-SQL

Tumblr tutorial

Reinforcement Learning

R Programming tutorial

R Programming

RxJS tutorial

React Native

Python Design Patterns

Python Design Patterns

Python Pillow tutorial

Python Pillow

Python Turtle tutorial

Python Turtle

Keras tutorial

Preparation

Aptitude

Verbal Ability

Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

Cloud Computing

Hadoop tutorial

Data Science

Angular 7 Tutorial

B.Tech / MCA

DBMS tutorial

Data Structures

DAA tutorial

Operating System

Computer Network tutorial

Computer Network

Compiler Design tutorial

Compiler Design

Computer Organization and Architecture

Computer Organization

Discrete Mathematics Tutorial

Discrete Mathematics

Ethical Hacking

Ethical Hacking

Computer Graphics Tutorial

Computer Graphics

Software Engineering

Software Engineering

html tutorial

Web Technology

Cyber Security tutorial

Cyber Security

Automata Tutorial

C Programming

C++ tutorial

Data Mining

Data Warehouse Tutorial

Data Warehouse

RSS Feed

← ^ →

Computational Learning Theory

Sample Complexity for Finite Hypothesis Spaces

  • The growth in the number of required training examples with problem size is called the sample complexity of the learning problem.
  • We will consider only consistent learners , which are those that maintain a training error of 0.
  • We can derive a bound on the number of training examples required by any consistent learner!
  • Fact: Every consistent learner outputs a hypothesis belonging to the version space.
  • Therefore, we need to bound the number of examples needed to assure that the version space contains no unacceptable hypothesis.
  • The version space $VS_{H,D}$ is said to be ε-exhausted with respect to $c$ and $\cal{D}$, if every hypothesis $h$ in $VS_{H,D}$ has error less than ε with respect to $c$ and $\cal{D}$. \[(\forall h \in VS_{H,D}) error_{\cal{D}}(h) < \epsilon \]

José M. Vidal .

Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces

  • Published: July 2002
  • Volume 48 , pages 189–218, ( 2002 )

Cite this article

finite and infinite hypothesis spaces in machine learning

  • Gunnar Rätsch 1 ,
  • Ayhan Demiriz 2 &
  • Kristin P. Bennett 3  

2596 Accesses

30 Citations

Explore all metrics

We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combinations of base hypotheses generated by some boosting-type base learning algorithm. Unlike the classification case, for regression the set of possible hypotheses producible by the base learning algorithm may be infinite. We explicitly tackle the issue of how to define and solve ensemble regression when the hypothesis space is infinite. Our approach is based on a semi-infinite linear program that has an infinite number of constraints and a finite number of variables. We show that the regression problem is well posed for infinite hypothesis spaces in both the primal and dual spaces. Most importantly, we prove there exists an optimal solution to the infinite hypothesis space problem consisting of a finite number of hypothesis. We propose two algorithms for solving the infinite and finite hypothesis problems. One uses a column generation simplex-type algorithm and the other adopts an exponential barrier approach. Furthermore, we give sufficient conditions for the base learning algorithm and the hypothesis set to be used for infinite regression ensembles. Computational results show that these methods are extremely promising.

Article PDF

Download to read the full article text

Similar content being viewed by others

finite and infinite hypothesis spaces in machine learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

finite and infinite hypothesis spaces in machine learning

A comparative analysis of gradient boosting algorithms

finite and infinite hypothesis spaces in machine learning

A Review on Random Forest: An Ensemble Classifier

Avoid common mistakes on your manuscript.

Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithm: Bagging, boosting and variants. Machine Learning , 36 , 105–142.

Google Scholar  

Bennett, K., Demiriz, A., & Shawe-Taylor, J. (2000). A column generation algorithm for boosting. In Pat Langley (Ed.), Proceedings Seventeenth International Conference on Machine Learning (pp. 65–72). San Francisco: Morgan Kaufmann.

Bertoni, A., Campadelli, P., & Parodi, M. (1997). Aboosting algorithm for regression. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Proceedings ICANN'97, Int. Conf. on Artificial Neural Networks , Vol. V of LNCS (pp. 343–348), Berlin: Springer.

Bertsekas, D. (1995). Nonlinear programming . Belmont, MA: Athena Scientific.

Bradley, P., Mangasarian, O., & Rosen, J. (1998). Parsimonious least norm approximation. Computational Optimization and Applications, 11:1 , 5–21.

Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11:7 , 1493–1518. Also Technical Report 504, Statistics Department, University of California, Berkeley.

Breneman, C., Sukumar, N., Bennett, K., Embrechts, M., Sundling, M., & Lockwood, L. (2000). Wavelet representations of molecular electronic properties: Applications in ADME, QSPR, and QSAR. Presentation, QSAR in Cells Symposium of the Computers in Chemistry Division's 220th American Chemistry Society National Meeting.

Censor,Y., & Zenios, S. (1997). Parallel optimization: Theory, algorithms and application . Numerical Mathematics and Scientific Computation. Oxford: Oxford University Press.

Chen, S., Donoho, D., & Saunders, M. (1995). Atomic decomposition by basis pursuit Technical Report 479, Department of Statistics, Stanford University.

Collins, M., Schapire, R., & Singer,Y. (2000). Adaboost and logistic regression unified in the context of information geometry. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory .

Cominetti, R., & Dussault, J.-P. (1994). A stable exponential penalty algorithm with superlinear convergence J.O.T.A., 83:2 .

Demiriz, A., Bennett, K., Breneman, C., & Embrechts, M. (2001). Support vector machine regression in chemometrics. Computer Science and Statistics. In Proceeding of the Conference on the 32 Symposium on the Interface , to appear.

Dietterich, T. (1999). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40:2 .

Drucker, H., Schapire, R., & Simard, P. (1993). Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7 , 705–719.

Duffy, N., & Helmbold, D. (2000). Leveraging for regression. In Colt'00 (pp. 208-219).

Embrechts, M., Kewley, R., & Breneman, C. (1998). Computationally intelligent data mining for the automated design and discovery of novel pharmaceuticals. In C. D. et al. (Ed.), Intelligent engineering systems through artifical neural networks , pp. 391-396. ASME Press.

Fisher, J., D. H. (Ed.). Improving regressors using boosting techniques . In Proceedings of the Fourteenth International Conference on Machine Learning.

Frean, M., & Downs, T. (1998). A simple cost function for boosting. Technical Report, Department of Computer Science and Electrical Engineering, University of Queensland.

Freund, Y., & Schapire, R. (1996). Game theory, on-line prediction and boosting. In COLT . San Mateo, CA: Morgan Kaufman. ACM Press, New York, NY, pp. 325–332.

Freund, Y., & Schapire, R. (1994). A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT: European Conference on Computational Learning Theory . LNCS.

Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning (pp. 148–146). San Mateo, CA: Morgan Kaufmann.

Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Technical Report, Department of Statistics, Sequoia Hall, Stanford Univerity.

Friedman, J. (1999). Greedy function approximation. Technical Report, Department of Statistics, Stanford University.

Frisch, K. (1955). The logarithmic potential method of convex programming. Memorandum, University Institute of Economics, Oslo.

Grove, A., & Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artifical Intelligence .

Hettich, R., & Kortanek, K. (1993). Semi-infinite programming: Theory, methods and applications. SIAM Review , 3 , 380–429.

Kaliski, J. A., Haglin, D. J., Roos, C., & Terlaky, T. (1997). Logarithmic barrier decomposition methods for semi-infinite programming. International Transactions in Operational Research, 4:4 , 285–303.

Kivinen, J., & Warmuth, M. (1999). Boosting as entropy projection. In Proc. 12th Annual Conference on Computational Learning Theory (pp. 134–144). New York: ACM Press.

LeCun, Y., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Müller, U. A., Säckinger, E., Simard, P., & Vapnik, V. (1995). Comparison of learning algorithms for handwritten digit recognition. In F. Fogelman-Soulié, & P. Gallinari (Eds.), Proceedings ICANN'95-International Conference on Artificial Neural Networks (Vol. II, pp. 53–60). Nanterre, France. EC2.

Luenberger, D. (1984). Linear and nonlinear programming (2nd edn.). Reading: Addison-Wesley Publishing Co., Reprinted with corrections in May, 1989.

Mackey, M. C., & Glass, L. (1977). Oscillation and chaos in physiological control systems. Science , 197 , 287–289.

Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proc. of AAAI .

Mason, L., Bartlett, P., & Baxter, J. (1998). Improved generalization through explicit optimization of margins. Technical Report, Deparment of Systems Engineering, Australian National University.

Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Functional gradient techniques for combining hypotheses. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 221–247). Cambridge, MA: MIT Press.

Mika, S., Rätsch, G., & Müller, K.-R. (2001). A mathematical programming approach to the Kernel Fisher algorithm. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Processing Systems, 13 , 591–597.

Mosheyev, L., & Zibulevsky, M. (2000). Penalty/barrier multiplier algorithm for semidefinite programming. Optimization Methods and Software, 13:4 , 235–262.

Müller, K.-R., Kohlmorgen, J., & Pawelzik, K. (1995). Analysis of switching dynamics with competing neural networks. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences , E78-A:10 , 1306–1315.

Müller, K.-R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., & Vapnik, V. (1999). Predicting time series with support vector machines. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in Kernel methods-support vector learning (pp. 243–254). Cambridge, MA: MIT Press. Short version appeared in ICANN'97, Springer Lecture Notes in Computer Science.

Müller, K.-R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., & Vapnik, V. (1997). Predicting time series with support vector machines. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Artificial neural networks-ICANN'97 (pp. 999–1004). Berlin: Springer. Lecture Notes in Computer Science, Vol. 1327.

Pawelzik, K., Kohlmorgen, J., & Müller, K.-R. (1996). Annealed competition of experts for a segmentation and classification of switching dynamics. Neural Computation , 8:2 , 342–358.

Rätsch, G. (2001). Robust boosting via convex optimization . Ph.D. Thesis, University of Potsdam, Neues Palais 10, 14469 Potsdam, Germany.

Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning , 42:3 , 287–320. also NeuroCOLT Technical Report NC-TR-1998-021.

Rätsch, G., Schölkopf, B., Mika, S., & Müller, K.-R. (2000a). SVM and boosting: One class. Technical report 119, GMD FIRST, Berlin. Accepted for publication in IEEE TPAMI.

Rätsch, G., Schölkopf, B., Smola, A., Mika, S., Onoda, T., & Müller, K.-R. (2000b). Robust ensemble learning. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 207–219). Cambridge, MA: MIT Press.

Rätsch, G. R., & Warmuth, M. K. (2001). Marginal boosting. Royal Holloway College, NeuroCOLT2 Technical report, 97. London.

Rätsch, G., Warmuth, M., Mika, S., Onoda, T., Lemm, S., & Müller, K.-R. (2000). Barrier boosting. In COLT'2000 (pp. 170–179). San Mateo, CA: Morgan Kaufmann.

Ridgeway, G. D., & Madigan, T. R. (1999). Boosting methodology for regression problems. In D. Heckerman, & J. Whittaker (Eds.), Proceedings of Artificial Intelligence and Statistics '99 (pp. 152-161). http:/www.rand.org/methodology/stat/members/gregr.

Schapire, R., Freund,Y., Bartlett, P., & Lee,W. (1997). Boosting the margin:Anewexplanation for the effectiveness of voting methods. In Proc. 14th International Conference on Machine Learning (pp. 322–330). San Mateo, CA: Morgan Kaufmann.

Schölkopf, B., Burges, C., & Smola, A. (Eds.). (1999). Advances in Kernel methods-support vector learning . Cambridge, MA: MIT Press.

Schölkopf, B., Smola, A., Williamson, R. C., & Bartlett, P. L. (2000). New support vector algorithms. Neural Computation, 12 , 1207–1245.

Schwenk, H., & Bengio, Y. (1997). AdaBoosting neural networks. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Proc. of the Int. Conf. on Artificial Neural Networks (ICANN'97) , Vol. 1327 of LNCS (pp. 967–972). Berlin: Springer.

Smola, A. J. (1998). Learning with Kernels. Ph.D. Thesis, Technische Universit¨at Berlin.

Smola, A., Schölkopf, B., & Rätsch, G. (1999). Linear programs for automatic accuracy control in regression. In Proceedings ICANN'99, Int. Conf. on Artificial Neural Networks , Berlin: Springer.

Vapnik, V. (1995). The nature of statistical learning theory . New York: Springer Verlag.

Weigend, A., & N. A. Gershenfeld (Eds.) (1994). Time series prediction: Forecasting the future and understanding the past . Addison-Wesley. Santa Fe Institute Studies in the Sciences of Complexity.

Zemel, R., & Pitassi, T. (2001). A gradient-based boosting algorithm for regression problems. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Precessing Systems 13 (pp. 696–702). Cambridge, MA: MIT Press.

Download references

Author information

Authors and affiliations.

GMD FIRST, Kekulèstr. 7, 12489, Berlin, Germany

Gunnar Rätsch

Department of Decision Sciences and Eng. Systems, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA

Ayhan Demiriz

Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA

Kristin P. Bennett

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

About this article

Rätsch, G., Demiriz, A. & Bennett, K.P. Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces. Machine Learning 48 , 189–218 (2002). https://doi.org/10.1023/A:1013907905629

Download citation

Issue Date : July 2002

DOI : https://doi.org/10.1023/A:1013907905629

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • ensemble learning
  • semi-infinite programming
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. PPT

    finite and infinite hypothesis spaces in machine learning

  2. PPT

    finite and infinite hypothesis spaces in machine learning

  3. PPT

    finite and infinite hypothesis spaces in machine learning

  4. Hypothesis in Machine Learning

    finite and infinite hypothesis spaces in machine learning

  5. PPT

    finite and infinite hypothesis spaces in machine learning

  6. PPT

    finite and infinite hypothesis spaces in machine learning

VIDEO

  1. Riemannian Geometry

  2. week3 (Learning Theory

  3. SCILAB Practical on Finite Probability Spaces

  4. Limits and inifinity in calculus

  5. Discussion on ML Types, Hypothesis Spaces and Evaluation (Week 1

  6. Hypothesis spaces, Inductive bias, Generalization, Bias variance trade-off in tamil -AL3451 #ML

COMMENTS

  1. Hypothesis in Machine Learning

    A hypothesis is a function that best describes the target in supervised machine learning. The hypothesis that an algorithm would come up depends upon the data and also depends upon the restrictions and bias that we have imposed on the data. The Hypothesis can be calculated as: y = mx + b y =mx+b. Where, y = range. m = slope of the lines.

  2. PDF LECTURE 16: LEARNING THEORY

    Infinite Hypothesis Space The previous analysis was restricted to finite hypothesis spaces Some infinite hypothesis spaces are more expressive than others - E.g., Rectangles, vs. 17- sides convex polygons vs. general convex polygons - Linear threshold function vs. a conjunction of LTUs Need a measure of the expressiveness of an infinite

  3. PDF CSC 411 Lecture 23-24: Learning theory

    Finite hypothesis space A rst simple example of PAC learnable spaces - nite hypothesis spaces. Theorem (uniform convergence for nite H) Let Hbe a nite hypothesis space and ': YY! [0;1] be a bounded loss function, then Hhas the uniform convergence property with M( ; ) = ln(2jHj ) 2 2 and is therefore PAC learnable by the ERM algorithm. Proof .

  4. PDF ml learning with finite hypothesis sets

    Learning Bound for Finite H - Consistent Case. Theorem: let H be a finite set of functions from X to 1} and L an algorithm that for any target concept c H and sample S returns a consistent hypothesis hS : . Then, for any 0 , with probability at least. (hS ) = 0 >.

  5. PDF ml learning with infinite hypothesis sets

    VC Dimension. (Vapnik & Chervonenkis, 1968-1971; Vapnik, 1982, 1995, 1998) Definition: the VC-dimension of a hypothesis set H is defined by. VCdim(H ) = max{m : H (m) = 2m}. Thus, the VC-dimension is the size of the largest set that can be fully shattered by H . Purely combinatorial notion.

  6. PDF 10-806 Foundations of Machine Learning and Data Science

    10-806 Foundations of Machine Learning and Data Science Maria-Florina Balcan Lecture 4-5: September 21st and September 23rd, 2015 Sample Complexity Results for In nite Hypothesis Spaces The Shattering Coe cient Let C be a concept class over an instance space X, i.e. a set of functions functions from X to f0;1g(where both Cand Xmay be in nite).

  7. PDF CS 391L: Machine Learning: Computational Learning Theory

    Allows unlimited data and computational resources. PAC Model. Only requires learning a Probably Approximately Correct Concept: Learn a decent approximation most of the time. Requires polynomial sample complexity and computational complexity. 6. • Learning in the limit model is too strong.

  8. PDF CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning

    hypothesis space, i.e., if you have small hypothesis space, you do not have to see too many examples. There is also a trade-o of the hypothesis space. If the space is small, then it generalizes well, but it many not be expressive enough. 4 Consistent Learners Using the results from the previous section, we can get this general scheme for PAC ...

  9. PDF Foundations of Machine Learning and Data Science

    The VC-dimension of a hypothesis space H is the cardinality of the largest set S that can be shattered by H. Definition: If arbitrarily large finite sets can be shattered by H, then VCdim :H L∞ VC-dimension (Vapnik-Chervonenkis dimension) H shatters S if |HS| L2||. Shattering, VC-dimension The VC-dimension of a hypothesis space H is the ...

  10. PDF Learning Theory Part 1: PAC Model

    What if the hypothesis space is not finite? •Q: If H is infinite (e.g. the class of perceptrons), what measure of hypothesis-space complexity can we use in place of |H| ? • A: the largest subset of 𝒳for which H can guarantee zero training error, regardless of the target function. this is known as the Vapnik-Chervonenkis dimension (VC ...

  11. Foundations of Learning from Data

    We will see that PAC can provide learnability bounds for a finite hypothesis space, and by using the Vapnik-Chervonenkis (VC) dimension, such results can be extended to an infinite hypothesis space. These results will give us theoretical characterizations of the difficulty of machine learning problems and the capabilities of certain models.

  12. PDF COS 511: Theoretical Machine Learning

    nite hypothesis spaces since we are using the cardinality of H. This led us to brie y discuss about the generalization of Occam's Razor to in nite hypothesis spaces at the end of last week's lecture. Sample Complexity for In nite Hypothesis Space In order to generalize Occam's Razor to in nite hypothesis spaces, we have to somewhat ...

  13. A Gentle Introduction to Computational Learning Theory

    Computational learning theory, or statistical learning theory, refers to mathematical frameworks for quantifying learning tasks and algorithms. These are sub-fields of machine learning that a machine learning practitioner does not need to know in great depth in order to achieve good results on a wide range of problems. Nevertheless, it is a sub-field where having a high-level understanding of ...

  14. PDF Computational Learning Theory

    Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004 Sample Complexity for infinite hypothesis spaces • Another measure of the complexity of H called Vapnik-Chervonenkis dimension, or VC(H) • We will use VC(H) instead of |H| • Results in tighter bounds • Allows characterizing sample complexity of infinite hypothesis spaces and is ...

  15. What is a Hypothesis in Machine Learning?

    Supervised machine learning is often described as the problem of approximating a target function that maps inputs to outputs. This description is characterized as searching through and evaluating candidate hypothesis from hypothesis spaces. The discussion of hypotheses in machine learning can be confusing for a beginner, especially when "hypothesis" has a distinct, but related meaning […]

  16. Sample complexity

    Definition. Let be a space which we call the input space, and be a space which we call the output space, and let denote the product .For example, in the setting of binary classification, is typically a finite-dimensional vector space and is the set {,}. Fix a hypothesis space of functions :.A learning algorithm over is a computable map from to .In other words, it is an algorithm that takes as ...

  17. Searching the hypothesis space (Chapter 6)

    In Chapter 5 we introduced the main notions of machine learning, with particular regard to hypothesis and data representation, and we saw that concept learning can be formulated in terms of a search problem in the hypothesis space H.As H is in general very large, or even infinite, well-designed strategies are required in order to perform efficiently the search for good hypotheses.

  18. Computational Learning Theory

    The number of dichotomies could be finite even for an infinite hypothesis space \(\mathcal {H}\); ... Award committee pointed out that Valiant's paper published in 1984 created a new research area known as computational learning theory that puts machine learning on a sound mathematical footing.

  19. machine learning

    The hypothesis class can be finite or infinite, for example a discrete set of shapes to encircle certain portion of the input space is a finite hypothesis space, whereas hpyothesis space of parametrized functions like neural nets and linear regressors are infinite. ... is commonly used. For example, this famous book on machine learning and ...

  20. Hypothesis in Machine Learning

    The hypothesis is one of the commonly used concepts of statistics in Machine Learning. It is specifically used in Supervised Machine learning, where an ML model learns a function that best maps the input to corresponding outputs with the help of an available dataset. In supervised learning techniques, the main aim is to determine the possible ...

  21. PDF Sample complexity for in nite hypothesis spaces

    COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #5 Scribe: David Bieber February 19, 2013 Recall Occam's razor. With probability at least 1 , a hypothesis h2Hconsistent with mexamples sampled independently from distribution Dsatis es err(h) lnjHj+ln 1 m: Sample complexity for in nite hypothesis spaces

  22. Sample Complexity for Finite Hypothesis Spaces

    Fact: Every consistent learner outputs a hypothesis belonging to the version space. Therefore, we need to bound the number of examples needed to assure that the version space contains no unacceptable hypothesis.

  23. PDF Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces

    In Section 3.2 we investigate the dual of this linear program for ensemble regression. In Section 3.3, we propose a semi-infinite linear program formulation for "boosting" of infinite hypothesis sets, first in the dual and then in the primal space. The dual problem is called semi-infinite because it has an infinite number of constraints and ...

  24. Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces

    We explicitly tackle the issue of how to define and solve ensemble regression when the hypothesis space is infinite. Our approach is based on a semi-infinite linear program that has an infinite number of constraints and a finite number of variables. We show that the regression problem is well posed for infinite hypothesis spaces in both the ...