Automated Feature Engineering in Python

摘要： Machine learning is increasingly moving from hand-designed models to automatically optimized pipelines using tools such as H20, TPOT, and auto-sklearn. These libraries, along with methods such as random search, aim to simplify the model selection and tuning parts of machine learning by finding the best model for a dataset with little to no manual intervention. However, feature engineering, an arguably more valuable aspect of the machine learning pipeline, remains almost entirely a human labor.

Typically, feature engineering is a drawn-out manual process, relying on domain knowledge, intuition, and data manipulation. This process can be extremely tedious and the final features will be limited both by human subjectivity and time. Automated feature engineering aims to help the data scientist by automatically creating many candidate features out of a dataset from which the best can be selected and used for training.

In this article, we will walk through an example of using automated feature engineering with the featuretools Python library. We will use an example dataset to show the basics (stay tuned for future posts using real-world data). The complete code for this article is available on GitHub.

The process of constructing features is very time-consuming because each new feature usually requires several steps to build, especially when using information from more than one table. We can group the operations of feature creation into two categories: transformations and aggregations. Let’s look at a few examples to see these concepts in action.

...

Full Text： Medium

若喜歡本文，請關注我們的臉書 Please Like our Facebook Page：

Big Data In Finance

Automated Feature Engineering in Python

留下你的回應

以訪客張貼回應

回應

釘選列表

喜愛列表

Web Services

YOU MAY BE INTERESTED

Popular Tags

	今日	444
	昨日	1579
	本週	8430
	本月	7314
	總訪客量	2707067