摘要: 「贏者詛咒」:贏得拍賣品的中標者出價高於其他競標者,但他很可能對拍賣品估價過高,支付了超過其價值的價格,從而贏得的拍賣品的收益會低於正常收益甚至爲負。換句話說,就是當你一心想要贏得競標時,卻偏離了你原本的目的:當我們爲各種測評任務中取得的分數歡欣鼓舞時,可能我們已經受到了「贏者詛咒」。

摘要: 創建一個爬蟲項目,以圖蟲網為例抓取裡面的圖片。在頂部菜單“發現” “標籤”裡面是對各種圖片的分類,點擊一個標籤,我們以此作為爬蟲入口,分析一下該頁面

摘要: In this second article on adversarial validation we get to the meat of the matter: what we can do when train and test sets differ. Will we be able to make a better validation set?

摘要: Many data science competitions suffer from a test set being markedly different from a training set (a violation of the “identically distributed” assumption). It is then difficult to make a representative validation set. We propose a method for selecting training examples most similar to test examples and using them as a validation set. The core of this idea is training a probabilistic classifier to distinguish train/test examples.

摘要: 我們在處理資料時,為了萃取資料的重要資訊常常會使用主成份分析,不過有時候卻難以解釋主成分分析的結果與成因。此篇教導了主成份分析的視覺化方法,可以有效地幫助我們了解並給予主成份分析背後的意義