Infinite Loop: 【演算】內插搜尋法

　　內插搜尋法(interpolation search)改良自二分搜尋法(binary search)，也同樣都只能在資料已進行的情況下進行搜尋。但是，若在資料分布均勻時，其效率是會比二分搜尋法還高的。

　　既然內插搜尋法是改良自二分搜尋法，那麼實際的差異在哪呢？首先，內插搜尋法將資料的分布假設為一條直線：

　　若是一串資料於索引值 I_low 與 I_upper 的值分別為 K_low 與 K_upper。假設我們需要找的目標值為 K，且 K 的索引值 I，則我們理應可以推得：(K_upper - K_low) / (I_upper - I_low) = (K - K_low) / (I - I_low)。

　　若整理以上的算式，便可以得到公式：I = I_low + (I_upper - I_low)(K - K_low) / K_upper - K_low。

　　舉例來說，現在我們需要在下面這些資料中搜尋數值 44：

5 12 19 26 37 44 60 65 73 85

　　則 I = 1 + [(10 - 1)(44 - 5) / (85 - 5)] = 5：

5 12 19 26 37 44 60 65 73 85

　　由於 37 小於目標值，因此我們對後面的資料進行同樣的搜尋，I = 6 + [(10 - 6)(44 - 44) / (85 - 44)] = 6：

5 12 19 26 37 44 60 65 73 85

發現第 I 項資料值等於搜尋值，這樣就找到欲尋找的值了。

　　內插搜尋法的虛擬碼大致如下：

void interpolationSearch(Type data[1..n], Type search)
{
    Index low = 1;
    Index upper = n;

    while (low <= upper)
    {
        Index mid = low + (upper - low)
                      * (search - data[low])
                      / (data[upper] - data[low]);

        if (data[mid] = search)
        {
            print mid;
            return;
        }
        else if (data[mid] > search)
        {
            upper = mid - 1;
        }
        else if (data[mid] < search)
        {
            low = mid + 1;
        }
    }

    print "Not found";
}

　　平均而言，在有 n 筆資料的情況下，內插搜尋法只需要進行 log(log(n)) 次比對就可以找到資料！可見其效率之高。

　　但是，在資料並非分布均勻的最差情況下，內插搜尋法則是需要進行 n 次比對才能夠找到資料。此時的效率就比二分搜尋法低得多了。

　　以 C 語言的實作如下：

#include <stdio.h>
#include <stdlib.h>

int interpolationSearch(int[], int, int);

int main(void)
{
    int search, ans;
    int data[] = {5, 12, 19, 26, 37, 44, 60, 65, 73, 85};

    printf("請輸入欲搜尋的資料: ");
    scanf("%d", &search);

    // 呼叫函式進行搜尋
    ans = interpolationSearch(data, search, sizeof(data) / sizeof(int));

    if (ans < 0)
    {
        printf("找不到 %d\n", search);
    }
    else
    {
        printf("在第 %d 筆資料找到 %d\n", ans + 1, search);
    }

    system("pause");
}

int interpolationSearch(int data[], int search, int n)
{
    int low = 0, upper = n - 1;

    while (low <= upper)
    {
        int mid = low + (upper - low)
                      * (search - data[low])
                      / (data[upper] - data[low]);

        if (data[mid] == search)
        {
            return mid;
        }
        else if (data[mid] > search)
        {
            upper = mid - 1;
        }
        else if (data[mid] < search)
        {
            low = mid + 1;
        }
    }

    return -1;
}

4 回覆:

咳嗽di小鱼提到...: 借用了您的图片和算法公式...如果不妥请留言给我哈
http://blog.gregwym.info/cs-240-fu-xi-zong-jie-zhi-liu--dictionary-tricks.html; 2011年4月8日凌晨4:50
Unknown 提到...: @咳嗽di小鱼: 沒問題的, 歡迎引用; 2011年4月8日上午11:10
匿名提到...: 請問一下資料分布均勻的意思是已排序過嗎?; 2011年11月3日晚上7:43
Unknown 提到...: 不是的，是指數字平均散佈於 I_low 至 I_upper 之間。
譬如說：13 19 25 31 37

因為在這種情況下，透過公式「猜」出來的索引值會十分接近欲找尋資料的實際索引值。; 2011年11月3日晚上8:19

Infinite Loop

12月 21, 2008

【演算】內插搜尋法 - Interpolation Search

4 回覆:

張貼留言

About Me

Labels

Search

Archive