程式碼效能探討

最近在寫一些題目練練手，發現個有趣的東西

題目是這樣的：

You are given a string and you have to find its first word.

The input string consists of only English letters and spaces.
There aren’t any spaces at the beginning and the end of the string.

Input: A string (str).

Output: A string (str).

Examples:

assert first_word("Hello world") == "Hello"

assert first_word("a word") == "a"

assert first_word("greeting from CheckiO Planet") == "greeting"

assert first_word("hi") == "hi"

How it is used: The first word is a command in a command line.

Precondition: The text can contain a-z, A-Z and spaces.

我馬上想到，這還不簡單，用 split 之後取第一個值 return 不就好了，所以給出以下程式碼：

def first_word(text: str) -> str:
    return text.split(" ")[0]

過是過了，但往下一滑，發現了有趣的東西

def first_word(text):
    index = text.find(" ")
    return text[:index] if index != -1 else text


"""
It's worth to look at the performance of different methods under the same predefined conditions.
Let's check runtime of the 4 methods (10000 executions for each) defined below for the next 4 cases:
-a short str which contains space chars: "asdf we"*10;
-a short str which doesn't contain space chars: "asdfawe"*10;
-a long str which contains space chars: "asdf we"*100000;
-a long str which doesn't contain space chars: "asdf we"*100000.
############################################################################################################
from timeit import timeit as t


def first_word_1(text):
    return text.split(" ")[0]

print(t('first_word_1(x)', setup='x = "asdf we"*10', number=10000, globals=globals()))       #  ~11.7 ms
print(t('first_word_1(x)', setup='x = "asdfawe"*10', number=10000, globals=globals()))       #  ~6.1 ms
print(t('first_word_1(x)', setup='x = "asdf we"*100000', number=10000, globals=globals()))   #  ~90928.2 ms
print(t('first_word_1(x)', setup='x = "asdfawe"*100000', number=10000, globals=globals()))   #  ~5562.9 ms


def first_word_2(text):
    index = text.find(" ")
    return text[:index] if index != -1 else text
    
print(t('first_word_2(x)', setup='x = "asdf we"*10', number=10000, globals=globals()))       #  ~6.3 ms
print(t('first_word_2(x)', setup='x = "asdfawe"*10', number=10000, globals=globals()))       #  ~4.7 ms
print(t('first_word_2(x)', setup='x = "asdf we"*100000', number=10000, globals=globals()))   #  ~7.0 ms
print(t('first_word_2(x)', setup='x = "asdfawe"*100000', number=10000, globals=globals()))   #  ~2108.4 ms


def first_word_3(text):
    try:
        index = text.index(" ")
        return text[:index]
    except ValueError:
        return text

print(t('first_word_3(x)', setup='x = "asdf we"*10', number=10000, globals=globals()))       #  ~5.8 ms
print(t('first_word_3(x)', setup='x = "asdfawe"*10', number=10000, globals=globals()))       #  ~8.5 ms
print(t('first_word_3(x)', setup='x = "asdf we"*100000', number=10000, globals=globals()))   #  ~5.8 ms
print(t('first_word_3(x)', setup='x = "asdfawe"*100000', number=10000, globals=globals()))   #  ~2005.8 ms


def first_word_4(text):
    index = -1
    for pos, letter in enumerate(text):
        if letter == " ":
            index = pos
            break
    return text[:index] if index != -1 else text
    
print(t('first_word_4(x)', setup='x = "asdf we"*10', number=10000, globals=globals()))       #  ~13.1 ms
print(t('first_word_4(x)', setup='x = "asdfawe"*10', number=10000, globals=globals()))       #  ~71.1 ms
print(t('first_word_4(x)', setup='x = "asdf we"*100000', number=10000, globals=globals()))   #  ~13.1 ms
print(t('first_word_4(x)', setup='x = "asdfawe"*100000', number=10000, globals=globals()))   #  ~788793.7 ms
############################################################################################################
So what conclusions can be made from all of this?

1.Since every string is an instance of the string class, it's preferred to use its methods rather than implement
a new function which seems to be faster. It won't work faster in most of the cases. Compare first_word_2 and
first_word_4 for example.

2.Despite the fact first_word_1 (which uses .split() method) looks nice and concise it works worse with long strings
than first_word_2 and first_word_3 do(they use .find() and .index() methods respectively). Especially in case there are
lots of spaces in the text.

3.str.index() method works a bit faster than str.find() but only in case there is a space in the text. Otherwise it's
needed to handle an exception which takes some extra time. 

Thus, I'd use str.find() method in such kind of tasks.
"""

只能說我還太菜了，要再多練練

僅此紀錄

程式碼效能探討

留言

發表迴響取消回覆