Python Clean Code


The guidelines to create clean and smart code.

Overview


`TOC`

PRODUCTION CODE:

software running on production servers to handle live users and data of the intended audience. Note this is different from production quality code, which describes code that meets expectations in reliability, efficiency, etc., for production. Ideally, all code in production meets these expectations, but this is not always the case.

CLEAN:

readable, simple, and concise. A characteristic of production quality code that is crucial for collaboration and maintainability in software development.

  • Meaningful Names
  • Nice Whitespace

MODULAR:

logically broken up into functions and modules. Also an important characteristic of production quality code that makes your code more organized, efficient, and reusable.

  • DRY - Don’t repeat yourself
  • Abstract out logic into function
  • Minimize the number of entities (functions, classes, modules, etc.)
  • A function fucus at doing one thing
  • Use abbreviation for parameter

Code Development Process

[ Code Experiment] -> [Code Refactoring] (Constantly)

Documentation


TOC

Line level

in line comment (Explain when code cannot)

Reable code is preferable over having comments

Function / Module level - docstring

def function_name (var1, var2=10, *args, **kwargs):

""" Purpose of this function

Parameters:
-----------
var1: array_like  
    Explanation
var2: int or float
    Explanation 
*args : iterable
    Other arguments.
long_var_name : {'hi', 'ho'}, optional
    Choices in brackets, default first when optional.
**kwargs : dict
    Keyword arguments.

Returns:
-------
output list Explanation 
"""
Return output

Reference: https://numpydoc.readthedocs.io/en/latest/format.html

Project level - documents

README file

Optimization


TOC

  • Use vector over loop
In [1]:
# Experiment: Intersection List

import time
import numpy as np

# Create sample date
list1 = list(range(1, 10001))
list2 = list(range(1, 5001))

# Experiment1: loop
start_time = time.time()

intersection_list1 = []

for item in list1:
  if item in list2:
    intersection_list1.append(item)

print('Experiment 1: loop, Duration: {:.4f} seconds'.format(time.time() - start_time))

# Experiment2: np.intersect1d
start_time = time.time()

intersection_list2 =  np.intersect1d(list1, list2)

print('Experiment 2: np.inters., Duration: {:.4f} seconds'.format(time.time() - start_time))

# Experiment3: set
start_time = time.time()

intersection_list3 = set.intersection(set(list1), set(list2))

print('Experiment 3: set, Duration: {:.4f} seconds'.format(time.time() - start_time))
Experiment 1: loop, Duration: 0.4057 seconds
Experiment 2: np.inters., Duration: 0.0037 seconds
Experiment 3: set, Duration: 0.0008 seconds
In [2]:
# Experiment: Conditional sum

import time
import numpy as np

# Create sample date
list1 = np.array(range(1, 100001)).astype(int)

# Experiment1: loop
start_time = time.time()

total1 = 0
for item in list1:
  if item < 2000:
    total1 += item

print('Experiment 1: loop, Duration: {:.4f} seconds'.format(time.time() - start_time))

# Experiment2: np.intersect1d
start_time = time.time()

total2 =  (list1[list1 < 2000]).sum()

print('Experiment 2: numpy, Duration: {:.4f} seconds'.format(time.time() - start_time))
Experiment 1: loop, Duration: 0.0228 seconds
Experiment 2: numpy, Duration: 0.0003 seconds

Version


TOC

  • git version control
  • use master and branch to control version
  • develop multiple features at once
  • Merge Conflicts - git dont know how to combine two changes, and asks you for help.

Senario1

  • master devlop -> LOCAL □develop Pull from develop branch
  • master devlop -> LOCAL □develop ☑feature1 Create feature1 branch
  • master devlop -> LOCAL □develop ☑feature1 Edit code in feature1 branch
  • master devlop -> LOCAL □develop ☑feature1 Commit change of feature1 branch
  • master devlop -> LOCAL ☑develop □feature1 Switch to develop branch
  • master devlop -> LOCAL □develop □feature1 ☑feature2 Create feature2 branch
  • master devlop -> LOCAL □develop □feature1 ☑feature2 Edit feature2 branch
  • master devlop -> LOCAL □develop □feature1 ☑feature2 Commi feature2 branch
  • master devlop -> LOCAL ☑develop □feature1 □feature2 Switch to Develop branch
  • master devlop -> LOCAL ☑develop □feature1 □feature2 Pull from latest Remote Develop branch
  • master devlop -> LOCAL ☑develop □feature1 Merge with feature2 branch
  • master devlop -> LOCAL ☑develop □feature1 Push to remote develop branch
  • master devlop -> LOCAL □develop ☑feature1 Switch to feature1 branch
  • master devlop -> LOCAL ☑develop Review remote development and merge it to master

☑ local active branch

Senario2

  • master devlop -> LOCAL □develop Pull from develop branch
  • master devlop -> LOCAL □develop ☑model Create model branch
  • master devlop -> LOCAL □develop ☑model Turn parameter and commit1 cv0.9
  • master devlop -> LOCAL □develop ☑model Turn parameter and commit2 cv0.8
  • master devlop -> LOCAL □develop ☑model Switch to commit2 cv0.9
  • master devlop -> LOCAL ☑develop □mode1 Switch to develop branch
  • master devlop -> LOCAL ☑develop □model Pull from latest Remote Develop branch
  • master devlop -> LOCAL ☑develop Merge with model branch
  • master devlop -> LOCAL ☑develop Push to remote develop branch
  • master devlop -> LOCAL ☑develop Review remote development and merge it to master

☑ local active branch

Log


TOC

  • Be concise and use normal capitalization

  • Level for logging

    • DEBUG - level you would use for anything that happens in the program.
    • ERROR - level to record any error that occurs
    • INFO - level to record all actions that are user-driven or system specific, such as regularly scheduled operations

Questions to ASK

  • Is the code clean and modular?

    • Can I understand the code easily?
    • Does it use meaningful names and whitespace?
    • Is there duplicated code?
    • Can you provide another layer of abstraction?
    • Is each function and module necessary?
    • Is each function or module too long?
  • Is the code efficient?

    • Are there loops or other steps we can vectorize?
    • Can we use better data structures to optimize any steps?
    • Can we shorten the number of calculations needed for any steps?
    • Can we use generators or multiprocessing to optimize any steps?
  • Is documentation effective?

    • Are in-line comments concise and meaningful?
    • Is there complex code that's missing documentation?
    • Do function use effective docstrings?
    • Is the necessary project documentation provided?
  • Is the code well tested?

    • Does the code high test coverage?
    • Do tests check for interesting cases?
    • Are the tests readable?
    • Can the tests be made more efficient?
  • Is the logging effective?

    • Are log messages clear, concise, and professional?
    • Do they include all relevant and useful information?
    • Do they use the appropriate logging level?

    ### Tips

  • Use a code linter to check

https://www.pylint.org/

In [7]:
# Sample Test files
# compute_launch.py

# def days_until_launch(current_day, launch_day):
#     """"Returns the days left before launch.
    
#     current_day (int) - current day in integer
#     launch_day (int) - launch day in integer
#     """
#     if launch_day < current_day:
#         days_until_launch = 0
#     else:
#         days_until_launch = launch_day - current_day
#     return days_until_launch

# test_compute_launch

# from compute_launch import days_until_launch

# def test_days_until_launch_4():
#     assert(days_until_launch(22, 26) == 4)

# def test_days_until_launch_0():
#     assert(days_until_launch(253, 253) == 0)

# def test_days_until_launch_0_negative():
#     assert(days_until_launch(83, 64) == 0)
    
# def test_days_until_launch_1():
#     assert(days_until_launch(9, 10) == 1)

Log


TOC

  • Be concise and use normal capitalization

  • Level for logging

    • DEBUG - level you would use for anything that happens in the program.
    • ERROR - level to record any error that occurs
    • INFO - level to record all actions that are user-driven or system specific, such as regularly scheduled operations

Questions to ASK

  • Is the code clean and modular?

    • Can I understand the code easily?
    • Does it use meaningful names and whitespace?
    • Is there duplicated code?
    • Can you provide another layer of abstraction?
    • Is each function and module necessary?
    • Is each function or module too long?
  • Is the code efficient?

    • Are there loops or other steps we can vectorize?
    • Can we use better data structures to optimize any steps?
    • Can we shorten the number of calculations needed for any steps?
    • Can we use generators or multiprocessing to optimize any steps?
  • Is documentation effective?

    • Are in-line comments concise and meaningful?
    • Is there complex code that's missing documentation?
    • Do function use effective docstrings?
    • Is the necessary project documentation provided?
  • Is the code well tested?

    • Does the code high test coverage?
    • Do tests check for interesting cases?
    • Are the tests readable?
    • Can the tests be made more efficient?
  • Is the logging effective?

    • Are log messages clear, concise, and professional?
    • Do they include all relevant and useful information?
    • Do they use the appropriate logging level?

    https://github.com/lyst/MakingLyst/tree/master/code-reviews

    https://www.kevinlondon.com/2015/05/05/code-review-best-practices.html

Variables


TOC

Variable Name Guidelines:

  • Input data: data, df
  • Data structure:
  • Type: product_list,
  • Period: start_time, end_time, execution_time
  • Sampling: X, y, X_train, y_train, X_test, y_test, X_valid, y_valid
  • Model:
  • Evaluation:
  • Parameter : i = iteration, ,
  • Common abbreviation:
  • Imported library:
    • import numpy as np
    • import pandas as pd
    • import tensorflow as tf
    • import statsmodels.api as sm
    • from sklearn import
    • import matplotlib.pyplot as plt
In [ ]: