The author | ANIRUDDHA BHANDARI compile | source of vitamin k | Analytics Vidhya

An overview of the

  • Python-style tutorials will enable you to write neat Python code

  • Learn the different Python conventions and other nuances of Python programming in this style tutorial

introduce

Have you ever come across a badly written piece of Python code? I know many of you will nod.

Writing code is part of the role of a data scientist or analyst. Writing nice, clean Python code, on the other hand, is another matter entirely. As a programmer with expertise in analytics or data science (or even software development), this will most likely change your image.

So how do we write this supposedly beautiful Python code?

Welcome to python-style tutorials

Many people in data science and analytics come from non-programming backgrounds. We started by learning the basics of programming, followed by understanding the theory behind machine learning, and then started conquering data sets.

In the process, we often didn’t practice core programming and didn’t pay attention to programming conventions.

That’s what this Python-style tutorial will tackle. We’ll review the Python programming conventions described in the PEP-8 documentation and you’ll be a better programmer!

directory

  • Why is this Python-style tutorial important for data science?

  • What is PEP8?

  • Understand Python naming conventions

  • Code layout for python-style tutorial

  • Be familiar with correct Python comments

  • Whitespace in Python code

  • General programming advice for Python

  • Automatic formatting of Python code

Why is this Python-style tutorial important for data science

Formatting is an important aspect of programming for several reasons, especially for data science projects:

  • readability

A good code format will inevitably improve the readability of your code. This will not only make your code more organized, but also make it easier for the reader to understand what is going on in the program. This is especially useful if your program runs thousands of lines.

You’ll have lots of data frames, lists, functions, drawings, etc., and if you don’t follow the right formatting guidelines, you can easily lose track of your own code!

  • collaboration

If you’re collaborating on a team project, as most data scientists do, good formatting becomes an essential task.

This ensures that the code is understood correctly without causing any trouble. In addition, following a common format pattern maintains program consistency throughout the project lifecycle.

  • Bug fix

Having well-formed code will also help you when you need to fix bugs in your programs. Wrong indentation, improper naming, and so on can easily make debugging a nightmare!

Therefore, it is best to start your program in the right writing style!

With that in mind, let’s take a quick overview of the PEP-8 style tutorial that this article will cover!

The PEP – 8 is what

Pep-8 or Python Enhancement Suggestions are style tutorials for Python programming. It was written by Guido Van Rosen, Barry Warsaw and Nick Coglan. It describes the rules for writing beautiful and readable Python code.

Following peP-8’s coding style will ensure consistency in Python code, making it easier for other readers, contributors, or yourself to understand.

This article covered the most important aspects of peP-8 guidelines, such as how to name Python objects, how to structure code, when to include comments and whitespace, and finally some general programming advice that is important but easily overlooked by most Python programmers.

Let’s learn to write better code!

The official PEP-8 documentation can be found here.

www.python.org/dev/peps/pe…

Understand Python naming conventions

Shakespeare famously said, “What’s in a name?” If he had met a programmer, he would have gotten a quick “a lot!” .

Yes, when you write a piece of code, the names you choose for variables, functions, etc., have a big impact on the understandability of the code. Look at the following code:

Def func(x) def func(x) A = x.split()[0] b = x.split()[1] return a, b print(func('Analytics Vidhya')) # def name_split(full_name): first_name = full_name.split()[0] last_name = full_name.split()[1] return first_name, last_name print(name_split('Analytics Vidhya'))Copy the code
('Analytics', 'Vidhya') ('Analytics', 'Vidhya')Copy the code

Both functions do the same thing, but the latter provides a better intuition of what’s going on, even without any comments!

That’s why choosing the right name and following the right naming convention can make a huge difference when writing a program. That being said, let’s look at how to name objects in Python!

Start naming

These techniques can be applied to naming any entity and should be strictly followed.

  • Follow the same pattern
thisVariable, ThatVariable, some_other_variable, BIG_NOCopy the code
  • Avoid long names, and avoid names that are too frugal
This_could_be_a_bad_name = "get this!" T = "This isn't good either"Copy the code
  • Use reasonable and descriptive names. This will help you remember the purpose of the code later
X = "My Name" # prevent this full_name = "My Name" # this is betterCopy the code
  • Avoid names that start with a number
1_name = "This is bad!"Copy the code
  • Avoid special characters, such as @ and! , #, $, etc
Bad phone_ #Copy the code

Variable naming

  • Variable names should always be lowercase
blog = "Analytics Vidhya"Copy the code
  • For longer variable names, separate words with underscores. This improves readability
awesome_blog = "Analytics Vidhya"Copy the code
  • Try not to use single-character variable names such as “I” (uppercase “I”), “O” (uppercase “O”), and “L” (lowercase “L”). They are indistinguishable from the numbers 1 and 0. Take a look at:
O = 0 + l + I + 1Copy the code
  • Naming global variables follows the same convention

The function name

  • Follow the lowercase and underscore naming conventions

  • Use expressive names

# avoid def con():... Def connect():...Copy the code
  • If the function parameter name conflicts with the keyword, use trailing underscores instead of abbreviations. For example, convert break to break_u instead of BRK
Def break_time(break_): print(" Your break time is ", break_, "long")Copy the code

Named after the name of the class

  • Follow the CapWord (or camelCase or StudlyCaps) naming convention. Start each word with a capital letter and don’t underline between words
Follow CapWord class MySampleClass: passCopy the code
  • If the class contains subclasses with the same attribute name, consider adding a double underscore to the class attribute

This will ensure that the property age in the Person class is accessed as _Person\age. This is Python name clutter, which ensures that there are no name conflicts

Class Person: def __init__(self): self.__age = 18 obj = Person() obj.__age # error objCopy the code
  • Use the suffix “Error” for exception classes
Class CustomError(Exception): "" "custom Exception class ""Copy the code

Class method naming

  • The first argument to an instance method (a basic class method without strings attached) should always be self. It points to the calling object

  • The first argument to a class method should always be CLS. This points to the class, not the object instance

Class SampleClass: def instance_method(self, del_): print(" Instance method ") @class_method def class_method(CLS): Print (" Class method ")Copy the code

Package and module naming

  • Keep your name short and clear

  • Follow the lowercase naming convention

  • For long module names, underscores are preferred

  • Avoid underscores in package names

Testpackage # sample_module.py # Module nameCopy the code

Constant named

  • Constants are usually declared and assigned in modules

  • Constant names should be all uppercase

  • Use underscores for longer names

Py module PI = 3.14 GRAVITY = 9.8 SPEED_OF_Light = 3*10**8Copy the code

Code layout for python-style tutorial

Now that you know how to name entities in Python, the next question should be how to construct code in Python!

Honestly, this is very important, because without proper structure, your code can go wrong, which is the biggest hurdle for any reviewer.

So without further ado, let’s take a look at the basics of code layout in Python.

The indentation

It is one of the most important aspects of code layout and plays a critical role in Python. Indentation tells the code block what lines to include for execution. The lack of indentation can be a serious mistake.

Indentation determines which code block the code statement belongs to. Imagine trying to write a nested for loop. Writing a line of code outside the respective loops may not give you a syntax error, but you will certainly end up with a logic error that can be time-consuming in debugging.

Follow the indentation style mentioned below for a consistent Python scripting style.

  • Always follow the four-space indent rule
# Example if value<0: print(" negative value ") # Another example for I in range(5): print(" Follow this rule religiously!" )Copy the code
  • It is recommended to use Spaces instead of tabs

It is recommended to use Spaces instead of tabs. But tabs can be used when code has already been indented with tabs.

if True: print('4 spaces of indentation used! ')Copy the code
  • Break a large expression into several lines

There are several ways to deal with this situation. One way is to align subsequent statements with the initial delimiter.

# align with the start delimiter. Def name_split(first_name, middle_name, last_name) # another example. ans = solution(value_one, value_two, value_three, value_four)Copy the code

The second method uses the four-space indentation rule. This will require an additional level of indentation to distinguish parameters from other code within the block.

# Use extra indentation. def name_split( first_name, middle_name, last_name): print(first_name, middle_name, last_name)Copy the code

Finally, you can even use “hanging indent”. Suspended indentation in Python context refers to the text style in which lines containing parentheses end with opening parentheses, and the following lines are indent until the parentheses end.

Ans = solution(value_one, value_two, value_three, value_four)Copy the code
  • Indented if statements can be a problem

An if statement with multiple conditions naturally contains four Spaces. As you can see, this can be a problem. Subsequent lines are also indented, and the if statement cannot be distinguished from the block of code it executes. Now, what do we do?

Well, there are a few ways we can get around it:

# This is a problem. Condition_one and condition_two: print(" Implement this ")Copy the code

One way is to use extra indentation!

Condition_one and condition_two: print(" Implement this ")Copy the code

Another approach is to add comments between an if statement condition and a code block to distinguish the two:

# Add comments. Condition_one and condition_two: print(" Implement this ")Copy the code
  • Closing of parentheses

Suppose you have a very long dictionary. You put all the key-value pairs on a single line, but where do you put the closing bracket? Is it on the last line? Or does it follow the last key-value pair? If I put it on the last line, what is the indentation of the close bracket position?

There are several ways to solve this problem.

One way is to align the closing parenthesis with the first non-space character on the previous line.

# learning_path = {' Step 1 ':' Learn machine learning ', 'Step 2' : 'Learn machine learning', 'Step 3' : 'Crack on the hackathons'}Copy the code

The second way is to make it the first character of a new line.

Learning_path = {' Step 1 ':' Learn machine learning ', 'Step 2' : 'Learn machine learning', 'Step 3' : 'Crack on the hackathons'}Copy the code
  • Newline before binary operator

If you try to put too many operators and operands on a single line, this can be very troublesome. Instead, break it up into several lines for better readability.

The obvious question now is — do you interrupt before or after the operator? The convention is to break lines before operators. This helps to identify the operator and the operand it acts on.

GDP = (consumption + government_spending + investment + net_exports)Copy the code

Use a blank line

Putting lines together will only make it harder for the reader to understand your code. A good way to make your code look cleaner and prettier is to introduce a corresponding number of blank lines into your code.

  • Top-level functions and classes should be separated by two blank lines
SampleClass(): pass def sample_function(): print("Top level function")Copy the code
  • Methods in a class should be separated by a single blank line
Class MyClass(): def method_one(self): print("First method") def two(self): print("Second method")Copy the code
  • Try not to include blank lines between code segments that have related logic and functions
def remove_stopwords(text): 
    stop_words = stopwords.words("english")
    tokens = word_tokenize(text) 
    clean_text = [word for word in tokens if word not in stop_words] 

    return clean_textCopy the code
  • You can use fewer blank lines in functions to separate logical parts. This makes the code easier to understand
def remove_stopwords(text): 
    stop_words = stopwords.words("english")
    tokens = word_tokenize(text) 
    clean_text = [word for word in tokens if word not in stop_words] 

    clean_text = ' '.join(clean_text)
    clean_text = clean_text.lower()

    return clean_textCopy the code

Maximum line length

  • A line contains no more than 79 characters

When you write code in Python, you cannot compress more than 79 characters in a single line. This is a limitation and should be a guideline for keeping statements short.

  • You can split statements into multiple lines and convert them into shorter lines of code
Num_list = [y for y in range(100) if y % 2 == 0 if y % 5 == 0] print(num_list)Copy the code

Import packages

Part of the reason many data scientists like Python is because there are so many libraries that make it easier to work with data. So let’s assume you’ll end up importing a bunch of libraries and modules to do any task in data science.

  • It should always be at the top of the Python script

  • Separate libraries should be imported on separate lines

import numpy as np
import pandas as pd

df = pd.read_csv(r'/sample.csv')Copy the code
  • Imports should be grouped in the following order:

    • Standard library import
    • Related third Party imports
    • Local application/Kutdine import
  • Include an empty line after each group import

import numpy as np
import pandas as pd
import matplotlib
from glob import glob
import spaCy 
import mypackageCopy the code
  • You can import multiple classes from the same module in a single line
from math import ceil, floorCopy the code

Be familiar with correct Python comments

Understanding a piece of uncommented code can be a laborious task. Even the original writers of the code have a hard time remembering exactly what happened in the line of code after a while.

Therefore, it is best to comment the code in a timely manner so that the reader can correctly understand what you are trying to achieve with the code.

General tips

  • Comments always begin with a capital letter

  • Comments should be complete sentences

  • Update comments when updating code

  • Avoid commenting on the obvious

Style of comments

  • Describe the code snippets that follow them

  • Has the same indentation as the code snippet

  • Start with a space

Remove non-alphanumeric characters from the user input string. Import re raw_text = input(' Enter string: ') text = re.sub(r'\W+', ' ', raw_text)Copy the code

Inline comments

  • These comments are on the same line as the code statements

  • At least two Spaces should be separated from code statements

  • Start with the usual #, followed by a space

  • Don’t use them to state the obvious

  • Use them sparingly as they can be distracting

Info_dict = {} # dictionary for storing extracted informationCopy the code

Docstring

  • Describes common modules, classes, functions, and methods

  • Also known as the “docstrings”

  • They stand out from other comments because they are enclosed in triple quotation marks

  • If docString ends on a single line, include the terminator “” on the same line

  • If the docString is divided into multiple lines, add the terminator “” to the new line.

Def square_num(x): """ return x**2 def power(x, y): """ Return x**y. """ return x**yCopy the code

Whitespace in Python code

Whitespace is often ignored as a trivial aspect when writing beautiful code. But using whitespace correctly can greatly improve the readability of your code. They help prevent overcrowding of code statements and expressions. This inevitably helps the reader navigate the code with ease.

The key

  • Avoid placing Spaces inside parentheses immediately
Df [' text '] = df[' text '].apply(preprocess)Copy the code
  • Do not place Spaces before commas, semicolons, or colons
# correct name_split = lambda x: x.split()Copy the code
  • Do not contain Spaces between characters and open parentheses
Print (' This is the right way ') # print(' This is the right way ') # print for I in range(5): name_dict[I] = input_list[I]Copy the code
  • When multiple operators are used, only Spaces are included around the operator with the lowest priority
Ans = x**2 + b*x + cCopy the code
  • In sharding, the colon acts as a binary operator

They should be considered the lowest-priority operators. Each colon must contain equal Spaces around it

Df_train [lower_bound+5: upper_bound-5]Copy the code
  • Trailing whitespace should be avoided

  • Function parameter defaults do not have Spaces around the = sign

def exp(base, power=2):
    return base**powerCopy the code
  • Always enclose the following binary operators with a single space:
    • Assignment operators (=, +=, -=, etc.)
    • Compare (=, <, >! =, <>, <=, >=, input, no, yes, no)
    • Boolean values (and, or, not)
Brooklyn = [' Amy ', 'Terry', 'Gina', 'Jake'] count = 0 for name in Brooklyn: if name == 'Jake' : Print (" Cool ") count + = 1Copy the code

General programming advice for Python

In general, there are many ways to write a piece of code. When they accomplish the same task, it is best to use the recommended writing method and maintain consistency. I’ve covered some of them in this section.

  • Always use “is” or “is not” when comparing with “None” and the like. Do not use the equality operator
# error if name! If name is Not None: print("Not null")Copy the code
  • Do not use comparison operators to compare booleans to TRUE or FALSE. While using the comparison operator may be intuitive, it is not necessary. You just write Boolean expressions
# error if valid == True: print("Wrong")Copy the code
  • Instead of binding lambda functions to identifiers, use generic functions. Because assigning a lambda function to an identifier defeats its purpose. It would also be easier to backtrack
Def func(x): return None # instead of lambda x: x**2Copy the code
  • When you catch an exception, name the exception you want to catch. Don’t just use a bare exception. This will ensure that when you try to interrupt execution, the exception block does not mask other exceptions by interrupting exceptions on the keyboard
try:
    x = 1/0
except ZeroDivisionError:
    print('Cannot divide by zero')Copy the code
  • Be consistent with your return statement. That is, all return statements in a function should return an expression, or none of them should return an expression. Also, if the return statement returns no value, return None instead of nothing
Def sample(x): if x > 0: elif x == 0: else: return x-1: def sample(x): if x > 0: return x+1 elif x == 0: return None else: return x-1Copy the code

If you want to check for prefixes or suffixes in strings, use “.startswith() “and”.endswith()” instead of slicing the string. They are cleaner and less error-prone

If name. Endswith ('and'): print('Great! ')Copy the code

Automatic formatting of Python code

Formatting is not a problem when you write small programs. But imagine having to follow the correct formatting rules for a complex program that runs thousands of lines! This is definitely a daunting task. And, most of the time, you don’t even remember all the formatting rules.

How can we solve this problem? Well, we can do this with some automatic formatters!

An autoformatter is a program that identifies formatting errors and fixes them in place. Black is one such auto-formatter that automatically formats Python code to fit the PEP8 coding style, reducing your load.

BLACK:pypi.org/project/bla…

It can be easily installed using PIP by typing the following command in the terminal:

pip install blackCopy the code

But let’s see how black helps in the real world. Let’s use it to format programs with the following types of errors:

Now, all we need to do is go to the terminal and type the following command:

black style_script.pyCopy the code

When done, Black may have made the changes and you will receive the following message:

Once you try to open the program again, these changes will be reflected in the program:

As you can see, it already formats the code correctly, and it helps in case you accidentally violate formatting rules.

Black can also be integrated with Atom, Sublime Text, visualstudio code, and even Jupyter Notebook! This is definitely a plugin you’ll never miss.

In addition to Black, there are other automatic formatters, such as AutoEP8 and YAPf, which you can also try!

At the end

We’ve covered many of the key points in the Python-style tutorial. If you consistently follow these principles in your code, you’ll end up with cleaner and more readable code.

In addition, when you work as a team on a project, it is beneficial to follow a common standard. It makes it easier for other collaborators to understand. Start adding these style tips to your Python code!

The original link: www.analyticsvidhya.com/blog/2020/0…

Welcome to panchuangai blog: panchuang.net/

Sklearn123.com/

Welcome to docs.panchuang.net/