Knowledge database
This code is not really needed or may be simplified
Seems like things could be organized in a better way.
Whatever you want to code, usually someone has already done that. And usually there are some tools to cover your case. And usually you can find them using Google.
Side effects are evil: it's very hard to track what function does under the hood, because function signature does not reflect side effects. Always try to write "pure" functions, i.e. those which receive inputs, do something internally, and return output. You should be able to run such function million times and get exactly the same result with nothing changed in the environment. However, side effects are totally OK if that's the only thing the function does.
Exceptions should be easy to catch. If your code throws only Exception
or ValueError
, then it's very hard to catch specific errors, because all thrown exception classes are the same. Create application-specific exceptions, so that every logical error has its own exception class: class VerySpecificException(Exception): pass
If some value is hidden somewhere in the middle of code, it will be very hard to find it later. Better solutions: 1) use environment variables 2) use constants in module's top section 3) use class variables.
Key constants of a class should be so-called "class variables" instead of using/defining them inside methods. If you create a child class and want to change those variables, you'll have to rewrite entire methods. With class vars, you only have to overwrite those class vars in child. Example: class MyClass: SOME_VALUE = 1
and class MySubclass(Class): SOME_VALUE = 2
Catching Exception
or everything means that you don't understand what the code does. So you may catch and suppress exceptions which you probably didn't want to catch at all but you weren't aware of them. It is better for your program to fail, rather then catching everything.
You should include as few lines of code inside try
block as possible. First, you may accidentally catch exceptions from lines which you didn't want to be caught. Second, the smaller try ... except
blocks are, the less cognitive load it makes.
When you have some code in module scope, the code will be executed when someone imports that module - which makes the module completely non-reusable. Maybe it's not meant to be reusable today, but you'll have to refactor if you need to import the module later. Use if __name__ == '__main__'
to detect whether module was imported or used as a standalone script.
Global variables have one drawback: they are global. Which means that any snippet of code which relies on such global variables also becomes global, i.e. you can't run two copies of that code simultaneously. Also, global variables prevent you from writing clean tests where you can change input parameters and verify that your program behaves correctly - because some input parameter is a global variable.
Idiomatic python type check is if isinstance(type, variable)
or if isinstance((type1, type2), variable)
. Using type(variable) == SomeClass
smells because it won't work if variable is a subclass of SomeClass
.
You. Should. Never. Use. Eval.
Executing code from a variable is very dangerous. Almost always there is a better option than using eval. Usually all you need is either getattr()
or google.
Surprise! Python's requests
module does not have a timeout by default! So if you don't use the timeout
parameter when making a request and if remote resource hangs forever (or is very slow), your code will hang, too. Always use reasonable timeouts. A good value may be 5 seconds.
One-liners' benefit is that they do one action in one line. This means that 1) your code will take less lines of code (which improves readability) 2) there will be only one assignment to a variable (a = x if b else y]
instead of if b: a = x else: a = y
), which is less error-prone 3) if using list/dict/etc comprehensions, you simply write a faster code because they are fast
When catching an exception and then raising another one, you can chain them to get better tracebacks. Use raise MyException from OriginalException
.
When function's argument has a mutable default, it may lead to bugs because defaults are shared across all invocations of function. So def fn(a: list = [])
is a bad practice and should be replaced, for example, with def fn(a: list | None = None): a = a or []
There are functions that receive input and return output. There are functions that receive some object and modify it in-place. For example, reversed()
vs list.reverse()
, dataframe.apply()
vs dataframe.apply(inplace=True)
. Doing both at the same time is confusing: when function returns something, it's not expected that it has any side effect, but here it does.
It is usually not a good practice to hardcode specific python interpreter at specific path: #!/usr/bin/python
. Instead, use #!/usr/bin/env python
, which will interpret $PATH
and make your script more portable.
Python doesn't check whether you inspected all possible values of some collection. If a variable can become "one" or "two", and you check if var == "one": ... elif var == "two": ...
then this code will break when one day you allow variable to be "three". You should always check whether the variable holds something unexpected, and in such case an exception should be raised: else: raise ValueError(f'Unexpected value: {var}')
.
Calling super()
is a well-readable way to show that you want to call super method. If you want to call direct parent's method, just use super().method(args)
. Isn't it beautiful?
Very often developers use beginning in value
instead of value.startswith(beginning)
. For example, checking 'google.com' in url
may become true if url is https://evil.com?q=google.com
. This may lead to vulnerabilities, sometimes severe. If you want to check that some string starts with some value, use some_string.startswith(some_value)
.
If your program is a CLI and it fails, then consider exiting with non-zero code, for example exit(1)
.
Once you open a file, you should close it afterwards. In cpython not closing a file is relatively safe, in pypy it's not, and in general one should not rely on python implementation details.
Using sleep()
and hoping that something will happen within sleep duration (for example, web page loading) is a very bad practice, because nothing guarantees you that. The assumption that some event happens within some time is very weak. Instead of using some duration number, you should strictly rely on mechanisms that report when the page is loaded. As a simple example, use while True
with delays and timeout, and at each iteration check that the page was loaded.
People make typos. Here it is.
In python, when function fails to do something it's a common practice to either raise an exception or return None
. Returning empty strings, False
or anything else is confusing. Additionally it's nice that 1) python's return
is the same as return None
and 2) when execution flow reaches end of function and sees no "return" statement, it invokes return None
automatically.
Some condition not taken into account
It is a good practice to not force using specific python from specific path #!/usr/bin/python
, and instead make #!/usr/bin/env python
which will interpret $PATH
and make your scripts more portable.
zip()
function is a nice way to process pairs, triples etc of elements from different collection. If collections are of different sizes, you may need zip_longest
.
print()
is a nice way to output to stdout. But one day you'll need to not only write to stdout, but also, say, to a file. Another day you'll need to output only severe errors' messages, and nothing else. This all could be solved if using logging
module. Usually it's as easy as from logging import getLogger; log = getLogger(__name__)
.
Calculating something multiple times is redundant - use variables to store result of some calculation.
List comprehensions are faster than .append()
or .extend()
. When you use a loop for filling in a list, it's a high chance that list comprehension will do a better job. You may also use asterisk (*
) to generate list from single and multiple elements: all_items = [*items_group1, *items_group2, single_element]
Developers always write code without bugs. So there's no chance of writing a code that will never exit infinite loop, right? However, if it happens anyway and your program ends up in an infinite loop, it won't just break and raise an exception. It will get stuck, become unresponsive, and maybe will eat all memory if you allocate something inside the loop. Very dangerous, but easy to avoid - use for
loops with iteration limit: for _ in range(10_000): <your code> else: raise LoopError('Ran code 10k times and didn't break from loop, something went terribly wrong')
Many pandas dataframe methods have inplace
param which modifies the dataframe itself and does not create a new one. This is helpful if you want to avoid redundant creation of new dataframe and removing an old one, which happens when assigning to original variable: df = df.apply(...)
-> df.apply(..., inplace=True)
.
When writing decorators, it is a good practice to decorate them with @wraps(decorated_function)
which does some nice things you usually don't think about when decorating: preserving name and docstring of the decorated function.
Searching in list is slow - O(n)
. Searching in set is fast - O(1)
. Also, lists are mutable while sets are not, so if you want to indicate and ensure that some collection never changes - use sets. So [1, 2, 3]
-> {1, 2, 3}
.
Each call of requests.get()
(and other methods) creates a new connection to web resource. It is okay if you make a single request, but is very non-optimal when making subsequent requests. You should always use sessions which will hold connection between requests: session = requests.Session(); response = session.get(url)
Sometimes you need to split string and take only first or last element of the split. By default, str.split()
does as much splits as possible, but you can limit it with maxsplit
parameter. Say, you have a string address = 'domain.com/a/b/c/d'
. address.split('/')
will result in ['domain.com', 'a', 'b', 'c', 'd']
, while address.split('/', maxsplit=1)
will do only minimal required job and result in ['domain.com', 'a/b/c/d']
.
Generators allow you to memory-efficiently iterate over collection of items because you don't need to store all of them in memory at once. Also, if you break at some iteration, you won't even spend memory and CPU time on processing unneeded items.
Idiomatic python way is to "do something, and if it fails - catch an exception". Sometimes developers do it other way round: "ask if something is possible, then do it". This adds an overhead for that "asking" part. For example, checking some key in dictionary and (if it's present) access it requires two operations, while accessing key without check is one operation. Of course, you can't always follow this pattern and it depends on the situation. Read more about this on Google.
Type hints help humans and linters (like mypy
) to understand what to expect "in" and "out" for a function. Not only it serves as a documentation for others (and you after some time, when the code is wiped from your "brain cache"), but also allows using automated tools to find type errors.
Python's built-in pathlib
library is nice! It's less verbose than os.path
and is easier to read, for example: root = Path('root'); subdir = root / 'subfolder1' / 'subfolder2'; subdir.mkdir(exist_ok=True)
. Use it!
"Early quit" is a common pattern which reduces branching and indents in python code. Instead of writing if a: <1000 lines of code> else: <return / raise exception>
, revert the condition: if not a: <return / raise exception>
. This way the "else" clause is not needed, and program logic is straightforward: if something is wrong, quit or raise, otherwise go on.
Functions should be small - at least fit your screen's height. Otherwise they will be hard to read and hard to test. Try splitting big function into smaller ones.
Copy-paste lead to errors very, very often because developers forget to change value on one of copy-pasted lines . You should avoid it as much as possible. Usually a good solution is to extract the things that differ into separate variables.
Dataclasses let you get rid of many boilerplate code, most often the "init hell": def __init__(self, a): self.a = a
. With dataclasses, it's all done automatically!
Using len
and range
in python's for
loop smells. Idiomatic python iteration looks like for element in collection
. If you need element's index as well, use for i, element in enumerate(collection)
.
F-strings are powerful and very easy to read. Compare: 'text ' + str(number) + ' text'
vs f'text {number} text'
There is no need to create something and then modify it if you can create that something already with modifications. For example, items = []; items.append(1)
-> items = [1]
By looking at function name, stranger should roughly understand what the function does. If he doesn't, then the name should be changed.
Never overwrite built-in names, like list
, sum
or max
- if later someone wants to use built-in function, he will find it replaced with your variable. If you really need to use one of those names, append underscore: sum_ = 15
.
There is a well-known pattern "if a: return one else: return two". In fact, the else
keyword is redundant, because when first if
condition evaluates to True
, the function returns, otherwise control flow goes on. Removing else
makes the code cleaner and easier to read.
Simple rule: read a variable name as if you were English-speaking person without a context and guess what's inside - and if you have no clue, then you should give it a better name. Exceptions: i
, j
, x
, y
etc.
requests
module has built-in method to raise exception on bad response: response.raise_for_status()
. If you need to perform some additional actions on bad response, use if not response.ok: ...
if a == True
should be replaced with if a is True
, or better just if a
. if a == False
should be replaced with if a is False
, or better just if not a
. But be careful with cases when a
may be None
or is an object which has special rules of conversion to boolean.
Some methods within a class don't depend on self
variable. This is an indication that such methods should be classmethods. Use the @classmethod
decorator.
Magic variables (usually numeric values without explanation why the values are exactly like that) are evil. One should make them class / module variables or put in some config / env files with explanation why these values were chosen.
When function has too many parameters, it is very hard to read and also it is very easy to accidentally mix them and instead of calling fn(.., o, p, q)
call fn(..., o, q, p)
. Solution is to either refactor the function to receive smaller amount of data (maybe function is too big and can be divided into smaller functions?), or to send one parameter which is some data structure and incorporates all the parameters (this structure could be NamedTuple
, dataclass
, or TypedDict
).
When you want to show that the same type is used in different places - use TypeVar
instead of Any
.
It is a good practice to create new variables as close as possible to the place where they are used. This minimizes cognitive load: fewer things to remember.
Setattr
and getattr
are helpful when you calculate some attribute names dynamically. However, you later, your IDE, and other devs may have troubles understanding which attributes you accessed. Always try to avoid using setattr
and getattr
. Dictionaries are usually a good replacement.
dict
type has built-in methods: keys()
- to get keys iterator; values()
- to get values iterator; items()
- to get (key, value)
iterator. Use it.
Calling super()
doesn't need any arguments in most cases. super(Class, self)
-> super()
.
Context managers may be combined: with open('file1') as file1, open('file2') as file2:
Almost always class-based views are more readable and extendable than plain "function" views. As an example, CBVs allow you to handle GET, POST and other verbs in separate methods, which adds readability to your code.
Use contextlib.suppress
to ignore several types of exceptions: try: ... except ValueError: pass
-> with suppress(ValueError): ...
.
If you start writing in one manner, you should continue doing so till the end. Otherwise it will be harder to read your code.
timedelta
is a python built-in data structure which allows defining time periods in a human-readable way (timedelta(hours=5)
) and converting to seconds. Developers often define time periods in ints like TIMEOUT = 5 * 60
which does not reveal which unit it is, and is hard to read. Use timedelta
.
map()
and filter()
functions came to python by accident and have rare use-case. Usually list comprehension / generator is more readable and faster. list(map(int, filter(lambda x: x > 5, items)))
-> [int(x) for x in items if x > 5]
.