Functions commonly used in python

import tensorflow as tf 
	dtype = None,
	shape = None,
	name = "Const",
	verify_shape = False

1.value is required. This parameter can be a value or a list
2.dtype is a data type, which can generally be TF float64,tf. Float32 et al
3.shape definition name
5. The default is False. If it is modified to True, it means to check whether the shape of value is consistent with the shape. If not, an error will be reported

2.seed() function
random.seed(n). If the same n value is used, the random number sequence generated by the random number generation function is the same every time. If this value is not set, the system selects this value according to time. At this time, the random number sequence generated each time is different due to time difference

3.join() function and os path. Join() function
**join(): * * connect the elements in the string array, tuple and list to generate a new string with the specified character


sep: separator, which can be empty
seq: element sequence, string, tuple and dictionary to be connected
Take sep as the separator to combine all the elements of seq into a new string
Returns a string generated after connecting the elements with the separator sep
**os.path.join(): * * connect two or more pathnames
1. If the initial letter of each component name does not contain '/', the function will automatically add '/'
2. If a component is an absolute path, all components before it will be discarded
3. If the last component is empty, the generated path ends with a '/' separator

Path1 = 'home'
Path2 = 'develop'
Path3 = 'code'

Path10 = Path1 + Path2 + Path3
Path20 = os.path.join(Path1,Path2,Path3)
print ('Path10 = ',Path10)
print ('Path20 = ',Path20)

Path10 = homedevelopcode
Path20 = home\develop\code

4.unique() function

unique(A,return_index= True,return_inverse = True)

Is a function in numpy, which returns all different values in the parameter array and sorts them from small to large
Optional parameters
return_index=True: returns the index value of each element in the new list for the first time in the original list, so the number of elements is the same as that in the new list
return_inverse=True: returns the index value of each element in the original list in the new list, so the number of elements is the same as that in the original list
5.shape() function
Reads the length of a matrix or array
shape[0]: number of rows read
shape[1]: read the number of columns
shape: row and column array element group direct output
(1) When the array or matrix is one-dimensional, only shape[0] can be used

Output: 3
#In one-dimensional array, the number of elements is output

(2) When the array or matrix is two-dimensional (note that there is a () and a []) outside the write matrix or array)
shape[0]: number of rows read
shape[1]: read the number of columns
shape: row and column arrays are directly output in tuple form

import numpy as np
Output results: (4, 3)

(3) When the array or matrix is three-dimensional (note that the outside of the matrix or array is one () and two [])

import numpy as np
Output results: (1, 4, 3)

(4) A singular value. The return value is empty

print(shape(3) )

6.read_csv() function
header: Specifies the number of rows to use as the column name. If there is no column name in the file, it defaults to 0
index_col: the column number or column name used as the row index. If a given sequence has multiple row indexes, you can use index_col=[0,1] to specify columns 1 and 2 in the file as index columns

7.format() function
Format the string, use {} to open a position in the string to be formatted, and fill the parameters in the format function into {}
1, Parameter transfer
1. Position transfer
Do not set the specified position (default order)
{}{}{}. format("intelligence", "demo", "WeChat official account")
WeChat official account for intelligent demonstration
Set the specified position (default order)
{0}{1}{2}.format("intelligence", "demo", "WeChat official account")
WeChat official account for intelligent demonstration
Set the specified position (custom order)
{1}{0}{2}.format("intelligence", "demo", "WeChat official account")
WeChat official account for demonstrating intelligence
Redundant parameters are not passed
{}{}. format("intelligence", "demo", "WeChat official account")
'smart demo '
Parameter called multiple times
{0}{1}|{0}{1}{2}’. format("intelligence", "demo", "WeChat official account")
"Intelligent demonstration, intelligent demonstration WeChat official account"
2. Keyword transfer
In the format() function, you can indicate the keyword in the slot and use key=value to realize assignment replacement
{name1}{name2}-{style}’. format(name2= "demonstration", style= "WeChat official account", name1= "intelligence")
"Intelligent demonstration - WeChat official account"
3. Dictionary delivery
Parameters can be passed in a dictionary.
dict = {name ':' intelligent demonstration ',' style ':' WeChat official account '}
"Intelligent demonstration - WeChat official account"

8.loc and iloc functions
loc means location, and i in iloc means integer
loc gets data according to the tag, which is' 0-4 'and' A-B '. The first parameter is index and the second parameter is column

df.loc[0,:]: all records in line 0
df.loc [:, 'A']: all records in column A
: indicates all, [] with first column and last column
iloc selects the data according to the position, that is, the data in row N and column n. It only accepts integer parameters and closes on the left and opens on the right

DF if you want to get all the data in the first column Iloc [:, 0], does not accept 'A' as A parameter

iloc [row] [column], the row must be a number, and the column can be a number or a column name

9.groupby function

df = pd.DataFrame(np.random.randn(5,2),index = range(0,5,1),columns = list('AB'))


df = df.groupby('A').count().reset_index()

No reset_index() function

Without [['A']]

Replace with [['B']]

10.python reads yaml configuration file
Read the file data through the open mode, and then convert the data into a list or dictionary through the load function

print("***obtain yaml file data ***")
file = open(yaml_file, 'r', encoding="utf-8")
file_data =
***obtain yaml file data ***
# yaml key value pair: the dictionary in python
usr: my
psw: 123455
 Type:<class 'str'>
#Converts a string to a dictionary or list
print("***conversion yaml Data is a dictionary or list***")
data = yaml.load(file_data)


***conversion yaml Data is a dictionary or list***
{'usr': 'my', 'psw': 123455}
Type:<class 'dict'>

Returns a reconfigured data format organized by a given index or column value
pivot(index=None, columns=None, values=None)
Index: optional parameter to set the row index of the new dataframe. If it is not specified, the existing row index will be used
columns: required parameter to set the column index as the new dataframe
Values: optional parameter. Select the value of a column in the original dataframe to display it in the column of the new dataframe. If it is not specified, all columns of the original dataframe will be displayed by default. Note: in order to display all values, multi-layer row index will appear.
np.random.choice(a,size = None,replace = True,p=None)
Indicates the number of size s randomly selected from a
replace indicates whether to put it back after sampling. If it is False, the numbers selected at the same time are different. If it is True, there may be duplication because the previous ones are put back
p represents the probability of each element being extracted. If it is not specified, the probability of all elements in a being selected is equal
Yes, put it back for sampling
np.random.choice(5, 3) and NP random. RandInt (0,5,3) has the same meaning. It means that three numbers are randomly selected from [0,5) with equal probability
np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0]) indicates that three numbers are selected from the four numbers [0,1,2,3,4] with the probability of P = [0.1, 0, 0.3, 0.6,0]
No return sampling
np.random.choice(a=5, size=3, replace=False, p=None)
np.random.choice(a=5, size=3, replace=False, p=[0.2, 0.1, 0.3, 0.4, 0.0]
Use for list elements
aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
Generate random number
np.random.choice(5) # randomly outputs a random number from [0,5), which is equivalent to np.random.randint (0,5)
np.random.choice(5, 3) # outputs five numbers in [0, 5) and forms a one-dimensional array (ndarray), which is equivalent to np.random.randint(0, 5, 3)
Output: array([1, 4, 1])
Randomly extract from an array, list, or tuple
Note: no matter what it is, it must be one-dimensional!!!

L = [1, 2, 3, 4, 5]#List list
T = (2, 4, 6, 2)#Tuple tuple
A = np.array([4, 2, 1])#numpy,array array, must be one-dimensional
A0 = np.arange(10).reshape(2, 5)#The two-dimensional array will report an error
np.random.choice(L, 5)
output:array([3, 5, 2, 1, 5])
np.random.choice(T, 5)
output:array([2, 2, 2, 4, 2])
np.random.choice(A, 5)
Output: array([1, 4, 2, 2, 1])
np.random.choice(A0, 5)#If it is a two-dimensional array, an error will be reported
 Output: ValueError: 'a' must be 1-dimensional

13. Use of tqdm module
Display the task progress bar. Tqdm and trange in the tqdm module are commonly used. The principle is to constantly rewrite the current output in the shell
tqdm: tqdm.tqdm (iteratable object). The iteratable object in brackets can be a list, tuple, etc
desc=None, pass in str type as progress bar title (similar to description)
total=None, expected number of iterations
ncols=None, you can customize the total length of the progress bar
mininterval=0.1, minimum update interval
Maximum interval = 10.0, maximum update interval miniters=None, ascii=None, unit = 'it', unit_scale=False, dynamic_ncols=False, smoothing=0.3,bar_format=None, initial=0, position=None, postfix passes in details in the form of a dictionary, for example, speed = 10,

14.set function usage
Create an unordered non repeating element set, which can be used for relationship test, delete duplicate data, and calculate intersection, difference, union, etc
Set (iteratable object) to return a new set object

>x = set('runoob')
>y = set('google')
>x, y
(set(['b', 'r', 'u', 'o', 'n']), set(['e', 'o', 'g', 'l']))   # Duplicate deleted
>x & y         # intersection
>x | y         # Union
set(['b', 'e', 'g', 'l', 'o', 'n', 'r', 'u'])
> x - y         # Difference set
set(['r', 'b', 'u', 'n'])

15.str.isdigit() function
Detects whether a string consists of only numbers
Syntax: str.isdigit(), no parameters in parentheses
Return value: returns True if the string contains only numbers; otherwise, returns False

str = "123456";  # Only digit in this string
print str.isdigit();
str = "this is string!!!";
print str.isdigit();

16.os.listdir() function
Returns a list of the names of the files or folders contained in the specified folder
barring. And
Syntax: OS List dir (path), path is the directory path to be listed
Return value: returns the list of files and folders under the specified path

import os, sys

# Open file
path = "/var/www/html/"
dirs = os.listdir( path )

# Export all files and folders
for file in dirs:
   print file

Output result:

17.filter() function
It is used to filter the sequence, filter out the unqualified elements and return an iterator object. If you want to convert it to a list, you can use list() to convert and receive two parameters. The first is a function and the second is a sequence. Each element of the sequence is passed to the function as a parameter for judgment, then return True or False, and finally put the element that returns True into the new list
Syntax: filter (function, iteratable), function is a function of judgment, iteratable is an iteratable object, and returns an iterator object

def is_odd(n):
    return n % 2 == 1
tmplist = filter(is_odd, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
newlist = list(tmplist)
Output result:
[1, 3, 5, 7, 9]

18.assert function
The assertion function is used to terminate the program execution if its condition returns an error
void assert( int expression )
assert is used to calculate the expression. If its value is false, i.e. 0, it first prints an error message to stderr, and then terminates the program by calling abort

19.isnull() and any() are used together
isnull() is used to judge the missing value, and the true/false matrix of all data is generated
df.isnull().any() determines which columns have missing values. True indicates that there are empty values. The default axis=0 indicates that there are columns
data.isnull().any().sum() indicates the number of columns with null values
data.isnull().any(axis=1) to check whether each line contains null values
data.isnull().any(axis=1).sum, indicating the number of columns with null values
data.isnull().sum(), count the number of empty columns
any() function
It is used to judge whether the given iteratable parameters are all False. If so, it returns False. If one of them is true, it returns true. All elements are true except 0, empty and False

>>>any(['a', 'b', 'c', 'd'])  # List, list, elements are not empty or 0
>>> any(['a', 'b', '', 'd'])   # There is an empty list of elements
>>> any([0, '', False])        # list, all elements are 0,'',false
>>> any(('a', 'b', 'c', 'd'))  # tuple, element is not empty or 0
>>> any(('a', 'b', '', 'd'))   # tuple, there is an empty element
>>> any((0, '', False))        # tuple, all elements are 0,'',false
>>> any([]) # Empty list
>>> any(()) # Empty tuple

notnull() function
Returns a Boolean value

import numpy as np
import pandas as pd
from pandas import Series,DataFrame
0     True
1     True
2    False
3    False
Name: b, dtype: bool

20.upper() method
Converts lowercase letters to uppercase letters in a string
str.upper(), no parameters in parentheses
Return value: returns a string that converts lowercase letters to uppercase letters

str = "this is string!!!";
print "str.upper() : ", str.upper()
str.upper() :  THIS IS STRING EXAMPLE....WOW!!!

21. Magic functions in Python

__ init__()
__ str__()

if name = = 'main' how to understand correctly.
Examples to explain the magic function in python
22.merge function
Parameter description of merge function in pandas
Description of concat and merge function parameters in pandas
suffixes: string value tuple, which is used to append to the end of overlapping column names. The default is ('u x ',' u y '). For example, if both the left and right DataFrame objects have data, data will appear in the result_ x and data_y
indicator is not commonly used, which means whether to add a column to explain the operation in the merge
There are three options for indicator
1) False, default value, do not increase this column
2) True, increase the column name to_ A column of merge
3) str, string, add a column whose column name is the string
Three possible strings in this column
1)left_only: for this row, the selected key only appears in the left table, that is, the values in the right table are filled with empty values
2)right_only: the selected key only appears in the right table
3) Both: both the left and right tables of the selected key appear, that is, the results match the inner
Merge by index

left = pd.DataFrame({'A': [ 'A1', 'A2'],  'B': [ 'B1', 'B2']}, index=['K1', 'K2'])
right = pd.DataFrame({'C': [ 'C2', 'C3'], 'D': [ 'D2', 'D3']}, index=['K2', 'K3'])
     A	B
K1	A1	B1
K2	A2	B2
    C	D
K2	C2	D2
K3	C3	D3
result = left.join(right)     
   A	B	C	D
K1	A1	B1	NaN	NaN
K2	A2	B2	C2	D2
result1=pd.merge(left, right, left_index=True, right_index=True,how='left')
    A	B	C	D
K1	A1	B1	NaN	NaN
K2	A2	B2	C2	D2
result2=pd.merge(left, right, left_index=True, right_index=True,how='right')
    A	B	C	D
K2	A2	B2	C2	D2
K3	NaN	NaN	C3	D3
result3=pd.merge(left, right, left_index=True, right_index=True,how='inner')
    A	B	C	D
K2	A2	B2	C2	D2

Join key columns on index

left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3'],'key': ['K0', 'K1', 'K0', 'K1']})
   A	B	key
0	A0	B0	K0
1	A1	B1	K1
2	A2	B2	K0
3	A3	B3	K1
right = pd.DataFrame({'C': ['C0', 'C1'], 'D': ['D0', 'D1']}, index=['K0', 'K1'])
    C	D
K0	C0	D0
K1	C1	D1
left.join(right, on='key')
    A	B	key	C	D
0	A0	B0	K0	C0	D0
1	A1	B1	K1	C1	D1
2	A2	B2	K0	C0	D0
3	A3	B3	K1	C1	D1
pd.merge(left, right, left_on='key', right_index=True,how='left', sort=False)
    A	B	key	C	D
0	A0	B0	K0	C0	D0
1	A1	B1	K1	C1	D1
2	A2	B2	K0	C0	D0
3	A3	B3	K1	C1	D1

Sort the data set according to the data in a field. This function can sort according to the column data or the data of the specified row
DataFrame.sort_values(by='##',axis=0,ascending=True, inplace=False, na_position='last')
by: Specifies the column or row name
Axis: if axis=0 or 'index', sort according to the data size in the specified column; If axis=1 or 'columns', sort according to the data size in the specified index. The default is axis=0
Ascending: whether to arrange in ascending order according to the array of specified columns. The default value is True, that is, arrange in ascending order
inplace: whether to replace the original data with the sorted data set. The default is False, that is, no replacement
na_position: {'first', 'last'}, set the display position of the missing value

try, except code block

    Code blocks that may cause exceptions
except [ (Error1, Error2, ... ) [as e] ]:
    Code block 1 for handling exceptions
except [ (Error3, Error4, ... ) [as e] ]:
    Code block 2 for handling exceptions
except  [Exception]:
    Handle other exceptions

In this format, the part enclosed by [] can be used or omitted. The meanings of each part are as follows:
(Error1, Error2,...), (Error3, Error4,...): Error1, Error2, Error3 and Error4 are specific exception types. Obviously, an exception block can handle multiple exceptions at the same time.
Middle note: [as e]: as an optional parameter, it means an alias e to the exception type. The advantage of this is that it is convenient to call the exception type in the except block (which will be used later).
[Exception]: as an optional parameter, it can refer to all possible exceptions of the program, which is usually used in the last Exception block.
technological process:
1. First execute the code block in the try. If an exception occurs during the execution, the system will automatically generate an exception type and submit the exception to the Python interpreter. This process is called catching an exception.
2. When the Python interpreter receives the exception object, it will look for the exception block that can handle the exception object. If it finds the appropriate exception block, it will hand over the exception object to the exception block for processing. This process is called exception handling. If the Python interpreter cannot find the exception block to handle the exception, the program will terminate and the Python interpreter will exit.

23.update() function
It is used to merge two dictionaries. If there is the same dictionary, it will be overwritten
1. The two dictionaries do not have the same key

D = {'one': 1, 'two': 2}
D.update({'three': 3, 'four': 4})  # Pass on a dictionary
Output:{'one': 1, 'two': 2, 'three': 3, 'four': 4}

2. The two dictionaries have the same key overlay

D = {'one': 1, 'two': 2}
D.update({'two': 3, 'four': 4})  # Pass a dictionary with duplicate keys
Output:{'one': 1, 'two': 3, 'four': 4} function
Map will map the specified sequence according to the provided function. The first function calls the function function with each element in the parameter sequence and returns a new list containing the return value of each function function

map(function, iterable, ...)

Function: function
iterable: one or more sequences

def square(x) :    #Calculate square        
   return x ** 2
map(square, [1,2,3,4,5]) #Calculate the square of each element in the list
 Output: return iterators<map object at 0x100d3d550>
list(map(lambda x: x ** 2, [1, 2, 3, 4, 5])) #Use lambda anonymous function and use list() to convert to list
 Output:[1, 4, 9, 16, 25]
#Two lists are provided to add the list data at the same location
list(map(lambda x, y: x + y, [1, 3, 5, 7, 9], [2, 4, 6, 8, 10]))
Output:[3, 7, 11, 15, 19]

25. Dictionary items() method
Returns an array of traversable (key value) tuples as a list
Without parameters, return a traversable (key value) tuple array, form a tuple of each pair of key and value in the dictionary, and return these tuples in the list

dict = {'Google': '', 'Runoob': '', 'taobao': ''}
print (dict.items())
output:dict_items([('Google', ''), ('Runoob', ''), ('taobao', '')])
for key,values in  dict.items():
    print key,values
 Output:#Two parameters
for i in dict.items():
Output:#Tuple form
('Google', '')
('Runoob', '')
('taobao', '')

26.drop_duplicates() de duplication


subset: enter the column name to be de duplicated. The default value is None
keep: there are three optional parameters: 'first', 'last', False, and the default value is' first '
1.first means: keep the duplicate lines that appear for the first time and delete the subsequent duplicate lines.
2.last means: delete duplicate items and keep the last occurrence.
3.False means to delete all duplicates
inplace: Boolean value. The default value is False. Whether to delete duplicates directly on the original data or return a copy after deleting duplicates.

27.python displays all data
For the array in numpy

import numpy as np

For DataFrame in pandas

import pandas as pd
#Show all columns
pd.set_option('display.max_columns', None)

#Show all rows
pd.set_option('display.max_rows', None)

#Set the display length of value to 100, and the default is 50

28.shutil module
move(src, dst): move src to dst directory. If the dst directory does not exist, the effect is equivalent to changing src to dst. If dst directory exists, all contents of src folder will be moved to this directory
src: source folder or file
dst: move to the dst folder or rename the file dst file. If src is a folder and dst is a file, an error will be reported
copy_function: the way to copy files. You can pass in an executable processing function. The default is copy2, and python 3 adds a new parameter face

import shutil,os
# Example 1: move the src folder under the dst folder. If the bbb folder does not exist, it becomes a rename operation
folder1 = os.path.join(os.getcwd(),"aaa")
folder2 = os.path.join(os.getcwd(),"bbb")
shutil.move(folder1, folder2)
# Example 2: move the src file under the dst folder. If the bbb folder does not exist, it becomes a rename operation
file1 = os.path.join(os.getcwd(),"aaa.txt")
folder2 = os.path.join(os.getcwd(),"bbb")
shutil.move(file1, folder2)
# Example 3: Rename src file to dst file (dst file exists and will be overwritten)
file1 = os.path.join(os.getcwd(),"aaa.txt")
file2 = os.path.join(os.getcwd(),"bbb.txt")
shutil.move(file1, file2)

29. The difference between the three in \ T, \ n \ n\t Python
\t: Represents 4 empty characters, similar to the indent function in the document, which is equivalent to pressing a Tab key.
\n: Indicates line feed, which is equivalent to pressing an enter key
\n\t: four words in the same time and space representing line feed function
Take a series of iteratable objects as parameters, package the elements in the object into a tuple tuple, and then return the list composed of these tuples. If the length of the passed in parameters is different, the length of the returned list is the same as the object with the shortest length in the parameters. That is, the function returns a list with tuples as elements, where the ith tuple contains the ith element of each parameter sequence. The length of the returned list is truncated to the length of the shortest parameter sequence. When there is only one sequence parameter, it returns a list of 1 tuples. When there are no parameters, it returns an empty list.

seq = ['one', 'two', 'three']
[('one', 1), ('two', 2), ('three', 3)]

You can also treat each list as an element

[('one', 'two', 'three'), (1, 2, 3)]

Convert two lists into a dictionary

{'one': 1, 'three': 3, 'two': 2}

31. Writing documents

Tags: Python Big Data Data Analysis

Posted by BlueKai on Sun, 17 Apr 2022 22:52:07 +0930