telegtram's communication protocol MTproto2.0 learning 3 (telethon code analysis and TL implementation 1)

telethon code analysis and TL implementation 1

foreword

I tried the official tdLib package before, and it can be adjusted, but it is still not as easy to use as the telethon package, and it is very easy to change python.

Open source code address: https://github.com/LonamiWebs/Telethon/

Instructions:

from telethon import TelegramClient, events, sync
import socks

# These example values won't work. You must get your own api_id and
# api_hash from https://my.telegram.org, under API Development.
#api_id = 12345
#api_hash = '0123456789abcdef0123456789abcdef'

api_id = 94575  # Parameters copied from tdlib
api_hash = "a3406de8d171bb422bb6ddf3bbd800e2"

proxy1 = ("socks5", '127.0.0.1', 1081)
client = TelegramClient('session_name', api_id, api_hash, proxy=proxy1)
client.start()

It can run until prompted to enter the mobile phone number, that is, it can be connected!

Detailed documentation on TL language: https://core.telegram.org/mtproto/TL

Similar to protobuf, TL implements the definition of data format and RPC calling method.

But the grammar and ideas are very different.

  • Constructor: used to define a data type, representing serialization and deserialization methods;
  • Method: Describe the parameters of the called method, the serialization method of the parameters, and the return type, which corresponds to the constructor;

Constructors and methods are defined in a similar way. The defined statement can be hashed with CRC32 to get a 32-bit integer, which is used to uniquely mark the method or constructor;

Therefore, the first thing that the client of each language needs to do is to read the TL rules, write a TL compiler, and generate the TL part of the code according to the current version of TL officially announced for RPC and data encoding and decoding;

telethon_generator\data\api.tl

telethon_generator\data\mtproto.tl

1. TL object and deserialization analysis

Under the protocol described by TL, all messages are a TLObject, and all requests initiated can be considered as a TLRequest. First, let's investigate how to encapsulate and deserialize related data classes.

There is a code generator source code in the github source code, which generates TL package related files according to the relevant rules, otherwise the handwriting will be broken by handwriting, and the official documents have been updated;

TL language is mainly similar to protobuf to realize object serialization, deserialization and RPC process. First, we need a common file to define the most basic rules:

See code: telethon\extensions\binaryreader.py

This file is written manually, not automatically generated, and includes the most basic serialization and deserialization protocols:

The tgread_object method is the beginning of all work, so you need to read this method first:

1.1 Methods of reading objects from binary

def tgread_object(self):
        """Reads a Telegram object."""
        # 4-byte little-endian int as constructor number
        constructor_id = self.read_int(signed=False)   
        
        # The telethon\tl\alltlobjects.py file defines all object constructors and methods, encapsulated as a dictionary
        # Search if this constructor currently exists
        clazz = tlobjects.get(constructor_id, None)    
        
        if clazz is None:
            # If you can't find the corresponding number, try parsing some of the most basic types,
            # http://crc32.bchrt.com/ Calculation tool
            # 
            value = constructor_id
            if value == 0x997275b5:  # boolTrue
                return True
            elif value == 0xbc799737:  # boolFalse  crc32('boolFalse = Bool')
                return False
           
            # crc32("vector t:Type # [ t ] = Vector t") = 0x1cb5c415
            elif value == 0x1cb5c415:  # Vector
                return [self.tgread_object() for _ in range(self.read_int())]

            clazz = core_objects.get(constructor_id, None)
            if clazz is None:
                # If you can't find it, go back 4 bytes! !
                self.seek(-4)  # Go back
                pos = self.tell_position()
                error = TypeNotFoundError(constructor_id, self.read())
                self.set_position(pos)
                raise error

        return clazz.from_reader(self)   # Each type implements classMethod, constructing itself from binary

Remark:

0x1cb5c415 is a vector type and needs to be marked separately! !

The above code first reads a 4-byte integer from the byte stream, looks up the constructor according to the dictionary (generated by the code generator), and uses the constructor to read the following data;

Several of the built-in types are handled separately because:

  • The BOOL value has no subsequent part, only the constructor;

  • If it is a special vector, you need to read the number, write a loop, read N elements, put them in the array and return;

    Finally, if the identifier cannot be matched, it is an error. Generally, the biggest reason is that the current TL is not compatible with the one used by the other party!

1.2 Example of deserialization

Here is just an example of an RPC process in the previous key negotiation process. The class code here is automatically generated by the generator. Let's learn what needs to be generated:

The client calls the method:

req_pq#60469778 nonce:int128 = ResPQ;

The server uses the constructor to construct the return data:

resPQ#05162463 nonce:int128 server_nonce:int128 pq:string server_public_key_fingerprints:Vector<long> = ResPQ;

First look at the method: we searched for 0x60469778 and found:

0x60469778: functions.ReqPqRequest,

The function is defined as:

class ReqPqRequest(TLRequest):
    CONSTRUCTOR_ID = 0x60469778
    SUBCLASS_OF_ID = 0x786986b8

    # There is a particularity here, python's int can be considered as infinite, so it is not 4 bytes
    def __init__(self, nonce: int):
        """
        :returns ResPQ: Instance of ResPQ.
        """
        self.nonce = nonce
	
    # turn itself into a dictionary
    def to_dict(self):
        return {
            '_': 'ReqPqRequest',
            'nonce': self.nonce
        }

    # Convert itself into a binary byte stream, first concatenate the 4-byte function ID, then the parameter list, a total of 20 bytes
    def _bytes(self):
        return b''.join((
            b'x\x97F`',  #4 bytes 'x', 0x97, 'F', '`', i.e. 0x78, 0x97, 0x46, 0x60
            self.nonce.to_bytes(16, 'little', signed=True),  # 128 bits, little endian integer
        ))

    # Deserialization is very simple, directly read 16 bytes as a small integer
    @classmethod
    def from_reader(cls, reader):
        _nonce = reader.read_large_int(bits=128)
        return cls(nonce=_nonce)

The function to read 128 bits here is in binaryreader.py:

 def read_large_int(self, bits, signed=True):
        """Reads a n-bits long integer value."""
        return int.from_bytes(
            self.read(bits // 8), byteorder='little', signed=signed)

Then we see how the return data of this function is constructed:

telethon\tl\types_init_.py

class ResPQ(TLObject):
    CONSTRUCTOR_ID = 0x5162463
    SUBCLASS_OF_ID = 0x786986b8

    def __init__(self, nonce: int, server_nonce: int, pq: bytes, server_public_key_fingerprints: List[int]):
        """
        Constructor for ResPQ: Instance of ResPQ.
        """
        self.nonce = nonce
        self.server_nonce = server_nonce
        self.pq = pq
        self.server_public_key_fingerprints = server_public_key_fingerprints

    def to_dict(self):
        return {
            '_': 'ResPQ',
            'nonce': self.nonce,
            'server_nonce': self.server_nonce,
            'pq': self.pq,
            'server_public_key_fingerprints': [] if self.server_public_key_fingerprints is None else self.server_public_key_fingerprints[:]
        }

    def _bytes(self):
        return b''.join((
            b'c$\x16\x05',   # The little-endian representation of the type of 0x5162463
            self.nonce.to_bytes(16, 'little', signed=True),         # 16 bytes
            self.server_nonce.to_bytes(16, 'little', signed=True),  # 16 bytes
            self.serialize_bytes(self.pq),   # Here is the serialize string method in the parent class
            b'\x15\xc4\xb5\x1c',             # 0x1cb5c415 is the vector type
                                             # followed by the number of int elements of 4-byte little endian type
            struct.pack('<i',len(self.server_public_key_fingerprints)),
                                             # Serialize 8-byte long elements one by one
            b''.join(struct.pack('<q', x) for x in self.server_public_key_fingerprints),
        ))

    @classmethod
    def from_reader(cls, reader):
        _nonce = reader.read_large_int(bits=128)
        _server_nonce = reader.read_large_int(bits=128)
        _pq = reader.tgread_bytes()
        reader.read_int()
        _server_public_key_fingerprints = []
        for _ in range(reader.read_int()):
            _x = reader.read_long()
            _server_public_key_fingerprints.append(_x)

        return cls(nonce=_nonce, server_nonce=_server_nonce, pq=_pq, server_public_key_fingerprints=_server_public_key_fingerprints)


Note: For how to use struct to read and write binary data, refer to: https://blog.csdn.net/qq_30638831/article/details/80421019?spm=1001.2014.3001.5506

Remarks: Each class is a subclass derived from TLObject, so this class encapsulates some of the most basic methods, how to serialize basic types, such as how to serialize strings. (The telethon\tl\tlobject.py class is also a hand-written base class ;)

Summary: Each class has:

  1. a static method as a factory function;
  2. The constructor implements input parameters;
  3. implement serialization
  4. Convert to dictionary type

1.3 Reading and writing of basic types

1.3.1 Serialization of Basic Types and Composite Types

1.3.1.1 Packaging

The official document https://core.telegram.org/mtproto/serialize believes that data can be divided into two categories: Bare type and Boxed type:

  • The first letter of the package type is capitalized. When serializing: first the identifier of the type, and then the data,

  • The first character of the pure value type is lowercase. When serializing, no type identifier is added;

  • %X can be used to represent the pure value type corresponding to X: x

    For large arrays, if the encapsulation method is used, each element will have an identifier, which wastes storage space and bandwidth, so it is more reasonable to use the corresponding pure value type to represent it!

    for example

    int_couple int int = IntCouple
    

    int_couple is equivalent to %int_couple and %IntCouple

    A pair of integers: 3, 4 if represented by encapsulation type:

    If the corresponding identifier of intCouple is 404, then

    404 3 4
    

    Here 404 is not a real identifier, the official documentation is just for example, the identifier is calculated using CRC32.

1.3.1.2 Basic types

The basic types include, at the same time, there are two ways to represent the encapsulation form and the pure value type.

(int, long, double, string) corresponds to (Int, Long, Double, String)

int ? = Int;
long ? = Long;
double ? = Double;
string ? = String;
  • int: little endian storage, 4 bytes;

  • long: little endian storage, 8 bytes;

  • double: little-endian storage, 8 bytes;

  • string: see the next section, has the same meaning as bytes

    But if the above 4 types use the corresponding encapsulation type, you need to add an identifier.

    The identifier is calculated using CRC32:

    CRC32("int ? = Int")
    
1.3.1.3 Composite types

It is officially recommended to add field names when defining types, such as User and Group,

If you do not write the variable name, the meaning of the field cannot be recognized.

user int string string = User;
group int string string = Group;

Therefore, the following method is recommended:

user id:int first_name:string last_name:string = User;
group id:int title:string description:string = Group;

Adding and extending user requires redefining a constructor, but the generated class name does not change. Serialization and deserialization identify different types through identifiers:

userv2 id:int unread_messages:int first_name:string last_name:string in_groups:vector int = User;

1.3.2 Serialization of vector types

Vector can be considered as a built-in type or as a composite type Vector,

vector {t:Type} # [ t ] = Vector t;

This is similar to a template container, but in fact the constructor always uses the same identifier,

const 0x1cb5c415 = crc32("vector t:Type # [ t ] = Vector t")

The order of serialization is:

  • 0x1cb5c415 4 bytes is a vector type, no matter what type of its elements, this will not change!

  • followed by the number of int elements of 4-byte little endian type

  • N elements are serialized according to the type, (each element does not include the type)

    When deserializing, according to the custom type, the specific type of the element is known, and there is no need to store the element type;

Related to this are: IntHash and StrHash, which are used to represent hash types, which are arrays of key-value pairs,

here:

coupleInt {t:Type} int t = CoupleInt t;
intHash {t:Type} (vector %(CoupleInt t)) = IntHash t;

coupleStr {t:Type} string t = CoupleStr t;
strHash {t:Type} (vector %(CoupleStr t)) = StrHash t;

Using c++ to describe something like:

using coupleInt = std::pair<int, t>;

using IntHash<t> = std::vector<coupleInt>;

The percent sign % is used here, indicating that each element is not added with a construction identifier when stored in the array.

1.3.3 string(bytes) string serialization method

https://core.telegram.org/mtproto/serialize

  • If the length is less than 254: use 1 byte to represent the length, followed by a byte stream of N bytes; the total length is finally aligned with 4 bytes;

  • The length is greater than or equal to 254: the first byte is 254, followed by a 3-byte little-endian int, followed by an N-byte byte stream; the total length is finally aligned with 4 bytes;

    About padding length:

  • If length is less than 254: 4 - (len(data) + 1) % 4

  • Length greater than or equal to 254: 4 - len(data) % 4

code show as below:

    @staticmethod
    def serialize_bytes(data):
        """Write bytes by using Telegram guidelines"""
        if not isinstance(data, bytes):
            if isinstance(data, str):
                data = data.encode('utf-8')
            else:
                raise TypeError(
                    'bytes or str expected, not {}'.format(type(data)))

        r = []
        if len(data) < 254:
            padding = (len(data) + 1) % 4
            if padding != 0:
                padding = 4 - padding

            r.append(bytes([len(data)]))
            r.append(data)

        else:
            padding = len(data) % 4
            if padding != 0:
                padding = 4 - padding

            r.append(bytes([
                254,
                len(data) % 256,
                (len(data) >> 8) % 256,
                (len(data) >> 16) % 256
            ]))
            r.append(data)

        r.append(bytes(padding))
        return b''.join(r)

1.4 Built-in types

The official documentation states that the relevant basic types are built in: https://core.telegram.org/mtproto/TL-tl

/
//
// Common Types (source file common.tl, only necessary definitions included)
//
/

// Built-in types
int ? = Int;
long ? = Long;
double ? = Double;
string ? = String;

// Boolean emulation
boolFalse = Bool;
boolTrue = Bool;

// Vector
vector {t:Type} # [t] = Vector t;
tuple {t:Type} {n:#} [t] = Tuple t n;
vectorTotal {t:Type} total_count:int vector:%(Vector t) = VectorTotal t;

Empty False;
true = True;

The built-in meaning here is that we need to manually implement the relevant business logic, and the subsequent functions are implemented by calling these basic functions in the code generator;

There are 1500 identifiers in alltlobjects.py generated by the code and a class for one;

The code tlobject.py implements the most basic functions; but only defines two abstract classes, TLObject and TLRequest,

The specific factory method for reading data requires each class to be implemented in various codes;

@classmethod
    def from_reader(cls, reader):

1.5 Summary

At this point, we have clear the basic calling logic:

  1. The business layer receives the data stream;
  2. Construct BinaryReader(data) using data stream;
  3. Use reader.tgread_object() as the entry function to try to deserialize;
  4. This function finds the appropriate class and factory function to deserialize the object according to the recognized identifier; (other methods of BinaryReader will also be used in the process)

2. Tracking the implementation of the login verification process algorithm

TelegramClient is a class that the library uses directly for customers, which inherits from a lot of parent classes:

  1. TelegramBaseClient

  2. AuthMethods,

  3. AccountMethods,

  4. DownloadMethods,

  5. DialogMethods,

  6. ChatMethods,

  7. BotMethods,

  8. MessageMethods,

  9. UploadMethods,

  10. ButtonMethods,

  11. UpdateMethods,

  12. MessageParseMethods,

  13. UserMethods,

At the current stage, TelegramBaseClient and AuthMethods are the relevant classes that exactly establish the connection with the server and perform the exchange of keys;

Relevant classes are described in the following table and related official protocol documents: https://core.telegram.org/mtproto/description

kindillustratedocument
AuthKeyIt encapsulates the basic KEY calculation and management worktelethon\crypto\authkey.py
MTProtoStateImplemented data encryption and decryption, including the calculation of msg_id and seq_no;telethon\network\mtprotostate.py
do_authentication functionThe state machine for the authentication process is implemented here! ! !telethon\network\authenticator.py
MTProtoSenderManage the underlying connection; implement the core key exchange process and related state machines that interact with the server; receive thread function; send thread function; message processing event distribution function after receiving messages;telethon\network\mtprotosender.py
MTProtoPlainSenderThis class needs to be used to send plaintext before exchanging keys;telethon\network\mtprotoplainsender.py
PacketCodecAn interface for encoding and decoding is defined; this is a pure virtual class that implements nothing;telethon\network\connection\connection.py
ConnectionA base class that encapsulates asyncio.open_connection; in fact, the subclass only needs to reset the static member variable packet_codec; it realizes the basic connection function of TCP and the realization of the sending and receiving thread; the upper layer only needs to call connect, send, recv;telethon\network\connection\connection.py
FullPacketCodechttps://core.telegram.org/mtproto#tcp-transport implements encoding and decoding according to the document; sending: 4 bytes total length, 4 bytes send count, data, 4 bytes checksum; remarks total length equal to 12+ The data is long; the format is the same when decoding, and the checksum needs to be checked;telethon\network\connection\tcpfull.py
ConnectionTcpFullInherited from Connection, only the FullPacketCodec class is set as the codec;telethon\network\connection\tcpfull.py
HttpPacketCodecSend the data using HTTP; also read the data part from the HTTP packet;telethon\network\connection\http.py
ConnectionHttpInherited from Connection, using HttpPacketCodec as the codec class;telethon\network\connection\http.py

Reference: "python abstract class abc module" https://zhuanlan.zhihu.com/p/508700685

"Python asyncio asynchronous programming" https://www.jianshu.com/p/7fd361cde22c

https://www.jianshu.com/p/eed5da9965f2

MTProtoSender is the core work engine; _connect(self) is the entry function after the whole work starts. The process is as follows:

1) Call self._try_connect to try to connect to the underlying TCP connection (possibly encapsulated by some other protocol, you know);

2) If the connection is successful, try to exchange keys: self._try_gen_auth_key;

3) If you cannot connect or exchange keys after trying self._retries times, an error will be reported, which is generally an error that cannot be connected;

4) After establishing the logical connection, start two threads: self._send_loop() and self._recv_loop();

5) This way, the connection is fully established!

2.1 TCP connection _try_connect

At the beginning of the experiment we call:

client = TelegramClient('session_name', api_id, api_hash, proxy=proxy1)
client.start()

The call stack looks like this:

  1. AuthMethods.start()

  2. AuthMethods._start()

  3. TelegramBaseClient.connect(), the default constructor uses the ConnectionTcpFull type as the underlying connection class; that is, construct one and call self._sender.connect()

  4. MTProtoSender.connect(), _connect() is called again in the function, here is the logic of the analysis in the previous part;

Note: The telethon\network\connection directory defines several TCP bottom-level related classes;

As described in the table above, Telegram supports 2 connection methods. The data connection in TCP mode is discussed here, which is also the default connection mode;

**The packet format is: **https://core.telegram.org/mtproto/mtproto-transports#full

4B(length) + 4B(serial number)+ NBytes( data)+ 4B(CRC32)
+----+----+----...----+----+
|len.|seq.|  payload  |crc.|
+----+----+----...----+----+

Specific code reference: telethon\network\connection\tcpfull.py

The encapsulation of the TCP connection is implemented in the Connection class;

At this point, the TCP connection is completed, the encapsulation and unpacking of the TCP data packets are also completed, and the upper-layer business can happily perform logical interaction.

Note: There are 4 data encapsulation formats:

Due to space limitations, no further discussion will be discussed here;

2.2 Key exchange _try_gen_auth_key

After completing the TCP connection in the previous section,

The self._try_gen_auth_key function performs the key exchange process:

  1. First create an MTProtoPlainSender for sending plaintext, where the previous connection needs to be passed;

  2. Call authenticator.do_authentication to execute the state machine, where the key exchange is completed inside the function; after success, an authorization key and a time offset will be obtained;

As the previous post has discussed the relevant key exchange process: here is a comparison of the implementation process,

Step 1: Send a random number of 16 bytes and get the server response, including (pq, server_nonce, public key hash),

The data (nonce, server_nonce) will be used as a temporary sessionID later.

 # Step 1 sending: PQ Request, endianness doesn't matter since it's random
    nonce = int.from_bytes(os.urandom(16), 'big', signed=True)
    
    # Here, the function ReqPqMultiRequest is used to construct the sent data, which is equal to the remote RPC, and returns the constructor res_pq type data.
    # The design here is really neat
    res_pq = await sender.send(ReqPqMultiRequest(nonce))  
    
    assert isinstance(res_pq, ResPQ), 'Step 1 answer was %s' % res_pq

    if res_pq.nonce != nonce:
        raise SecurityError('Step 1 invalid nonce from server')

    # Here is the call to the system library, which uses big endian mode to parse out a large integer p*q
    pq = get_int(res_pq.pq)

It should be mentioned that the ReqPqMultiRequest constructor is called here, not the ReqPqRequest we wrote earlier. This is mainly the change of the protocol version. The current document on the official website is also an example of req_pq. The current protocol version is 2.0, and req_pq_multi has been used.

Step 2: Execute DH key exchange, report a new random number first, and encrypt it

 	# factorize the product of large prime numbers to get p, q
    p, q = Factorization.factorize(pq)
    p, q = rsa.get_byte_array(p), rsa.get_byte_array(q)
    
    # In order to transmit encrypted information later, create a new random number new_nonce,
    new_nonce = int.from_bytes(os.urandom(32), 'little', signed=True)
    # Construct new sent data
     pq_inner_data = bytes(PQInnerData(
        pq=rsa.get_byte_array(pq), p=p, q=q,
        nonce=res_pq.nonce,
        server_nonce=res_pq.server_nonce,
        new_nonce=new_nonce
    ))

Encrypt pq_inner_data:

In the rsa.py file, the public key information currently used by the server is defined, and the appropriate public key can be found through the index returned by the service

 # sha_digest + data + random_bytes
    cipher_text, target_fingerprint = None, None
    # From the public key index returned by the server, find the first one, encrypt pq_inner_data
    for fingerprint in res_pq.server_public_key_fingerprints:
        cipher_text = rsa.encrypt(fingerprint, pq_inner_data)
        if cipher_text is not None:
            target_fingerprint = fingerprint
            break

    # This section is for compatibility with the old server's key, which can be ignored
    if cipher_text is None:
        # Second attempt, but now we're allowed to use old keys
        for fingerprint in res_pq.server_public_key_fingerprints:
            cipher_text = rsa.encrypt(fingerprint, pq_inner_data, use_old=True)
            if cipher_text is not None:
                target_fingerprint = fingerprint
                break
	
    if cipher_text is None:
        raise SecurityError(
            'Step 2 could not find a valid key for fingerprints: {}'
            .format(', '.join(
                [str(f) for f in res_pq.server_public_key_fingerprints])
            )
        )
	# The first 2 fields of the sent data are the random numbers exchanged before,
    server_dh_params = await sender.send(ReqDHParamsRequest(
        nonce=res_pq.nonce,
        server_nonce=res_pq.server_nonce,
        p=p, q=q,
        public_key_fingerprint=target_fingerprint,
        encrypted_data=cipher_text
    ))

Remarks: The rsa.encrypt() function executes the RSA_PAD process. This algorithm is more complicated and will be discussed separately later;

Check whether the service response is legal: the random number contained is the same as the previous one, and check that the new random number is sent by us to prevent man-in-the-middle attacks:

    assert isinstance(
        server_dh_params, (ServerDHParamsOk, ServerDHParamsFail)),\
        'Step 2.1 answer was %s' % server_dh_params

    if server_dh_params.nonce != res_pq.nonce:
        raise SecurityError('Step 2 invalid nonce from server')

    if server_dh_params.server_nonce != res_pq.server_nonce:
        raise SecurityError('Step 2 invalid server nonce from server')

    if isinstance(server_dh_params, ServerDHParamsFail):
        nnh = int.from_bytes(
            sha1(new_nonce.to_bytes(32, 'little', signed=True)).digest()[4:20],
            'little', signed=True
        )
        if server_dh_params.new_nonce_hash != nnh:
            raise SecurityError('Step 2 invalid DH fail nonce from server')

    assert isinstance(server_dh_params, ServerDHParamsOk),\
        'Step 2.2 answer was %s' % server_dh_params

Step 3: Calculate your own key, check with the server for consistency, try to complete the exchange process, and use the AES256_ige_encrypt encryption algorithm to process the reported data;

At this point, the service response has been obtained: but it is encrypted by the server, and the ciphertext needs to be decrypted first.

stuct Server_DH_inner_data
{
 	int128 nonce, 
    int128 server_nonce,
    int g,
    int dh_prime,    // pow(g, {a or b}) mod dh_prime  
    string g_a,      // a need to cherish
    int server_time
}

//  https://blog.csdn.net/robinfoxnan/article/details/127322483
# Step 3 sending: Complete DH Exchange
    # First calculate the encryption key and initial vector
    key, iv = helpers.generate_key_data_from_nonce(
        res_pq.server_nonce, new_nonce
    )
    if len(server_dh_params.encrypted_answer) % 16 != 0:
        # See PR#453
        raise SecurityError('Step 3 AES block size mismatch')

    #  Unwrap the answer
    plain_text_answer = AES.decrypt_ige(
        server_dh_params.encrypted_answer, key, iv
    )

    # The first 20 bytes are the checksum, followed by the structure of the service response
    with BinaryReader(plain_text_answer) as reader:
        reader.read(20)  # hash sum
        server_dh_inner = reader.tgread_object()
        assert isinstance(server_dh_inner, ServerDHInnerData),\
            'Step 3 answer was %s' % server_dh_inner

    if server_dh_inner.nonce != res_pq.nonce:
        raise SecurityError('Step 3 Invalid nonce in encrypted answer')

    if server_dh_inner.server_nonce != res_pq.server_nonce:
        raise SecurityError('Step 3 Invalid server nonce in encrypted answer')
	
    # Here are the core parameters of the key exchange
    dh_prime = get_int(server_dh_inner.dh_prime, signed=False)
    g = server_dh_inner.g
    g_a = get_int(server_dh_inner.g_a, signed=False)
    time_offset = server_dh_inner.server_time - int(time.time())

    b = get_int(os.urandom(256), signed=False)
    g_b = pow(g, b, dh_prime)
    gab = pow(g_a, b, dh_prime)

At this time, the key is actually equal to gab:

auth_key = (g_a)^b mod dh_prime;

After preparing the post parameters, you need to check the key parameters:

    # IMPORTANT: Apart from the conditions on the Diffie-Hellman prime
    # dh_prime and generator g, both sides are to check that g, g_a and
    # g_b are greater than 1 and less than dh_prime - 1. We recommend
    # checking that g_a and g_b are between 2^{2048-64} and
    # dh_prime - 2^{2048-64} as well.
    # (https://core.telegram.org/mtproto/auth_key#dh-key-exchange-complete)
    if not (1 < g < (dh_prime - 1)):
        raise SecurityError('g_a is not within (1, dh_prime - 1)')

    if not (1 < g_a < (dh_prime - 1)):
        raise SecurityError('g_a is not within (1, dh_prime - 1)')

    if not (1 < g_b < (dh_prime - 1)):
        raise SecurityError('g_b is not within (1, dh_prime - 1)')

    safety_range = 2 ** (2048 - 64)
    if not (safety_range <= g_a <= (dh_prime - safety_range)):
        raise SecurityError('g_a is not within (2^{2048-64}, dh_prime - 2^{2048-64})')

    if not (safety_range <= g_b <= (dh_prime - safety_range)):
        raise SecurityError('g_b is not within (2^{2048-64}, dh_prime - 2^{2048-64})')

Still encrypted with the AES key just now

# Prepare client DH Inner Data
    client_dh_inner = bytes(ClientDHInnerData(
        nonce=res_pq.nonce,
        server_nonce=res_pq.server_nonce,
        retry_id=0,  # TODO Actual retry ID
        g_b=rsa.get_byte_array(g_b)
    ))

    client_dh_inner_hashed = sha1(client_dh_inner).digest() + client_dh_inner

    # Encryption
    client_dh_encrypted = AES.encrypt_ige(client_dh_inner_hashed, key, iv)

    # Prepare Set client DH params
    dh_gen = await sender.send(SetClientDHParamsRequest(
        nonce=res_pq.nonce,
        server_nonce=res_pq.server_nonce,
        encrypted_data=client_dh_encrypted,
    ))

After the server responds, if it is correct, the two parties have reached an agreement through negotiation.

The format is as follows:

struct dh_gen_ok
{
    int128 nonce;           // mark a conversation
    int128 server_nonce;    // mark a conversation
    int128 new_nonce_hash1; // mark
}

test result

    # The answer is 3 possibilities
    nonce_types = (DhGenOk, DhGenRetry, DhGenFail)
    assert isinstance(dh_gen, nonce_types), 'Step 3.1 answer was %s' % dh_gen
    name = dh_gen.__class__.__name__
    if dh_gen.nonce != res_pq.nonce:
        raise SecurityError('Step 3 invalid {} nonce from server'.format(name))

    if dh_gen.server_nonce != res_pq.server_nonce:
        raise SecurityError(
            'Step 3 invalid {} server nonce from server'.format(name))

    auth_key = AuthKey(rsa.get_byte_array(gab))
    nonce_number = 1 + nonce_types.index(type(dh_gen))
    new_nonce_hash = auth_key.calc_new_nonce_hash(new_nonce, nonce_number)

    dh_hash = getattr(dh_gen, 'new_nonce_hash{}'.format(nonce_number))
    if dh_hash != new_nonce_hash:
        raise SecurityError('Step 3 invalid new nonce hash')

    if not isinstance(dh_gen, DhGenOk):
        raise AssertionError('Step 3.2 answer was %s' % dh_gen)

    return auth_key, time_offset

To be continued...

Tags: telegram

Posted by flyingeagle855 on Fri, 21 Oct 2022 13:23:29 +1030