Skip to content

Content Encoding & Charset Detection ## Business Purpose

Automatically decompress response bodies and detect character encoding to provide readable text without manual setup. ## Current Behaviors - Response.text property decodes content using self.encoding or auto-detected encoding, with fallback to UTF-8 and error replacement src/requests/models.py:1031. - Response.apparent_encoding uses chardet.detect if available src/requests/models.py:895. - get_encoding_from_headers() returns charset from Content-Type, defaulting to ISO-8859-1 for text and UTF-8 for application/json src/requests/utils.py:569. - guess_json_utf() examines first bytes to guess UTF-16 or UTF-32 encoding src/requests/utils.py:1008. - iter_content with decode_unicode=True returns str chunks src/requests/models.py:912. ## Technical Implementation urllib3 handles content-encoding (gzip, deflate) transparently. The library then provides encoding detection on the decoded bytes. ## Definition of Done - A response with Content-Type text/html; charset=utf-8 is correctly decoded to Unicode. - A gzip-encoded response is automatically decompressed. - r.apparent_encoding returns a valid encoding name from chardet. - r.iter_content(decode_unicode=True) yields str instances. - r.json() works with UTF-16 encoded JSON responses.