OwlCyberSecurity - MANAGER

Edit File: charsetprober.cpython-39.pyc

����(��e,����������������������@���sL���d�dl�Z�d�dlZd�dlmZmZ�ddlmZmZ�e�d�Z	G�dd��d�Z
dS�)�����N)�Optional�Union����)�LanguageFilter�ProbingStates%���[a-zA-Z]*[�-�]+[a-zA-Z]*[^a-zA-Z�-�]?c�������������������@���s����e�Zd�ZdZejfedd�dd�Zdd�dd�Zee	e
�d�d	d
��Zee	e
�d�dd��Ze
eef�ed
�dd�Zeed�dd��Zed�dd�Zee
eef�ed�dd��Zee
eef�ed�dd��Zee
eef�ed�dd��ZdS�)�
CharSetProbergffffff�?N)�lang_filter�returnc�����������������C���s$���t�j|�_d|�_||�_t�t�|�_d�S�)NT)	r����	DETECTING�_state�activer����logging�	getLogger�__name__�logger)�selfr�����r����b/home/gouroczh/virtualenv/pat/3.9/lib/python3.9/site-packages/pip/_vendor/chardet/charsetprober.py�__init__,���s����zCharSetProber.__init__)r	���c�����������������C���s���t�j|�_d�S��N)r���r
���r����r���r���r���r����reset2���s����zCharSetProber.resetc�����������������C���s���d�S�r���r���r���r���r���r����charset_name5���s����zCharSetProber.charset_namec�����������������C���s���t��d�S�r�����NotImplementedErrorr���r���r���r����language9���s����zCharSetProber.language)�byte_strr	���c�����������������C���s���t��d�S�r���r���)r���r���r���r���r����feed=���s����zCharSetProber.feedc�����������������C���s���|�j�S�r���)r���r���r���r���r����state@���s����zCharSetProber.statec�����������������C���s���dS�)Ng��������r���r���r���r���r����get_confidenceD���s����zCharSetProber.get_confidence)�bufr	���c�����������������C���s���t��dd|��}�|�S�)Ns���([�-])+���� )�re�sub)r ���r���r���r����filter_high_byte_onlyG���s����z#CharSetProber.filter_high_byte_onlyc�����������������C���sZ���t���}t�|��}|D�]@}|�|dd����|dd��}|���sJ|dk�rJd}|�|��q|S�)u7��
        We define three types of bytes:
        alphabet: english alphabets [a-zA-Z]
        international: international characters [-ÿ]
        marker: everything else [^a-zA-Z-ÿ]
        The input buffer can be thought to contain a series of words delimited
        by markers. This function works to filter all words that contain at
        least one international character. All contiguous sequences of markers
        are replaced by a single space ascii character.
        This filter applies to all scripts which do not use English characters.
        N��������r!���)�	bytearray�INTERNATIONAL_WORDS_PATTERN�findall�extend�isalpha)r ����filtered�words�wordZ	last_charr���r���r����filter_international_wordsL���s����

z(CharSetProber.filter_international_wordsc�����������������C���s����t���}d}d}t|���d�}�t|��D�]R\}}|dkrB|d�}d}q$|dkr$||krr|sr|�|�||����|�d��d}q$|s�|�|�|d	����|S�)
a[��
        Returns a copy of ``buf`` that retains only the sequences of English
        alphabet and high byte characters that are not between <> characters.
        This filter can be applied to all scripts which contain both English
        characters and extended ASCII characters, but is currently only used by
        ``Latin1Prober``.
        Fr����c����>r�������<r!���TN)r'����
memoryview�cast�	enumerater*���)r ���r,���Zin_tag�prev�currZbuf_charr���r���r����remove_xml_tagsn���s ����	
zCharSetProber.remove_xml_tags)r����
__module__�__qualname__ZSHORTCUT_THRESHOLDr����NONEr���r����propertyr����strr���r���r����bytesr'���r���r���r����floatr����staticmethodr$���r/���r8���r���r���r���r���r���(���s"���!r���)r
���r"����typingr���r����enumsr���r����compiler(���r���r���r���r���r����<module>���s����