helikite.processing.post.fda ============================ .. py:module:: helikite.processing.post.fda Attributes ---------- .. autoapisummary:: helikite.processing.post.fda.CONC_COLUMN_NAME helikite.processing.post.fda.GRAD_COLUMN_NAME helikite.processing.post.fda.FLAG_COLUMN_NAME helikite.processing.post.fda.COLOR_RED helikite.processing.post.fda.logger helikite.processing.post.fda.FDA_PARAMS_POLLUTION helikite.processing.post.fda.FDA_PARAMS_HOVERING helikite.processing.post.fda.FDA_PARAMS_CLOUD Classes ------- .. autoapisummary:: helikite.processing.post.fda.FDAParameters helikite.processing.post.fda.FDA Module Contents --------------- .. py:data:: CONC_COLUMN_NAME :value: 'concentration' .. py:data:: GRAD_COLUMN_NAME :value: 'gradient' .. py:data:: FLAG_COLUMN_NAME :value: 'flag' .. py:data:: COLOR_RED :value: '#d73027' .. py:data:: logger .. py:class:: FDAParameters Parameters for Flag Detection Algorithm. :param inverse (bool): :type inverse (bool): If True, detect low values/gradients instead of high :param avg_time (str | None): :type avg_time (str | None): Resampling frequency (e.g., '1s', '5min') before processing :param main_filter (Literal["power_law": :type main_filter (Literal["power_law": Core detection method :param "iqr"]): :type "iqr"]): Core detection method :param use_neighbor_filter (bool): :type use_neighbor_filter (bool): If True, flag adjacent points of flagged by main filter data :param use_median_filter (bool): :type use_median_filter (bool): If True, flag points exceeding rolling median * factor :param use_sparse_filter (bool): :type use_sparse_filter (bool): If True, flag windows with high density of existing flags :param pl_a (float): :type pl_a (float): Multiplier 'a' for power law curve (a * x^m), in case of main_filter='power_law' :param pl_m (float): :type pl_m (float): Exponent 'm' for power law curve (a * x^m), in case of main_filter='power_law' :param iqr_window (str | None): :type iqr_window (str | None): Rolling window size for IQR calculation, in case of main_filter='iqr' :param iqr_factor (float | None): :type iqr_factor (float | None): Multiplier for IQR threshold (Q3 + factor * IQR), in case of main_filter='iqr' :param lower_thr (float): :type lower_thr (float): The values below lower_thr are considered clean (polluted, if inverse=True) by main filter :param upper_thr (float): :type upper_thr (float): The values above upper_thr are considered polluted (clean, if inverse=True) by main filter :param median_window (str | None): :type median_window (str | None): Rolling window size for median filter :param median_factor (float | None): :type median_factor (float | None): Multiplier for median filter threshold :param sparse_window (str | None): :type sparse_window (str | None): Rolling window size to check flag density :param sparse_thr (float | None): :type sparse_thr (float | None): Min flagged points in window to trigger sparse filter .. py:attribute:: inverse :type: bool .. py:attribute:: avg_time :type: str | None :value: None .. py:attribute:: main_filter :type: Literal['power_law', 'iqr'] :value: 'power_law' .. py:attribute:: use_neighbor_filter :type: bool :value: False .. py:attribute:: use_median_filter :type: bool :value: False .. py:attribute:: use_sparse_filter :type: bool :value: False .. py:attribute:: use_duration_filter :type: bool :value: False .. py:attribute:: pl_a :type: float .. py:attribute:: pl_m :type: float :value: 0 .. py:attribute:: iqr_window :type: str | None :value: None .. py:attribute:: iqr_factor :type: float | None :value: None .. py:attribute:: lower_thr :type: float .. py:attribute:: upper_thr :type: float .. py:attribute:: median_window :type: str | None :value: None .. py:attribute:: median_factor :type: float | None :value: None .. py:attribute:: sparse_window :type: str | None :value: None .. py:attribute:: sparse_thr :type: float | None :value: None .. py:attribute:: min_duration :type: str | None :value: None .. py:class:: FDA(df: pandas.DataFrame, conc_column_name: str, gt_flag_column_name: str | None, params: FDAParameters) Flag Detection Algorithm (FDA) for identifying anomalies in time-series data. Based on the algorithm described in: "Automated identification of local contamination in remote atmospheric composition time series" by Ivo Beck et al. (2020) https://doi.org/10.5194/amt-15-4195-2022 :param df (pandas.DataFrame): :type df (pandas.DataFrame): Input dataframe containing concentration and optional flag data :param conc_column_name (str): :type conc_column_name (str): Name of the column with the values to analyze :param gt_flag_column_name (str | None): :type gt_flag_column_name (str | None): Name of the column with ground truth flags :param params (FDAParameters): :type params (FDAParameters): Configuration object for the detection logic .. py:attribute:: _title .. py:attribute:: _params .. py:attribute:: _df_orig .. py:attribute:: _conc_orig .. py:attribute:: _df .. py:attribute:: _filters :type: list[Callable] | None :value: None .. py:attribute:: _intermediate_flags :type: list[pandas.Series] | None :value: None .. py:method:: plot_data(use_time_index: bool = True, figsize=(18, 10), bins=100, fontsize=22, markersize=3, save_path: str | pathlib.Path | None = None) Visualize concentration and gradient distributions. Generates a joint histogram and time-series plot to inspect raw signal behavior and threshold placement. :param use_time_index: Plot against timestamps instead of sample index. :param figsize: Figure size. :param bins: Histogram bin count. :param fontsize: Axis label font size. :param markersize: Marker size for time-series plot. :param save_path: Optional output path for saving the figure. .. py:method:: detect() -> pandas.Series Execute configured filters and produce a flag series. Applies filters sequentially, storing intermediate results, and returns the final flag series aligned with the original index. :returns: Series indicating detected flag events. .. py:method:: plot_detection(use_time_index: bool = True, figsize=None, fontsize=14, markersize=3, yscale='log', save_path: str | pathlib.Path | None = None, start_time: datetime.datetime | None = None, end_time: datetime.datetime | None = None) Visualize intermediate filtering stages. Displays concentration and gradient signals alongside each filter’s flag results. :param use_time_index: Plot against timestamps instead of sample index. :param figsize: Figure size. :param bins: Histogram bin count. :param fontsize: Axis label font size. :param markersize: Marker size. :param save_path: Optional output path for saving the figure. :param yscale: Y-axis scale for concentration. :param start_time: Optional plot start time. :param end_time: Optional plot end time. .. py:method:: power_law_filter(conc: pandas.Series, grad: pandas.Series, flag_old: pandas.Series, params: FDAParameters) :staticmethod: Flag anomalies using a power-law gradient threshold. .. py:method:: iqr_filter(conc: pandas.Series, grad: pandas.Series, flag_old: pandas.Series, params: FDAParameters) :staticmethod: Flag anomalies using rolling interquartile range thresholds. .. py:method:: neighbor_filter(conc: pandas.Series, grad: pandas.Series, flag_old: pandas.Series, params: FDAParameters) :staticmethod: Extend flags to neighboring samples. .. py:method:: median_filter(conc: pandas.Series, grad: pandas.Series, flag_old: pandas.Series, params: FDAParameters) :staticmethod: Flag samples exceeding a rolling median threshold. .. py:method:: sparse_filter(conc: pandas.Series, grad: pandas.Series, flag_old: pandas.Series, params: FDAParameters) :staticmethod: Flag regions with high density of anomalies. .. py:method:: duration_filter(conc: pandas.Series, grad: pandas.Series, flag_old: pandas.Series, params: FDAParameters) :staticmethod: Remove flagged events shorter than a minimum duration. .. py:method:: evaluate(conc: pandas.Series, flag: pandas.Series, flag_manual: pandas.Series, verbose: bool = False) :staticmethod: Compute detection performance metrics, in case ground truth is available. Calculates precision, recall, and F1 score relative to reference flags. .. py:data:: FDA_PARAMS_POLLUTION :type: FDAParameters .. py:data:: FDA_PARAMS_HOVERING .. py:data:: FDA_PARAMS_CLOUD