helikite.classes.data_processing_level1
=======================================

.. py:module:: helikite.classes.data_processing_level1


Attributes
----------

.. autoapisummary::

   helikite.classes.data_processing_level1.logger


Classes
-------

.. autoapisummary::

   helikite.classes.data_processing_level1.DataProcessorLevel1


Module Contents
---------------

.. py:data:: logger

.. py:class:: DataProcessorLevel1(output_schema: helikite.classes.output_schemas.OutputSchema, df: pandas.DataFrame, metadata: pydantic.BaseModel, flight_computer_version: str | None = None)

   Bases: :py:obj:`helikite.classes.base.BaseProcessor`


   Level 1 processor for performing quality control and calculating average humidity and temperature,
   flight altitude, and instrument-specific processing.


   .. py:property:: level
      :type: helikite.classes.output_schemas.Level


      Processing level identifier.


   .. py:attribute:: _df


   .. py:attribute:: _metadata


   .. py:attribute:: _outliers_files
      :type:  set[str]


   .. py:property:: df
      :type: pandas.DataFrame


      Return the current state of dataframe.


   .. py:property:: coupled_columns
      :type: list[tuple[str, Ellipsis]]


      Return list of column names. If any column in a tuple is marked as an outlier,
      all other columns in the same tuple should also be marked as outliers.


   .. py:method:: _data_state_info() -> list[str]


   .. py:method:: detect_outliers(outliers_file: str = 'outliers.csv', columns: list[str] | None = None, coupled_columns: list[tuple[str, Ellipsis]] | None = None, circular_ranges: dict[str, tuple] | None = None, acceptable_ranges: dict[str, tuple] | None = None, iqr_factor: numbers.Number = 5)

      Automatically detect statistical outliers and write results to file.


   .. py:method:: choose_outliers(y: str = 'flight_computer_pressure', outliers_file: str = 'outliers.csv', use_coupled_columns: bool = True, instruments: list[helikite.instruments.Instrument] | None = None) -> ipywidgets.VBox

      Creates a plot to interactively select outliers in the data.

      A plot is generated where two variables are plotted, and the user can
      click on points to select or deselect them as outliers, or use Plotly's
      selection tools to select multiple points at once.
      If `use_coupled_columns` is True and a value in any column within a group of coupled columns
      is marked as an outlier, then the values in all other columns of that group will also be marked as outliers.

      :param df (pandas.DataFrame):
      :type df (pandas.DataFrame): The dataframe containing the data
      :param y (str):
      :type y (str): The column name of the y-axis variable
      :param outlier_file (str):
      :type outlier_file (str): The path to the CSV file to store the outliers
      :param use_coupled_columns (bool):
      :type use_coupled_columns (bool): if True, use the coupled columns defined in the instruments


   .. py:method:: fillna_if_all_missing(values_dict: dict[str, Any] | None = None)

      Fill columns with default values when entirely missing.


   .. py:method:: set_outliers_to_nan()

      Mask detected outliers as NaN in the dataframe.


   .. py:method:: plot_outliers_check()

      Plots various flight parameters against flight_computer_pressure.


   .. py:method:: convert_gps_coordinates(lat_col='flight_computer_Lat', lon_col='flight_computer_Long', lat_dir='S', lon_dir='W')

      Convert GPS coordinates to signed decimal format.


   .. py:method:: plot_gps_on_map(center_coords=(-70.6587, -8.285), zoom_start=13) -> folium.Map

      Plot GPS track on an interactive map.


   .. py:method:: T_RH_averaging(columns_t: list[str] | None = None, columns_rh: list[str] | None = None, nan_threshold: int = 400)

      Averages reference instrument temperature and humidity data from available sensors and
      plots temperature and RH versus pressure.
      Updates DataFrame with 'Average_Temperature' and 'Average_RH' columns.

      :param columns_t: List of column names containing temperature data.
      :param columns_rh: List of column names containing humidity data.
      :param nan_threshold: Number of NaNs to tolerate before discarding a sensor.
      :type nan_threshold: int


   .. py:method:: plot_T_RH(save_path: str | pathlib.Path | None = None)

      Plot averaged temperature and humidity values.


   .. py:method:: _build_FC_T_columns() -> list[str]


   .. py:method:: _build_FC_RH_columns() -> list[str]


   .. py:method:: altitude_calculation_barometric(offset_to_add: numbers.Number = 0)

      Calculates altitude using barometric formula based on ground pressure/temperature interpolation
       and pressure readings during flight.
       Updates DataFrame with 'Pressure_ground', 'Temperature_ground', and 'Altitude' columns.

      :param offset_to_add: Offset to add to altitude estimate.


   .. py:method:: plot_altitude()

      Plot altitude profile.


   .. py:method:: add_missing_columns()

      Add columns missing from the dataframe filled with NaN. This way, all the dataframes for flights
      from the same campaign have the same columns.


   .. py:method:: calculate_derived(instrument: helikite.instruments.Instrument, verbose: bool = True, *args, **kwargs)

      Run instrument-specific derived calculations.


   .. py:method:: normalize(instrument: helikite.instruments.Instrument, verbose: bool = True, *args, **kwargs)

      Normalize instrument measurements.


   .. py:method:: plot_raw_and_normalized_data(instrument: helikite.instruments.Instrument, verbose: bool = True, *args, **kwargs)

      Plot raw versus normalized data.


   .. py:method:: plot_distribution(instrument: helikite.instruments.Instrument, verbose: bool = True, time_start: datetime.datetime | None = None, time_end: datetime.datetime | None = None, *args, **kwargs)

      Plot distribution of instrument data.


   .. py:method:: plot_vertical_distribution(instrument: helikite.instruments.Instrument, verbose: bool = True, *args, **kwargs)

      Plot vertical distribution of instrument data.


   .. py:method:: plot_flight_profiles(flight_basename: str, save_path: str | pathlib.Path, variables: list[helikite.classes.output_schemas.FlightProfileVariable] | None = None)

      Generate and save flight profile plots.


   .. py:method:: plot_size_distr(flight_basename: str, save_path: str | pathlib.Path, time_start: datetime.datetime | None = None, time_end: datetime.datetime | None = None)

      Generate and save particle size distribution plots combined in a single plot.


   .. py:method:: get_expected_columns(output_schema: helikite.classes.output_schemas.OutputSchema, reference_instrument: helikite.instruments.Instrument, with_dtype: bool) -> list[str] | dict[str, str]
      :classmethod:


      Generate expected dataframe columns at level 1.

      :param output_schema: Schema containing campaign instruments.
      :param with_dtype: Whether to include dtype mapping.

      :returns: List of column names or dict of column-to-dtype.


   .. py:method:: read_data(level1_filepath: str | pathlib.Path) -> pandas.DataFrame
      :staticmethod:


      Load Level 1 dataframe from CSV.


   .. py:method:: export_data(filepath: str | pathlib.Path | None = None)

      Export dataframe in its final state.