helikite.classes.data_processing_level1

Attributes

logger

Classes

DataProcessorLevel1

Level 1 processor for performing quality control and calculating average humidity and temperature,

Module Contents

helikite.classes.data_processing_level1.logger

class helikite.classes.data_processing_level1.DataProcessorLevel1(output_schema: helikite.classes.output_schemas.OutputSchema, df: pandas.DataFrame, metadata: pydantic.BaseModel, flight_computer_version: str | None = None)

Bases: helikite.classes.base.BaseProcessor

Level 1 processor for performing quality control and calculating average humidity and temperature, flight altitude, and instrument-specific processing.

property level: helikite.classes.output_schemas.Level: Processing level identifier.

_df

_metadata

_outliers_files: set[str]

property df: pandas.DataFrame: Return the current state of dataframe.

property coupled_columns: list[tuple[str, Ellipsis]]: Return list of column names. If any column in a tuple is marked as an outlier, all other columns in the same tuple should also be marked as outliers.

_data_state_info() → list[str]

detect_outliers(outliers_file: str = 'outliers.csv', columns: list[str] | None = None, coupled_columns: list[tuple[str, Ellipsis]] | None = None, circular_ranges: dict[str, tuple] | None = None, acceptable_ranges: dict[str, tuple] | None = None, iqr_factor: numbers.Number = 5): Automatically detect statistical outliers and write results to file.

choose_outliers(y: str = 'flight_computer_pressure', outliers_file: str = 'outliers.csv', use_coupled_columns: bool = True, instruments: list[helikite.instruments.Instrument] | None = None) → ipywidgets.VBox

Creates a plot to interactively select outliers in the data.

A plot is generated where two variables are plotted, and the user can click on points to select or deselect them as outliers, or use Plotly’s selection tools to select multiple points at once. If use_coupled_columns is True and a value in any column within a group of coupled columns is marked as an outlier, then the values in all other columns of that group will also be marked as outliers.

Parameters:

(pandas.DataFrame) (df)
(str) (outlier_file)
(str)
(bool) (use_coupled_columns)

fillna_if_all_missing(values_dict: dict[str, Any] | None = None): Fill columns with default values when entirely missing.

set_outliers_to_nan(): Mask detected outliers as NaN in the dataframe.

plot_outliers_check(): Plots various flight parameters against flight_computer_pressure.

convert_gps_coordinates(lat_col='flight_computer_Lat', lon_col='flight_computer_Long', lat_dir='S', lon_dir='W'): Convert GPS coordinates to signed decimal format.

plot_gps_on_map(center_coords=(-70.6587, -8.285), zoom_start=13) → folium.Map: Plot GPS track on an interactive map.

T_RH_averaging(columns_t: list[str] | None = None, columns_rh: list[str] | None = None, nan_threshold: int = 400)

Averages reference instrument temperature and humidity data from available sensors and plots temperature and RH versus pressure. Updates DataFrame with ‘Average_Temperature’ and ‘Average_RH’ columns.

Parameters:

columns_t – List of column names containing temperature data.
columns_rh – List of column names containing humidity data.
nan_threshold (int) – Number of NaNs to tolerate before discarding a sensor.

plot_T_RH(save_path: str | pathlib.Path | None = None): Plot averaged temperature and humidity values.

_build_FC_T_columns() → list[str]

_build_FC_RH_columns() → list[str]

altitude_calculation_barometric(offset_to_add: numbers.Number = 0)

Calculates altitude using barometric formula based on ground pressure/temperature interpolation: and pressure readings during flight. Updates DataFrame with ‘Pressure_ground’, ‘Temperature_ground’, and ‘Altitude’ columns.

Parameters:: offset_to_add – Offset to add to altitude estimate.

plot_altitude(): Plot altitude profile.

add_missing_columns(): Add columns missing from the dataframe filled with NaN. This way, all the dataframes for flights from the same campaign have the same columns.

calculate_derived(instrument: helikite.instruments.Instrument, verbose: bool = True, *args, **kwargs): Run instrument-specific derived calculations.

normalize(instrument: helikite.instruments.Instrument, verbose: bool = True, *args, **kwargs)

plot_raw_and_normalized_data(instrument: helikite.instruments.Instrument, verbose: bool = True, *args, **kwargs): Plot raw versus normalized data.

plot_distribution(instrument: helikite.instruments.Instrument, verbose: bool = True, time_start: datetime.datetime | None = None, time_end: datetime.datetime | None = None, *args, **kwargs): Plot distribution of instrument data.

plot_vertical_distribution(instrument: helikite.instruments.Instrument, verbose: bool = True, *args, **kwargs): Plot vertical distribution of instrument data.

plot_flight_profiles(flight_basename: str, save_path: str | pathlib.Path, variables: list[helikite.classes.output_schemas.FlightProfileVariable] | None = None): Generate and save flight profile plots.

plot_size_distr(flight_basename: str, save_path: str | pathlib.Path, time_start: datetime.datetime | None = None, time_end: datetime.datetime | None = None): Generate and save particle size distribution plots combined in a single plot.

classmethod get_expected_columns(output_schema: helikite.classes.output_schemas.OutputSchema, reference_instrument: helikite.instruments.Instrument, with_dtype: bool) → list[str] | dict[str, str]

Generate expected dataframe columns at level 1.

Parameters:

output_schema – Schema containing campaign instruments.
with_dtype – Whether to include dtype mapping.

Returns:

List of column names or dict of column-to-dtype.

static read_data(level1_filepath: str | pathlib.Path) → pandas.DataFrame: Load Level 1 dataframe from CSV.

export_data(filepath: str | pathlib.Path | None = None): Export dataframe in its final state.