helikite

Submodules

Attributes

`__version__`
`__appname__`
`__description__`

Classes

Cleaner

Package Contents

class helikite.Cleaner(instruments: list[helikite.instruments.base.Instrument], reference_instrument: helikite.instruments.base.Instrument, input_folder: str | pathlib.Path, flight_date: datetime.date, flight: str | None = None, time_takeoff: datetime.datetime | None = None, time_landing: datetime.datetime | None = None, time_offset: datetime.time = datetime.time(0, 0))

Bases: helikite.classes.base.BaseProcessor

_instruments: list[helikite.instruments.base.Instrument] = []

input_folder: str

flight = None

flight_date: datetime.date

time_takeoff: datetime.datetime | None = None

time_landing: datetime.datetime | None = None

time_offset: datetime.time

pressure_column: str = 'pressure'

master_df: pandas.DataFrame | None = None

housekeeping_df: pandas.DataFrame | None = None

reference_instrument: helikite.instruments.base.Instrument

_data_state_info() → List[str]

set_pressure_column(column_name_override: str | None = None) → None: Set the pressure column for each instrument’s dataframe

set_time_as_index() → None: Set the time column as the index for each instrument dataframe

data_corrections(start_altitude: float = None, start_pressure: float = None, start_temperature: float = None) → None

plot_pressure() → None

Creates a plot with the pressure measurement of each instrument

Assumes the pressure column has been set for each instrument

remove_duplicates() → None: Remove duplicate rows from each instrument based on time index, and clear repeated values in ‘msems_scan_’, ‘msems_inverted_’ columns, and specific ‘mcda_*’ columns, keeping only the first instance.

merge_instruments(tolerance_seconds: int = 0, remove_duplicates: bool = True) → None

Merges all the dataframes from the instruments into one dataframe.

All columns from all instruments are included in the merged dataframe, with unique prefixes to avoid column name collisions.

Parameters:

tolerance_seconds (int) – The tolerance in seconds for merging dataframes.
remove_duplicates (bool) – If True, removes duplicate times and keeps the first result.

export_data(filename: str | None = None) → None

Export all data columns from all instruments to local files

The function will export a CSV and a Parquet file with all columns from all instruments. The files will be saved in the current working directory unless a filename is provided.

The Parquet file will include the metadata from the class.

_apply_rolling_window_to_pressure(instrument, window_size: int = 20)

Apply rolling window to the pressure measurements of instrument

Then plot the pressure measurements with the rolling window applied

define_flight_times()

Creates a plot to select the start and end of the flight

Uses the pressure measurements of the reference instrument to select the start and end of the flight. The user can click on the plot to select the points.

correct_time_and_pressure(max_lag=180, walk_time_seconds: int | None = None, apply_rolling_window_to: list[helikite.instruments.base.Instrument] = [], rolling_window_size: int = constants.ROLLING_WINDOW_DEFAULT_SIZE, reference_pressure_thresholds: tuple[float, float] | None = None, detrend_pressure_on: list[helikite.instruments.base.Instrument] = [], offsets: list[tuple[helikite.instruments.base.Instrument, int]] = [], match_adjustment_with: list[tuple[helikite.instruments.base.Instrument, helikite.instruments.base.Instrument]] = [])

Correct time and pressure for each instrument based on time lag.

Parameters:

max_lag (int) – The maximum time lag to consider for cross-correlation.
walk_time_seconds (int) – The time in seconds to walk the pressure data to match the reference instrument.
apply_rolling_window_to (list[Instrument]) – A list of instruments to apply a rolling window to the pressure data.
rolling_window_size (int) – The size of the rolling window to apply to the pressure data.
reference_pressure_thresholds (tuple[float, float]) – A tuple with two values (low, high) to apply a threshold to the reference instrument’s pressure data.
detrend_pressure_on (list[Instrument]) – A list of instruments to detrend the pressure data.
offsets (list[tuple[Instrument, int]]) – A list of tuples with an instrument and an offset in seconds to apply to the time index.
match_adjustment_with (dict[Instrument, list[Instrument]]) – A list of tuples with two instruments, in order to be able to to match the same time adjustment. This can be used, for example, if an instrument does not have a pressure column, and as such, can use the time adjustment from another instrument. The first instrument is the one that has the index adjustment, and the second instrument is the one that will be adjusted.

shift_msems_columns_by_90s(df: pandas.DataFrame) → pandas.DataFrame

Shift all ‘msems_inverted_’ and ‘msems_scan_’ columns by 90 seconds in time.

Parameters:: df (pd.DataFrame) – The DataFrame containing the time-indexed data to shift.
Returns:: The DataFrame with specified columns time-shifted by 90 seconds.
Return type:: pd.DataFrame

fill_missing_timestamps(df: pandas.DataFrame, freq: str = '1S', fill_method: str | None = None) → pandas.DataFrame

Reindex the DataFrame to fill in missing timestamps at the specified frequency. Optionally forward- or backward-fill missing values. Prints the number of timestamps added.

Parameters:

df (pd.DataFrame) – The input DataFrame with a DateTimeIndex.
freq (str) – The desired frequency for the DateTimeIndex (e.g., “1S” for 1 second).
fill_method (str or None) – Method to fill missing values: “ffill”, “bfill”, or None (default: None).

Returns:

A DataFrame with missing timestamps added and values optionally filled.

Return type:

pd.DataFrame

helikite.__version__ = '1.1.3'

helikite.__appname__ = 'helikite-data-processing'

helikite.__description__ = 'Library to generate quicklooks and data quality checks on Helikite campaigns'