2.1.3. rainfallqc.utils package

2.1.3.1. Submodules

2.1.3.2. rainfallqc.utils.data_readers module

Data loading tools.

Classes for reading rain gauge network data at bottom of file.

class rainfallqc.utils.data_readers.GPCCNetworkReader(path_to_gpcc_dir: str, time_res: str, file_format: str = '.zip', unzipped_file_format: str = '.dat')[source]

Bases: GaugeNetworkReader

GPCC rain gauge network reader.

Methods

`get_nearest_overlapping_neighbours_to_target`(...)	Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.
`load_network_data`(data_paths, target_gauge_col)	Load GPCC network data based on file paths.

load_network_data(data_paths: List[str] | ndarray[str], target_gauge_col: str, missing_val: int | float = -999.9) → DataFrame[source]

Load GPCC network data based on file paths.

Parameters:

data_paths: Paths to load network data from.
target_gauge_col: Rainfall data column
missing_val: Missing value (default: -999)

Returns:

network_data: Dataframe of GPCC gauges.

class rainfallqc.utils.data_readers.GSDRNetworkReader(path_to_gsdr_dir: str, file_format: str = '.txt')[source]

Bases: GaugeNetworkReader

GSDR rain gauge network reader.

Methods

`get_nearest_overlapping_neighbours_to_target`(...)	Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.
`load_network_data`(rain_col_prefix, data_paths)	Load GSDR network data based on file paths.

load_network_data(rain_col_prefix: str, data_paths: List[str] | ndarray[str], suffix_only: bool = False, gsdr_header_rows: int = 20) → DataFrame[source]

Load GSDR network data based on file paths.

Parameters:

data_paths: Paths to load network data from.
rain_col_prefix: Prefix for rain column name (default is ‘rain’)
suffix_only: Override to only include the suffix e.g. if the column name is the ID)
gsdr_header_rows: Number of rows to skip in the header of the GSDR data (default=20)

Returns:

network_data: Dataframe of GSDR gauges.

class rainfallqc.utils.data_readers.GaugeNetworkReader(path_to_gauge_network: str)[source]

Bases: ABC

Base class for reading rain gauge networks.

Methods

get_nearest_overlapping_neighbours_to_target(...)

Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.

get_nearest_overlapping_neighbours_to_target(target_id: str, distance_threshold: int | float, n_closest: int, min_overlap_days: int) → set[source]

Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.

Parameters:

target_id: Target gauge to get neighbour IDs of
distance_threshold: Distance threshold to check for neighbours
n_closest: Number of nearest neighbours to return
min_overlap_days: Minimum time overlap between neighbours to return

Returns:

neighbouring_gauge_id: IDs of neighbouring gauges within a given distance to target and min overlapping days

rainfallqc.utils.data_readers.add_datetime_to_gsdr_data(gsdr_data: DataFrame, gsdr_metadata: dict, multiplying_factor: int | float) → DataFrame[source]

Add datetime column to GSDR gauge data using metadata from that gauge.

NOTE: Could maybe extend so can find metadata if not provided?

Parameters:

gsdr_data: GSDR data
gsdr_metadata: Metadata from GSDR file
multiplying_factorint or float: Factor to multiply the data by.

Returns:

gsdr_data: GSDR data with datetime column added

rainfallqc.utils.data_readers.convert_gsdr_metadata_dates_to_datetime(gsdr_metadata: dict) → dict[source]

Convert GSDR metadata date string column to datetime.

Parameters:

gsdr_metadata: Metadata from GSDR file

Returns:

gsdr_metadatadict
Metadata from GSDR file with start and end date column

rainfallqc.utils.data_readers.get_paths_using_gauge_ids(gauge_ids: List[str] | ndarray[str], dir_path: str, file_format: str, time_res: str = None) → dict[source]

Get data path of Gauge IDs.

Parameters:

gauge_ids: Array of gauge IDs
dir_path: Path to data directory
file_format: Format of files in directory.
time_res: Time resolution (e.g. ‘mw’ or ‘tw’)

Returns:

gauge_paths: Dictionary of gauge ID and path

rainfallqc.utils.data_readers.load_etccdi_data(etccdi_var: str, path_to_etccdi: str = None) → Dataset[source]

Load ETCCDI data.

Parameters:

etccdi_var: variable to load from ETCCDI
path_to_etccdi: path to ETCCDI data (default is location of data in tests)

Returns:

etccdi_data: Loaded data

rainfallqc.utils.data_readers.load_gpcc_gauge_network_metadata(path_to_gpcc_dir: str, time_res: str, gpcc_file_format: str = '.dat') → DataFrame[source]

Load metadata from GPCC gauges from a directory.

Parameters:

path_to_gpcc_dir: Path to directory with GPCC gauges
time_res: Time resolution (e.g. ‘mw’ or ‘tw’)
gpcc_file_format: Format of file (default is .dat)

Returns:

all_station_metadata: All GPCC gauges metadata as one dataframe.

rainfallqc.utils.data_readers.load_gsdr_gauge_network_metadata(path_to_gsdr_dir: str, file_format: str = '.txt') → DataFrame[source]

Load metadata from GSDR gauges from a directory.

Parameters:

path_to_gsdr_dir: Path to directory with GSDR gauges
file_format: Format of file (default is .txt)

Returns:

all_station_metadata: All GSDR gauges metadata as one dataframe.

rainfallqc.utils.data_readers.read_gpcc_data_from_zip(data_path: str, gpcc_file_name: str, target_gauge_col: str, time_res: str, hour_offset: int = 7, missing_val: int | float = -999) → DataFrame[source]

Read the specific format and header of Global Precipitation Climatology Centre (GPCC) files.

Parameters:

data_path: path to GPCC zip file
gpcc_file_name: Name of GPCC file within zip
target_gauge_col: Name of rainfall column
time_res: ‘daily’ or ‘monthly’
hour_offset: Hours to offset grouped data by (default is 7)
missing_val: Missing value (default: -999)

Returns:

gpcc_datadict: Data from GPCC file

rainfallqc.utils.data_readers.read_gpcc_metadata_from_zip(data_path: str, time_res: str, gpcc_file_format: str = '.dat') → dict[source]

Read GPCC metadata from zip file.

Parameters:

data_path: path to GPCC zip file.
time_res: Time resolution of data (e.g. daily or monthly)
gpcc_file_format: Default GPCC file format (default: .dat)

Returns:

metadata: Metadata from GPCC file

rainfallqc.utils.data_readers.read_gsdr_data_from_file(data_path: str, raw_data_time_res: str, rain_col_prefix: str = None, rain_col_suffix: str = None, suffix_only: bool = False, gsdr_header_rows: int = 20) → DataFrame[source]

Read GSDR data from file.

Note: this was developed on the GSDR data available from IntenseQC. So it needs a number of header rows in data.

Parameters:

data_path: Path to GSDR data file
raw_data_time_res: Time resolution of data record i.e. ‘hourly’ or ‘daily’
rain_col_prefix: Prefix for column for target_gauge_col (set as None by default)
rain_col_suffix: Suffix for column name for target_gauge_col (set as None by default)
suffix_only: Override to only include the suffix e.g. if the column name is the ID)
gsdr_header_rows: Number of rows to skip in the header of the GSDR data (default=20)

Returns:

gsdr_data: GSDR data as Pandas DataFrame

rainfallqc.utils.data_readers.read_gsdr_metadata(data_path: str) → dict[source]

Read the specific format and header of Global Sub-Daily Rainfall (GSDR) files.

Parameters:

data_path: path to GSDR data file (.txt)

Returns:

metadata: Metadata from GSDR file

2.1.3.3. rainfallqc.utils.data_utils module

All data operations for polars including datetime and calendar functionality.

Classes and functions ordered alphabetically.

rainfallqc.utils.data_utils.back_propagate_daily_data_flags(data: DataFrame, flag_column: str, num_days: int) → DataFrame[source]

Back fill-in flags a number of days.

This will prioritise higher flag values.

Parameters:

data: Daily data with flag_column
flag_column: column with flags
num_days:: Number of days to back-propagate

Returns:

data: Data with flags back-propogated

rainfallqc.utils.data_utils.calculate_dry_spell_fraction(data: DataFrame, target_gauge_col: str, dry_period_days: int) → Series[source]

Calculate dry spell fraction.

Parameters:

data: Data with time column
target_gauge_col: Column with rainfall data
dry_period_days: Length for of a “dry_spell”

Returns:

rain_daily_dry_day: Data with dry spell fraction

rainfallqc.utils.data_utils.check_data_has_consistent_time_step(data: DataFrame) → None[source]

Check data has a consistent time step i.e. ‘1h’.

Parameters:

data: Data with time column

Raises:

ValueError: If data has more than one time steps

rainfallqc.utils.data_utils.check_data_is_monthly(data: DataFrame) → None[source]

Check data is monthly.

Parameters:

data: Data with time column

Raises:

ValueError: If data has a no monthly time steps

rainfallqc.utils.data_utils.check_data_is_specific_time_res(data: DataFrame, time_res: str | list) → None[source]

Check data has a hourly or daily time step.

Does not work for monthly data, please use ‘check_data_is_monthly’.

Parameters:

data: Data with time column.
time_res: Time resolutions either a single string or list of strings

Raises:

ValueError: If data is not hourly or daily.

rainfallqc.utils.data_utils.check_for_negative_values(df: DataFrame, target_gauge_col: str) → bool[source]

Check if the target column contains any negative values.

Parameters:

df: DataFrame to check.
target_gauge_col: Column to check for negative values.

Raises:

ValueError: If negative values are found in the target column.

rainfallqc.utils.data_utils.convert_daily_data_to_monthly(daily_data: DataFrame, rain_cols: list, perc_for_valid_month: int | float = 95) → DataFrame[source]

Convert daily data to monthly whilst setting month to NaN if less than a given percentage of days is missing.

Parameters:

daily_data: Daily data to convert to monthly
rain_cols: Columns with rainfall data
perc_for_valid_month: Percentage of month needed to be classed as a valid month for the monthly group by

Returns:

monthly_data: Monthly data

rainfallqc.utils.data_utils.convert_datarray_seconds_to_days(series_seconds: DataArray) → ndarray[source]

Convert xarray series from seconds to days. For some reason the CDD data from ETCCDI is in seconds.

Parameters:

series_seconds: Data in series to convert to days.

Returns:

series_days: Data array converted to days.

rainfallqc.utils.data_utils.downsample_and_fill_columns(high_res_data: DataFrame, low_res_data: DataFrame, data_cols: str | list[str], fill_limit: int, fill_method: str = 'backward', time_col: str = 'time') → DataFrame[source]

Join columns from lower resolution data to higher resolution data and fill gaps.

Parameters:

high_res_data: Higher resolution data (e.g., 15-min)
low_res_data: Lower resolution data with columns to join (e.g., hourly)
data_cols: Column name(s) to join and fill. Can be: - Single column name: “rainfall” - List of columns: [“rain1”, “rain2”] - Regex pattern: “^rain.*$”
fill_limit: Maximum number of intervals to fill
fill_method: “forward”, “backward”, or “none”
time_col: Name of time column (default: ‘time’)

Returns:

high_res_data_filled: High resolution data with filled columns

rainfallqc.utils.data_utils.downsample_monthly_data(sub_monthly_data: DataFrame, monthly_data: DataFrame, data_cols: str | list[str], time_col: str = 'time') → DataFrame[source]

Join monthly data to hourly and fill only within same month.

Parameters:

sub_monthly_data: Sub-monthly data (e.g., hourly)
monthly_data: Monthly data with columns to join
data_cols: Column name(s) to join and fill. Can be: - Single column name: “rainfall” - List of columns: [“rain1”, “rain2”]
time_col: Name of time column (default: ‘time’)

Returns:

result: Sub-monthly data with monthly columns joined and filled within month

rainfallqc.utils.data_utils.extract_negative_values_from_data(data: DataFrame, cols_to_extract_from: list) → DataFrame[source]

Extract negative values from data.

Parameters:

data: Rainfall data.
cols_to_extract_from: Columns to extract negative values from

Returns:

data: Data with only negative values or 0.

rainfallqc.utils.data_utils.extract_positive_values_from_data(data: DataFrame, cols_to_extract_from: list) → DataFrame[source]

Extract positive values from data.

Parameters:

data: Rainfall data.
cols_to_extract_from: Columns to extract positive values from

Returns:

data: Data with only positive values or 0.

rainfallqc.utils.data_utils.format_timedelta_duration(td: timedelta) → str[source]

Convert timedelta to custom strings.

Parameters:

td: Time delta to convert.

Returns:

td: Human-readable timedelta string using largest unit (d, h, m, s).

rainfallqc.utils.data_utils.get_data_timestep_as_str(data: DataFrame) → str[source]

Get time step of data.

Parameters:

data: Data with time column

Returns:

time_step: Time step of data i.e. ‘1h’, ‘1d’, ‘15m’.

rainfallqc.utils.data_utils.get_data_timesteps(data: DataFrame) → Series[source]

Get data timesteps. Ideally the data should have 1.

Parameters:

data: Data with time column.

Returns:

unique_timesteps: All unique time steps in data (timedelta).

rainfallqc.utils.data_utils.get_dry_period_proportions(dry_period_days: int) → dict[source]

Get dry period proportions.

Parameters:

dry_period_days: Length for of a “dry_spell” (default: 15 days)

Returns:

fraction_dry_days: Dictionary with keys “1”, “2”, “3” with dry spell fractions

rainfallqc.utils.data_utils.get_dry_spells(data: DataFrame, target_gauge_col: str) → DataFrame[source]

Get dry spell column.

Parameters:

data: Rainfall data
target_gauge_col: Column with rainfall data

Returns:

data_w_dry_spells: Data with is_dry binary column

rainfallqc.utils.data_utils.get_expected_days_in_month(data: DataFrame) → DataFrame[source]

Get expected number of days in a months within the data.

Parameters:

data: Data with ‘year’ and ‘month’ columns

Returns:

data:: Data with ‘expected_days_in_month” column

rainfallqc.utils.data_utils.get_normalised_diff(data: DataFrame, target_col: str, other_col: str, diff_col_name: str) → DataFrame[source]

Ger normalised difference between two columns in data.

Parameters:

data: Data with columns
target_col: Target column
other_col: Other column.
diff_col_name: New column name for difference column

Returns:

data_w_norm_diff: Data with normalised diff

rainfallqc.utils.data_utils.make_month_and_year_col(data: DataFrame) → DataFrame[source]

Make year and month columns for polars dataframe.

Parameters:

data: Data with time column

Returns:

data: Data with year and month columns

rainfallqc.utils.data_utils.normalise_data(data: Series | Expr) → Series[source]

Normalise data to [0, 1].

Parameters:

data: Data with time column.

Returns:

norm_data: Normalised data.

rainfallqc.utils.data_utils.offset_data_by_time(data: DataFrame, target_col: str, offset_in_time: int, time_res: str) → DataFrame[source]

Shift/offset data either backwards or forwards in time.

Parameters:

data: Data with column to offset in ‘time’
target_col: Column of data to offset
offset_in_time: Amount to offset data by i.e. 1 for 1 day if time_res set to ‘1d’
time_res: Time resolution like ‘hourly’, ‘daily’, ‘1h’ or ‘1d’

Returns:

data: Offset data by ‘offset_in_time’ amount

rainfallqc.utils.data_utils.replace_missing_vals_with_nan(data: DataFrame, target_gauge_col: str, missing_val: int = None) → DataFrame[source]

Replace no data value with numpy.nan.

Parameters:

data: Rainfall data
target_gauge_col: Column of rainfall
missing_val: Missing value identifier

Returns:

gsdr_data: GSDR data with missing values replaced

rainfallqc.utils.data_utils.resample_data_by_time_step(data: DataFrame, rain_cols: List[str], time_col: str, time_step: str, min_count: int, hour_offset: int) → DataFrame[source]

Group hourly data into daily and check for at least 24 daily time steps per day.

Parameters:

data: Rainfall data to resample
rain_cols: List of column with rainfall data
time_col: Name of time column
time_step: Time step to resample into (e.g. ‘1d’ for daily, ‘1h’ for hourly, ‘15m’ for 15 minute)
min_count: Minimum number of time steps needed per time period
hour_offset: Time offset in hours (needed if data is not aligned to midnight)

Returns:

resampled_data: Rainfall data grouped into a given time step

2.1.3.4. rainfallqc.utils.neighbourhood_utils module

All neighbourhood and nearby related operations.

rainfallqc.utils.neighbourhood_utils.compute_km_distances_from_target_id(gauge_network_metadata: DataFrame, target_id: str, station_id_col: str) → DataFrame[source]

Compute kilometre distances between gauges in network and target gauges.

Parameters:

gauge_network_metadata: Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.
target_id: Target gauge to compare against.
station_id_col: Column name for station ID in gauge_network_metadata

Returns:

neighbour_distances_df: Data of distances to a target gauge in kilometers

rainfallqc.utils.neighbourhood_utils.compute_temporal_overlap_days(start_1: datetime, end_1: datetime, start_2: datetime, end_2: datetime) → int[source]

Compute temporal overlap in days.

Note: assumes that the data is contiguous.

Parameters:

start_1: Start time of timestamp 1
end_1: End time of timestamp 2
start_2: Start time of timestamp 2
end_2: End time of timestamp 2

Returns:

overlap_days: Days that overlap between the two timestamps

rainfallqc.utils.neighbourhood_utils.compute_temporal_overlap_days_from_target_id(gauge_network_metadata: DataFrame, target_id: str, station_id_col: str, start_datetime_col: str, end_datetime_col: str) → DataFrame[source]

Compute overlap in days between target gauges and its neighbours.

Note: assumes that the data is contiguous.

Parameters:

gauge_network_metadata: Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.
target_id: Target gauge to compare against.
station_id_col: Column name for station ID in gauge_network_metadata
start_datetime_col: Column name for start datetime in gauge_network_metadata
end_datetime_col: Column name for end datetime in gauge_network_metadata

Returns:

neighbour_overlap_days_df: Neighbouring gauges with overlap days to target gauge.

rainfallqc.utils.neighbourhood_utils.get_ids_of_n_nearest_overlapping_neighbouring_gauges(gauge_network_metadata: DataFrame, target_id: str, distance_threshold: int | float, n_closest: int, min_overlap_days: int, station_id_col: str = 'station_id', start_datetime_col: str = 'start_datetime', end_datetime_col: str = 'end_datetime') → list[source]

Get gauge IDs of nearest n time-overlapping neighbouring gauges.

Parameters:

gauge_network_metadata: Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.
target_id: Target gauge to compare against.
distance_threshold: Threshold for maximum distance considered
n_closest: Number of closest neighbours.
min_overlap_days: Minimum overlap between target and neighbouring gauges
station_id_col: Column name for station ID in gauge_network_metadata (default ‘station_id’)
start_datetime_col: Column name for start datetime in gauge_network_metadata (default ‘start_datetime’)
end_datetime_col: Column name for end datetime in gauge_network_metadata (default ‘end_datetime’)

Returns:

neighbouring_gauge_id: IDs of neighbouring gauges within a given distance to target and min overlapping days

rainfallqc.utils.neighbourhood_utils.get_n_closest_neighbours(neighbour_distances_df: DataFrame, distance_threshold: int | float, n_closest: int) → DataFrame[source]

Get closest neighbours from neighbour distances data.

Will return more than number of n_closest if there is multiple values that are equal at that index. Will not return values that are 0 dist away.

Parameters:

neighbour_distances_df: Data of distances to a target gauge
distance_threshold: Threshold for maximum distance considered
n_closest: Number of closest neighbours.

Returns:

n_closest_neighbour_df: Data of n_closest neighbours

rainfallqc.utils.neighbourhood_utils.get_nearest_non_nan_etccdi_val_to_gauge(etccdi_data: Dataset, etccdi_name: str, gauge_lat: int | float, gauge_lon: int | float, max_distance_km: int | float = 500) → Dataset[source]

Get the value at the nearest non-nan ETCCDI grid cell to the gauge coordinates.

Parameters:

etccdi_data: ETCCDI data with given variable to check
etccdi_name: ETCCDI variable name to check
gauge_lat: latitude of the rain gauge
gauge_lon: longitude of the rain gauge
max_distance_km: Maximum distance in km to search for a non-nan value (default 500 km)

Returns:

nearby_etccdi_data: ETCCDI data at the nearest grid cell with non-nan values

rainfallqc.utils.neighbourhood_utils.get_neighbours_with_min_overlap_days(neighbour_overlap_days_df: DataFrame, min_overlap_days: int) → DataFrame[source]

Get neighbours around gauge at least min_overlap_days of overlapping time steps.

Note: assumes that the data is contiguous.

Parameters:

neighbour_overlap_days_df: Neighbouring gauges with overlap days to target gauge.
min_overlap_days: Minimum overlap between target and neighbouring gauges

Returns:

neighbour_overlap_days_df: Neighbouring gauges with at least min_overlap_days overlap days.

rainfallqc.utils.neighbourhood_utils.get_rain_not_minima_column(data: DataFrame, target_col: str, other_col: str) → DataFrame[source]

Get rain not equal to minima column.

Combines two functions for getting non_zero_minima i.e. 0.1 and then get ‘rain_not_minima’

Parameters:

data: Rainfall data
target_col: Target rainfall column
other_col: Other rainfall column

Returns:

data_w_minima_col: Rainfall data with rain is minima column

rainfallqc.utils.neighbourhood_utils.get_target_neighbour_non_zero_minima(data: DataFrame, target_col: str, other_col: str, default_minima: float = 0.1) → float[source]

Get minimum non-zero value in rainfall data between target and neighbour.

Parameters:

data: Rainfall data
target_col: Target rainfall column
other_col: Other rainfall column
default_minima: Default minimum to use for non-zero value

Returns:

non_zero_minima: Minimum non-zero value.

rainfallqc.utils.neighbourhood_utils.make_rain_not_minima_column_target_or_neighbour(data: DataFrame, target_col: str, other_col: str, data_minima: float) → DataFrame[source]

Get rain values that are not minima rainfall for target or neighbour.

Parameters:

data: Rainfall data
target_col: Target rainfall column
other_col: Other rainfall column
data_minima: Data minimum (i.e. lowest non-zero value)

Returns:

data: Rainfall data with “rain_not_minima” column

2.1.3.5. rainfallqc.utils.spatial_utils module

All spatial operations.

Classes and functions ordered alphabetically.

rainfallqc.utils.spatial_utils.compute_spatial_mean_xr(data: Dataset, var_name: str) → Dataset[source]

Get the value at the nearest ETCCDI grid cell to the gauge coordinates.

Parameters:

data: Data with variable to compute mean from. Should have lat/lon and time (as axis 0)
var_name: Variable to make mean value of

Returns:

data: Data with spatial mean

rainfallqc.utils.spatial_utils.haversine(lon1: DataArray, lat1: DataArray, lon2: ndarray | float, lat2: ndarray | float) → float[source]

Great circle distance (km) between two points on Earth.

Parameters:

lon1xr.DataArray: Longitude of point 1
lat1xr.DataArray: Latitude of point 1
lon2np.ndarray | float: Longitude of point 2
lat2np.ndarray | float: Latitude of point 2

Returns:

distancefloat: Distance between the two points in km

2.1.3.6. rainfallqc.utils.stats module

Statistical tests and other indices for rainfall data quality control.

Classes and functions ordered alphabetically.

rainfallqc.utils.stats.affinity_index(data: DataFrame, binary_col: str, return_match_and_diff: bool = False) → tuple | float[source]

Calculate affinity index from binary column.

Parameters:

data: Rainfall data
binary_col: Column with binary data
return_match_and_diff: Whether to return count of matching and difference columns as well as affinity index.

Returns:

affinity: Affinity index.

rainfallqc.utils.stats.dry_spell_fraction(rain_daily: DataFrame, target_gauge_col: str, dry_period_days: int) → Series[source]

Make dry spell fraction column.

Parameters:

rain_daily: Single time-step of rainfall data with ‘dry_day’ column
target_gauge_col: Column with Rainfall data
dry_period_days: Dry periods window in days

Returns:

rain_daily_w_dry_spell_fraction: Single row with dry spell fraction column

rainfallqc.utils.stats.factor_diff(data: DataFrame, target_col: str, other_col: str) → DataFrame[source]

Compute factor diff for polars.

Parameters:

data: Rainfall data
target_col: Target column to compute factor diff for
other_col: Other column to compute factor diff for

Returns:

data_w_factor_diff: Data with factor diff

rainfallqc.utils.stats.filter_out_rain_world_records(data: DataFrame, target_gauge_col: str, time_res: str) → DataFrame[source]

Filter out rain world records based on time resolution.

Parameters:

data: Rainfall data
target_gauge_col: Column with rainfall data
time_res: Temporal resolution of the time series either ‘daily’ or ‘hourly’

Returns:

data_not_wr: Data without rain world records

rainfallqc.utils.stats.fit_expon_and_get_percentile(series: Series, percentiles: list[float]) → dict[float, float][source]

Fit exponential to data series and then get percentile using PPF.

Parameters:

series: Data series to fit exponential distribution.
percentiles: Percentiles (between 0-1) to evaluate on the fitted exponential distribution

Returns:

expon_percentiles: Threshold at percentile of fitted distribution

rainfallqc.utils.stats.gauge_correlation(data: DataFrame, target_col: str, other_col: str) → float[source]

Calculate correlation between rain gauge data columns.

Parameters:

data: Rainfall data
target_col: Target rainfall column
other_col: Other rainfall column

Returns:

corr_coef: Correlation coefficient.

rainfallqc.utils.stats.get_rainfall_world_records() → dict[str, float][source]

Return rainfall world record as of 29/04/25.

See: - http://www.nws.noaa.gov/oh/hdsc/record_precip/record_precip_world.html - http://www.bom.gov.au/water/designRainfalls/rainfallEvents/worldRecRainfall.shtml - https://wmo.asu.edu/content/world-meteorological-organization-global-weather-climate-extremes-archive

Returns:

rwr: rainfall world records set in stats.py

rainfallqc.utils.stats.percentage_diff(target: Expr, other: Expr) → Series[source]

Percentage difference between target and other column.

Parameters:

target:: Target data to compare other too
other:: Other data

Returns:

perc_diff:: Percentage difference

rainfallqc.utils.stats.pettitt_test(arr: Series | ndarray)[source]

Pettitt test for detecting a change point in a time series.

Calculated following Pettitt (1979): https://www.jstor.org/stable/2346729?seq=4#metadata_info_tab_contents.

TAKEN FROM: https://stackoverflow.com/questions/58537876/how-to-run-standard-normal-homogeneity-test-for-a-time-series-data.

Parameters:

arrpl.Series or np.ndarray: The input time series data.

Returns:

tauint: Index of the change point (first point of the second segment).
pfloat: p-value for the test statistic.

rainfallqc.utils.stats.simple_precip_intensity_index(data: DataFrame, target_gauge_col: str, wet_threshold: int | float) → float[source]

Calculate simple precipitation intensity index.

Parameters:

data: Rainfall data
target_gauge_col: Column with rainfall data
wet_threshold: Threshold for rainfall intensity in given time period

Returns:

sdii_val: Simple precipitation intensity index

2.1.3.7. Module contents

Utility functions.