2.1.3. rainfallqc.utils package

2.1.3.1. Submodules

2.1.3.2. rainfallqc.utils.data_readers module

Data loading tools.

Classes for reading rain gauge network data at bottom of file.

class rainfallqc.utils.data_readers.GPCCNetworkReader(path_to_gpcc_dir: str, time_res: str, file_format: str = '.zip', unzipped_file_format: str = '.dat')[source]

Bases: GaugeNetworkReader

GPCC rain gauge network reader.

Methods

get_nearest_overlapping_neighbours_to_target(...)

Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.

load_network_data(data_paths, target_gauge_col)

Load GPCC network data based on file paths.

load_network_data(data_paths: List[str] | ndarray[str], target_gauge_col: str, missing_val: int | float = -999.9) DataFrame[source]

Load GPCC network data based on file paths.

Parameters:
data_paths

Paths to load network data from.

target_gauge_col

Rainfall data column

missing_val

Missing value (default: -999)

Returns:
network_data

Dataframe of GPCC gauges.

class rainfallqc.utils.data_readers.GSDRNetworkReader(path_to_gsdr_dir: str, file_format: str = '.txt')[source]

Bases: GaugeNetworkReader

GSDR rain gauge network reader.

Methods

get_nearest_overlapping_neighbours_to_target(...)

Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.

load_network_data(rain_col_prefix, data_paths)

Load GSDR network data based on file paths.

load_network_data(rain_col_prefix: str, data_paths: List[str] | ndarray[str], suffix_only: bool = False, gsdr_header_rows: int = 20) DataFrame[source]

Load GSDR network data based on file paths.

Parameters:
data_paths

Paths to load network data from.

rain_col_prefix

Prefix for rain column name (default is ‘rain’)

suffix_only

Override to only include the suffix e.g. if the column name is the ID)

gsdr_header_rows

Number of rows to skip in the header of the GSDR data (default=20)

Returns:
network_data

Dataframe of GSDR gauges.

class rainfallqc.utils.data_readers.GaugeNetworkReader(path_to_gauge_network: str)[source]

Bases: ABC

Base class for reading rain gauge networks.

Methods

get_nearest_overlapping_neighbours_to_target(...)

Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.

get_nearest_overlapping_neighbours_to_target(target_id: str, distance_threshold: int | float, n_closest: int, min_overlap_days: int) set[source]

Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.

Parameters:
target_id

Target gauge to get neighbour IDs of

distance_threshold

Distance threshold to check for neighbours

n_closest

Number of nearest neighbours to return

min_overlap_days

Minimum time overlap between neighbours to return

Returns:
neighbouring_gauge_id

IDs of neighbouring gauges within a given distance to target and min overlapping days

rainfallqc.utils.data_readers.add_datetime_to_gsdr_data(gsdr_data: DataFrame, gsdr_metadata: dict, multiplying_factor: int | float) DataFrame[source]

Add datetime column to GSDR gauge data using metadata from that gauge.

NOTE: Could maybe extend so can find metadata if not provided?

Parameters:
gsdr_data

GSDR data

gsdr_metadata

Metadata from GSDR file

multiplying_factorint or float

Factor to multiply the data by.

Returns:
gsdr_data

GSDR data with datetime column added

rainfallqc.utils.data_readers.convert_gsdr_metadata_dates_to_datetime(gsdr_metadata: dict) dict[source]

Convert GSDR metadata date string column to datetime.

Parameters:
gsdr_metadata

Metadata from GSDR file

Returns:
gsdr_metadatadict
Metadata from GSDR file with start and end date column
rainfallqc.utils.data_readers.get_paths_using_gauge_ids(gauge_ids: List[str] | ndarray[str], dir_path: str, file_format: str, time_res: str = None) dict[source]

Get data path of Gauge IDs.

Parameters:
gauge_ids

Array of gauge IDs

dir_path

Path to data directory

file_format

Format of files in directory.

time_res

Time resolution (e.g. ‘mw’ or ‘tw’)

Returns:
gauge_paths

Dictionary of gauge ID and path

rainfallqc.utils.data_readers.load_etccdi_data(etccdi_var: str, path_to_etccdi: str = None) Dataset[source]

Load ETCCDI data.

Parameters:
etccdi_var

variable to load from ETCCDI

path_to_etccdi

path to ETCCDI data (default is location of data in tests)

Returns:
etccdi_data

Loaded data

rainfallqc.utils.data_readers.load_gpcc_gauge_network_metadata(path_to_gpcc_dir: str, time_res: str, gpcc_file_format: str = '.dat') DataFrame[source]

Load metadata from GPCC gauges from a directory.

Parameters:
path_to_gpcc_dir

Path to directory with GPCC gauges

time_res

Time resolution (e.g. ‘mw’ or ‘tw’)

gpcc_file_format

Format of file (default is .dat)

Returns:
all_station_metadata

All GPCC gauges metadata as one dataframe.

rainfallqc.utils.data_readers.load_gsdr_gauge_network_metadata(path_to_gsdr_dir: str, file_format: str = '.txt') DataFrame[source]

Load metadata from GSDR gauges from a directory.

Parameters:
path_to_gsdr_dir

Path to directory with GSDR gauges

file_format

Format of file (default is .txt)

Returns:
all_station_metadata

All GSDR gauges metadata as one dataframe.

rainfallqc.utils.data_readers.read_gpcc_data_from_zip(data_path: str, gpcc_file_name: str, target_gauge_col: str, time_res: str, hour_offset: int = 7, missing_val: int | float = -999) DataFrame[source]

Read the specific format and header of Global Precipitation Climatology Centre (GPCC) files.

Parameters:
data_path

path to GPCC zip file

gpcc_file_name

Name of GPCC file within zip

target_gauge_col

Name of rainfall column

time_res

‘daily’ or ‘monthly’

hour_offset

Hours to offset grouped data by (default is 7)

missing_val

Missing value (default: -999)

Returns:
gpcc_datadict

Data from GPCC file

rainfallqc.utils.data_readers.read_gpcc_metadata_from_zip(data_path: str, time_res: str, gpcc_file_format: str = '.dat') dict[source]

Read GPCC metadata from zip file.

Parameters:
data_path

path to GPCC zip file.

time_res

Time resolution of data (e.g. daily or monthly)

gpcc_file_format

Default GPCC file format (default: .dat)

Returns:
metadata

Metadata from GPCC file

rainfallqc.utils.data_readers.read_gsdr_data_from_file(data_path: str, raw_data_time_res: str, rain_col_prefix: str = None, rain_col_suffix: str = None, suffix_only: bool = False, gsdr_header_rows: int = 20) DataFrame[source]

Read GSDR data from file.

Note: this was developed on the GSDR data available from IntenseQC. So it needs a number of header rows in data.

Parameters:
data_path

Path to GSDR data file

raw_data_time_res

Time resolution of data record i.e. ‘hourly’ or ‘daily’

rain_col_prefix

Prefix for column for target_gauge_col (set as None by default)

rain_col_suffix

Suffix for column name for target_gauge_col (set as None by default)

suffix_only

Override to only include the suffix e.g. if the column name is the ID)

gsdr_header_rows

Number of rows to skip in the header of the GSDR data (default=20)

Returns:
gsdr_data

GSDR data as Pandas DataFrame

rainfallqc.utils.data_readers.read_gsdr_metadata(data_path: str) dict[source]

Read the specific format and header of Global Sub-Daily Rainfall (GSDR) files.

Parameters:
data_path

path to GSDR data file (.txt)

Returns:
metadata

Metadata from GSDR file

2.1.3.3. rainfallqc.utils.data_utils module

All data operations for polars including datetime and calendar functionality.

Classes and functions ordered alphabetically.

rainfallqc.utils.data_utils.back_propagate_daily_data_flags(data: DataFrame, flag_column: str, num_days: int) DataFrame[source]

Back fill-in flags a number of days.

This will prioritise higher flag values.

Parameters:
data

Daily data with flag_column

flag_column

column with flags

num_days:

Number of days to back-propagate

Returns:
data

Data with flags back-propogated

rainfallqc.utils.data_utils.calculate_dry_spell_fraction(data: DataFrame, target_gauge_col: str, dry_period_days: int) Series[source]

Calculate dry spell fraction.

Parameters:
data

Data with time column

target_gauge_col

Column with rainfall data

dry_period_days

Length for of a “dry_spell”

Returns:
rain_daily_dry_day

Data with dry spell fraction

rainfallqc.utils.data_utils.check_data_has_consistent_time_step(data: DataFrame) None[source]

Check data has a consistent time step i.e. ‘1h’.

Parameters:
data

Data with time column

Raises:
ValueError

If data has more than one time steps

rainfallqc.utils.data_utils.check_data_is_monthly(data: DataFrame) None[source]

Check data is monthly.

Parameters:
data

Data with time column

Raises:
ValueError

If data has a no monthly time steps

rainfallqc.utils.data_utils.check_data_is_specific_time_res(data: DataFrame, time_res: str | list) None[source]

Check data has a hourly or daily time step.

Does not work for monthly data, please use ‘check_data_is_monthly’.

Parameters:
data

Data with time column.

time_res

Time resolutions either a single string or list of strings

Raises:
ValueError

If data is not hourly or daily.

rainfallqc.utils.data_utils.check_for_negative_values(df: DataFrame, target_gauge_col: str) bool[source]

Check if the target column contains any negative values.

Parameters:
df

DataFrame to check.

target_gauge_col

Column to check for negative values.

Raises:
ValueError

If negative values are found in the target column.

rainfallqc.utils.data_utils.convert_daily_data_to_monthly(daily_data: DataFrame, rain_cols: list, perc_for_valid_month: int | float = 95) DataFrame[source]

Convert daily data to monthly whilst setting month to NaN if less than a given percentage of days is missing.

Parameters:
daily_data

Daily data to convert to monthly

rain_cols

Columns with rainfall data

perc_for_valid_month

Percentage of month needed to be classed as a valid month for the monthly group by

Returns:
monthly_data

Monthly data

rainfallqc.utils.data_utils.convert_datarray_seconds_to_days(series_seconds: DataArray) ndarray[source]

Convert xarray series from seconds to days. For some reason the CDD data from ETCCDI is in seconds.

Parameters:
series_seconds

Data in series to convert to days.

Returns:
series_days

Data array converted to days.

rainfallqc.utils.data_utils.downsample_and_fill_columns(high_res_data: DataFrame, low_res_data: DataFrame, data_cols: str | list[str], fill_limit: int, fill_method: str = 'backward', time_col: str = 'time') DataFrame[source]

Join columns from lower resolution data to higher resolution data and fill gaps.

Parameters:
high_res_data

Higher resolution data (e.g., 15-min)

low_res_data

Lower resolution data with columns to join (e.g., hourly)

data_cols

Column name(s) to join and fill. Can be: - Single column name: “rainfall” - List of columns: [“rain1”, “rain2”] - Regex pattern: “^rain.*$”

fill_limit

Maximum number of intervals to fill

fill_method

“forward”, “backward”, or “none”

time_col

Name of time column (default: ‘time’)

Returns:
high_res_data_filled

High resolution data with filled columns

rainfallqc.utils.data_utils.downsample_monthly_data(sub_monthly_data: DataFrame, monthly_data: DataFrame, data_cols: str | list[str], time_col: str = 'time') DataFrame[source]

Join monthly data to hourly and fill only within same month.

Parameters:
sub_monthly_data

Sub-monthly data (e.g., hourly)

monthly_data

Monthly data with columns to join

data_cols

Column name(s) to join and fill. Can be: - Single column name: “rainfall” - List of columns: [“rain1”, “rain2”]

time_col

Name of time column (default: ‘time’)

Returns:
result

Sub-monthly data with monthly columns joined and filled within month

rainfallqc.utils.data_utils.extract_negative_values_from_data(data: DataFrame, cols_to_extract_from: list) DataFrame[source]

Extract negative values from data.

Parameters:
data

Rainfall data.

cols_to_extract_from

Columns to extract negative values from

Returns:
data

Data with only negative values or 0.

rainfallqc.utils.data_utils.extract_positive_values_from_data(data: DataFrame, cols_to_extract_from: list) DataFrame[source]

Extract positive values from data.

Parameters:
data

Rainfall data.

cols_to_extract_from

Columns to extract positive values from

Returns:
data

Data with only positive values or 0.

rainfallqc.utils.data_utils.format_timedelta_duration(td: timedelta) str[source]

Convert timedelta to custom strings.

Parameters:
td

Time delta to convert.

Returns:
td

Human-readable timedelta string using largest unit (d, h, m, s).

rainfallqc.utils.data_utils.get_data_timestep_as_str(data: DataFrame) str[source]

Get time step of data.

Parameters:
data

Data with time column

Returns:
time_step

Time step of data i.e. ‘1h’, ‘1d’, ‘15m’.

rainfallqc.utils.data_utils.get_data_timesteps(data: DataFrame) Series[source]

Get data timesteps. Ideally the data should have 1.

Parameters:
data

Data with time column.

Returns:
unique_timesteps

All unique time steps in data (timedelta).

rainfallqc.utils.data_utils.get_dry_period_proportions(dry_period_days: int) dict[source]

Get dry period proportions.

Parameters:
dry_period_days

Length for of a “dry_spell” (default: 15 days)

Returns:
fraction_dry_days

Dictionary with keys “1”, “2”, “3” with dry spell fractions

rainfallqc.utils.data_utils.get_dry_spells(data: DataFrame, target_gauge_col: str) DataFrame[source]

Get dry spell column.

Parameters:
data

Rainfall data

target_gauge_col

Column with rainfall data

Returns:
data_w_dry_spells

Data with is_dry binary column

rainfallqc.utils.data_utils.get_expected_days_in_month(data: DataFrame) DataFrame[source]

Get expected number of days in a months within the data.

Parameters:
data

Data with ‘year’ and ‘month’ columns

Returns:
data:

Data with ‘expected_days_in_month” column

rainfallqc.utils.data_utils.get_normalised_diff(data: DataFrame, target_col: str, other_col: str, diff_col_name: str) DataFrame[source]

Ger normalised difference between two columns in data.

Parameters:
data

Data with columns

target_col

Target column

other_col

Other column.

diff_col_name

New column name for difference column

Returns:
data_w_norm_diff

Data with normalised diff

rainfallqc.utils.data_utils.make_month_and_year_col(data: DataFrame) DataFrame[source]

Make year and month columns for polars dataframe.

Parameters:
data

Data with time column

Returns:
data

Data with year and month columns

rainfallqc.utils.data_utils.normalise_data(data: Series | Expr) Series[source]

Normalise data to [0, 1].

Parameters:
data

Data with time column.

Returns:
norm_data

Normalised data.

rainfallqc.utils.data_utils.offset_data_by_time(data: DataFrame, target_col: str, offset_in_time: int, time_res: str) DataFrame[source]

Shift/offset data either backwards or forwards in time.

Parameters:
data

Data with column to offset in ‘time’

target_col

Column of data to offset

offset_in_time

Amount to offset data by i.e. 1 for 1 day if time_res set to ‘1d’

time_res

Time resolution like ‘hourly’, ‘daily’, ‘1h’ or ‘1d’

Returns:
data

Offset data by ‘offset_in_time’ amount

rainfallqc.utils.data_utils.replace_missing_vals_with_nan(data: DataFrame, target_gauge_col: str, missing_val: int = None) DataFrame[source]

Replace no data value with numpy.nan.

Parameters:
data

Rainfall data

target_gauge_col

Column of rainfall

missing_val

Missing value identifier

Returns:
gsdr_data

GSDR data with missing values replaced

rainfallqc.utils.data_utils.resample_data_by_time_step(data: DataFrame, rain_cols: List[str], time_col: str, time_step: str, min_count: int, hour_offset: int) DataFrame[source]

Group hourly data into daily and check for at least 24 daily time steps per day.

Parameters:
data

Rainfall data to resample

rain_cols

List of column with rainfall data

time_col

Name of time column

time_step

Time step to resample into (e.g. ‘1d’ for daily, ‘1h’ for hourly, ‘15m’ for 15 minute)

min_count

Minimum number of time steps needed per time period

hour_offset

Time offset in hours (needed if data is not aligned to midnight)

Returns:
resampled_data

Rainfall data grouped into a given time step

2.1.3.4. rainfallqc.utils.neighbourhood_utils module

All neighbourhood and nearby related operations.

rainfallqc.utils.neighbourhood_utils.compute_km_distances_from_target_id(gauge_network_metadata: DataFrame, target_id: str, station_id_col: str) DataFrame[source]

Compute kilometre distances between gauges in network and target gauges.

Parameters:
gauge_network_metadata

Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.

target_id

Target gauge to compare against.

station_id_col

Column name for station ID in gauge_network_metadata

Returns:
neighbour_distances_df

Data of distances to a target gauge in kilometers

rainfallqc.utils.neighbourhood_utils.compute_temporal_overlap_days(start_1: datetime, end_1: datetime, start_2: datetime, end_2: datetime) int[source]

Compute temporal overlap in days.

Note: assumes that the data is contiguous.

Parameters:
start_1

Start time of timestamp 1

end_1

End time of timestamp 2

start_2

Start time of timestamp 2

end_2

End time of timestamp 2

Returns:
overlap_days

Days that overlap between the two timestamps

rainfallqc.utils.neighbourhood_utils.compute_temporal_overlap_days_from_target_id(gauge_network_metadata: DataFrame, target_id: str, station_id_col: str, start_datetime_col: str, end_datetime_col: str) DataFrame[source]

Compute overlap in days between target gauges and its neighbours.

Note: assumes that the data is contiguous.

Parameters:
gauge_network_metadata

Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.

target_id

Target gauge to compare against.

station_id_col

Column name for station ID in gauge_network_metadata

start_datetime_col

Column name for start datetime in gauge_network_metadata

end_datetime_col

Column name for end datetime in gauge_network_metadata

Returns:
neighbour_overlap_days_df

Neighbouring gauges with overlap days to target gauge.

rainfallqc.utils.neighbourhood_utils.get_ids_of_n_nearest_overlapping_neighbouring_gauges(gauge_network_metadata: DataFrame, target_id: str, distance_threshold: int | float, n_closest: int, min_overlap_days: int, station_id_col: str = 'station_id', start_datetime_col: str = 'start_datetime', end_datetime_col: str = 'end_datetime') list[source]

Get gauge IDs of nearest n time-overlapping neighbouring gauges.

Parameters:
gauge_network_metadata

Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.

target_id

Target gauge to compare against.

distance_threshold

Threshold for maximum distance considered

n_closest

Number of closest neighbours.

min_overlap_days

Minimum overlap between target and neighbouring gauges

station_id_col

Column name for station ID in gauge_network_metadata (default ‘station_id’)

start_datetime_col

Column name for start datetime in gauge_network_metadata (default ‘start_datetime’)

end_datetime_col

Column name for end datetime in gauge_network_metadata (default ‘end_datetime’)

Returns:
neighbouring_gauge_id

IDs of neighbouring gauges within a given distance to target and min overlapping days

rainfallqc.utils.neighbourhood_utils.get_n_closest_neighbours(neighbour_distances_df: DataFrame, distance_threshold: int | float, n_closest: int) DataFrame[source]

Get closest neighbours from neighbour distances data.

Will return more than number of n_closest if there is multiple values that are equal at that index. Will not return values that are 0 dist away.

Parameters:
neighbour_distances_df

Data of distances to a target gauge

distance_threshold

Threshold for maximum distance considered

n_closest

Number of closest neighbours.

Returns:
n_closest_neighbour_df

Data of n_closest neighbours

rainfallqc.utils.neighbourhood_utils.get_nearest_non_nan_etccdi_val_to_gauge(etccdi_data: Dataset, etccdi_name: str, gauge_lat: int | float, gauge_lon: int | float, max_distance_km: int | float = 500) Dataset[source]

Get the value at the nearest non-nan ETCCDI grid cell to the gauge coordinates.

Parameters:
etccdi_data

ETCCDI data with given variable to check

etccdi_name

ETCCDI variable name to check

gauge_lat

latitude of the rain gauge

gauge_lon

longitude of the rain gauge

max_distance_km

Maximum distance in km to search for a non-nan value (default 500 km)

Returns:
nearby_etccdi_data

ETCCDI data at the nearest grid cell with non-nan values

rainfallqc.utils.neighbourhood_utils.get_neighbours_with_min_overlap_days(neighbour_overlap_days_df: DataFrame, min_overlap_days: int) DataFrame[source]

Get neighbours around gauge at least min_overlap_days of overlapping time steps.

Note: assumes that the data is contiguous.

Parameters:
neighbour_overlap_days_df

Neighbouring gauges with overlap days to target gauge.

min_overlap_days

Minimum overlap between target and neighbouring gauges

Returns:
neighbour_overlap_days_df

Neighbouring gauges with at least min_overlap_days overlap days.

rainfallqc.utils.neighbourhood_utils.get_rain_not_minima_column(data: DataFrame, target_col: str, other_col: str) DataFrame[source]

Get rain not equal to minima column.

Combines two functions for getting non_zero_minima i.e. 0.1 and then get ‘rain_not_minima’

Parameters:
data

Rainfall data

target_col

Target rainfall column

other_col

Other rainfall column

Returns:
data_w_minima_col

Rainfall data with rain is minima column

rainfallqc.utils.neighbourhood_utils.get_target_neighbour_non_zero_minima(data: DataFrame, target_col: str, other_col: str, default_minima: float = 0.1) float[source]

Get minimum non-zero value in rainfall data between target and neighbour.

Parameters:
data

Rainfall data

target_col

Target rainfall column

other_col

Other rainfall column

default_minima

Default minimum to use for non-zero value

Returns:
non_zero_minima

Minimum non-zero value.

rainfallqc.utils.neighbourhood_utils.make_rain_not_minima_column_target_or_neighbour(data: DataFrame, target_col: str, other_col: str, data_minima: float) DataFrame[source]

Get rain values that are not minima rainfall for target or neighbour.

Parameters:
data

Rainfall data

target_col

Target rainfall column

other_col

Other rainfall column

data_minima

Data minimum (i.e. lowest non-zero value)

Returns:
data

Rainfall data with “rain_not_minima” column

2.1.3.5. rainfallqc.utils.spatial_utils module

All spatial operations.

Classes and functions ordered alphabetically.

rainfallqc.utils.spatial_utils.compute_spatial_mean_xr(data: Dataset, var_name: str) Dataset[source]

Get the value at the nearest ETCCDI grid cell to the gauge coordinates.

Parameters:
data

Data with variable to compute mean from. Should have lat/lon and time (as axis 0)

var_name

Variable to make mean value of

Returns:
data

Data with spatial mean

rainfallqc.utils.spatial_utils.haversine(lon1: DataArray, lat1: DataArray, lon2: ndarray | float, lat2: ndarray | float) float[source]

Great circle distance (km) between two points on Earth.

Parameters:
lon1xr.DataArray

Longitude of point 1

lat1xr.DataArray

Latitude of point 1

lon2np.ndarray | float

Longitude of point 2

lat2np.ndarray | float

Latitude of point 2

Returns:
distancefloat

Distance between the two points in km

2.1.3.6. rainfallqc.utils.stats module

Statistical tests and other indices for rainfall data quality control.

Classes and functions ordered alphabetically.

rainfallqc.utils.stats.affinity_index(data: DataFrame, binary_col: str, return_match_and_diff: bool = False) tuple | float[source]

Calculate affinity index from binary column.

Parameters:
data

Rainfall data

binary_col

Column with binary data

return_match_and_diff

Whether to return count of matching and difference columns as well as affinity index.

Returns:
affinity

Affinity index.

rainfallqc.utils.stats.dry_spell_fraction(rain_daily: DataFrame, target_gauge_col: str, dry_period_days: int) Series[source]

Make dry spell fraction column.

Parameters:
rain_daily

Single time-step of rainfall data with ‘dry_day’ column

target_gauge_col

Column with Rainfall data

dry_period_days

Dry periods window in days

Returns:
rain_daily_w_dry_spell_fraction

Single row with dry spell fraction column

rainfallqc.utils.stats.factor_diff(data: DataFrame, target_col: str, other_col: str) DataFrame[source]

Compute factor diff for polars.

Parameters:
data

Rainfall data

target_col

Target column to compute factor diff for

other_col

Other column to compute factor diff for

Returns:
data_w_factor_diff

Data with factor diff

rainfallqc.utils.stats.filter_out_rain_world_records(data: DataFrame, target_gauge_col: str, time_res: str) DataFrame[source]

Filter out rain world records based on time resolution.

Parameters:
data

Rainfall data

target_gauge_col

Column with rainfall data

time_res

Temporal resolution of the time series either ‘daily’ or ‘hourly’

Returns:
data_not_wr

Data without rain world records

rainfallqc.utils.stats.fit_expon_and_get_percentile(series: Series, percentiles: list[float]) dict[float, float][source]

Fit exponential to data series and then get percentile using PPF.

Parameters:
series

Data series to fit exponential distribution.

percentiles

Percentiles (between 0-1) to evaluate on the fitted exponential distribution

Returns:
expon_percentiles

Threshold at percentile of fitted distribution

rainfallqc.utils.stats.gauge_correlation(data: DataFrame, target_col: str, other_col: str) float[source]

Calculate correlation between rain gauge data columns.

Parameters:
data

Rainfall data

target_col

Target rainfall column

other_col

Other rainfall column

Returns:
corr_coef

Correlation coefficient.

rainfallqc.utils.stats.get_rainfall_world_records() dict[str, float][source]

Return rainfall world record as of 29/04/25.

See: - http://www.nws.noaa.gov/oh/hdsc/record_precip/record_precip_world.html - http://www.bom.gov.au/water/designRainfalls/rainfallEvents/worldRecRainfall.shtml - https://wmo.asu.edu/content/world-meteorological-organization-global-weather-climate-extremes-archive

Returns:
rwr

rainfall world records set in stats.py

rainfallqc.utils.stats.percentage_diff(target: Expr, other: Expr) Series[source]

Percentage difference between target and other column.

Parameters:
target:

Target data to compare other too

other:

Other data

Returns:
perc_diff:

Percentage difference

rainfallqc.utils.stats.pettitt_test(arr: Series | ndarray)[source]

Pettitt test for detecting a change point in a time series.

Calculated following Pettitt (1979): https://www.jstor.org/stable/2346729?seq=4#metadata_info_tab_contents.

TAKEN FROM: https://stackoverflow.com/questions/58537876/how-to-run-standard-normal-homogeneity-test-for-a-time-series-data.

Parameters:
arrpl.Series or np.ndarray

The input time series data.

Returns:
tauint

Index of the change point (first point of the second segment).

pfloat

p-value for the test statistic.

rainfallqc.utils.stats.simple_precip_intensity_index(data: DataFrame, target_gauge_col: str, wet_threshold: int | float) float[source]

Calculate simple precipitation intensity index.

Parameters:
data

Rainfall data

target_gauge_col

Column with rainfall data

wet_threshold

Threshold for rainfall intensity in given time period

Returns:
sdii_val

Simple precipitation intensity index

2.1.3.7. Module contents

Utility functions.