작성일자 : 2023-10-31

Ver 0.1.1

Datamanim_유형1_7_Merge,Concat

33 ~ 100 문제 풀이 : https://youtu.be/00rctVVSSoA?si=Du3KDMiQeHq7Gp-F

7. Merge, Concat ¶

국가별 5세이하 사망비율 통계 : https://www.kaggle.com/utkarshxy/who-worldhealth-statistics-2020-complete
데이터 변형
Dataurl = https://raw.githubusercontent.com/Datamanim/pandas/main/mergeTEst.csv

Q91. df1과 df2 데이터를 하나의 데이터 프레임으로 합쳐라

In [1]:

import pandas as pd

In [15]:

df = pd.read_csv('https://raw.githubusercontent.com/Datamanim/pandas/main/mergeTEst.csv', index_col=0)

In [16]:

df1 = df.iloc[:4,:]
df2 = df.iloc[4:,:]

display(df1)
display(df2)

	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019
Location
Afghanistan	64.023	61.640	59.367	57.170	55.08	53.107	51.267	49.560	47.983	46.453
Albania	11.803	10.807	9.943	9.267	8.79	8.493	8.363	8.363	8.453	8.597
Algeria	23.540	22.907	22.450	22.117	21.85	21.587	21.257	20.850	20.407	19.930
Andorra	4.240	4.033	3.843	3.667	3.49	3.330	3.187	3.060	2.933	2.827

	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019
Location
Angola	75.713	71.280	67.233	63.570	60.430	57.757	55.510	53.460	51.757	50.093
Antigua and Barbuda	8.667	8.223	7.807	7.420	7.070	6.757	6.483	6.230	6.000	5.783
Argentina	12.887	12.380	11.840	11.283	10.733	10.203	9.683	9.177	8.680	8.227
Armenia	16.497	15.677	14.897	14.170	13.477	12.817	12.183	11.583	11.007	10.497
Australia	3.993	3.803	3.623	3.467	3.343	3.253	3.183	3.137	3.090	3.047
Austria	3.573	3.463	3.333	3.210	3.113	3.043	2.987	2.943	2.897	2.843

In [17]:

pd.concat([df1,df2])

Out[17]:

	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019
Location
Afghanistan	64.023	61.640	59.367	57.170	55.080	53.107	51.267	49.560	47.983	46.453
Albania	11.803	10.807	9.943	9.267	8.790	8.493	8.363	8.363	8.453	8.597
Algeria	23.540	22.907	22.450	22.117	21.850	21.587	21.257	20.850	20.407	19.930
Andorra	4.240	4.033	3.843	3.667	3.490	3.330	3.187	3.060	2.933	2.827
Angola	75.713	71.280	67.233	63.570	60.430	57.757	55.510	53.460	51.757	50.093
Antigua and Barbuda	8.667	8.223	7.807	7.420	7.070	6.757	6.483	6.230	6.000	5.783
Argentina	12.887	12.380	11.840	11.283	10.733	10.203	9.683	9.177	8.680	8.227
Armenia	16.497	15.677	14.897	14.170	13.477	12.817	12.183	11.583	11.007	10.497
Australia	3.993	3.803	3.623	3.467	3.343	3.253	3.183	3.137	3.090	3.047
Austria	3.573	3.463	3.333	3.210	3.113	3.043	2.987	2.943	2.897	2.843

Q92. df3과 df4 데이터를 하나의 데이터 프레임으로 합쳐라. 둘다 포함하고 있는 년도에 대해서만 고려한다

In [18]:

df3 = df.iloc[:2,:4]
df4 = df.iloc[5:,3:]

display(df3)
display(df4)

	2010	2011	2012	2013
Location
Afghanistan	64.023	61.640	59.367	57.170
Albania	11.803	10.807	9.943	9.267

	2013	2014	2015	2016	2017	2018	2019
Location
Antigua and Barbuda	7.420	7.070	6.757	6.483	6.230	6.000	5.783
Argentina	11.283	10.733	10.203	9.683	9.177	8.680	8.227
Armenia	14.170	13.477	12.817	12.183	11.583	11.007	10.497
Australia	3.467	3.343	3.253	3.183	3.137	3.090	3.047
Austria	3.210	3.113	3.043	2.987	2.943	2.897	2.843

In [19]:

pd.concat([df3, df4], join = 'inner')

Out[19]:

	2013
Location
Afghanistan	57.170
Albania	9.267
Antigua and Barbuda	7.420
Argentina	11.283
Armenia	14.170
Australia	3.467
Austria	3.210

Q93. df3과 df4 데이터를 하나의 데이터 프레임으로 합쳐라. 모든 컬럼을 포함하고, 결측치는 0으로 대체한다

In [11]:

help(pd.concat)

Help on function concat in module pandas.core.reshape.concat:

concat(objs: 'Iterable[NDFrame] | Mapping[HashableT, NDFrame]', *, axis: 'Axis' = 0, join: 'str' = 'outer', ignore_index: 'bool' = False, keys=None, levels=None, names=None, verify_integrity: 'bool' = False, sort: 'bool' = False, copy: 'bool | None' = None) -> 'DataFrame | Series'
    Concatenate pandas objects along a particular axis.
    
    Allows optional set logic along the other axes.
    
    Can also add a layer of hierarchical indexing on the concatenation axis,
    which may be useful if the labels are the same (or overlapping) on
    the passed axis number.
    
    Parameters
    ----------
    objs : a sequence or mapping of Series or DataFrame objects
        If a mapping is passed, the sorted keys will be used as the `keys`
        argument, unless it is passed, in which case the values will be
        selected (see below). Any None objects will be dropped silently unless
        they are all None in which case a ValueError will be raised.
    axis : {0/'index', 1/'columns'}, default 0
        The axis to concatenate along.
    join : {'inner', 'outer'}, default 'outer'
        How to handle indexes on other axis (or axes).
    ignore_index : bool, default False
        If True, do not use the index values along the concatenation axis. The
        resulting axis will be labeled 0, ..., n - 1. This is useful if you are
        concatenating objects where the concatenation axis does not have
        meaningful indexing information. Note the index values on the other
        axes are still respected in the join.
    keys : sequence, default None
        If multiple levels passed, should contain tuples. Construct
        hierarchical index using the passed keys as the outermost level.
    levels : list of sequences, default None
        Specific levels (unique values) to use for constructing a
        MultiIndex. Otherwise they will be inferred from the keys.
    names : list, default None
        Names for the levels in the resulting hierarchical index.
    verify_integrity : bool, default False
        Check whether the new concatenated axis contains duplicates. This can
        be very expensive relative to the actual data concatenation.
    sort : bool, default False
        Sort non-concatenation axis if it is not already aligned.
    
    copy : bool, default True
        If False, do not copy data unnecessarily.
    
    Returns
    -------
    object, type of objs
        When concatenating all ``Series`` along the index (axis=0), a
        ``Series`` is returned. When ``objs`` contains at least one
        ``DataFrame``, a ``DataFrame`` is returned. When concatenating along
        the columns (axis=1), a ``DataFrame`` is returned.
    
    See Also
    --------
    DataFrame.join : Join DataFrames using indexes.
    DataFrame.merge : Merge DataFrames by indexes or columns.
    
    Notes
    -----
    The keys, levels, and names arguments are all optional.
    
    A walkthrough of how this method fits in with other tools for combining
    pandas objects can be found `here
    <https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html>`__.
    
    It is not recommended to build DataFrames by adding single rows in a
    for loop. Build a list of rows and make a DataFrame in a single concat.
    
    Examples
    --------
    Combine two ``Series``.
    
    >>> s1 = pd.Series(['a', 'b'])
    >>> s2 = pd.Series(['c', 'd'])
    >>> pd.concat([s1, s2])
    0    a
    1    b
    0    c
    1    d
    dtype: object
    
    Clear the existing index and reset it in the result
    by setting the ``ignore_index`` option to ``True``.
    
    >>> pd.concat([s1, s2], ignore_index=True)
    0    a
    1    b
    2    c
    3    d
    dtype: object
    
    Add a hierarchical index at the outermost level of
    the data with the ``keys`` option.
    
    >>> pd.concat([s1, s2], keys=['s1', 's2'])
    s1  0    a
        1    b
    s2  0    c
        1    d
    dtype: object
    
    Label the index keys you create with the ``names`` option.
    
    >>> pd.concat([s1, s2], keys=['s1', 's2'],
    ...           names=['Series name', 'Row ID'])
    Series name  Row ID
    s1           0         a
                 1         b
    s2           0         c
                 1         d
    dtype: object
    
    Combine two ``DataFrame`` objects with identical columns.
    
    >>> df1 = pd.DataFrame([['a', 1], ['b', 2]],
    ...                    columns=['letter', 'number'])
    >>> df1
      letter  number
    0      a       1
    1      b       2
    >>> df2 = pd.DataFrame([['c', 3], ['d', 4]],
    ...                    columns=['letter', 'number'])
    >>> df2
      letter  number
    0      c       3
    1      d       4
    >>> pd.concat([df1, df2])
      letter  number
    0      a       1
    1      b       2
    0      c       3
    1      d       4
    
    Combine ``DataFrame`` objects with overlapping columns
    and return everything. Columns outside the intersection will
    be filled with ``NaN`` values.
    
    >>> df3 = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']],
    ...                    columns=['letter', 'number', 'animal'])
    >>> df3
      letter  number animal
    0      c       3    cat
    1      d       4    dog
    >>> pd.concat([df1, df3], sort=False)
      letter  number animal
    0      a       1    NaN
    1      b       2    NaN
    0      c       3    cat
    1      d       4    dog
    
    Combine ``DataFrame`` objects with overlapping columns
    and return only those that are shared by passing ``inner`` to
    the ``join`` keyword argument.
    
    >>> pd.concat([df1, df3], join="inner")
      letter  number
    0      a       1
    1      b       2
    0      c       3
    1      d       4
    
    Combine ``DataFrame`` objects horizontally along the x axis by
    passing in ``axis=1``.
    
    >>> df4 = pd.DataFrame([['bird', 'polly'], ['monkey', 'george']],
    ...                    columns=['animal', 'name'])
    >>> pd.concat([df1, df4], axis=1)
      letter  number  animal    name
    0      a       1    bird   polly
    1      b       2  monkey  george
    
    Prevent the result from including duplicate index values with the
    ``verify_integrity`` option.
    
    >>> df5 = pd.DataFrame([1], index=['a'])
    >>> df5
       0
    a  1
    >>> df6 = pd.DataFrame([2], index=['a'])
    >>> df6
       0
    a  2
    >>> pd.concat([df5, df6], verify_integrity=True)
    Traceback (most recent call last):
        ...
    ValueError: Indexes have overlapping values: ['a']
    
    Append a single row to the end of a ``DataFrame`` object.
    
    >>> df7 = pd.DataFrame({'a': 1, 'b': 2}, index=[0])
    >>> df7
        a   b
    0   1   2
    >>> new_row = pd.Series({'a': 3, 'b': 4})
    >>> new_row
    a    3
    b    4
    dtype: int64
    >>> pd.concat([df7, new_row.to_frame().T], ignore_index=True)
        a   b
    0   1   2
    1   3   4

In [20]:

pd.concat([df3, df4], join = 'outer').fillna(0)

Out[20]:

	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019
Location
Afghanistan	64.023	61.640	59.367	57.170	0.000	0.000	0.000	0.000	0.000	0.000
Albania	11.803	10.807	9.943	9.267	0.000	0.000	0.000	0.000	0.000	0.000
Antigua and Barbuda	0.000	0.000	0.000	7.420	7.070	6.757	6.483	6.230	6.000	5.783
Argentina	0.000	0.000	0.000	11.283	10.733	10.203	9.683	9.177	8.680	8.227
Armenia	0.000	0.000	0.000	14.170	13.477	12.817	12.183	11.583	11.007	10.497
Australia	0.000	0.000	0.000	3.467	3.343	3.253	3.183	3.137	3.090	3.047
Austria	0.000	0.000	0.000	3.210	3.113	3.043	2.987	2.943	2.897	2.843

Q94. df5과 df6 데이터를 하나의 데이터 프레임으로 merge함수를 이용하여 합쳐라. Algeria컬럼을 key로 하고 두 데이터 모두 포함하는 데이터만 출력하라

In [21]:

df5 = df.T.iloc[:7,:3]
df6 = df.T.iloc[6:,2:5]

display(df5)
display(df6)

Location	Afghanistan	Albania	Algeria
2010	64.023	11.803	23.540
2011	61.640	10.807	22.907
2012	59.367	9.943	22.450
2013	57.170	9.267	22.117
2014	55.080	8.790	21.850
2015	53.107	8.493	21.587
2016	51.267	8.363	21.257

Location	Algeria	Andorra	Angola
2016	21.257	3.187	55.510
2017	20.850	3.060	53.460
2018	20.407	2.933	51.757
2019	19.930	2.827	50.093

In [24]:

pd.merge(df5, df6, on = 'Algeria', how = 'inner')

Out[24]:

Location	Afghanistan	Albania	Algeria	Andorra	Angola
0	51.267	8.363	21.257	3.187	55.51

Q95. df5과 df6 데이터를 하나의 데이터 프레임으로 merge함수를 이용하여 합쳐라. Algeria컬럼을 key로 하고 합집합으로 합쳐라

In [25]:

pd.merge(df5, df6, on = 'Algeria', how = 'outer')

Out[25]:

Location	Afghanistan	Albania	Algeria	Andorra	Angola
0	64.023	11.803	23.540	NaN	NaN
1	61.640	10.807	22.907	NaN	NaN
2	59.367	9.943	22.450	NaN	NaN
3	57.170	9.267	22.117	NaN	NaN
4	55.080	8.790	21.850	NaN	NaN
5	53.107	8.493	21.587	NaN	NaN
6	51.267	8.363	21.257	3.187	55.510
7	NaN	NaN	20.850	3.060	53.460
8	NaN	NaN	20.407	2.933	51.757
9	NaN	NaN	19.930	2.827	50.093

저작자표시

일	월	화	수	목	금	토
		1	1	1	1	1
1	1	1	1	1	1	1
1	1	1	1	1	1	1
1	1	1	1	1	1	1
1	1	1	1

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[빅분기] 유형1_Merge,Concat

7. Merge, Concat ¶

7. Merge, Concat ¶

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

7. Merge, Concat¶

7. Merge, Concat¶

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

7. Merge, Concat ¶

7. Merge, Concat ¶